PROFILER(1M) PROFILER(1M) NAME profiler: prfld, prfstat, prfdc, prfsnap, prfpr - UNIX system profiler SYNOPSIS prfld [ system_namelist ] prfstat [ range [ domain ] ] prfdc file [ period [ off_hour ] ] prfsnap file prfpr file [ cutoff [ system_namelist ] ] DESCRIPTION Prfld, prfstat, prfdc, prfsnap, and prfpr form a system of programs to facilitate an activity study of the UNIX operating system. Prfld is used to initialize the recording mechanism in the system. It generates a table containing the starting address of each system subroutine as extracted from system_namelist. Prfstat is used to enable or disable the sampling mechanism. The range parameter selects what values will be sampled at the sampling points. The current choices for range are pc to select PC-style sampling, stack to sample stack backtraces, and off to disable profile sampling. The domain parameter selects when the sample values will be collected and defaults to time which uses a 1ms sampling clock. The current choices for domain are: domain description ______________________________________________________________ time time switch context switches ipl non-zero interrupt priority level cycles instruction cycles dcache1 primary data cache misses dcache2 secondary data cache misses icache1 primary instruction cache misses icache2 secondary instruction cache misses scfail failed store conditional instructions brmiss mispredicted branch instructions upgclean exclusive upgrades on clean secondary cache lines upgshared exclusive upgrades on shared secondary cache lines All platforms support the time, switch, and ipl domains but only platforms based on the R10K CPU and its successors support the other domains. Samples which occur while executing user code will be attributed to the synthetic function user_code. The time and cycles domains produce time-based samplings but are different. The cycles domain can be useful when you believe that the activity of the kernel may be correlated with the time domain sampling. Such correlations can occur when application activity is triggered by clock timeouts, etc. The switch domain allows profiling to be done in performance situations where MP contention is causing processes to be constantly descheduled resulting in an idle system. Trying to profile such a problem in the time domain would show most of the system's time being spent under the kernel idle() routine with a smattering of time elsewhere - basically not very useful. Profiling in the switch domain allows you to determine what the common code paths are leading up to the context switch. The ipl domain is a special subset of the time domain. It produces a time-based sampling but only those samples which occur when the interrupt priority level is non-zero are taken. All other samples are attributed to user_code or low_ipl depending on whether the interrupt occurred while executing user code or executing kernel code at IPL0, respectively. This allows one to rapidly find where interrupts are being held off by code holding non-zero interrupt priority levels. For PC sampling, profiler overhead is less than 1% as calculated for 500 text addresses. For stack sampling profiling overhead is less than 10% of run time. Without any arguments, prfstat will display the current sampling mode. Prfstat will also reveal the number of text addresses being measured. Prfdc and prfsnap perform the PC sampling data collection function of the profiler by copying the current value of all the text address counters to a file where the data can be analyzed. Prfdc will store the counters into file every period minutes and will turn off at off_hour (valid values for period are larger than 0.017 and for off_hour are 0-24). Prfsnap collects data at the time of invocation only, appending the counter values to file. Prfpr formats the data collected by prfdc or prfsnap. Each text address is converted to the nearest text symbol (as found in system_namelist) and is printed if the percent activity for that range is greater than cutoff. cutoff may be given as a floating-point number >= 0.01. If cutoff is zero, then all samples collected are printed, even if their percentage is less than 0.01%. For stack sampling, the SpeedShop kernprof(1) special executable and the rtmond(1M) kernel data transport are used to collect the stack trace data. This data can then be analyzed with SpeedShop tools like prof(1) to produce a performance profile which provides far more information than that offered by PC sampling. The data may be collected on the machine being profiled or an any machine that can be reached via the network. See kernprof(1) for a description of all the options it supports. EXAMPLE PC sampling: # prfld # prfstat pc PC profiling enabled 9055 kernel text addresses # prfsnap /tmp/P;find /usr/bin -name xxx -print; prfsnap /tmp/P # prfpr /tmp/P .3 IRIX anchor 6.2 03131015 IP22 03/17/96 20:36 03/17/96 20:36 CPU 0 - 1253 total samples; cutoff 0.300000 wait_for_interrupt 51.1572 bzero 0.4789 bcopy 0.4789 get_buf 1.4366 bflush 0.3990 syscall 0.3192 idle 37.1907 dnlc_search 0.3192 efs_dirlookup 0.3192 iget 0.3192 user 0.7981 Total 93.22 # prfstat off profiling disabled 9055 kernel text addresses Stack sampling on machine alpha and collecting data on machine beta: alpha# prfld <kernel-file> alpha# prfstat stack STACK profiling enabled 9055 kernel text addresses alpha# /usr/etc/rtmond ... beta% ssrun -usertime /usr/bin/kernprof -t 5 -p 0 alpha beta% prof -gprof <alpha's-kernel> kernprof.usertime.<pid>.cpu0 (This assumes that rtmond(1M) is not chkconfig(1M)'ed on on the machine alpha and thus needs to be started manually.) FILES /dev/prf interface to profile data and text addresses /unix default for system namelist file Page 3