speedshop(1)

'

NAME
     SpeedShop - An integrated package of performance tools

IMPLEMENTATION
     IRIX systems

DESCRIPTION
     SpeedShop is the generic name for an integrated package of performance
     tools that run performance experiments on executables and lets you
     examine the results of those experiments.  It also supports starting a
     process in such a way as to permit a debugger to attach to it.  SpeedShop
     also runs Purify on executables.

     For Purify and for some experiments, instrumentation of the code is
     necessary.  When it is necessary, SpeedShop performs the instrumentation
     automatically and runs the instrumented executable to generate the data.

SUPPORTED EXECUTABLES
     SpeedShop works under IRIX 6.2 or later, and it supports executables
     compiled with the IRIX 6.2 compilers (o32, n32 and 64), or with the
     MIPSPro 7.x compilers (n32 and 64).  SpeedShop supports C, C++, Fortran
     77, Fortran 90, ADA, and assembler programs.  Programs must be built
     using shared libraries (DSOs); nonshared or stripped executables are not
     supported.

RECORDING EXPERIMENTS
     Experiments are recorded using the ssrun(1) command, as follows:

          ssrun [ssrun-options] -exp_type executable_name [executable_args]

     exp_type       One of the experiment names described in the EXPERIMENT
                    TYPES section.

     The result of an experiment is one or more files that are named by the
     following convention:

          executable_name.exp_type.[Rrank.][Tthread.]id

     rank    Rank number of the MPI process that generated this experiment
             file. This part of the file name is optional and will not be
             present for non-MPI targets. Ranks are given in terms of the
             MPI_COMM_WORLD communicator.

     thread  Number of the OpenMP thread that generated this experiment file.
             This part of the file name is optional and will not be present
             for non-OpenMP targets.  Thread numbers are those given by the
             function omp_get_thread_num().

     id      One of the following one or two-letter codes followed by the
             process identifier (PID):

             m    For the master process created by ssrun;

             p    For a process created by a call to sproc();












SPEEDSHOP(1)                                                      SPEEDSHOP(1)


             f    For a process created by a call to fork();

             e    For a process created by a call to exec();

             s    For a process created by a call to system(); and

             fe   For the exec'd process created by calls to fork() and
                  exec(), with environment variable
                  _SPEEDSHOP_TRACE_FORK_TO_EXEC set to False.

     To start the target process running and leave it in a state to attach a
     debugger, add, the, -hang flag:

          ssrun -hang -exp_type executable_name executable_args

     To get more detailed information about the run, add the -v flag, as in
     one of the following examples:

     ssrun -v -exp_type executable_name executable_args ssrun -v -hang
     -exp_type executable_name executable_args

     To run Purify on an executable, use the following:

          ssrun -purify executable_name executable_args

     Purify and performance experiments are mutually exclusive.

     ssrun takes additional arguments; see the ssrun(1) man page for further
     information.

EXPERIMENT TYPES
     The following experiment types are supported on all architectures:

     usertime       Returns CPU time, the time your program is actually
                    running plus the time the operating system is performing
                    services for your program.  The display generated by prof
                    breaks the program time down into the time used by each
                    function within the program.  Uses statistical callstack
                    profiling, based on CPU time, with a time sample interval
                    of 30 milliseconds.
                    Note: An o32 executable must explicitly link with -lexc
                    for this experiment to work. Program execution may show
                    significant slowdown compared to the original executable.
                    The stack unwind code sometimes fails to completely unwind
                    the stack; consequently, caller attribution cannot be done
                    beyond the point of failure.

     totaltime      Returns wall-clock time in a manner identical to that of
                    the usertime experiment. Uses statistical callstack
                    profiling, based on wall-clock time, with a time sample
                    interval of 30 milliseconds.


     [f]pcsamp[x]   Returns the estimated actual CPU time for each source code
                    line, machine code line, and function in your program.
                    Uses statistical PC sampling, using 16-bit bins, based on
                    user and system time, with a sample interval of 10
                    milliseconds.  If the optional f prefix is specified, a
                    sample interval of 1 millisecond will be used.  If the
                    optional x suffix is specified, a 32-bit bin size will be
                    used.

     bbcounts       Returns the calculated linear time of executed
                    instructions. This produces a complete call graph, but
                    does not take into consideration time spent in paging or
                    any reduction in time due to processor parallelism.  Uses
                    basic-block counting, done by instrumenting the
                    executable.

     fpe            Traces all floating-point exceptions.

     heap           Traces malloc and free calls and also supports various
                    options for debugging heap usage.  Use cvperf(1) to
                    display this information; it is not supported with
                    prof(1).

     io             Traces the following I/O system calls: read(2), readv(2),
                    pread(2), write(2), writev(2), pwrite(2), open(2),
                    close(2), dup(2), lseek(2), pipe(2), and creat(2).

     mpi            Traces calls to various MPI routines and generates a file
                    viewable in the prof(1) report generator.  For a list of
                    the routines that are traced, see the ssrun(1) manual
                    page.

     mpi_trace      Traces calls to various MPI routines and generates a file
                    viewable in the cvperf(1) performance analyzer window.
                    For a list of the routines that are traced, see the
                    ssrun(1) man page. This experiment is deprecated and will
                    be removed in a future release.

     On machines with hardware performance counters (R10000, R12000, R14000,
     and R16000 machines), the following additional types are supported:

     [f|s]gi_hwc    Uses statistical PC sampling, based on overflows of the
                    graduated-instruction counter (counter17), at an overflow
                    interval of 32771.  If the optional f prefix is used, the
                    overflow interval will be 6553.  If the optional s prefix
                    is used, the overflow interval will be 3999971.

     [f|s]cy_hwc    Uses statistical PC sampling, based on overflows of the
                    cycle counter (counter 0), at an overflow interval of
                    16411.  If the optional f prefix is used, the overflow
                    interval will be 3779.  If the optional s prefix is used,
                    the overflow interval will be 1999993.


     [f|s]ic_hwc    Uses statistical PC sampling, based on overflows of the
                    primary instruction-cache miss counter (counter 9), at an
                    overflow interval of 2053.  If the optional f prefix is
                    used, the overflow interval will be 419.  If the optional
                    s prefix is used, the overflow interval will be 524309.

     [f|s]isc_hwc   Uses statistical PC sampling, based on overflows of the
                    secondary instruction-cache miss counter (counter 10), at
                    an overflow interval of 131.  If the optional f prefix is
                    used, the overflow interval will be 29.  If the optional s
                    prefix is used, the overflow interval will be 65537.

     [f|s]dc_hwc    Uses statistical PC sampling, based on overflows of the
                    primary data-cache miss counter (counter 25), at an
                    overflow interval of 2053.  If the optional f prefix is
                    used, the overflow interval will be 419.  If the optional
                    s prefix is used, the overflow interval will be 524309.

     [f|s]dsc_hwc   Uses statistical PC sampling, based on overflows of the
                    secondary data-cache miss counter (counter 26), at an
                    overflow interval of 131.  If the optional f prefix is
                    used, the overflow interval will be 29.  If the optional s
                    prefix is used, the overflow interval will be 65537.

     [f|s]tlb_hwc   Uses statistical PC sampling, based on overflows of the
                    TLB miss counter (counter 23), at an overflow interval of
                    257.  If the optional f prefix is used, the overflow
                    interval will be 53.  If the optional s prefix is used,
                    the overflow interval will be 19997.

     [f|s]gfp_hwc   Uses statistical PC sampling, based on overflows of the
                    graduated floating-point instruction counter (counter 21),
                    at an overflow interval of 32771.  If the optional f
                    prefix is used, the overflow interval will be 6553.  If
                    the optional s prefix is used, the overflow interval will
                    be 3999971.

     [f|s]fsc_hwc   Uses statistical PC sampling, based on overflows of the
                    failed store conditionals counter (counter 5), at an
                    overflow interval of 2003.  If the optional f prefix is
                    used, the overflow interval will be 401.  If the optional
                    f prefix is used, the overflow interval will be 19997.

     prof_hwc       Uses statistical PC sampling, based on overflows of the
                    counter specified by the environment variable
                    _SPEEDSHOP_HWC_COUNTER_NUMBER, at an interval given by the
                    environment variable _SPEEDSHOP_HWC_COUNTER_OVERFLOW.
                    Note that these environment variables cannot be used to
                    override the counter number or interval for the other
                    defined experiments.  They are examined only when the
                    prof_hwc experiment is specified.  The default counter is
                    the primary instruction-cache miss counter and the default


                    overflow interval is 2053.

     gi_hwctime     Profiles the cycle counter using statistical call-stack
                    sampling, based on overflows of the graduated-instruction
                    counter (counter 17), at an overflow interval of 1000003.

     cy_hwctime     Profiles the cycle counter using statistical call-stack
                    sampling, based on overflows of the cycle counter (counter
                    16), at an overflow interval of 10000019.

     ic_hwctime     Profiles the cycle counter using statistical call-stack
                    sampling, based on overflows of the primary instruction-
                    cache-miss counter (counter 9), at an overflow interval of
                    8009.

     isc_hwctime    Profiles the cycle counter using statistical call-stack
                    sampling, based on overflows of the secondary
                    instruction-cache-miss counter (counter 10), at an
                    overflow interval of 2003.

     dc_hwctime     Profiles the cycle counter using statistical call-stack
                    sampling, based on overflows of the primary data-cache-
                    miss counter (counter 25), at an overflow interval of
                    8009.

     dsc_hwctime    Profiles the cycle counter using statistical call-stack
                    sampling, based on overflows of the secondary data-cache-
                    miss counter (counter 26), at an overflow interval of
                    2003.

     tlb_hwctime    Profiles the cycle counter using statistical call-stack
                    sampling, based on overflows of the TLB miss counter
                    (counter 23), at an overflow interval of 2521.

     gfp_hwctime    Profiles the cycle counter using statistical call-stack
                    sampling, based on overflows of the graduated floating-
                    point instruction counter (counter 21), at an overflow
                    interval of 10007.

     fsc_hwctime    Profiles the cycle counter using statistical call-stack
                    sampling, based on overflows of the failed store
                    conditionals counter (counter 5), at an overflow interval
                    of 5003.

     prof_hwctime   Profiles the counter specified by the environment variable
                    _SPEEDSHOP_HWC_COUNTER_PROF_NUMBER using statistical
                    call-stack sampling, based on overflows of the counter
                    specified by the environment variable
                    _SPEEDSHOP_HWC_COUNTER_NUMBER, at an interval given by the
                    environment variable _SPEEDSHOP_HWC_COUNTER_OVERFLOW.
                    Note that these environment variables can not be used to
                    override the counter numbers or interval for the other


                    defined experiments.  They are examined only when the
                    prof_hwctime experiment is specified.  The default
                    overflow and profiling counter is the cycle counter and
                    the default overflow interval is 10000019.

     On SGI's Origin systems with the ccNUMA architecture, the following
     additional type is supported:

     numa           Profiles ccNUMA memory access patterns by statistically
                    sampling the memory accesses made by the application.
                    Records information about the memory being accessed and
                    which ccNUMA node is making the access.

REPORT GENERATION
     Report generation is done through the prof(1) command:

          prof output file . . . output file

     The prof(1) command adds the data from all of the output files and
     produces a listing that depends on the particular experiment type.  For
     all experiments, it produces a list of functions, annotated with the
     appropriate metric.

     For [f]pcsamp[x], and the various _hwc experiments, the function list is
     annotated with the exclusive metric. For the PC sampling experiments, the
     metric is exclusive time; for the various hardware counter profiling
     experiments, the metric is exclusive counts.

     For bbcounts experiments, the function list is annotated with a cycle
     count and percentage, a cumulative percentage for that function and all
     others above it in the list, an estimated linear time, an instruction
     execution count, and a call count.  If the -b[utterfly] flag is added, a
     list of callers and callees of each function is also produced.

     For usertime and totaltime and the various _hwctime experiments, the
     function list is annotated with percentage of time or counts for the
     function, the time in that function, and the time or counts in that
     function and its descendants, and a count of the number of callstacks
     containing that function.  If the -b[utterfly] flag is added, a list of
     callers and callees of each function is also produced.

     For fpe experiments, the function list is annotated with the percentage
     of FPEs in that function, and counts for the function and its
     descendants.  If the -b[utterfly] flag is added, a list of callers and
     callees of each function is also produced.

     For io experiments, the function list is annotated with the percentage of
     IO calls in that function, and counts for the function and its
     descendants.  If the -b[utterfly] flag is added, a list of callers and
     callees of each function is also produced.


     For mpi experiments, a call site list is produced that is annotated with
     the number of MPI calls made by that call site and the total amount of
     time taken by those calls.

     For numa experiments, the number of memory accesses sampled, the number
     of remote memory accesses, the percentage of remote memory accesses, and
     the average ccNUMA routing distance are all reported.

     There are many additional options to prof; see the prof(1) man page for
     further details and examples of some of the displays.

CALIPER SAMPLES
     In the current releases, caliper samples can be recorded, and the
     -calipers option to prof will let you to see the data for any caliper
     setting.

     Caliper samples are supported in three different ways:

     First, you can explicitly link with the SpeedShop runtime DSO and call
     its API routine to record a caliper sample.

     Second, you can define a signal to be used to record a caliper sample by
     specifying the environment variable _SPEEDSHOP_CALIPER_POINT_SIG and send
     the target the specified signal.

     Third, you can set a caliper-sample trap in either dbx, or the WorkShop
     debugger.  In the current debuggers, this is done by planting a stop trap
     (breakpoint) and, when the process stops, evaluating the expression:

          ssrt_caliper_point(0, 0)

     The evaluation of the expression always returns zero, but a side effect
     of the evaluation is the recording of the appropriate data.  After
     evaluation, process execution may be resumed.

     See the ssapi(3) man page for further details.

USER ENVIRONMENT VARIABLE CONTROLS
     Various environment variables are normally used to control the operation
     of SpeedShop.  They are as follows:

     _SPEEDSHOP_VERBOSE
          Causes a log of each program's operation to be written to stderr.
          If it is set to an empty string, only major events are logged; if it
          is set to a non-empty string, more detailed events are logged.

     _SPEEDSHOP_SILENT
          If set, suppresses all output, other than fatal error messages from
          SpeedShop.  If both _SPEEDSHOP_VERBOSE and _SPEEDSHOP_SILENT are
          set, _SPEEDSHOP_SILENT wins.


     _SPEEDSHOP_CALIPER_POINT_SIG signal-number
          If specified, gives a signal number to be used for recording a
          caliper-point in the experiment.

     _SPEEDSHOP_POLLPOINT_CALIPER_POINT timer_type, timer_interval
          Sets a caliper point every timer_interval seconds.  The timer_type
          argument is one of the following:

          0    Real time, or wall-clock time.  This is the total time a
               program spent while executing.  It includes both time spent
               when a program is swapped out waiting for a CPU and the time
               the operating system is in control, performing some task for
               the program such as I/O or executing a system call.

          1    Process virtual time.  This is the time spent when the program
               is actually running.  This does not include either the time
               spent when a program is swapped out waiting for a CPU or the
               time the operating system is in control, performing some task
               for the program such as I/O or executing a system call.

          2    User and system time.  This is process virtual time plus the
               time the system is running on behalf of the process.  The
               system time could include performing I/O or executing system
               calls.

     _SPEEDSHOP_OUTPUT_DIRECTORY
          If specified, the output data files will be put in the named
          directory.

     _SPEEDSHOP_OUTPUT_FD
          If specified, gives the number of the file descriptor to be used for
          writing the output file.  Note: this option is not supported in the
          current release.

     _SPEEDSHOP_REUSE_FILE_DESCRIPTORS
          If set, opens and closes the file descriptors for the output files
          every time performance data is to be written.

     _SPEEDSHOP_OUTPUT_FILENAME
          If specified, the given name will be used for the output file;  if
          _SPEEDSHOP_OUTPUT_DIRECTORY is also specified, it will be prepended
          to the name.

     _SPEEDSHOP_HWC_COUNTER_NUMBER
          Specifies the overflow counter to be used for prof_hwc,
          prof_hwctime, or numa experiments.  Counters are numbered between 0
          and 31 and are described in the MIPS R10000 Microprocessor User's
          Manual and the MIPS R12000 Microprocessor User's Manual.  Counter 0
          counters are numbered 0-15, and counter 1 counters are numbers
          16-31.


     _SPEEDSHOP_HWC_COUNTER_OVERFLOW
          Specifies the overflow value for the counter to be used in prof_hwc,
          prof_hwctime, or numa experiments.  The value chosen may be any
          number greater than 0.  Some choices may produce data that is not
          statistically random, but rather reflects a correlation between the
          overflow interval and a cyclic behavior in the application.  Users
          may want to do two or more runs with different overflow values.
          This is unnecessary with the numa experiment as it randomly varies
          the real overflow value with every sample.

     _SPEEDSHOP_HWC_COUNTER_PROF_NUMBER
          Specifies the profiling counter to be used for prof_hwctime
          experiments.  Counters are numbered between 0 and 31, and are
          described in the MIPS R10000 Microprocessor User's Manual and the
          MIPS R12000 Microprocessor User's Manual.  Counter 0 counters are
          numbered 0-15, and counter 1 counters are numbers 16-31.

     _SPEEDSHOP_OUTPUT_NOCOMPRESS
          If set, disables the compression of performance data.

PROCESS TRACKING ENVIRONMENT VARIABLE CONTROLS
     The following environment variables are used for controlling the
     treatment of processes spawned from the original target:

     _SPEEDSHOP_TRACE_FORK {True|False}
          If True, specifies that processes spawned by calls to fork() will be
          monitored, if they do not call exec().  If they do call exec(), and
          _SPEEDSHOP_TRACE_FORK_TO_EXEC is not set to True, the data covering
          the time between the fork() and the exec() will be discarded.  It is
          True by default.  Note: in the current release, data will be
          recorded independent of whether the process calls exec() or not.

     _SPEEDSHOP_TRACE_FORK_TO_EXEC {True|False}
          If True, specifies that process spawned by calls to fork() will be
          monitored, even if they also call exec().  It is False by default.

     _SPEEDSHOP_TRACE_EXEC {True|False}
          If True, specifies that process spawned by calls to any of the
          various flavors of exec() will be monitored.  It is True by default.

     _SPEEDSHOP_TRACE_SPROC {True|False}
          If True, specifies that process spawned by calls to sproc() will be
          monitored.  It is True by default.

     _SPEEDSHOP_TRACE_SYSTEM {True|False}
          If True, specifies that process spawned by calls to system() will be
          monitored.  It is False by default.

     _SPEEDSHOP_TRACE_MPI_RANKS mpi-ranks
          If specified, specifies that only the list MPI ranks will be
          monitored.  This list is a comma-separated list with optional dash-
          separated ranges.  For example, "1-4,7".  Rank numbers are given in


          terms of the MPI_COMM_WORLD communicator.  Data is collected for ALL
          MPI ranks by default and this option is silently ignored for non-MPI
          executables.

EXPERT-MODE ENVIRONMENT VARIABLE CONTROLS
     The following additional environment variables are used for debugging and
     finer control of the operation of SpeedShop:

     _SPEEDSHOP_SAMPLING_MODE
          For PC-sampling and hardware-counter profiling, if set to 1, will
          generate data for the base executable only.  If it is not set, or
          set to anything other than 1, data is generated for the executable
          and all DSOs it uses.

     _SPEEDSHOP_INIT_DEFERRED_SIG signal-number
          If specified, initialization of the experiment will not be performed
          when the target process starts, but rather will be delayed until the
          specified signal is sent to the process.   A handler for the given
          signal will be installed when the process starts, and it is the
          users responsibility to ensure that it is not overridden by the
          target code.  If the process terminates before the signal is
          received, no data will be recorded.

     _SPEEDSHOP_INIT_DEFERRED
          If specified, initialization of the experiment will not be performed
          when the target process starts, but rather will be delayed until the
          application calls SpeedShop API routine ssrt_experiment_init.  If
          the process terminates before the signal is received, no data will
          be recorded.

     _SPEEDSHOP_SHUTDOWN_SIG signal-number
          If specified, termination of the experiment will not be performed
          when the target process exits, but rather will happen when the
          specified signal is sent to the process.   A handler for the given
          signal will be installed when the process starts, and it is the
          users responsibility to ensure that it is not overridden by the
          target code.  If the process terminates before the signal is
          received, data is recorded normally.

     _SPEEDSHOP_EXPERIMENT_TYPE
          Passes the name of the experiment to the runtime.  It is normally
          set by ssrun(1), but may be overwritten.

     _SPEEDSHOP_MARCHING_ORDERS
          Passes the marching orders of the experiment to the runtime.  It is
          normally set by ssrun(1) from the experiment type, but may be
          overwritten.

     _SPEEDSHOP_EXTRA_MARCHING_ORDERS
          Specifies additional marching orders.  This environment variable is
          useful when the experiment name is used to specify an experiment,
          but additional specification is also required via marching orders.


     _SPEEDSHOP_SBRK_BUFFER_LENGTH
          Defines the segment grow size for the internal malloc arena used.
          This arena is completely separate from the user's arena, and it
          usually grows in default segments of size 0x100000.

     _SPEEDSHOP_SBRK_BUFFER_ADDR
          Defines the preferred starting address to be used for the internal
          malloc arena. This option has to used with extreme care since it
          might result in memory region overlap.

     _SPEEDSHOP_FILE_BUFFER_LENGTH
          Defines the size of the buffer used for writing the experiment
          files.  The default length is 64 Kbytes.  The buffer is only used
          for writing many small records to the file (as in tracing
          experiments); large records are written directly, to avoid the
          buffering overhead.

     _SPEEDSHOP_DEBUG_NO_SIG_TRAPS
          If set, disables the normal setting of signal handlers for all fatal
          and exit signals.

     _SPEEDSHOP_DEBUG_NO_STACK_UNWIND
          If set, suppresses the stack unwind as done in usertime or other
          callstack-based experiments.  The option is used as a workaround for
          various unwind bugs in libexc.

     _SPEEDSHOP_RLD
          Defines the full path name to rld and enables rld profiling (for
          pcsamp and _hwc experiments only).  If the path name does not lead
          to rld, SpeedShop determines the correct path name automatically.
          For example, if you set _SPEEDSHOP_RLD to 1, SpeedShop will locate
          rld automatically.

     _SPEEDSHOP_INSTR_ARGS
          Defines additional instrumentation arguments.

INSTRUMENTATION
     Instrumentation is invoked automatically by ssrun(1) and, if necessary,
     for DSOs that are opened during a run by the runtime library.

     By default, instrumented executables and DSOs appear in the current
     working directory.  You can direct them to a directory of your choice by
     setting the _SPEEDSHOP_OUTPUT_DIRECTORY environment variable.

SPEEDSHOP API ROUTINES
     The SpeedShop API routines are defined in the include file
     SpeedShop/api.h, which is installed in /usr/include.  It defines three
     entry points, described int the SpeedShop API man page, ssapi(3).


SPEEDSHOP CUSTOM DATA CAPTURE ROUTINES
     The SpeedShop facility for users to add custom data capture routines is
     not available in the current release.

MISCELLANEOUS UTILITY PROGRAMS
     Several utility routines are provided in addition to the main
     functionality in SpeedShop.  They are:

     sscord, ssorder, and sswsextr
                    Generate cord feedback files from recorded data.  sswsextr
                    is a script to produce the working-set files used for cord
                    computations.  See their respective man pages for more
                    information.

     ssusage        A variant of time(1) that prints more information about
                    the resource usage of a program.  See the ssusage(1) man
                    page for more information.

     squeeze        Allocates and locks down memory, making the system behave
                    as if it had less physical memory that it really does.
                    See squeeze(1) for more information.

     thrash         Allocates memory and touches all of the pages in order to
                    force other pages out of the system's physical memory.
                    See the thrash(1) man page for more information.

CAVEATS
     The caveats described here may affect the results of SpeedShop
     experiments.

   R10000 Hardware Counter 14
     Revisions of the R10000 CPUs earlier than 3.1 differ from version 3.1 and
     later R10000 CPUs.  The difference is in the interpretation of counter
     number 14.  Before revision 3.1, counter 14 reflects the Virtual
     coherency condition.  With revision 3.1 and later R10000 releases,
     counter 14 reflects ALU/FPU completion cycles.  There are also some
     subtle differences in the semantics of some of the counters.  See
     r10k_counters(5) for more information.

     In systems with a homogeneous deployment of CPUs at the same revision,
     SpeedShop will adjust the reported information accordingly.

     For systems with a mixed deployment of CPU revisions, including some
     before 3.1 and some at or after 3.1, the interpretation of counter 14 is
     undefined, and there may be some slight inaccuracies due to aggregation
     of counters with different semantics across all CPUs.

     Use hinv -v to identify the revision levels for all CPUs.

   Pthreads
     Performance data for applications that use pthreads for usertime
     experiments using SIGALRM on IRIX 6.2-6.5 and for _hwctime experiments on


     IRIX 6.5 may be subject to minor inaccuracies.

SEE ALSO
     perfex(1), prof(1), squeeze(1), sscord(1), ssorder(1), ssrun(1),
     ssusage(1), sswsextr(1), thrash(1)

     fpe_ss(3), io_ss(3), malloc_ss(3), ssapi(3)

     r10k_counters(5), speedshop_restrictions(5)

     SpeedShop User's Guide


                                                                       Page 13