libperfex(3C)




LIBPERFEX(3C)                                                    LIBPERFEX(3C)


NAME
     libperfex, start_counters, read_counters, print_counters, print_costs,
     load_costs - A procedural interface to processor event counters

C SYNOPSIS
         int start_counters( int e0, int e1 );
         int read_counters( int e0, long long *c0, int e1, long long *c1);
         int print_counters( int e0, long long c0, int e1, long long c1);
         int print_costs( int e0, long long c0, int e1, long long c1);
         int load_costs(char *CostFileName);


FORTRAN SYNOPSIS
          INTEGER*8 c0, c1
          INTEGER   e0, e1
          CHARACTER(*n) CostFileName
          INTEGER*4 function start_counters( e0, e1 )
          INTEGER*4 function read_counters( e0, c0, e1, c1 )
          INTEGER*4 function print_counters( e0, c0, e1, c1 )
          INTEGER*4 function print_costs( e0, c0, e1, c1 )
          INTEGER*4 function load_costs( CostFileName )

DESCRIPTION
     These routines provide simple access to the hardware event counters.  The
     arguments e0 and e1 are int types specifying which events to count.  For
     descriptions of the counters themselves, see the perfex(1) or the
     r10k_counters(5) man page.

     The counts are returned in the long long arguments c0 and c1. The
     print_counters routine prints the counts to standard error.  Two events
     which must be counted on the same hardware counter will cause a
     conflicting counters error. The arguments e0 and e1 can be overridden by
     setting the environment variables T5_EVENT0 and T5_EVENT1.  Calls to
     start_counters implicitly zero out the internal software counters before
     starting them, while read_counters implicitly stops the counters after
     reading them. Thus if you want to accumulate counts over multiple
     start/read calls, you must save out the counts and do the accumulation
     yourself. Each of these incurs a typical system call overhead which can
     amount to a few hundred microseconds per start/stop call pair. The
     print_counters procedure is just a formatting convenience and has no
     effect on the state of the counters.

TIME ESTIMATES AND COST TABLES
     The print_costs routine prints the counts together with approximate time
     estimates as described for perfex.  The load_costs procedure allows a
     cost table to be loaded from a file.  Cost ranges reflect both the width
     of the cost distribution and uncertainty in the degree of overlap.  Costs
     for different events are not exclusive due to the overlap of one type of
     event with others.  By default a table of costs for each event
     appropriate to the host system is used.  If the file /etc/perfex.costs is
     present, it is used instead. If a table is loaded using load_costs, it is
     used instead. Partial tables may be used; the remaining entries taking


     the appropriate default values. The format of the cost table and how to
     dump the default cost table for the current system are described in the
     perfex manpage.

DIAGNOSTICS
     Normal completion returns a positive integer, the generation number.
     Negative return values signal an error.  The underlying operating system
     counter interface may produce other errors.  Among these are indications
     that the counter resource is in use, possibly because of monitoring by
     another user on the system (with root privileges) in system mode.  The
     generation numbers increment on a per-process basis each time the
     counters are enabled or stopped.  Because the counters may have been used
     by a supervisor-privilege process after being enabled by the user with
     start_counters, but before being read by read_counters, all correct uses
     of the interface must check that the generation number returned by
     read_counters matches the generation number from the corresponding
     start_counters, for each process.  For example, To collect instruction
     and data scache miss counts on a program normally executed by
     int e0,e1
     long long c0,c1;
     int gen_start, gen_read

     /* ... set e0 and e1  ... */

     if((gen_start=start_counters(e0,e1)) < 0) {
         perror("start_counters");
       }

     /* user code */

     if((gen_read=read_counters(e0,&c0,e1,&c1))<0) {
        perror("read_counters");
       }

     if(gen_read != gen_start) {
        perror("lost counters!, aborting...");
        exit(1);
       }

     /* do something with c0,c1 */
     illustrates correct use. Counter measurements are typically not precisely
     reproducible for most events.  For example, counts of cache misses depend
     on the change in the cache population due to other processes when the
     target process is not running.  Invalidation and intervention counts may
     depend on the temporal ordering of memory accesses by different
     processors, which is not guaranteed.  Issued instruction counts depend on
     the R10000's dynamic scheduling of instructions, which in turn can depend
     on the pattern of instruction cache misses.  Since the cache misses are
     affected by the runtime environment as noted above, this count is also
     not precisely reproducible.  In almost all cases, however, while the
     exact count may not be reproducible, the amount of time attributable to
     any fluctuations is a tiny fraction of the execution time.  Conversely,


     any event which accounts for an important fraction of the execution time
     will have a count with small relative fluctuations from run to run.

RESTRICTIONS
     This interface is not reentrant or (pthread) thread-safe.  The
     start_counters, read_counters, print_counters and print_costs, routines
     can be called by individual sproc threads.  However there is only
     provision for a single global cost table, so load_costs should be called
     only in a sequential region, and that cost table must be applied to the
     counts from all threads.

FILES
     /usr/lib/libperfex.so /usr/lib32/libperfex.so /usr/lib64/libperfex.so

DEPENDENCIES
     These procedures are only available on systems with hardware performance
     counters (systems with R10000 or R12000 processors).  Use on systems with
     mixed processor types will have undefined results in systems with mixed
     processor types.

     Manipulating the counters with these routines and simultaneously through
     perfex or the OS counter interface ioctl procedures can produce an error
     if, for example, the counters are enabled twice without being released in
     the interim.

SEE ALSO
     perfex(1), r10k_counters(5), abi(5), mips4(5), mips3(5)


                                                                        Page 3