NAME PCPIntro - introduction to the Performance Co-Pilot (PCP) INTRODUCTION Performance Co-Pilot (PCP) is an SGI product designed for monitoring and managing system-level performance. These services are distributed and scalable to accommodate the most complex system configurations and performance problems. From the perspective of package installation managers (such as inst(1) on IRIX and rpm(1) on Linux), PCP is composed of two products. On Linux the products are: PCP Collector This is the part of PCP that collects and extracts performance data from various sources, e.g. the Linux /proc pseudo filesystem. It is available under GPL/LPGL from oss.sgi.com/projects/pcp. PCP Monitor This is the part of PCP that displays data collected from hosts (or archives) that have the PCP Collector installed. On Irix the products are: pcp_eoe For IRIX 6.5, this is included in the IRIX CD set. For other IRIX versions, this comes from the PCP product distribution. pcp From the PCP product distribution This manual entry describes the high-level features and options common to the images within both products on each platform. OVERVIEW The PCP architecture is distributed in the sense that any PCP tool may be executing remotely. On the host (or hosts) being monitored, each domain of performance metrics, whether IRIX, a service layer, a database management system, a web server, an application, etc. requires a Performance Metrics Domain Agent (PMDA) which is responsible for collecting performance measurements from that domain. All PMDAs are controlled by the Performance Metrics Collector Daemon (pmcd(1)) on the same host. Client applications (the monitoring tools) connect to pmcd(1), which acts as a router for requests, by forwarding requests to the appropriate PMDA and returning the responses to the clients. Clients may also access performance data from a PCP archive (created using pmlogger(1)) for retrospective analysis. The following performance monitoring applications may be launched directly from the command line, or from the PerfTools page of the IRIX Interactive Desktop (trademark) Icon Catalog. Each tool or command is documented completely in its own reference page. 1 PCPINTRO(1) PCPINTRO(1) oview Displays a three-dimensional visualization of the topology and performance of an Origin system. pmkstat Outputs an ASCII high-level summary of system performance. pmie An inference engine that can evaluate predicate-action rules to perform alarms and automate system management tasks. pmem ASCII report of memory usage by process. pminfo Interrogate specific performance metrics and the meta data that describes them. pmlogger Generates PCP archives of performance metrics suitable for replay by most PCP tools. pmval Simple periodic reporting for some or all instances of a performance metric. If the PCP product is installed (along with the associated valid PCP licenses) then the following additional tools are available. pmchart Displays trends over time of arbitrarily selected performance metrics from one or more hosts. osvis Displays a three-dimensional bar chart of high-level CPU, disk, memory and network activity. mpvis Displays a three-dimensional bar chart of multiprocessor CPU utilization. dkvis Displays a three-dimensional bar chart showing activity in the disk subsystem. pmgsys Displays a system-level visual monitor of a single host. nfsvis Displays a three-dimensional bar chart of Network File System (NFS) client and server activity. pmdumptext Produce ASCII reports for arbitrary combinations of performance metrics. COMMON COMMAND LINE ARGUMENTS There is a set of common command line arguments that are used consistently by most PCP tools. -a archive Performance metric information is retrospectively retrieved from the Performance Co-Pilot (PCP) archive, previously generated by pmlogger(1). The -a and -h options are mutually exclusive. -a archive[,archive,...] An alternate form of -a for applications that are able to handle multiple archives. -h hostname Unless directed to another host by the -h option, or to an archive by the -a option, the source of performance metrics will be the Performance Metrics Collector Daemon (PMCD) on the local host. The -a and -h options are mutually exclusive. -n pmnsfile Normally the distributed Performance Metrics Name Space (PMNS) is used, however if the -n option is specified an alternative local PMNS is loaded from the file pmnsfile. -s samples The argument samples defines the number of samples to be retrieved and reported. If samples is 0 or -s is not specified, the application will sample and report continuously (in real time mode) or until the end of the PCP archive (in archive mode). -z Change the reporting timezone to the local timezone at the host that is the source of the performance metrics, as identified via either the -h or -a options. -Z timezone By default, applications report the time of day according to the local timezone on the system where the application is executed. The -Z option changes the timezone to timezone in the format of the environment variable TZ as described in environ(5). INTERVAL SPECIFICATION AND ALIGNMENT Most PCP tools operate with periodic sampling or reporting, and the -t and -A options may be used to control the duration of the sample interval and the alignment of the sample times. -t interval Set the update or reporting interval. The interval argument is specified as a sequence of one or more elements of the form number[units] where number is an integer or floating point constant (parsed using strtod(3C)) and the optional units is one of: seconds, second, secs, sec, s, minutes, minute, mins, min, m, hours, hour, h, days, day and d. If the unit is empty, second is assumed. In addition, the upper case (or mixed case) version of any of the above is also acceptable. Spaces anywhere in the interval are ignored, so 4 days 6 hours 30 minutes, 4day6hour30min, 4d6h30m and 4d6.5h are all equivalent. Multiple specifications are additive, e.g. ``1hour 15mins 30secs'' is interpreted as 3600+900+30 seconds. -A align By default samples are not necessarily aligned on any natural unit of time. The -A option may be used to force the initial sample to be aligned on the boundary of a natural time unit. For example -A 1sec, -A 30min and -A 1hour specify alignment on whole seconds, half and whole hours respectively. The align argument follows the syntax for an interval argument described above for the -t option. Note that alignment occurs by advancing the time as required, and that -A acts as a modifier to advance both the start of the time window (see the next section) and the origin time (if the -O option is specified). TIME WINDOW SPECIFICATION Many PCP tools are designed to operate in some time window of interest, e.g. to define a termination time for real-time monitoring or to define a start and end time within a PCP archive log. In the absence of the -O and -A options to specify an initial sample time origin and time alignment (see above), the PCP application will retrieve the first sample at the start of the time window. The following options may be used to specify a time window of interest. -S starttime By default the time window commences immediately in real-time mode, or coincides with time at the start of the PCP archive log in archive mode. The -S option may be used to specify a later time for the start of the time window. The starttime parameter may be given in one of three forms (interval is the same as for the -t option as described above, ctime is described below): interval To specify an offset from the current time (in real-time mode) or the beginning of a PCP archive (in archive mode) simply specify the interval of time as the argument. For example -S 30min will set the start of the time window to be exactly 30 minutes from now in real-time mode, or exactly 30 minutes from the start of a PCP archive. -interval To specify an offset from the end of a PCP archive log, prefix the interval argument with a minus sign. In this case, the start of the time window precedes the time at the end of archive by the given interval. For example -S -1hour will set the start of the time window to be exactly one hour before the time of the last sample in a PCP archive log. @ctime To specify the calendar date and time (local time in the reporting timezone) for the start of the time window, use the ctime(3C) syntax preceded by an at sign. For example -S '@ Mon Mar 4 13:07:47 1996' -T endtime By default the end of the time window is unbounded (in real-time mode) or aligned with the time at the end of a PCP archive log (in archive mode). The -T option may be used to specify an earlier time for the end of the time window. The endtime parameter may be given in one of three forms (interval is the same as for the -t option as described above, ctime is described below): interval To specify an offset from the start of the time window simply use the interval of time as the argument. For example -T 2h30m will set the end of the time window to be 2 hours and 30 minutes after the start of the time window. -interval To specify an offset back from the time at the end of a PCP archive log, prefix the interval argument with a minus sign. For example -T -90m will set the end of the time window to be 90 minutes before the time of the last sample in a PCP archive log. @ctime To specify the calendar date and time (local time in the reporting timezone) for the end of the time window, use the ctime(3C) syntax preceded by an at sign. For example -T '@ Mon Mar 4 13:07:47 1996' -O origin By default samples are fetched from the start of the time window (see description of -S option) to the end of the time window (see description of -T option). The -O option allows the specification of an origin within the time window to be used as the initial sample time. This is useful for interactive use of a PCP tool with the pmtime(1) VCR replay facility. The origin argument accepted by -O conforms to the same syntax and semantics as the starttime argument for the -T option. For example -O -0 specifies that the initial position should be at the end of the time window; this is most useful when wishing to replay ``backwards'' within the time window. The ctime argument for the -O, -S and -T options is based upon the calendar date and time format of ctime(3C), but may be a fully specified time string like Mon Mar 4 13:07:47 1996 or a partially specified time like Mar 4 1996, Mar 4, Mar, 13:07:50 or 13:08. For any missing low order fields, the default value of 0 is assumed for hours, minutes and seconds, 1 for day of the month and Jan for months. Hence, the following are equivalent: -S '@ Mar 1996' and -S '@ Mar 1 00:00:00 1996'. If any high order fields are missing, they are filled in by starting with the year, month and day from the current time (real-time mode) or the time at the beginning of the PCP archive log (archive mode) and advancing the time until it matches the fields that are specified. So, for example if the time window starts by default at ``Mon Mar 4 13:07:47 1996'', then -S @13:10 corresponds to 13:10:00 on Mon Mar 4, 1996, while -S @10:00 corresponds to 10:00:00 on Tue Mar 5, 1996 (note this is the following day). For greater precision than afforded by ctime(3C), the seconds component may be a floating point number. Also the 12 hour clock (am/pm notation) is supported, so for example 13:07 and 1:07 pm are equivalent. PERFORMANCE METRICS - NAMES AND IDENTIFIERS The number of performance metric names supported by PCP in IRIX is of the order of a few thousand. There are fewer metrics on Linux, but still a considerable number. The PCP libraries and applications use an internal identification scheme that unambiguously associates a single integer with each known performance metric. This integer is known as the Performance Metric Identifier, or PMID. Although not a requirement, PMIDs tend to have global consistency across all systems, so a particular performance metric usually has the same PMID. For all users and most applications, direct use of the PMIDs would be inappropriate (e.g. this would limit the range of accessible metrics, make the code hard to maintain, force the user interface to be particularly baroque, etc.). Hence a Performance Metrics Name Space (PMNS) is used to provide external names and a hierarchic classification for performance metrics. A PMNS is represented as a tree, with each node having a label, a pointer to either a PMID (for leaf nodes) or a set of descendent nodes in the PMNS (for non-leaf nodes). A node label must begin with an alphabetic character, followed by zero or more characters drawn from the alphabetics, the digits and character `_' (underscore). For alphabetic characters in a node label, upper and lower case are distinguished. By convention, the name of a performance metric is constructed by concatenation of the node labels on a path through the PMNS from the root node to a leaf node, with a ``.'' as a separator. The root node in the PMNS is unlabeled, so all names begin with the label associated with one of the descendent nodes below the root node of the PMNS, e.g. kernel.percpu.syscall. Typically (although this is not a requirement) there would be at most one name for each PMID in a PMNS. For example kernel.all.cpu.idle and disk.dev.read are the unique names for two distinct performance metrics, each with a unique PMID. Groups of related PMIDs may be named by naming a non-leaf node in the PMNS tree, e.g. disk. There may be PMIDs with no associated name in a PMNS; this is most likely to occur when specific PMIDs are not available in all systems, e.g. if ORACLE is not installed on a system, there is no good reason to pollute the PMNS with names for all of the ORACLE performance metrics. Note also that there is no requirement for the PMNS to be the same on all systems, however in practice most applications would be developed against a stable PMNS that was assumed to be a subset of the PMNS on all systems. Indeed the PCP distribution includes a default local PMNS for just this purpose. The default local PMNS is located at $PCP_VAR_DIR/pmns/root however the environment variable PMNS_DEFAULT may be set to the full pathname of a different PMNS which will then be used as the default local PMNS. Most applications do not use the local PMNS, but rather import parts of the PMNS as required from the same place that performance metrics are fetched, i.e. from pmcd(1) for live monitoring or from a PCP archive for retrospective monitoring. To explore the PMNS use pminfo(1), or if the PCP product is installed the Metric Selection browser within pmchart(1). PERFORMANCE METRIC SPECIFICATIONS In configuration files and (to a lesser extent) command line options, metric specifications adhere to the following syntax rules. If the source of performance metrics is real-time from pmcd(1) then the accepted syntax is host:metric[instance1,instance2,...] If the source of performance metrics is a PCP archive log then the accepted syntax is archive/metric[instance1,instance2,...] The host:, archive/ and [instance1,instance2,...] components are all optional. The , delimiter in the list of instance names may be replaced by white space. Special characters in instance names may be escaped by surrounding the name in double quotes or preceding the character with a backslash. White space is ignored everywhere except within a quoted instance name. An empty instance is silently ignored, and in particular ``[]'' is the same as no instance, while ``[one,,,two]'' is parsed as specifying just the two instances ``one'' and ``two''. PMCD AND ARCHIVE VERSIONS Since PCP version 2, version information has been associated with pmcd(1) and PCP archives. The version number is used in a number of ways, but most noticeably for the distributed pmns(4). In PCP version 1, the client applications would load the PMNS from the default PMNS file but in PCP version 2, the client applications extract the PMNS information from pmcd(1) or a PCP archive. Thus in PCP version 2, the version number is used to determine if the PMNS to use is from the default local file or from the actual current source of the metrics. ENVIRONMENT In addition to the PCP run-time environment and configuration variables described in the PCP ENVIRONMENT section below, the following environment variables apply to all installations. PCP_STDERR Many PCP tools support the environment variable PCP_STDERR, which can be used to control where error messages are sent. When unset, the default behavior is that ``usage'' messages and option parsing errors are reported on standard error, other messages after initial startup are sent to the default destination for the tool, i.e. standard error for ASCII tools, or a dialog for GUI tools. If PCP_STDERR is set to the literal value DISPLAY then all messages will be displayed in a dialog. This is used for any tools launched from the IRIX Interactive Desktop (trademark) or from the PerfTools icon catalog page. If PCP_STDERR is set to any other value, the value is assumed to be a filename, and all messages will be written there. PCP_USE_STDERR This environment variable, previously used by pmlaunch(5), pmgsys(1), pmview(1) and the pmview(1) front-end scripts (such as mpvis(1)), has been deprecated from the PCP 2.0 release onward and replaced by PCP_STDERR. PMCD_CONNECT_TIMEOUT When attempting to connect to a remote pmcd(1) on a machine that is booting, the connection attempt could potentially block for a long time until the remote machine finishes its initialization. Most PCP applications and some of the PCP library routines will abort and return an error if the connection has not been established after some specified interval has elapsed. The default interval is 5 seconds. This may be modified by setting PMCD_CONNECT_TIMEOUT in the environment to a real number of seconds for the desired timeout. This is most useful in cases where the remote host is at the end of a slow network, requiring longer latencies to establish the connection correctly. PMCD_RECONNECT_TIMEOUT When a monitor or client application loses a connection to a pmcd(1), the connection may be re-established by calling a service routine in the PCP library. However, attempts to reconnect are controlled by a back-off strategy to avoid flooding the network with reconnection requests. By default, the back-off delays are 5, 10, 20, 40 and 80 seconds for consecutive reconnection requests from a client (the last delay will be repeated for any further attempts after the fifth). Setting the environment variable PMCD_RECONNECT_TIMEOUT to a comma separated list of positive integers will re-define the back-off delays, e.g. setting PMCD_RECONNECT_TIMEOUT to ``1,2'' will back-off for 1 second, then attempt another connection request every 2 seconds thereafter. PMCD_REQUEST_TIMEOUT For monitor or client applications connected to pmcd(1), there is a possibility of the application "hanging" on a request for performance metrics or metadata or help text. These delays may become severe if the system running pmcd crashes, or the network connection is lost. By setting the environment variable PMCD_REQUEST_TIMEOUT to a real number of seconds, requests to pmcd will timeout after this number of seconds. The default behavior is to be willing to wait 10 seconds for a response from every pmcd for all applications. PMCD_WAIT_TIMEOUT When pmcd(1) is started from $PCP_RC_DIR/pcp then the primary instance of pmlogger(1) will be started if the configuration flag pmlogger is chkconfig'ed on, some key applications from the pcp.sw.base subsystem are installed and pmcd is running and accepting connections. The check on pmcd's readiness will wait up to PMCD_WAIT_TIMEOUT seconds. If pmcd has a long startup time (such as on a very large system), then PMCD_WAIT_TIMEOUT can be set to provide a maximum wait longer than the default 60 seconds. PMNS_DEFAULT If set, then interpreted as the the full pathname to be used as the default local PMNS for pmLoadNameSpace(3). Otherwise, the default local PMNS is located at $PCP_VAR_DIR/pcp/pmns/root for base PCP installations. PCP_COUNTER_WRAP Many of the performance metrics exported from PCP agents have the semantics of counter meaning they are expected to be monotonically increasing. Under some circumstances, one value of these metrics may smaller than the previously fetched value. This can happen when a counter of finite precision overflows, or when the PCP agent has been reset or restarted, or when the PCP agent is exporting values from some underlying instrumentation that is subject to some asynchronous discontinuity. The environment variable PCP_COUNTER_WRAP may be set to indicate that all such cases of a decreasing ``counter'' should be treated as a counter overflow, and hence the values are assumed to have wrapped once in the interval between consecutive samples. This ``wrapping'' behavior was the default in earlier PCP versions, but by default has been disabled in PCP release from version 1.3 on. PCP_LICENCE_NOWARNING or PCP_LICENSE_NOWARNING Many of the PCP client programs require that a valid software license be present on the host on which the client is running (the license is node-locked). In the case that such a valid license is present, but is due to expire within the next 30 days, a message or popup notifier appears informing the user of this condition. These warnings can be disabled by setting PCP_LICENCE_NOWARNING or PCP_LICENSE_NOWARNING in the environment. PMDA_PATH The PMDA_PATH environment variable may be used to modify the search path used by pmcd(1) and pmNewContext(3) (for PM_CONTEXT_LOCAL contexts) when searching for a daemon or DSO PMDA. The syntax follows that for PATH in sh(1), i.e. a colon separated list of directories, and the default search path is ``/var/pcp/lib:/usr/pcp/lib'', (or ``/var/lib/pcp/lib'' on Linux, depending on the value of the $PCP_VAR_DIR environment variable). PMCD_PORT The TPC/IP port(s) used by pmcd(1) to create the socket for incoming connections and requests, was historically 4321 and more recently the officially registered port 44321; in the current release, both port numbers are used by default as a transitional arrangement. This may be over-ridden by setting PMCD_PORT to a different port number, or a comma-separated list of port numbers. If a non-default port is used when pmcd(1) is started, then every monitoring application connecting to that pmcd(1) must also have PMCD_PORT set in their environment before attempting a connection. The following environment variables are relevant to installations in which pmlogger(1), the PCP archive logger, is used. PMLOGGER_PORT The environment variable PMLOGGER_PORT may be used to change the base TCP/IP port number used by pmlogger(1) to create the socket to which pmlc(1) instances will try and connect. The default base port number is 4330. When used, PMLOGGER_PORT should be set in the environment before pmlogger(1) is executed. If you have the PCP product installed, then the following environment variables are relevant to the Performance Metrics Domain Agents (PMDAs). PMDA_LOCAL_PROC If set, then a context established with the type of PM_CONTEXT_LOCAL will have access to the ``proc'' PMDA to retrieve performance metrics about individual processes. PMDA_LOCAL_SAMPLE If set, then a context established with the type of PM_CONTEXT_LOCAL will have access to the ``sample'' PMDA if this optional PMDA has been installed locally. PMIECONF_PATH If set, pmieconf(1) will form its pmieconf(4) specification (set of parameterized pmie(1) rules) using all valid pmieconf files found below each subdirectory in this colon-separated list of subdirectories. If not set, the default is $PCP_VAR_DIR/config/pmieconf. FILES /etc/pcp.conf Configuration file for the PCP runtime environment, see pcp.conf(4). $PCP_RC_DIR/pcp Script for starting and stopping pmcd(1). $PCP_PMCDCONF_PATH Control file for pmcd(1). $PCP_PMCDOPTIONS_PATH Command line options passed to pmcd(1) when it is started from $PCP_RC_DIR/pcp. All the command line option lines should start with a hyphen as the first character. This file can also contain environment variable settings of the form "VARIABLE=value". $PCP_BINADM_DIR Location of PCP utilities for collecting and maintaining PCP archives, PMDA help text, PMNS files etc. $PCP_PMDAS_DIR Parent directory of the installation directory for Dynamic Shared Object (DSO) PMDAs. $PCP_LOG_DIR/pmcd Default location of log files for pmcd(1), current directory for running PMDAs. Archives generated by pmlogger(1) are generally below $PCP_LOG_DIR/pmlogger. $PCP_LOG_DIR/pmcd/pmcd.log Diagnostic and status log for the current running pmcd(1) process. The first place to look when there are problems associated with pmcd. $PCP_LOG_DIR/pmcd/pmcd.log.prev Diagnostic and status log for the previous pmcd(1) instance. $PCP_LOG_DIR/NOTICES Log of pmcd(1) and PMDA starts, stops, additions and removals. $PCP_VAR_DIR/config Contains directories of configuration files for several PCP tools. $PCP_VAR_DIR/config/pmcd/rc.local Local script for controlling PCP boot, shutdown and restart actions. $PCP_VAR_DIR/pmns Directory containing the set of PMNS files for all installed PMDAs. $PCP_VAR_DIR/pmns/root The ASCII pmns(4) exported by pmcd(1) by default. This PMNS is be the super set of all other PMNS files installed in $PCP_VAR_DIR/pmns. In addition, if the PCP product is installed the following files and directories are relevant. $PCP_LOG_DIR/NOTICES In addition to the pmcd(1) and PMDA activity, may be used to log alarms and notices from pmie(1) via pmpost(1). $PCP_VAR_DIR/config/pmlogger/control Control file for pmlogger(1) instances launched from $PCP_RC_DIR/pcp and/or managed by pmlogger_check(1) and pmlogger_daily(1) as part of a production PCP archive collection setup. $PCP_VAR_DIR/config/pmsnap/control Control file for pmsnap(1) to produce GIF images of recent performance as displayed by pmchart(1) from PCP archives. $PCP_DEMOS_DIR Contains examples for using a variety of PCP tools and the PCP online tutorial. PCP ENVIRONMENT Environment variables with the prefix PCP_ are used to parameterize the file and directory names used by PCP. On each installation, the file /etc/pcp.conf contains the local values for these variables. The $PCP_CONF variable may be used to specify an alternative configuration file, as described in pcp.conf(4). SEE ALSO oview(1), pmcd(1), pmem(1), pmie(1), pminfo(1), pmkstat(1), pmlogger(1), pmval(1), pcp.conf(4), pcp.env(4), pmns(4) and pmlaunch(5). If the Performance Co-Pilot product is installed, then the following entries are also relevant: pmlogger_daily(1), dkvis(1), mpvis(1), nfsvis(1), osvis(1), pcp(1), pmchart(1), pmdumptext(1), pmgevctr(1) and pmgsys(1). Also refer to the Insight books Performance Co-Pilot User's and Administrator's Guide and Performance Co-Pilot Programmer's Guide. If you have the PCP product, relevant information is also available from the on-line PCP Tutorial. Provided the pcp.man.tutorial subsystem from the PCP images has been installed, access the URL file:$PCP_DOC_DIR/Tutorial/index.html from your web browser. Page 13