proc(4) proc(4) NAME proc - process (debug) filesystem SYNOPSIS #include <sys/procfs.h> DESCRIPTION /proc is a filesystem that provides access to the image of each active process in the system. This was historically mounted as /debug. /proc does not consume any disk resources. This interface provides a richer set of functionality and replaces the now obsolete dbg(4), debug(4) interface. The "files" of this filesystem are of the form /proc/nnnnn and /proc/pinfo/nnnnn, where nnnnn is a decimal number corresponding to the process-ID. These files actually consume no disk space, and are only convenient handles by which a debugger can attach to a process. The owner of each ``file'' is determined by the process's user-ID. Files of the form /proc/nnnnn have permission mode 0600 while files of the form /proc/pinfo/nnnnn have permission mode 0444. The /proc/pinfo files are intended for use by unprivileged programs that wish to access miscellaneous process information such as that provided by ps(1) and top(1). The statfs(2) system call will return valid information concerning the proc filesystem. The total and free blocks as reported by df(1) respectively represent the total virtual memory (real memory plus swap space) available and currently free. Standard system call interfaces are used to access /proc files: open(2), close(2), read(2), write(2), and ioctl(2). Note that read(2) and write(2) are not allowed for /proc/pinfo files. Furthermore only the PIOCACINFO, PIOCPSINFO, PIOCUSAGE, PIOCGETPTIMER and PIOCCRED commands may be specified to ioctl(2) for /proc/pinfo files. An open for reading and writing enables process control; a read-only open allows inspection but not control. As with ordinary files, more than one process can open the same /proc file at the same time. Exclusive open is provided to allow controlling processes to avoid collisions: an open(2) for writing that specifies O_EXCL fails if the file is already open for writing; if such an exclusive open succeeds, subsequent attempts to open the file for writing, with or without the O_EXCL flag, fail until the exclusively- opened file descriptor is closed. (Exception: a superuser open(2) that does not specify O_EXCL succeeds even if the file is exclusively opened.) There can be any number of read-only opens, even when an exclusive write open is in effect on the file. On a successful open the inherit-on-fork (PR_FORK) and run-on-last-close (PR_RLC) flags are set by default, if no other process has the file open. On the last close for writing, if the kill-on-last-close (PR_KLC) or the PR_RLC flags are set, then all the controlling flags are cleared and either a SIGKILL is sent to the process or the process is set running again. If neither of the above two flags are set, the controlling flags are not cleared. Data may be transferred from or to any locations in the traced process's address space by applying lseek(2) to position the file at the virtual address of interest followed by read(2) or write(2). The PIOCMAP operation can be applied to determine the accessible areas (mappings) of the address space. A contiguous area of the address space may appear as multiple mappings due to varying read/write/execute permissions. I/O transfers may span contiguous mappings. An I/O request extending into an unmapped area is truncated at the boundary. Information and control operations are provided through ioctl(2). These have the form: #include <sys/types.h> #include <sys/signal.h> #include <sys/fault.h> #include <sys/syscall.h> #include <sys/procfs.h> void *p; retval = ioctl(fildes, code, p); The argument p is a generic pointer whose type depends on the specific ioctl code. Where not specifically mentioned below, its value should be zero. <sys/procfs.h> contains definitions of ioctl codes and data structures used by the operations. Process information and control operations involve the use of sets of flags. The set types sigset_t, fltset_t, and sysset_t correspond, respectively, to signal, fault, and system call enumerations defined in <sys/signal.h>, <sys/fault.h>, and <sys/syscall.h>. Each set type is large enough to hold flags for its own enumeration. Although they are of different sizes, they have a common structure and can be manipulated by these macros: prfillset(&set); /* turn on all flags in set */ premptyset(&set); /* turn off all flags in set */ praddset(&set, flag); /* turn on the specified flag */ prdelset(&set, flag); /* turn off the specified flag */ r = prismember(&set, flag); /* != 0 iff flag is turned on */ One of prfillset() or premptyset() must be used to initialize set before it is used in any other operation. flag must be a member of the enumeration corresponding to set. IOCTL CODES The allowable ioctl codes follow. Certain of these can be used only if the process file descriptor is open for writing; these include all operations that affect process control. Those requiring write access are marked with an asterisk (*). Except where noted, an ioctl to a process that has terminated elicits the error ENOENT. PIOCSTATUS PIOCSTATUS returns status information for the process; p is a pointer to a prstatus structure containing at least the following fields: typedef struct prstatus { long pr_flags; /* Flags */ short pr_why; /* Reason for stop (if stopped) */ short pr_what; /* More detailed reason */ short pr_cursig; /* Current signal */ sigset_t pr_sigpend; /* Set of pending signals */ sigset_t pr_sighold; /* Set of held signals */ struct siginfo pr_info; /* Info associated with signal/fault */ struct sigaltstack pr_altstack; /* Alternate signal stack info */ struct sigaction pr_action;/* Signal action for current signal */ short pr_syscall; /* System call # (if in syscall) */ short pr_nsysarg; /* # of arguments to this syscall */ long pr_errno; /* Error number from system call */ long pr_rval1; /* System call return value 1 */ long pr_rval2; /* System call return value 2 */ long pr_sysarg[PRSYSARGS]; /* Arguments to this syscall */ pid_t pr_pid; /* Process id */ pid_t pr_ppid; /* Parent process id */ pid_t pr_pgrp; /* Process group id */ pid_t pr_sid; /* Session id */ timespec_t pr_utime; /* Process user cpu time */ timespec_t pr_stime; /* Process system cpu time */ timespec_t pr_cutime; /* Sum of children's user times */ timespec_t pr_cstime; /* Sum of children's system times */ char pr_clname[8]; /* Scheduling class name */ long pr_instr; /* Current instruction */ gregset_t pr_reg; /* General registers */ } prstatus_t; pr_flags is a bit-mask holding these flags: PR_STOPPED Process is stopped PR_ISTOP Process is stopped on an event of interest (see PIOCSTOP). PR_DSTOP Process has a stop directive in effect (see PIOCSTOP). PR_STEP Process has a single-step directive in effect (see PIOCRUN). PR_ASLEEP Process is in an interruptible sleep within a system call. PR_PCINVAL Process's current instruction (pr_instr) is undefined. PR_ISSYS Process is a system process (see PIOCSTOP). PR_FORK Process has its inherit-on-fork flag set (see PIOCSET). PR_RLC Process has its run-on-last-close flag set (see PIOCSET). PR_KLC Process has its kill-on-last-close flag set (see PIOCSET). PR_PTRACE Process is being traced via ptrace(2). pr_why and pr_what together describe, for a stopped process, the reason that the process is stopped. Possible values of pr_why are: PR_REQUESTED The stop occurred in response to a stop directive, normally because PIOCSTOP was applied. pr_what is unused in this case. PR_SIGNALLED The process stopped on receipt of a signal (see PIOCSTRACE); pr_what holds the signal number that caused the stop (for a newly-stopped process, the same value is in pr_cursig). PR_FAULTED The process stopped on incurring a hardware fault (see PIOCSFAULT); pr_what holds the fault number that caused the stop. PR_SYSENTRY and PR_SYSEXIT A stop on entry to or exit from a system call (see PIOCSENTRY and PIOCSEXIT); pr_what holds the system call number. PR_JOBCONTROL The process stopped due to the default action of a job control stop signal (see sigaction(2)); pr_what holds the stopping signal number. pr_cursig names the current signal, that is, the next signal to be delivered to the process. pr_sigpend identifies any other signals pending for the process. pr_sighold identifies those signals whose delivery is being delayed if sent to the process. pr_info, when the process is in a PR_SIGNALLED or PR_FAULTED stop, contains additional information pertinent to the particular signal or fault (see <sys/siginfo.h>). pr_altstack contains the alternate signal stack information for the process (see sigaltstack(2)). pr_action contains the signal action information pertaining to the current signal (see sigaction(2)); it is undefined if pr_cursig is zero. pr_syscall is the number of the system call, if any, being executed by the traced process; it is non-zero if the process is stopped on PR_SYSENTRY or PR_SYSEXIT, is asleep within a system call (PR_ASLEEP is set), or is stopped on a watchpoint trap incurred within a system call (see PIOCSWATCH). If pr_syscall is non-zero, pr_nsysarg is the number of arguments to the system call and the pr_sysarg array contains the actual arguments; pr_errno contains the value of errno returned at the last system call; and pr_rval1 and pr_rval2 contain the return values from the last system call. pr_pid, pr_ppid, pr_pgrp, and pr_sid are, respectively, the process id, the id of the process's parent, the process's process group id, and the process's session id. pr_utime, pr_stime, pr_cutime, and pr_cstime are, respectively, the user CPU and system CPU time consumed by the process, and the cumulative user CPU and system CPU time consumed by the process's children, in seconds and nanoseconds. pr_clname contains the name of the process's scheduling class. pr_instr contains the machine instruction to which the program counter refers. The amount of data retrieved from the process is machine- dependent; on SGI machines, it is a 32-bit word. In general, the size is that of the machine's smallest instruction. If PR_PCINVAL is set, pr_instr is undefined; this occurs whenever the process is not stopped or when the program counter refers to an invalid address. pr_reg is an array holding the contents of the general registers for a stopped process. For SGI machines the structure gregset_t is defined in <sys/ucontext.h>. If the process is not stopped, register values are undefined. PIOCTHREAD PIOCTHREAD returns thread-specific information. p is a pointer to a prthreadctl_t structure containing the following fields: typedef struct prthreadctl { tid_t pt_tid; /* Id of the designated thread */ int pt_cmd; /* Command value for ioctl */ int pt_flags; /* Flags governing use of pt tid */ caddr_t pt_data; /* Data pointer for command. */ } prthreadctl_t; Possible values of pt_cmd are: PIOCGREG get general registers PIOCSREG set general registers PIOCGFPREG get floating-point registers PIOCSFPREG set floating-point registers PIOCSTATUS get process status PIOCPSINFO get ps(1) information PIOCSTOP stop process thread(s) from running PIOCWSTOP wait for process thread(s) to stop PIOCRUN make process runnable PIOCSSIG set current signal PIOCOPENM open mapped object for reading PIOCNMAP get number of memory mappings PIOCMAP get memory map information PIOCMAP_SGI get extended memory map information PIOCPGD_SGI get page table information PIOCNWATCH get number of watch points PIOCSWATCH set watch point PIOCTLBMISS turn utlbmiss counting on/off PIOCGUTID get uthread id(s) PIOCGHOLD get signal-hold mask PIOCSHOLD set signal-hold mask PIOCUNKILL delete a signal PIOCCFAULT clear current fault PIOCREAD read from target address space PIOCENEVCTRTHREADS enable event counters for uthread; only for R10000/R12000 event counters. PIOCGETEVCTRTHREADS dump out the counters for uthread; only for R10000/R12000 event counters. PIOCSETEVCTRTHREADS set event counters for uthread; only for R10000/R12000 event counters. PIOCRELEVCTRTHREADS release/stop event counters for thread; only for R10000/R12000 event counters. pt_flags is a bit-mask holding these flags: PTF_DIR Flags giving direction. PTF_SET Flags defining set of threads. PTFD_EQL Only threads with exact tid. PTFD_GEQ Only threads with equal or greater tid. PTFD_GTR Only threads with greater tid. PTFD_MAX Max valid direction. PTFS_ALL Set includes all threads. PTFS_STOPPED Set includes stopped threads. PTFS_EVENTS Set includes threads with new events. PTFS_MAX Max valid set of threads. pt_tid is the thread id. pt_data describes the data to be returned by the ioctl cmd. The following section of code shows an example of use for this interface: #include <stdio.h> #include <errno.h> #include <sys/hwperfmacros.h> #include <sys/fcntl.h> #include <sys/hwperftypes.h> #include <sys/procfs.h> #include <sys/pthread.h> static int fd; void *function1(void *arg); main() { pid_t pid = getpid(); pthread_t tid[3]; char pname[32]; pthread_attr_t pthread_attr; int status; sprintf(pname, "/proc/%010d", pid); if ((fd = open(pname, O_RDONLY)) < 0) { perror("open"); exit(-1); } /* Initializes the thread attributes to default */ status = pthread_attr_init(&pthread_attr); if (status) { perror("pthread_attr_init()"); exit(-1); } /* create one pthread -- tid 0 */ if (pthread_create(&tid[0], &pthread_attr, function1, (void*)0)) { perror("pthread_create"); exit(-1); } /* wait for pthread to finish */ pthread_join(tid[0], NULL); } void * function1(void *arg) { prthreadctl_t ptc; hwperf_profevctrarg_t evctr_args; hwperf_cntr_t cnts; int i; ptc.pt_tid = (int)arg; /* thread id 0 */ ptc.pt_flags = PTFD_GEQ | PTFS_ALL; ptc.pt_cmd = PIOCENEVCTRTHREADS; /* enable event counters */ ptc.pt_data = (caddr_t)&evctr_args; if (ioctl(fd, PIOCTHREAD, &ptc) < 0) { perror("PIOCENEVCTRTHREADS"); exit(-1); } ptc.pt_cmd = PIOCGETEVCTRTHREADS; /* read event counters */ ptc.pt_data = (caddr_t)&cnts; if ((ioctl(fd, PIOCTHREAD, (void *)&ptc)) < 0) { perror("PIOCGETEVCTRTHREADS"); ptc.pt_cmd = PIOCRELEVCTRTHREADS; ioctl(fd, PIOCTHREAD, (void *)&ptc); exit(-1); } /* print event counters */ for(i = 0; i < HWPERF_EVENTMAX; i++) { printf("cnts.hwp_evctr[%d] %lld0, i, cnts.hwp_evctr[i]); } ptc.pt_cmd = PIOCRELEVCTRTHREADS; /* release event counters */ ioctl(fd, PIOCTHREAD, (void *)&ptc); return(0); } *PIOCSTOP PIOCWSTOP PIOCSTOP directs the process to stop and waits until it has stopped; PIOCWSTOP simply waits for the process to stop. These operations complete when the process stops on an event of interest, immediately if already so stopped. If p is non-zero it points to an instance of prstatus_t to be filled with status information for the stopped process. An ``event of interest'' is either a PR_REQUESTED stop or a stop that has been specified in the process's tracing flags (set by PIOCSTRACE, PIOCSFAULT, PIOCSENTRY, and PIOCSEXIT). A PR_JOBCONTROL stop is specifically not an event of interest. (A process may stop twice due to a stop signal, first showing PR_SIGNALLED if the signal is traced and again showing PR_JOBCONTROL if the process is set running without clearing the signal.) If the process is controlled by ptrace(2), it comes to a PR_SIGNALLED stop on receipt of any signal; this is an event of interest only if the signal is in the traced signal set. If PIOCSTOP is applied to a process that is stopped, but not on an event of interest, the stop directive takes effect when the process is restarted by the competing mechanism; at that time the process enters a PR_REQUESTED stop before executing any user-level code. ioctl()s are interruptible by signals so that, for example, an alarm(2) can be set to avoid waiting forever for a process that may never stop on an event of interest. If PIOCSTOP is interrupted, the stop directive remains in effect even though the ioctl() returns an error. A system process (indicated by the PR_ISSYS flag) never executes at user level, has no user-level address space visible through /proc, and cannot be stopped. Applying PIOCSTOP or PIOCWSTOP to a system process elicits the error EBUSY. *PIOCRUN The traced process is made runnable again after a stop. If p is non-zero it points to a prrun structure describing additional actions to be performed. The prrun structure contains at least the following fields: typedef struct prrun { long pr_flags; /* Flags */ sigset_t pr_trace; /* Set of signals to be traced */ sigset_t pr_sighold; /* Set of signals to be held */ fltset_t pr_fault; /* Set of faults to be traced */ caddr_t pr_vaddr; /* Virtual address at which to resume */ } prrun_t; pr_flags is a bit-mask describing optional actions; the remainder of the entries are meaningful only if the appropriate bits are set in pr_flags. Flag definitions: PRCSIG Clears the current signal, if any (see PIOCSSIG). PRCFAULT Clears the current fault, if any (see PIOCCFAULT). PRSTRACE Sets the traced signal set to pr_trace (see PIOCSTRACE). PRSHOLD Sets the held signal set to pr_sighold (see PIOCSHOLD). PRSFAULT Sets the traced fault set to pr_fault (see PIOCSFAULT). PRSVADDR Sets the address at which execution resumes to pr_vaddr. PRSTEP Directs the process to single-step, that is, to run and to execute a single machine instruction. On completion of the instruction, a trace trap occurs. If FLTTRACE is being traced, the process stops, otherwise it is sent SIGTRAP; if SIGTRAP is being traced and not held, the process stops. This operation requires hardware and operating system support and may not be implemented on all processors. It is implemented on SGI machines. PRCSTEP Cancels any outstanding single-step directive and any PRSTEP directive set in the current request. PRSABORT Meaningful only if the process is in a PR_SYSENTRY stop or is marked PR_ASLEEP; it instructs the process to abort execution of the system call (see PIOCSENTRY, PIOCSEXIT). PRSTOP Directs the process to stop again as soon as possible after resuming execution (see PIOCSTOP). In particular if the process is stopped on PR_SIGNALLED or PR_FAULTED, the next stop will show PR_REQUESTED, no other stop will have intervened, and the process will not have executed any user- level code. PIOCRUN fails (EBUSY) if applied to a process that is not stopped on an event of interest. Once PIOCRUN has been applied, the process is no longer stopped on an event of interest even if, due to a competing mechanism, it remains stopped. *PIOCSTRACE This defines a set of signals to be traced: the receipt of one of these signals causes the traced process to stop. The set of signals is defined via an instance of sigset_t addressed by p. Receipt of SIGKILL cannot be traced. If a signal that is included in the held signal set is sent to the traced process, the signal is not received and does not cause a process stop until it is removed from the held signal set, either by the process itself or by setting the held signal set with PIOCSHOLD or the PRSHOLD option of PIOCRUN. PIOCGTRACE The current traced signal set is returned in an instance of sigset_t addressed by p. *PIOCSSIG The current signal and its associated signal information are set according to the contents of the siginfo structure addressed by p (see <sys/siginfo.h>). If the specified signal number is zero or if p is zero, the current signal is cleared. Setting the current signal to SIGKILL terminates the process immediately, even if it is stopped. All other signals will be sent after the process is made runnable, if it is currently stopped. *PIOCKILL A signal is sent to the process with semantics identical to those of kill(2). p points to an int naming the signal. Sending SIGKILL terminates the process immediately. *PIOCUNKILL A signal is deleted, that is, it is removed from the set of pending signals. The current signal (if any) is unaffected. p points to an int naming the signal. It is an error to attempt to delete SIGKILL. PIOCGHOLD *PIOCSHOLD PIOCGHOLD returns the set of held signals (signals whose delivery will be delayed if sent to the process) in an instance of sigset_t addressed by p. PIOCSHOLD correspondingly sets the held signal set but does not allow SIGKILL or SIGSTOP to be held. PIOCMAXSIG PIOCACTION These operations provide information about the signal actions associated with the traced process (see sigaction(2)). PIOCMAXSIG returns, in the int addressed by p, the maximum signal number understood by the system. This can be used to allocate storage for use with the PIOCACTION operation, which returns the traced process's signal actions in an array of sigaction structures addressed by p. Signal numbers are displaced by 1 from array indices, so that the action for signal number n appears in position n-1 of the array. *PIOCSFAULT This defines a set of hardware faults to be traced: on incurring one of these faults the traced process stops. The set is defined via an instance of fltset_t addressed by p. Fault names are defined in <sys/fault.h> and include the following. Some of these may not occur on all processors; there may be processor-specific faults in addition to these. FLTILL illegal instruction FLTPRIV privileged instruction FLTBPT breakpoint trap FLTTRACE trace trap FLTWATCH watchpoint trap FLTKWATCH kernel watchpoint trap FLTACCESS memory access fault FLTBOUNDS memory bounds violation FLTIOVF integer overflow FLTIZDIV integer zero divide FLTFPE floating-point exception FLTSTACK unrecoverable stack fault FLTPAGE recoverable page fault When not traced, a fault normally results in the posting of a signal to the process that incurred the fault. If the process stops on a fault, the signal is posted to the process when execution is resumed unless the fault is cleared by PIOCCFAULT or by the PRCFAULT option of PIOCRUN. FLTPAGE and FLTKWATCH are exceptions; no signal is posted. There may be additional processor-specific faults like this. pr_info in the prstatus structure identifies the signal to be sent and contains machine-specific information about the fault. PIOCGFAULT The current traced fault set is returned in an instance of fltset_t addressed by p. *PIOCCFAULT The current fault (if any) is cleared; the associated signal is not sent to the process. *PIOCSENTRY *PIOCSEXIT These operations instruct the process to stop on entry to or exit from specified system calls. The set of system calls to be traced is defined via an instance of sysset_t addressed by p. When entry to a system call is being traced, the traced process stops after having begun the call to the system but before the system call arguments have been fetched from the process. When exit from a system call is being traced, the traced process stops on completion of the system call just prior to checking for signals and returning to user level. At this point all return values have been stored into the traced process's registers. If the traced process is stopped on entry to a system call (PR_SYSENTRY) or when sleeping in an interruptible system call (PR_ASLEEP is set), it may be instructed to go directly to system call exit by specifying the PRSABORT flag in a PIOCRUN request. Unless exit from the system call is being traced the process returns to user level showing error EINTR. PIOCGENTRY PIOCGEXIT These return the current traced system call entry or exit set in an instance of sysset_t addressed by p. PIOCNWATCH PIOCNWATCH returns, in the int addressed by p, the number of watched areas supported by the system. This can be used to allocate storage for use with the PIOCSWATCH and PIOCGWATCH operations, each of which must provide an array whose number of elements equals the supported number of watched areas. *PIOCSWATCH PIOCSWATCH establishes or clears a set of watched areas in the traced process; p points to prwatch structure containing at least the following fields: typedef struct prwatch { caddr_t pr_vaddr; /* Virtual address of watched area */ u_long pr_size; /* Size of watched area in bytes */ long pr_wflags; /* Watch type flags */ } prwatch_t; pr_vaddr specifies the virtual address of an area of memory to be watched in the traced process. pr_size specifies the size of the area, in bytes. pr_wflags specifies the type of memory access to be monitored as a bit- mask of one or more of the following flags (see also PIOCMAP): MA_READ read access MA_WRITE write access MA_EXEC execution access An entry with a zero value for pr_size clears any previously-established watched area starting at the specified virtual address. An entry with a non-empty pr_wflags bit-mask establishes a watched area for the virtual address range specified by pr_vaddr and pr_size. An entry with an empty pr_wflags bit-mask is ignored. A watchpoint is triggered when the traced process makes a memory reference that covers at least one byte of a watched area and the memory reference is a mode of interest as specified in pr_wflags. When a watchpoint is triggered, the process incurs a watchpoint trap. If FLTWATCH is being traced, the process stops; otherwise it is sent SIGTRAP; if SIGTRAP is being traced and not held, the process stops. If the access is a write access, the memory is not modified. If the process stops, its program counter refers to the instruction that triggered the watchpoint. pr_info in the prstatus structure contains information pertinent to the watchpoint trap. In particular, the si_addr field contains the virtual address of the memory reference that triggered the watchpoint and the si_code field contains one of MA_READ, MA_WRITE, or MA_EXEC, indicating read, write or execute access, respectively. A watchpoint may be triggered while executing a system call that makes reference to the traced process's memory. Such a system call completes normally; a kernel watchpoint fault is taken after the system call completes but before the process returns to user level. If more than one watchpoint would be triggered by the system call, the first one encountered is the one reported. PIOCSWATCH fails with EINVAL if an attempt is made to specify overlapping watched areas or to specify a watchpoint whose virtual address range includes invalid virtual addresses in the traced process. PIOCSWATCH fails with E2BIG if an attempt is made to establish more than the supported number of watched areas and with ESRCH if an attempt is made to delete a non-existent watchpoint. An attempt to delete watchpoints on a running process could result in failure with errno set to EBUSY. This is a temporary condition that occurs when the kernel is stepping over a watchpoint and a later subsequent attempt should succeed. This does not happen if the process is stopped. Access to a process's memory through /proc will not trigger a watchpoint, even if the access is from the process itself (which must have opened its own /proc entry). PIOCGWATCH PIOCGWATCH returns, in the array of prwatch structures addressed by p, the set of watched areas currently in effect. Elements beyond the number of actually established watched areas are filled with zeros. *PIOCSET *PIOCRESET PIOCSET sets one or more modes of operation for the traced process. PIOCRESET resets these modes. The modes to be set or reset are specified by flags in a long addressed by p: PR_FORK (inherit-on-fork) When set, the process's tracing flags are inherited by the child of a fork(2). When reset, child processes start with all tracing flags cleared. PR_RLC (run-on-last-close) When set and the last writable /proc file descriptor referring to the traced process is closed, all of the process's tracing flags are cleared, any outstanding stop directive is canceled, and if the process is stopped, it is set running as though PIOCRUN had been applied to it. When reset, the process's tracing flags are retained and the process is not set running on last close. PR_KLC (kill-on-last-close) When set and the last writable /proc file descriptor referring to the traced process is closed, the process is terminated with SIGKILL. It is an error (EINVAL) to specify flags other than those described above or to apply these operations to a system process. The current modes are reported in the prstatus structure (see PIOCSTATUS). Note that a processes using /proc can not assume any default settings for these flags, as some other process may have attached to the target earlier and reset the flags and then detached. PIOCGREG *PIOCSREG These operations respectively get and set the process general registers into or out of an array addressed by p; the array has type gregset_t. Register contents are accessible using a set of predefined indices (see PIOCSTATUS). No bits of the processor-status register (PSR) or other privileged registers can be modified by PIOCSREG. PIOCSREG fails (EBUSY) if applied to a process that is not stopped on an event of interest. If the process is not stopped, the register values returned by PIOCGREG are undefined. PIOCGFPREG *PIOCSFPREG These operations respectively get and set the process floating-point registers into or out of a structure addressed by p; the structure has type fpregset_t. An error (EINVAL) is returned if there is no floating- point hardware on the machine. PIOCSFPREG fails (EBUSY) if applied to a process that is not stopped on an event of interest. If the process is not stopped, the register values returned by PIOCGFPREG are undefined. *PIOCNICE The traced process's nice(2) priority is incremented by the amount contained in the int addressed by p. Only the superuser may better a process's priority in this way, but any user may make the priority worse. PIOCPSINFO This returns miscellaneous process information such as that reported by ps(1). p is a pointer to a prpsinfo structure containing at least the following fields: typedef struct prpsinfo { char pr_state; /* numeric process state (see pr_sname) */ char pr_sname; /* printable character representing pr_state */ char pr_zomb; /* !=0: process terminated but not waited for */ char pr_nice; /* nice for cpu usage */ u_long pr_flag; /* process flags */ uid_t pr_uid; /* real user id */ gid_t pr_gid; /* real group id */ pid_t pr_pid; /* unique process id */ pid_t pr_ppid; /* process id of parent */ pid_t pr_pgrp; /* pid of process group leader */ pid_t pr_sid; /* session id */ caddr_t pr_addr; /* physical address of process */ long pr_size; /* size of process image in pages */ long pr_rssize; /* resident set size in pages */ long pr_pagesize; /* system page size, in bytes */ caddr_t pr_wchan; /* wait addr for sleeping process */ timespec_t pr_start; /* process start time, sec+nsec since epoch */ timespec_t pr_time; /* usr+sys cpu time for this process */ long pr_pri; /* priority, high value is high priority */ char pr_oldpri; /* pre-SVR4, low value is high priority */ char pr_cpu; /* pre-SVR4, cpu usage for scheduling */ dev_t pr_ttydev; /* controlling tty device (PRNODEV if none) */ char pr_clname[8]; /* Scheduling class name */ char pr_fname[PRCOMSIZ]; /* last component of exec()ed pathname */ char pr_psargs[PRARGSZ]; /* initial characters of arg list */ u_int pr_pset; /* associated processor set name */ cpuid_t pr_sonproc; /* processor running on */ timespec_t pr_ctime; /* usr+sys cpu time for all children */ } prpsinfo_t; Some of the entries in prpsinfo, such as pr_state and pr_flag, are system-specific and should not be expected to retain their meanings across different versions of the operating system. pr_addr is a vestige of the past and has no real meaning in current systems. PIOCPSINFO can be applied to a zombie process (one that has terminated but whose parent has not yet performed a wait(2) on it). PIOCNMAP PIOCMAP These operations provide information about the memory mappings (virtual address ranges) associated with the traced process. PIOCNMAP returns, in the int addressed by p, the number of mappings that are currently active. The PIOCMAP operation may be used to obtain the list of currently active mappings, which is an array of structures of type prmap_t. The PIOCNMAP may be used to determine the minimum amount of storage that needs to be allocated to receive these structures, but the programmer should not assume that it is the maximum amount needed. If the PIOCNMAP and PIOCMAP calls are made on a process that is not stopped, the number of maps could change between the two ioctl calls and caller could fault if too few maps were allocated to hold the results of PIOCMAP. Note: for a better interface, see PIOCMAP_SGI below. For PIOCMAP, p addresses an array of elements of type prmap_t; one array element (one structure) is returned for each mapping, plus an additional element containing all zeros to mark the end of the list. There is a possibility of reporting virtual mappings that overlap under certain conditions. A shared mapping of a given virtual address range can be partly or fully overlapped by a private mapping. No two shared mappings may overlap address space, and also no two private mappings may overlap their virtual address space. The pr_mflags bit MA_SREGION indicates that a mapping is a shared or private mapping. The prmap structure contains at least the following fields: typedef struct prmap { caddr_t pr_vaddr; /* Virtual address */ u_long pr_size; /* Size of mapping in bytes */ off_t pr_off; /* Offset into mapped object, if any */ long pr_mflags; /* Protection and attribute flags */ } prmap_t; pr_vaddr is the virtual address of the mapping within the traced process and pr_size is its size in bytes. pr_off is the offset within the mapped object (if any) to which the virtual address is mapped. pr_mflags is a bit-mask of protection and attribute flags: MA_READ mapping is readable by the traced process MA_WRITE mapping is writable by the traced process MA_EXEC mapping is executable by the traced process MA_SHARED mapping changes are shared by the mapped object MA_BREAK mapping is grown by the brk(2) system call MA_STACK mapping is grown automatically on stack faults MA_PHYS mapping corresponds to a physical device mapping MA_MAPZERO mapping is a /dev/zero mapping MA_FETCHOP mapping is a fetchop page MA_PRIMARY mapping is one of the processes core segments MA_SREGION mapping is on shared region list MA_COW mapping corresponds to a copy on write segment MA_NOTCACHED mapped address segment is not cached MA_SHMEM mapping corresponds to a shared memory mapping MA_REFCNT_SHIFT amount to shift right mflags to get reference count PIOCMAP_SGI This operation provides detailed information about the memory mappings (virtual address ranges) associated with the traced process. In effect it performs both a PIOCNMAP and a PIOCMAP call (with additional information) with one ioctl. The PIOCMAP_SGI operation may be used to obtain the list of currently active mappings, which is an array of structures of type prmap_sgi_t. The user must preallocate an array of the maximum number of mapping structures they are willing to receive. One array element (one structure) is returned for each mapping, plus an additional element containing all zeros that also marks the end of the list. There is an upper limit to the number of memory mappings that can be returned by this call, which is defined as PRMAPMAX in the procfs.h header file. Attempts to request more than the PRMAPMAX number of mappings results in only PRMAPMAX mappings returned. PIOCMAP_SGI returns either -1 or the number of mappings that are currently active. There is a possibility of reporting virtual mappings that overlap under certain conditions. A shared mapping of a given virtual address range can be partly or fully overlapped by a private mapping. No two shared mappings may overlap address space, and also no two private mappings may overlap their virtual address space. The pr_mflags bit MA_SREGION indicates that a mapping is a shared or private mapping. For PIOCMAP_SGI, p addresses a pointer to a structure called prmap_sgi_arg_t. It contains the following fields: typedef struct prmap_sgi_arg { caddr_t pr_vaddr; /* Base of map buffer */ ulong_t pr_size; /* Size of buffer in bytes */ } prmap_sgi_arg_t; pr_vaddr is the virtual address of the buffer to hold the mappings for the traced process and pr_size is its size in bytes. The prmap_sgi_t structure contains at least the following fields: typedef struct prmap_sgi { caddr_t pr_vaddr; /* Virtual base address */ ulong_t pr_size; /* Size of mapping in bytes */ off_t pr_off; /* Offset into mapped object, if any */ ulong_t pr_mflags; /* Protection and attribute flags */ pgno_t pr_vsize; /* # valid pages in this segment */ pgno_t pr_psize; /* # private pages in this segment */ pgno_t pr_wsize; /* Cost for this proc weighted base 256 */ pgno_t pr_rsize; /* # referenced pages in this segment */ pgno_t pr_msize; /* # modified pages in this segment */ dev_t pr_dev; /* Device # of segment iff mapped */ ino_t pr_ino; /* Inode # of segment iff mapped */ } prmap_sgi_t; pr_vaddr is the virtual address of the mapping within the traced process and pr_size is its size in bytes. pr_off is the offset within the mapped object (if any) to which the virtual address is mapped. pr_vsize, pr_psize, pr_wsize, pr_rsize, pr_msize are page counts for the virtual mapping. pr_dev and pr_dev identify the filesystem resident object from which the mapping originates (if one exists). pr_mflags is a bit-mask of protection and attribute flags: MA_READ mapping is readable by the traced process MA_WRITE mapping is writable by the traced process MA_EXEC mapping is executable by the traced process MA_SHARED mapping changes are shared by the mapped object MA_BREAK mapping is grown by the brk(2) system call MA_STACK mapping is grown automatically on stack faults MA_PHYS mapping corresponds to a physical device mapping MA_MAPZERO mapping is a /dev/zero mapping MA_FETCHOP mapping is a fetchop page MA_PRIMARY mapping is one of the processes core segments MA_SREGION mapping is on shared region list MA_COW mapping corresponds to a copy on write segment MA_NOTCACHED mapped address segment is not cached MA_SHMEM mapping corresponds to a shared memory mapping MA_REFCNT_SHIFT amount to shift right mflags to get reference count PIOCPGD_SGI This operation provides information about the interior of a memory mappings (virtual address ranges) associated with the traced process. The PIOCPGD_SGI operation is be used to obtain the list of page descriptors, which is an array of structures of type pgd_t. The PIOCMAP_SGI ioctl may be used to determine the amount of storage that needs to be allocated to receive these structures. For PIOCPGD_SGI, p addresses a pointer to a prpgd_sgi_t structure that contains an array of elements of type prpgd_t. The pgd_t structure contains at least the following fields: typedef struct pgd { /* per-page data */ short pr_flags; /* flags */ short pr_value; /* page count/fault offset */ } pgd_t; The prpgd_sgi_t structure contains at least the following fields: typedef struct prpgd_sgi { caddr_t pr_vaddr; /* virtual base address of region to stat */ pgno_t pr_pglen; /* number of pages in data list... */ pgd_t pr_data[1]; /* variable length array of page flags */ } prpgd_sgi_t; pr_vaddr is the virtual address of the mapping within the traced process and pr_pglen is length of the pr_data array. The pr_flags field for each page contains the following flags: PGF_REFERENCED page is currently valid in system page table PGF_GLOBAL page is marked global in system page table PGF_WRITEABLE page is currently writeable in system page table PGF_NOTCACHED page is marked non-cacheable in system page table PGF_ISVALID page is marked valid for this process PGF_ISDIRTY page is marked dirty for this process PGF_PRIVATE page is marked private to this process PGF_FAULT the pr_value field contains a fault offset PGF_USRHISTORY accumulating history flag for caller PGF_REFHISTORY page has been marked referenced PGF_WRTHISTORY page has been marked dirty PGF_VALHISTORY page has been marked valid PGF_CLEAR clear valid & writeable bits in page table The pr_value field for each page contains either a reference count or a fault offset value if the PGF_CLEAR operation was set on a previous call. This can be used to determine what function or variable inside a page that the process references or writes frequently. PIOCOPENM The return value retval provides a read-only file descriptor for a mapped object associated with the traced process. If p is zero the traced process's exec(2)ed file is found. This enables a debugger to find the object file symbol table without having to know the pathname of the executable file. If p is non-zero it points to a caddr_t containing a virtual address within the traced process and the mapped object, if any, associated with that address is found; this can be used to get a file descriptor for a shared library that is attached to the process. On error (invalid address, physical device mapping, or no mapped object for the designated address), -1 is returned and errno is set to EINVAL. PIOCCRED Fetch the set of credentials associated with the process. p points to an instance of prcred_t that is filled by the operation. The prcred structure contains at least the following fields: typedef struct prcred { uid_t pr_euid; /* Effective user id */ uid_t pr_ruid; /* Real user id */ uid_t pr_suid; /* Saved user id (from exec) */ gid_t pr_egid; /* Effective group id */ gid_t pr_rgid; /* Real group id */ gid_t pr_sgid; /* Saved group id (from exec) */ u_int pr_ngroups; /* Number of supplementary groups */ } prcred_t; PIOCGROUPS Fetch the set of supplementary group IDs associated with the process. p points to an array of elements of type gid_t, that will be filled by the operation. PIOCCRED can be applied beforehand to determine the number of groups (pr_ngroups) that will be returned and the amount of storage that should be allocated to hold them. PIOCTLBMISS Enable special user TLB handling. The TLB is a hardware coprocessor that makes virtual-to-physical address translations. p points to an integer that specifies the handling desired. If the value is TLB_COUNT, a record will be kept of every virtual-address TLB refill that occurs while the process mapped by fildes is running. If the value is TLB_STD, counting will be disabled (the default mode). It is important to note that monitoring TLB efficiency can be a useful tool, but the performance of the code that refills the TLB will be degraded. The TLB refill counts can be obtained by PIOCUSAGE. The struct prusage field pu_utlb accounts for TLB refills that occurred while the process was running in user mode, and the field pu_ktlb accounts for refills that occurred while executing system calls on behalf of the user or while handling hardware interrupt code while the user process was scheduled. PIOCUSAGE PIOCUSAGE returns process usage information. p points to a prusage structure that is filled by the operation. The fields in a prusage structure are implementation dependent; no application can assume portability in this area. See <sys/procfs.h> for the exact definition for a particular implementation. The SGI implementation supports the following fields: typedef struct prusage { timespec_t pu_tstamp; /* time stamp */ timespec_t pu_starttime; /* process start time */ timespec_t pu_utime; /* user CPU time */ timespec_t pu_stime; /* system CPU time */ u_long pu_minf; /* minor (mapping) page faults */ u_long pu_majf; /* major (disk) page faults */ u_long pu_utlb; /* user TLB misses */ u_long pu_nswap; /* number of swaps */ u_long pu_gbread; /* gigabytes ... */ u_long pu_bread; /* and bytes read */ u_long pu_gbwrit; /* gigabytes ... */ u_long pu_bwrit; /* and bytes written */ u_long pu_sigs; /* signals received */ u_long pu_vctx; /* voluntary context switches */ u_long pu_ictx; /* involuntary context switches */ u_long pu_sysc; /* system calls */ u_long pu_syscr; /* read() system calls */ u_long pu_syscw; /* write() system calls */ u_long pu_syscps; /* poll() or select() system calls */ u_long pu_sysci; /* ioctl() system calls */ u_long pu_graphfifo; /* graphics pipeline stalls */ u_long pu_graph_req[8]; /* graphics resource requests */ u_long pu_graph_wait[8]; /* graphics resource waits */ u_long pu_size; /* size of swappable image in pages */ u_long pu_rss; /* resident size of swappable image */ u_long pu_inblock; /* block input operations */ u_long pu_oublock; /* block output operations */ u_long pu_vfault; /* total number of vfaults */ u_long pu_ktlb; /* kernel TLB misses */ } prusage_t; PIOCGETPTIMER PIOCGETPTIMER returns an array of timers indicating the amount of time the process has spent in each of the following states: #include <time.h> #include <sys/timers.h> struct timespec ptime[MAX_PROCTIMER]; AS_USR_RUN running in user mode AS_SYS_RUN running in system mode AS_INT_RUN running in interrupt mode AS_BIO_WAIT waiting for block I/O AS_MEM_WAIT waiting for memory AS_SELECT_WAIT waiting in select AS_JCL_WAIT stopped because of job control AS_RUNQ_WAIT waiting to run on run queue AS_SLEEP_WAIT waiting for resource AS_STRMON_WAIT waiting for the stream monitor AS_PHYSIO_WAIT waiting for raw I/O p is a pointer to an array of MAX_PROCTIMER timespec structures. PIOCOPENPD PIOCOPENPD is not currently implemented on SGI machines. It is under consideration for future releases. The return value retval provides a read-only file descriptor for a ``page data file'', enabling tracking of address space references and modifications on a per-page basis. A read(2) of the page data file descriptor returns structured page data and atomically clears the page data maintained for the file by the system. That is to say, each read returns data collected since the last read; the first read returns data collected since the file was opened. When the call completes, the read buffer contains the following structure as its header and thereafter contains a number of variable length structures that must be accessed by walking linearly through the buffer. typedef struct prpageheader { timespec_t tstamp; /* real time time stamp */ u_long nmap; /* number of address space mappings */ u_long npage; /* total number of pages */ } prpageheader_t; The header is followed by nmap variable-length prasmap structures: typedef struct prasmap { caddr_t vaddr; /* virtual address */ u_long npage; /* number of pages in mapping */ u_char data[1]; /* referenced, modified, present flags */ } prasmap_t; The data[] array is of variable length, with one entry for each page in the mapping, npage entries altogether, rounded up with empty entries at the end so that the structure size is an integral numbers of long's. data[] entries may contain these flags: PG_PRESENT page is resident in memory now PG_REFERENCED page has been referenced since last read PG_MODIFIED page has been modified since last read If the read buffer is not large enough to contain all of the page data, the read fails with E2BIG and the page data is not cleared. The required size of the read buffer can be determined through fstat(2). Application of lseek(2) to the page data file descriptor is ineffective. Closing the page data file terminates the system overhead associated with collecting the data. PIOCGETPR PIOCGETU These operations copy, respectively, the traced process's proc structure and user area into the buffer addressed by p. They are provided for completeness but it should be unnecessary to access either of these structures directly since relevant status information is available through other control operations. Their use is discouraged because a program making use of them is tied to a particular version of the operating system. PIOCGETPR can be applied to a zombie process (see PIOCPSINFO). PIOCACINFO PIOCACINFO returns the currently accumulated accounting information for the process. p points to a pracinfo structure that is filled in by the operation. The fields in pracinfo are implementation dependent; no application can assume portability in this area. See <sys/procfs.h> and <sys/extacct.h> for the exact definition of a particular implementation. The SGI implementation supports the following fields: typedef struct pracinfo { char pr_version; /* Accounting data version */ char pr_flag; /* Miscellaneous flags */ char pr_nice; /* Nice value */ unchar pr_sched; /* Scheduling discipline */ /* (see sys/schedctl.h) */ __int32_t pr_spare1; /* reserved */ ash_t pr_ash; /* Array session handle */ prid_t pr_prid; /* Project ID */ time_t pr_btime; /* Begin time (in secs since 1970)*/ time_t pr_etime; /* Elapsed time (in HZ) */ __int32_t pr_spare2[2]; /* reserved */ struct acct_timers pr_timers; /* Assorted timers: see extacct.h */ struct acct_counts pr_counts; /* Assorted counters: (ditto) */ __int64_t pr_spare3[8]; /* reserved */ } pracinfo_t; PIOCGETSN0EXTREFCNTRS PIOCGETSN0REFCNTRS PIOCGETSN0EXTREFCNTRS returns the extended memory reference counter values in an Origin system for a specified virtual address space range. See refcnt(5). The third argument is used to specify the virtual address space range and the user buffer where to store the counter values. This argument is of type sn0_refcnt_args_t, as defined in <sys/SN/hwcntrs.h>: typedef struct sn0_refcnt_args { caddr_t vaddr; long len; sn0_refcnt_buf_t* buf; } sn0_refcnt_args_t; The first field vaddr is the base of the virtual address space range, the field len is the corresponding length in bytes, and the field buf is a pointer to a user buffer where the system will store the counter values and additional information. This buffer is an array of elements of type sn0_refcnt_buf_t, where each element corresponds to the counter information associated with one hardware page: typedef struct sn0_refcnt_buf { sn0_refcnt_set_t refcnt_set; __uint64_t paddr; __uint64_t page_size; cnodeid_t cnodeid; } sn0_refcnt_buf_t; The field refcnt_set contains the set of counters associated with the virtual address passed via sn0_refcnt_args, paddr is the address of the physical page associated with this virtual address, page_size is the page size being used to map it, and cnodeid is the physical page home node, expressed in terms of Compact Node Identifiers which can be mapped back to node names using the command topology(1). The refcnt_set type is defined by typedef struct sn0_refcnt_set { refcnt_t refcnt[SN0_REFCNT_MAX_COUNTERS]; __uint64_t flags; } sn0_refcnt_set_t; The field refcnt is the actual set of counters (one counter per node), and flags is a state vector reserved for future use. The counters in refcnt are ordered according to the Compact Node Identifiers, also known as cnodeids (numa(5)). PIOCGETSN0REFCNTRS instructs the system to return the actual hardware counter values instead of the extended software counter values returned by PIOCGETSN0EXTREFCNTRS. The following section of code shows an example of use for this interface: #include <sys/types.h> #include <stdio.h> #include <malloc.h> #include <sys/stat.h> #include <fcntl.h> #include <sys/prctl.h> #include <procfs/procfs.h> #include <sys/syssgi.h> #include <sys/sysmp.h> #include <sys/SN/hwcntrs.h> /* * This routine makes two assumptions that may not * be true in all systems: * Length of hardware page (counter granularity): 0x1000 bytes * Length of base software page (smallest mappable memory area): 0x4000 bytes */ void print_refcounters(char* vaddr, int len) { pid_t pid = getpid(); char pfile[256]; int fd; sn0_refcnt_buf_t* refcnt_buffer; sn0_refcnt_buf_t* direct_refcnt_buffer; sn0_refcnt_args_t* refcnt_args; int npages; int gen_start; int numnodes; int page; int node; sprintf(pfile, "/proc/%05d", pid); if ((fd = open(pfile, O_RDONLY)) < 0) { fprintf(stderr,"Can't open /proc/%d", pid); exit(1); } vaddr = (char *)( (unsigned long)vaddr & ~0xfff ); npages = (len + 0xfff) >> 12; if ((refcnt_buffer = malloc(sizeof(sn0_refcnt_buf_t) * npages)) == NULL) { perror("malloc refcnt_buffer"); exit(1); } if ((direct_refcnt_buffer = malloc(sizeof(sn0_refcnt_buf_t) * npages)) == NULL) { perror("malloc refcnt_buffer"); exit(1); } if ((refcnt_args = malloc(sizeof(sn0_refcnt_args_t))) == NULL) { perror("malloc refcnt_args"); exit(1); } refcnt_args->vaddr = vaddr; refcnt_args->len = len; refcnt_args->buf = refcnt_buffer; if ((gen_start = ioctl(fd, PIOCGETSN0EXTREFCNTRS, (void *)refcnt_args)) < 0) { perror("ioctl PIOCGETSN0EXTREFCNTRS returns error"); exit(1); } refcnt_args->vaddr = vaddr; refcnt_args->len = len; refcnt_args->buf = direct_refcnt_buffer; if ((gen_start = ioctl(fd, PIOCGETSN0REFCNTRS, (void *)refcnt_args)) < 0) { perror("ioctl PIOCGETSN0REFCNTRS returns error"); exit(1); } if ((numnodes = sysmp(MP_NUMNODES)) < 0) { perror("sysmp MP_NUMNODES"); exit(1); } for (page = 0; page < npages; page++) { printf("page[%05d, 0x%lx, 0x%llx (0x%llx)]:", page, vaddr + page*0x1000, refcnt_buffer[page].paddr, refcnt_buffer[page].paddr >> 14); for (node = 0; node < numnodes; node++) { printf(" %05lld (%06lld)", refcnt_buffer[page].refcnt_set.refcnt[node], direct_refcnt_buffer[page].refcnt_set.refcnt[node]); } printf("0); } close(fd); free(refcnt_args); free(refcnt_buffer); } PIOCGETINODE PIOCGETINODE returns information about an open file for the process. p points to a prinodeinfo structure containing the file descriptor of interest (in pi_fd). On return pi_dev, pi_inum, and pi_gen contain the filesystem device, inode number, and inode generation number respectively. Further information about the file can then be obtained through syssgi(SGI_FS_BULKSTAT), for instance. The pi_dev value matches that returned by statvfs (f_fsid) and stat (st_dev). Bad values for pi_fd result in EBADF errors; if the referenced file is actually a socket then errno is set to EINVAL. Filesystems other than XFS and EFS return 0 for the pi_gen value. NOTES Each operation (ioctl or I/O) is guaranteed to be atomic with respect to the traced process, except when applied to a system process. To wait for one or more of a set of processes to stop, /proc file descriptors can be used in a poll(2) system call. On successful return, the polling event POLLPRI indicates that the process has stopped on an ``event of interest'' (see PIOCSTOP above). Although they cannot be requested, the polling events POLLHUP, POLLERR and POLLNVAL may be returned. POLLHUP indicates that the process has terminated. POLLERR indicates that the file descriptor has become invalid. POLLNVAL is returned immediately if POLLPRI is requested on a file descriptor referring to either itself or a system process (see PIOCSTOP). /proc file descriptors may also be used in a select(2) system call. Selecting for an exceptional event has the same semantics as polling for POLLPRI. Selecting for reading or writing or polling for POLLIN or POLLOUT will always return true. See the poll(2) and select(2) man pages for further details. poll() or select() may not be used on the /proc directory itself. For security reasons, except for the superuser, an open of a /proc file fails unless both the user-ID and group-ID of the caller match those of the traced process and the process's object file is readable by the caller. Files corresponding to setuid and setgid processes can be opened only by the superuser. Even if held by the superuser, an open process file descriptor becomes invalid if the traced process performs an exec() of a setuid/setgid object file or an object file that it cannot read. Any operation performed on an invalid file descriptor, except close(2), fails with EAGAIN. In this situation, if any tracing flags are set and the process file descriptor is open for writing, the process will have been directed to stop and its run-on-last-close flag will have been set (see PIOCSET). This enables a controlling process (if it has permission) to reopen the process file to get a new valid file descriptor, close the invalid file descriptor, and proceed. Just closing the invalid file descriptor causes the traced process to resume execution with no tracing flags set. Any process not currently open for writing via /proc but that has left-over tracing flags from a previous open and that execs a setuid/setgid or unreadable object file will not be stopped but will have all its tracing flags cleared. Descriptions of structures in this document include only interesting structure elements, not filler and padding fields, and may show elements out of order for descriptive clarity. The actual structure definitions are contained in <sys/procfs.h>. For reasons of symmetry and efficiency there are more control operations than strictly necessary. Programs compiled with the old 32-bit abi calling convention can perform ioctls on programs compiled with the new 32-bit abi or 64-bit abi calling conventions by "or'ing" the ioctl with PIOC_IRIX5_N32 or PIOC_IRIX5_64, respectively, and passing in a pointer to a buffer that is big enough to hold the larger structure. FILES /proc directory (list of active processes) /proc/nnnnn process image SEE ALSO mntproc(1M), ioctl(2), open(2), poll(2), ptrace(2), sigaction(2), signal(2), stat(2), statvfs(2), syssgi(2), siginfo(5), signal(5). DIAGNOSTICS Errors that can occur in addition to the errors normally associated with filesystem access: ENOENT The traced process has terminated after being opened. EIO I/O was attempted at an illegal address in the traced process. ENXIO I/O was attempted to an isolated processes address space. EBADF An I/O or ioctl operation requiring write access was attempted on a file descriptor not open for writing; PIOCGETINODE was applied to a process file which was not open. EBUSY PIOCSTOP or PIOCWSTOP was applied to a system process; an exclusive open(2) was attempted on a process file already open for writing; an open(2) for writing was attempted and an exclusive open is in effect on the process file; PIOCRUN, PIOCSREG or PIOCSFPREG was applied to a process not stopped on an event of interest; an attempt was made to mount /proc when it is already mounted. EPERM Someone other than the superuser attempted to better a process's priority by issuing PIOCNICE. ENOSYS An attempt was made to perform an unsupported operation (such as create, remove, link, or unlink) on an entry in /proc. EFAULT An I/O or ioctl request referred to an invalid address in the controlling process. EINVAL In general this means that some invalid argument was supplied to a system call. The list of conditions eliciting this error includes: the ioctl code is undefined; the ioctl code is not implemented; an ioctl operation was issued on a file descriptor referring to the /proc directory; an out-of-range signal number was specified with PIOCSSIG, PIOCKILL, or PIOCUNKILL; SIGKILL was specified with PIOCUNKILL; an illegal virtual address was specified in a PIOCOPENM request; overlapping watched areas were specified in a PIOCSWATCH request; an attempt was made to establish more than the supported number of watched areas in a PIOCSWATCH request; PIOCGFPREG or PIOCSFPREG was issued on a machine without floating-point hardware; the file specified to PIOCGETINODE is a socket. E2BIG Data to be returned in a read(2) of the page data file exceeds the size of the read buffer provided by the caller. EINTR A signal was received by the controlling process while waiting for the traced process to stop via PIOCSTOP or PIOCWSTOP. EAGAIN The traced process has performed an exec of a setuid/setgid object file or of an object file that it cannot read; all further operations on the process file descriptor (except close(2)) elicit this error. BUGS When a signal is sent to the target process, but it is cleared (either by PIOCUNKILL or by using the PRCSIG flag to PIOCRUN), most system calls complete normally and do not return EINTR. However, the specific system calls msgsnd(2), msgrcv(2), semop(2), uspsema(3P), poll(2) and ioctl(2) to the imon(7M) device are interrupted and do return EINTR. Page 29