realtime(5) realtime(5) NAME realtime, scheduler - introduction to real time and scheduler facilities DESCRIPTION The IRIX operating system provides a rich set of real-time programming features that are collectively referred to as the REACT extensions. This document introduces the components of REACT, including: bounded response time, clocks, timers, signals, virtual memory control, asynchronous I/O, POSIX threads, scheduling policies, realtime priority band, processor isolation, process binding, interrupt redirection and kernel thread placement. Bounded Response Time A real-time system provides bounded and usually fast response to specific external events, allowing applications to schedule a particular thread to run within a specified time limit after the occurrence of an event. IRIX guarantees deterministic response of one millisecond on certain uni-processor systems. This real-time strategy guarantees the highest priority thread will execute within one millisecond from the time it was made runnable. On certain multi-processor machines (OCTANE, Origin200, Origin2000, Onyx2, Origin300, Origin3000 series, and Onyx3), the one millisecond bounded response time guarantee is controlled by the systune variable rtcpus. rtcpus represents a threshold at which the scheduler functionality that is required to meet this guarantee is enabled. The threshold is based on the number of physical cpus in the system. If rtcpus is set greater than or equal to the number of physical processors, the bounded response guarantee is enabled. If rtcpus is set below the number of physical processors in the machine, the bounded response time guarantee is NOT enabled. The default value for rtcpus is 0, which means that by default, the guarantee is not enabled. In order to enable the guarantee, rtcpus must be set equal to or greater than the number of cpus in the system. As an example, consider a four processor system. If rtcpus is set at a value between 0 and 3 (inclusive), the real-time guarantee is not enabled. If rtcpus is set at 4 or greater, the real- time guarantee is enabled. Note that enabling the real-time guarantee may cause overall system performance to degrade. Real-time applications requiring a lower latency guarantee can use the multi-processor real-time strategy to obtain a deterministic response of 50 microseconds on Origin3000, Origin300, and Onyx3 series machines and 100 microseconds on Origin2000, Origin200, and Onyx2 series machines. This strategy typically consists of having one processor service unpredictable loads, such as interrupts and system daemons, and the other processor(s) servicing high-priority real-time jobs. Clocks In order to perform event timing, IRIX provides the POSIX 1003.1b clock_gettime(2) interface. This interface can be used to access various system clocks, including: the real-time clock, and a low overhead free running hardware counter. Timers IRIX implements both BSD itimers and POSIX 1003.1b timers. POSIX timers are recommended for real-time application development, as they provide the highest resolution and flexibility (see timer_create(3c)). Timer expiration interrupts are dispatched to IRIX interrupt threads for handling. The priority at which these threads are scheduled is determined by the scheduling policy and priority of the thread which set the timer: If the thread setting the timer is running with a timeshare scheduling policy, then the associated interrupt thread will be scheduled at realtime priority one. If the thread setting the timer is running with a real-time scheduling policy, then the priority of the associated interrupt thread will be the priority of the setting thread plus one. Priority 255, being the maximum real-time band priority, is an exception. If the thread setting the timer is running at priority 255, then the interrupt thread will also be scheduled at priority 255. Hence, real-time applications depending on system services shouldn't use priority 255 (see the Real-Time Priority Band Section below). Once the timer expires, the interrupt thread will be scheduled ahead of the thread which set the timer. Signals IRIX supports the full semantics of both BSD and AT&T signals. In addition IRIX has implemented the POSIX 1003.1b queued signals which provide signal priorities and for queuing of signals such that exactly as many signals are received as were sent (see sigqueue(3)). IRIX does not guarantee the latency of signal delivery. Memory Locking A real-time application can avoid the overhead of page fault processing under IRIX by locking ranges of its text and data into memory. The POSIX mlockall(3c) system call can be used to lock down a process's entire virtual address space. Since it is not always desirable to lock down the entire virtual address space, IRIX provides the following system calls to lock and unlock a specified range of addresses in memory: mpin(2)/munpin(2) and mlock(3c)/munlock(3c). The major difference between the two sets is that mpin/munpin maintains a per page lock counter and mlock/munlock does not. Developers should choose the set that best suits their application and stick with it, as mixing the interfaces may result in unexpected behavior. Asynchronous I/O IRIX implements the POSIX 1003.1b interface to asynchronous I/O. Using this facility a programmer can queue a read or write request to a device and optionally receive a queued signal when the request completes. The read() or write() call will return when the request is queued rather than blocking the process pending completion of the I/O. Optionally, process priority can be used to establish the order in which queued requests are completed. POSIX Thread Scope POSIX threads (pthreads) supports both process and system scope threads. System scope threads enable pthread applications to obtain predictable scheduling behavior on a system level by using the kernel scheduler directly, bypassing the user-level pthread scheduler. For more information about the pthread scheduling model, see pthread(5). Timeshare Scheduling IRIX has an earnings-based scheduler for timeshare threads. Processes earn cpu microseconds of time base on their proportional share of the system. Their share of the system, and thus the rate at which they accumulate earnings, is determined by their nice value. While timeshare threads are not priority scheduled, they do have an independent timeshare priority band to represent nice(2) values. This band ranges from a low priority of 1 to a high priority of 40. A change in either the timeshare priority or the nice value results in a corresponding change to the nice value or timeshare priority respectively. Timeshare threads which are not the beneficiaries of priority inheritance are never scheduled ahead of real-time threads. Batch Scheduling Refer to miser(5). Real-Time Scheduling IRIX supports the POSIX 1003.1b real-time scheduler interfaces, including: sched_setscheduler(2) and sched_setparam(2). These interfaces provide privileged applications with the control necessary for managing the cycles of the system processor(s). Real-time scheduling policies, such as round-robin and first-in-first-out, may be selected along with a real-time priority. Realtime Priority Band A real-time thread may select one of a range of 256 priorities (0-255) in the real-time priority band, using POSIX interfaces sched_setparam() or sched_setscheduler(). The higher the numeric value of the priority the more important the thread. Developers must consider the needs of the application and how it should interact with the rest of the system, before selecting a real-time priority. To aid in this decision, the priorities of the system threads should be considered. IRIX manages system threads to handle kernel tasks, such as paging and interrupts. System daemon threads execute between priority range 90 and 109 inclusive, and system device driver interrupt threads execute between priority range 200 and 239 inclusive (see the following section for more information about interrupt threads). An application may set the priorities of its threads above that of the system threads, but this may effect the behavior of the system. For example, if the disk interrupt thread is blocked by higher priority user thread, disk data access will be delayed, pending completion of the user thread. Setting the priorities of application threads within or above the system thread ranges requires an advanced understanding of IRIX system threads and their priorities. The priorities of the IRIX system threads may be found in /var/sysgen/mtune/kernel. If necessary, these defaults may be changed using systune(1M), although this is not recommended for most users. Many soft real-time applications simply need to execute ahead of timeshare applications, in which case priority range 0 through and including 89 is best suited. Since timeshare applications are not priority scheduled, a thread running at the lowest real-time priority (0) will still execute ahead of all timeshare applications. Note, however, that at times the operating system briefly promotes timeshare threads into the realtime band to handle timeouts, and avoid priority inversion. In these special cases, the promoted thread's real-time priority is never boosted higher than 1. Applications cannot depend on system services if they are running ahead of the system, without observing the system responsiveness timing guidelines below. Interactive real-time applications (such as digital media) need low latency response times from the operating system, but changing interrupt thread behavior is undesirable. In this case, priority range 110 through and including 199 is best suited, allowing execution ahead of system daemons but behind interrupt threads. Applications in this range are typically cooperating with a device driver, in which case, the correct priority for the application is the priority of the device driver interrupt thread minus 50 (see the following section). If the application is multi-threaded, and multiple priorities are warranted, then the priorities of the threads should be no greater than the priority of the device driver interrupt thread minus 50. Note that threads running at a higher priority than system daemon threads should never run for more than a few milliseconds at a time, in order to preserve system responsiveness. Hard real-time applications may use priorities 240 through and including 254 for the most deterministic behavior and the lowest latencies. However, if a thread running at this priority ever gets into a state where it is using 100% of the processor, the system may become completely unresponsive. Threads running at a higher priority than the interrupt threads should never run for more that a few hundred microseconds at a time, in order to preserve system responsiveness. Priority 255, the highest real-time priority, should not be used by applications. This priority is reserved for system use in order to handle timers for urgent real-time applications, and kernel debugger interrupts. Applications executing at this priority run the risk of hanging the system. The proprietary IRIX interface for selecting a real-time priority, schedctl(), is still supported for binary compatibility, but it is no longer the interface of choice. The non-degrading real-time priority range of schedctl() is re-mapped onto the POSIX real-time priority band as priorities 90 through 118 as follows: 39=90, 38=110, 37=111, 36=112, 35=113, 34=114, etc.. Note that the large gap between the first two priorities preserves the scheduling semantics of schedctl() threads and system daemons. Real-time users are encouraged to use tools such as par(1) and irixview(1) to observe the actual priorities and dynamic behaviors of all threads on a running system. Device Driver Interrupt Thread Priorities As of IRIX 6.4, device drivers employ interrupt threads to handle device interrupts. Interrupt threads have default priorities in the range 200 through and including 239. To make selecting an appropriate priority for an interrupt thread easier, IRIX defines device classes including: audio, video, network, disk, serial, parallel, tape, external. Each device class has a priority assigned to it. A complete listing of device classes, and their default priorities, can be found in /var/sysgen/mtune/kernel. For example, the value of network_intr_pri defines the interrupt thread priority of all network class devices. A device driver may set the priority of its interrupt thread to one of the defined classes, by using the class directive in its driver configuration file (located in the /var/sysgen/master.d directory). For example, /var/sysgen/master.d/if_ef includes the directive +thread_class network which means that the value of the systune(1M) variable network_intr_pri will be used for the interrupt thread priority of this device. Devices whose class cannot be determined use the value of the variable default_intr_pri: +thread_class default The default priority of each device class may be changed using the appropriate systune(1M) variable in /var/sysgen/mtune/kernel. The thread_class value may be overridden for a particular driver by adding the thread_priority directive to the driver description file. For example: +thread_priority 205 On systems supporting the hardware graph, both of these values may be overridden for a particular device by using the DEVICE_ADMIN directive with the INTR_SWLEVEL attribute in the /var/sysgen/system/irix.sm file (q.v. for an example of this usage). Processor Control Using the sysmp() call or the mpadmin and runon commands a programmer may control the distribution of processes among the processors in a real-time system. For instance, it is possible to bind a particular process onto a processor and conversely, it is possible to restrict a processor to only run those processes that are explicitly bound to it. This makes it possible to dedicate one or more processors to particular processes. Nominally, when IRIX is running in a multiprocessor certain system services require synchronization of all processors in the complex. This is mainly done to synchronize the instruction caches and to synchronize the virtual to physical translation caches or tlbs. In order to reduce the worst case dispatch latency a processor can be isolated using the sysmp() call. This allows a process some control over when these synchronizing events take place. If the process never requests system services then there is no need to synchronize. If the process is sharing address space with other processes through use of either sproc() or sprocsp() then members of the share group should also avoid operations that would require IRIX to synchronize with the isolated processor. These include operations that explicitly flush caches, expand address space across 4 megabyte boundaries, release address space or change address space protections. Creation of new share group members through the use of sproc() requires the creation of a stack area which may result in a synchronization event. Use of the sprocsp() interface specifying a stack in section of locked memory is recommended. sysmp() can also be used to turn off normal IRIX clock processing on a particular processor and thus normal IRIX time slicing will not preempt the running process. Thus, if a processor is isolated, no devices are configured onto that processor, the clock service is disabled, the application process is restricted to the isolated processor and its virtual space is locked in memory then a user can achieve a fast bounded response time to an external event. Frame Rate Scheduler The Frame Rate Scheduler (FRS) is part of SGI's REACT/Pro product and is a special scheduling mechanism. It allows real-time processes and threads to have CPUs run them for specific amounts of time, in a specific order. The FRS is very useful for real-time applications that need a high degree of control over the CPU on which they are running, and which need to be scheduled at a specific frequency. CPUsets A cpuset is a named set of CPUs, which may be defined to be restricted or open. A restricted cpuset only allows processes that are members of the cpuset to run on the set of CPUs. An open cpuset allows any process to run on its cpus, but a process that is a member of the cpuset can only run on the CPUs belonging to the cpuset. Cpusets are useful for real-time applications because as sets of restricted CPUs they eliminate the interference caused by other processes running on the same CPU, while being more convenient to manage then individual restricted CPUs. Interrupt Redirection When the multi-processor real-time strategy is being used, it is often necessary to redirect unwanted PCI and VME interrupts away from the real-time processors. Control over which device interrupts are sent to which processor can be achieved by adding DEVICE_ADMIN directives to the /var/sysgen/system/irix.sm file. The NOINTR directive may also be used to guarantee that no interrupts are randomly assigned for handling by the real-time processor. After irix.sm is modified lboot should be run to reconfigure the system. Kernel Thread Placement In some situations kernel threads need to run on specific processors or with other special behavior just like user threads. The XThread Control Interface (XTCI) was added in IRIX 6.5.16 to control these special behaviors. Users may add XTHREAD entries in the /var/sysgen/system/irix.sm file. Kernel theads not mentioned operate with default behavior. After irix.sm is modified autoconfig should be run to reconfigure the system. To preserve compatability, XTCI entries will defer to the legacy /var/sysgen/master.d/sgi interface in the event that conflicting entries are found. As in the master.d/sgi interface, system threads can also be specified but they may later change their behavior whereas interrupt threads must adhere throughout their lifetime. Up to 32 XTHREAD entries may be made in the irix.sm file. Entries may not combine any of the BOOT, FLOAT, or CPU options. Specific interface options include: XTHREAD: name[*] [BOOT] [FLOAT] [STACK s] [PRI p] [CPU m...n] XTHREAD: - Any line beginning with XTHREAD: will be for controlling kernel threads. All the information must be on the same line. name[*] - Any kernel thread with a name equal to "name" will be affected by the following directives. If [*] follows, any thread whose name begins with "name" will be affected. The list of kernel system and interrupt threads is available through the icrash command and the separate product IRIXView. BOOT - The thread will stay within the boot cpuset if one exists. If the system cpuset exists then it will stay there instead. FLOAT - The thread will never be bound to a cpu. STACK - The number following STACK will specify the starting thread stack size. PRI - The number following PRI wll specify the starting thread cpu scheduling priority. CPU - The numbers following CPU will be a list of cpus to attempt to place the thread on if possible. Threads that cannot be placed on their CPU list will be considered FLOAT. This is comparable to the sysmp() MP_MUSTRUN command for user threads. SEE ALSO lboot(1), mpadmin(1), runon(1), systune(1M), mlockall(3c), mpin(2), munpin(2), plock(2), sched_setparam(2), sched_setscheduler(2), sproc(2), sysmp(2), syssgi(2), aio_error(3), aio_read(3), aio_return(3), aio_write(3), lio_listio(3), system(4), signal(5), sigqueue(3) timer_create(3c), pthread(3p) nice(1), renice(1m), frs(3), cpuset(5), icrash(1M), irixview(1) Page 9