array_services(5) array_services(5) NAME array_services - overview of array services DESCRIPTION Along with the power and flexibility of clustered systems comes some additional complexity in the area of administering and managing the array as a whole. Array Services provides several services to help ease this situation. Some of these services revolve around the notion of an array session, which is a set of processes, perhaps running on different nodes in an array, that are conceptually related as a single "job". Additional services are provided by the array services daemon, which knows about the configuration of an array and is therefore able to provide functions for describing and administering it. ARRAY SESSIONS A principal use of an array system is to run jobs that are large enough to span two or more machines. Unfortunately, the mechanisms that are typically used to manage multiple related processes (for example: process groups, terminal sessions) are limited in scope to a single machine. As a result, mundane tasks such as killing a job or accounting for all of its resource usage can become very difficult when the job runs across several machines. Some means of correlating related processes on different machines is required. Array Services provides this function with the notion of an "array session". In formal terms, an array session is a set of processes all related to each other by a single unique identifier, the array session handle (ASH). A child process ordinarily inherits the ASH of its parent when it is created, thus becoming a member of its parent's array session. However, it is possible for a process to leave its parent's array session and start a new one. This would be done by programs such as login(1) or rshd(1M) so that logging in to the system will effectively start a new array session. This is also done by programs like cron(1M) and su(1M) so that work done on behalf of another user will be done in its own array session. When the last process with a given ASH exits, a session accounting record containing accumulated statistics for all of the processes that ran in the array session is written and the array session ceases to exist. The array session handle itself is a 64-bit value. By default, a unique, increasing value (similar to a process ID) is assigned to each new array session as its handle. This type of ASH is referred to as a local ASH: although it is guaranteed to be unique on the local machine, it may also be in use by a different session on another machine in the same array. Because of this, a local ASH is not appropriate for identifying multi- machine jobs. However, there is a second type of ASH known as a global ASH. These are assigned by the array services daemon (see below) and are supposed to be unique across the entire array. By arranging for the same global ASH to be associated with each process in a job, it is possible to treat the set of processes as a single entity, even though some of the processes may be running on different machines. The next trick is "arranging for the same global ASH to be associated with each process in a job". This involves several steps. First, each machine that is to run part of the job must start a new array session to contain the related processes. By default, this new array session will only have a local ASH, so it must "upgrade" its handle to a global ASH. If this is the first machine to run part of the job, it would need to obtain a new global ASH from the array services daemon (this is done with a single library call, asallocash(3X)). Additional machines that are called into service for the job would need to get a copy of the first machine's global ASH; presumably this information would be passed along at the same time as the rest of the information concerning the new job. Once an appropriate global ASH has been settled upon, it can then be assigned to the new array session, replacing the original local ASH. The process on each machine that started the new array session is now free to fork off any number of children to do the required work. These children will all have the same ASH and can therefore be correlated with each other for administrative tasks such as job control or accounting. ARRAY SERVICES DAEMON Although being able to correlate related processes on different machines in an array is necessary for the stated goal of administering an array in a reasonable way, it is not sufficient: something still needs to find all of those related processes and act upon them. That is the job of the array services daemon. Each machine in an array should have an array services daemon running on it. The array services daemon (arrayd) performs several different tasks: - It allocates global array session handles - It knows the current array configuration and can provide that information to other commands and programs - It can determine which processes belong to a particular array session and provide that information to other commands and programs - It can forward commands to all of the machines in an array Global Array Session Handles As mentioned earlier, a global array session handle is important for keeping track of jobs that run on several machines in an array. Because the array services daemon knows the configuration of an array, it is better suited to providing a unique global ASH than the kernel, which necessarily knows only about the local machine. When a program needs to allocate a global ASH it invokes a single library call, specifying (optionally) which array the ASH is to be allocated for. The library call, which is part of libarray (see below), takes care of the pragmatic issues of contacting and communicating with the local array services daemon. The resulting global ASH can then be passed to the setash(2) system call. Note that while anybody can allocate a global ASH, only a process with root privileges can actually change its ASH using setash. The global ASH itself is an "opaque" value: it does not necessarily have any specific information embedded in it, other than to distinguish it from a local ASH (a library function is provided to make this distinction). Nevertheless, the identity of the specific machine that creates a global ASH and the array for which it is intended may play some role in the generation of the ASH value itself. System administrators may specify particular values to be used for this purpose if desired. Array Configuration Database Each array services daemon has knowledge about one or more arrays and the machines that make up each of them. This information can be provided to other programs and commands with straightforward library calls in libarray. Ideally, this should make it unnecessary for other array- oriented programs to maintain their own separate array configuration data. An array services daemon obtains its configuration information from a configuration file located in its local filespace. Each daemon has its own configuration file which must be synchronized by the system administrator with configuration files on other machines in the array(s). Array Session Information To take advantage of array sessions, it is necessary to be able to enumerate the processes that are contained in a given array session. For certain applications (monitor programs for example) it may also be useful to enumerate ALL of the known array sessions. The array services daemon can obtain both types of information and provide it to other programs via libarray functions. Command Forwarding Command forwarding pulls all of the other array services together: it allows a user on one machine to issue a single command and have it executed on all of the machines in an array, perhaps only affecting a particular array session. With the appropriate setup, this could be used for such tasks as killing a runaway job or shutting down an entire array. Users use a simple client program (array(1)) to specify the command they want to execute, any arguments it may require and the array they want to execute it on. Such an array command might look like this: array -a DevArray killash 13543423 This example says "execute the command 'killash 13543423' on the machines in the array 'DevArray'". The command "killash" is not necessarily an actual program on any machine in the array; instead it refers to an entry in each machine's array configuration file. The entry itself specifies which program to execute, which arguments should be passed to it, which user/group/project the command should be executed under, etc. This allows each machine in an array to handle a particular command differently, or not handle it at all. The "array" program itself is fairly basic: it simply passes the user's command to the local array services daemon. The local array services daemon forwards the command to each machine in the specified array, then gathers the results which are then passed back to the "array" program and then the user. THE ARRAY SERVICES LIBRARY In general, users should never have any direct interaction with the array services daemon. Instead, all interaction with the array services daemon is done through the array services library, libarray. libarray provides functions for dealing with global ASH's, describing the current array configuration, and executing array commands. There are a number of libarray functions, all of which are documented in chapter 3X man pages. Some of the libarray functions include: ASH Functions asallocash - Allocates a global ASH asashisglobal - Indicates whether an ASH is global or local aslistashs_array - Returns all global ASH's in specified array Configuration Functions aslistarrays - Returns info on all known arrays aslistmachines - Returns info on all machines in specified array Command Forwarding ascommand - Execute an array command SEE ALSO array(1), arrayd(1M), newsess(1), asallocash(3X), asashisglobal(3X), ascommand(3X), aslistarrays(3X), aslistashs_array(3X), aslistashs_server(3X), aslistmachines(3X), arrayd.conf(4), array_sessions(5). Page 4