MP(3F)MP(3F) NAME MP_BARRIER, MP_BLOCK, MP_BLOCKTIME, MP_CREATE, MP_DESTROY, MP_IN_DOACROSS_LOOP, MP_IS_MASTER, MP_MY_THREADNUM, MP_NUMTHREADS, MP_SET_NUMTHREADS, MP_SET_SLAVE_STACKSIZE MP_SETLOCK, MP_SETUP, MP_SUGGESTED_NUMTHREADS, MP_UNBLOCK, MP_UNSETLOCK, MP_SHMEM_GET32, MP_SHMEM_PUT32, MP_SHMEM_IGET32, MP_SHMEM_IPUT32, MP_SHMEM_GET64, MP_SHMEM_PUT64, MP_SHMEM_IGET64, MP_SHMEM_IPUT64 - Fortran multiprocessing utility routines SYNOPSIS SUBROUTINE MP_BLOCK() SUBROUTINE MP_UNBLOCK() SUBROUTINE MP_BLOCKTIME(iters) INTEGER iters SUBROUTINE MP_SETUP() SUBROUTINE MP_CREATE(num) INTEGER num SUBROUTINE MP_DESTROY() INTEGER FUNCTION MP_NUMTHREADS() SUBROUTINE MP_SET_NUMTHREADS(num) INTEGER num INTEGER FUNCTION MP_MY_THREADNUM() INTEGER FUNCTION MP_IS_MASTER() SUBROUTINE MP_SETLOCK() INTEGER FUNCTION MP_SUGGESTED_NUMTHREADS(num) INTEGER num SUBROUTINE MP_UNSETLOCK() SUBROUTINE MP_BARRIER() LOGICAL FUNCTION MP_IN_DOACROSS_LOOP() SUBROUTINE MP_SET_SLAVE_STACKSIZE(size) INTEGER size MP_SHMEM_GET32 (target, source, length, source_thread) MP_SHMEM_PUT32 (target, source, length, target_thread) MP_SHMEM_IGET32 (target, source, target_inc, source_inc, length, source_thread) MP_SHMEM_IPUT32 (target, source, target_inc, source_inc, length, target_thread) MP_SHMEM_GET64 (target, source, length, source_thread) MP_SHMEM_PUT64 (target, source, length, target_thread) MP_SHMEM_IGET64 (target, source, target_inc, source_inc, length, source_thread) MP_SHMEM_IPUT64 (target, source, target_inc, source_inc, length, target_thread) DESCRIPTION The multiprocessing routines help control the level of parallelism used in Fortran programs. They should not be needed by most programs, but they can help to tune some applications. These routines are as follows: * MP_BLOCK uses the blockproc(2) system call to put all slave threads to sleep. This frees the processors for use by other jobs. This routine is useful if it is known that the slaves will not be needed for some time and the machine is being shared by several users. Calls to MP_BLOCK cannot be nested; a warning is issued if an attempt to do so is made. * MP_UNBLOCK wakes up slave threads that were previously blocked by a call to MP_BLOCK. You cannot unblock threads that are not currently blocked; a warning is issued if an attempt is made to do so. It is not necessary to call MP_UNBLOCK explicitly. When a Fortran parallel region is entered, a check is made, and if the slaves are currently blocked, a call is made to MP_UNBLOCK automatically. * MP_BLOCKTIME controls the amount of time a slave thread waits for work before giving up. When enough time has elapsed, the slave thread blocks itself. This automatic blocking is independent of the user-level blocking provided by the MP_BLOCK and MP_UNBLOCk calls. Slave threads that have blocked themselves are unblocked automatically upon entering a parallel region. The iters argument to MP_BLOCKTIME specifies the number of times to spin in the wait loop. By default, it is set to 10,000,000. This takes about .25 seconds on a 200MHz processor. As a special case, an argument of 0 disables the automatic blocking, which allows the slaves to spin-wait without limit. The environment variable MP_BLOCKTIME can be set to an integer value. It acts like an implicit call to MP_BLOCKTIME during program startup. For more information on the MP_BLOCKTIME environment variable, see pe_environ(5). * MP_DESTROY deletes the slave threads. They are stopped by forcing them to call the exit(2) system call. In general, doing this is discouraged. MP_BLOCk can be used in most cases. * MP_CREATE creates and initializes threads. Its num argument specifies the number of threads to create. Because the calling thread already counts as one, MP_CREATE actually creates num - 1 new slave threads. * MP_SETUP creates and initializes threads. It calls MP_CREATE using the current default number of threads. Unless otherwise specified, the default number is equal to the number of CPUs currently on the machine, or 8, whichever is less. If you have not called either of the thread creation routines already, then MP_SETUP is invoked automatically when the first parallel region is entered. If the MP_SETUP environment variable is set, then the MP_SETUP routine is called during Fortran initialization, before any user code is executed. * MP_NUMTHREADS returns the number of threads that would participate in an immediately following parallel region. If the threads have already been created, it returns the current number of threads. If the threads have not been created, it returns the current default number of threads. The count includes the master thread. Knowing this count can be useful in optimizing certain kinds of parallel loops by hand, but this function has the side effect of freezing the number of threads to the returned value. As a result, this routine should be used sparingly. To determine the number of threads without this side effect, see the description for the MP_SUGGESTED_NUMTHREADS routine on this man page. * MP_SET_NUMTHREADS sets the current default number of threads to the specified value. Note that this call does not directly create the threads; it only specifies the number that a subsequent MP_SETUP call should use. If the MP_SET_NUMTHREADS environment variable is set, it acts like an implicit call to MP_SET_NUMTHREADS during program startup. For more information on the MP_SET_NUMTHREADS environment variable, see the pe_environ(5) man page. For convenience when operating among several machines with different numbers of CPUs, the num argument to MP_SET_NUMTHREADS can be set to an expression involving integer literals; the binary operators + or -; the binary functions min and max; and the special symbolic value all. The all specification requests the total number of available CPUs on the current machine. For example, the following simple specification sets the number of threads to 7, which may be a fine choice on an 8 CPU machine, but would be a very bad choice on a 4 CPU machine: setenv MP_SET_NUMTHREADS 7 A better specification would be the following, which sets the number of threads to be one less than the number of CPUs on the current machine (but always at least one): setenv MP_SET_NUMTHREADS "max(1,all-1)" If your configuration includes some machines with large numbers of CPUs, setting an upper bound is a good idea. A specification such as the following requests no more than 4 cpus: setenv MP_SET_NUMTHREADS "min(all,4)" For compatibility with earlier releases, NUM_THREADS is supported as a synonym for MP_SET_NUMTHREADS. * MP_MY_THREADNUM returns an integer between 0 and n - 1, where n is the value returned by MP_NUMTHREADS. The master process is always thread 0. This routine is occasionally useful for optimizing certain kinds of loops by hand. * MP_IS_MASTER returns 1 if called by the master process, 0 otherwise. * MP_SETLOCK provides convenient (though limited) access to the locking routines. The convenience is that no set up need be done; it may be called directly without any preliminaries. The limitation is that there is only one lock. It is analogous to the ussetlock(3P) routine, but it accepts no arguments and does not return a value. This is useful for serializing access to shared variables (for example, counters) in a parallel region. Note that it will frequently be necessary to declare those variables as VOLATILE to ensure that the optimizer does not assign them to a register. For information on the VOLATILE statement, see your compiler's reference manual. * MP_SUGGESTED_NUMTHREADS uses its num argument as a hint about how many threads to use in subsequent parallel regions. It returns the previous value of the number of threads to be employed in parallel regions. It does not affect currently executing parallel regions, if any. The implementation may ignore this hint depending on factors such as overall system load. This routine can also be called with num = 0, in which case it simply returns the number of threads to be employed in parallel regions without the side effect present in MP_NUMTHREADS. * MP_UNSETLOCK is the companion routine for MP_SETLOCK. * MP_BARRIER provides a simple interface to a single barrier(3P). It can be used inside a parallel loop to force a barrier synchronization to occur among the parallel threads. The routine accepts no arguments, returns no value, and does not require any initialization. * MP_IN_DOACROSS_LOOP determines whether or not execution is currently inside a parallel loop. This can be useful if you have an external routine that can be called both from inside a parallel loop and also from outside a parallel loop and if the routine must do different things depending on whether it is being called in parallel or not. * MP_SET_SLAVE_STACKSIZE specifies the stack size (in bytes) to be used by the slave processes when they are created by sprocsp(2). The default size is 16MB. Note that slave processes only allocate their local data onto their stack. Shared data (even if allocated on the master's stack) is not counted. * MP_SHMEM_GET32, MP_SHMEM_PUT32, MP_SHMEM_IGET32, MP_SHMEM_IPUT32, MP_SHMEM_GET64, MP_SHMEM_PUT64, MP_SHMEM_IGET64, and MP_SHMEM_IPUT64 specify SHMEM-like operations. These routines allow you to manage communication explicitly, for reasons of performance or style, in a manner similar to SHMEM, one-sided communication, or message passing. The operations allow a thread to fetch from (get) and send to (put) data belonging to other threads. The MP_SHMEM routines can be used with OpenMP directives and with the Silicon Graphics DOACROSS directives. These routines are identical to the original SHMEM routines, but they are prefixed by MP_. For information on the SHMEM routines, see the intro_shmem(3) man page or contact your sales representative. When using the MP_SHMEM routines in a Fortran program, the data to be operated on must be in a common block and each thread must have its own private copy of the data being operated upon. If you are using OpenMP directives, use the THREADPRIVATE directive to declare the data to be private. If you are using the DOACROSS directive, declare the data to be private by specifying it as an argument to the -Xlocal option on the ld(1) command. (These methods are equivalent to using the TASKCOMMON directive in a Fortran program on a UNICOS system.) A GET routine requires that source point to private data. A PUT routine requires that target point to private data. For more information on the OpenMP directives and the DOACROSS directive, see your compiler reference manuals. Note that the DOACROSS directive is outmoded; the preferred alternative is the OpenMP DO directive. These routines accept the following arguments: target For the 32-bit versions of these routines, target is a pointer to a 32-bit quantity. For the 64-bit versions of these routines, target is a pointer to a 64-bit quantity. For a PUT operation, target must be private data. source For the 32-bit versions of these routines, target is a pointer to a 32-bit quantity. For the 64-bit versions of these routines, target is a pointer to a 64-bit quantity. For a GET operation, source must be private data. length Specifies the number of elements to be copied, in units of 32-bit or 64-bit elements, as appropriate. source_thread, target_thread Specify the numeric identifier of the remote source or target thread. source_inc, target_inc Specified for the strided routines, which are those with _IGET and _IPUT in their names. Specifies the increment, in units of 32-bit or 64-bit elements, along each of source and target, when performing the data transfer. The number of elements copied during a strided get or put operation is determined by length. You can call the MP_SHMEM routines only after the threads have been created, typically in the first DO/DOACROSS/PARALLEL region. Performing these operations while the program is still serial leads to a runtime error because each thread's copy has not yet been created. As library routines, they incur some runtime overhead. DIRECTIVES The MIPSpro Fortran 77 and MIPSpro Fortran 90 compilers allow you to apply the capabilities of a Silicon Graphics multiprocessor computer to the execution of a single job. By coding a few simple directives, the compiler splits the job into concurrently executing pieces, thereby decreasing the wall-clock run time of the job. Directives enable, disable, or modify a feature of the compiler. Essentially, directives are command line options specified within the input file instead of on the command line. Unlike command line options, directives have no default setting. To invoke a directive, you must either toggle it on or set a desired value for its level. Directives placed on the first line of the input file are called global directives. The compiler interprets them as if they appeared at the top of each program unit in the file. Use global directives to ensure that the program is compiled with the correct command line options. Directives appearing anywhere else in the file apply only until the end of the current program unit. The compiler resets the value of the directive to the global value at the start of the next program unit. You can set the global value using a command line option or a global directive. Some command line options act like global directives. Other command line options override directives. Many directives have corresponding command line options. If you specify conflicting settings in the command line and a directive, the compiler chooses the most restrictive setting. For Boolean options, if either the directive or the command line has the option turned off, it is considered off. For options that require a numeric value, the compiler uses the minimum of the command line setting and the directive setting. The Fortran compilers accept directives that generate code that can be run in parallel. The compiler directives look like Fortran comments. If multiprocessing is not turned on, these statements are treated as comments. This allows the identical source to be compiled with a single-processing compiler or by Fortran without the multiprocessing option. The following directives sets are supported for multiprocessing: * The OpenMP Fortran API directives. These portable, standard, multiprocessing directives are supported on IRIX and UNICOS systems. * The Origin series directives. These directives are developed specifically for the Origin series systems. * Other directive sets, including the PCF directives and the Autotasking directives, are supported for multiprocessing, but they are outmoded. For more information on any of the multiprocessing directives, see the MIPSpro Fortran 77 Programmer's Guide or the MIPSpro Fortran 90 Commands and Directives Reference Manual. QUERY INTRINSICS The DSM(3I) man page describes several query intrinsics for distributed arrays. You can use these intrinsics to obtain information about an individual dimension of a distributed array COMMAND LINE SUPPORT Various command line options must be in effect in order for multiprocessing to occur. The following command line options are used when multiprocessing is desired: * The -mp option enables all multiprocessing directives. * The -MP: option selectively disables particular directive sets and controls other aspects of multiprocessing. This option must be specified in conjunction with the -mp option. For example, to disable the OpenMP directives, you would include both -mp and -MP:open_mp=OFF on your command line. For more information on these command line options, see the f77(1) or f90(1) man pages. EXAMPLES Example 1. In the following example, assume that the command line includes -Wl,-Xlocal,mycommon_, which ensures that each thread has a private copy of X and Y: INTEGER X REAL(KIND=8), Y(100) COMMON /MYCOMMON/ X, Y Example 2. The following example copies the value of X on thread 3 into the private copy of X for the current thread: CALL MP_SHMEM_GET32 (X, X, 1, 3) Example 3. The following example copies the value of LOCALVAR into the thread 5 copy of X: CALL MP_SHMEM_PUT32 (X, LOCALVAR, 1, 5) Example 4. The following example fetches values from the thread 7 copy of array Y into LOCALARRAY: CALL MP_SHMEM_GET64 (LOCALARRAY, Y, 100, 7) Example 5. The following example copies the value of every other element of LOCALARRAY into the thread 9 copy of Y: CALL MP_SHMEM_IPUT64 (Y, LOCALARRAY, 2, 2, 50, 9) SEE ALSO f77(1), f90(1) blockproc(2), exit(2), sprocsp(2), intro_shmem(3) DSM(3I), SYNC(3I) barrier(3P), ussetlock(3P), pe_environ(5) MIPSpro Fortran 77 Programmer's Guide MIPSpro Fortran 90 Commands and Directives Reference Manual