mp(3F)



MP(3F)MP(3F)


NAME
     MP_BARRIER, MP_BLOCK, MP_BLOCKTIME, MP_CREATE, MP_DESTROY,
     MP_IN_DOACROSS_LOOP, MP_IS_MASTER, MP_MY_THREADNUM, MP_NUMTHREADS,
     MP_SET_NUMTHREADS, MP_SET_SLAVE_STACKSIZE MP_SETLOCK, MP_SETUP,
     MP_SUGGESTED_NUMTHREADS, MP_UNBLOCK, MP_UNSETLOCK, MP_SHMEM_GET32,
     MP_SHMEM_PUT32, MP_SHMEM_IGET32, MP_SHMEM_IPUT32, MP_SHMEM_GET64,
     MP_SHMEM_PUT64, MP_SHMEM_IGET64, MP_SHMEM_IPUT64 - Fortran
     multiprocessing utility routines

SYNOPSIS
     SUBROUTINE MP_BLOCK()

     SUBROUTINE MP_UNBLOCK()

     SUBROUTINE MP_BLOCKTIME(iters)
     INTEGER iters

     SUBROUTINE MP_SETUP()

     SUBROUTINE MP_CREATE(num)
     INTEGER num

     SUBROUTINE MP_DESTROY()

     INTEGER FUNCTION MP_NUMTHREADS()

     SUBROUTINE MP_SET_NUMTHREADS(num)
     INTEGER num

     INTEGER FUNCTION MP_MY_THREADNUM()

     INTEGER FUNCTION MP_IS_MASTER()

     SUBROUTINE MP_SETLOCK()

     INTEGER FUNCTION MP_SUGGESTED_NUMTHREADS(num)
     INTEGER num

     SUBROUTINE MP_UNSETLOCK()

     SUBROUTINE MP_BARRIER()

     LOGICAL FUNCTION MP_IN_DOACROSS_LOOP()

     SUBROUTINE MP_SET_SLAVE_STACKSIZE(size)
     INTEGER size

     MP_SHMEM_GET32 (target, source, length, source_thread)

     MP_SHMEM_PUT32 (target, source, length, target_thread)

     MP_SHMEM_IGET32 (target, source, target_inc, source_inc, length,
     source_thread)

     MP_SHMEM_IPUT32 (target, source, target_inc, source_inc, length,
     target_thread)

     MP_SHMEM_GET64 (target, source, length, source_thread)

     MP_SHMEM_PUT64 (target, source, length, target_thread)

     MP_SHMEM_IGET64 (target, source, target_inc, source_inc, length,
     source_thread)

     MP_SHMEM_IPUT64 (target, source, target_inc, source_inc, length,
     target_thread)

DESCRIPTION
     The multiprocessing routines help control the level of parallelism
     used in Fortran programs.  They should not be needed by most programs,
     but they can help to tune some applications.  These routines are as
     follows:

     * MP_BLOCK uses the blockproc(2) system call to put all slave threads
       to sleep.  This frees the processors for use by other jobs.  This
       routine is useful if it is known that the slaves will not be needed
       for some time and the machine is being shared by several users.
       Calls to MP_BLOCK cannot be nested; a warning is issued if an
       attempt to do so is made.

     * MP_UNBLOCK wakes up slave threads that were previously blocked by a
       call to MP_BLOCK.  You cannot unblock threads that are not currently
       blocked; a warning is issued if an attempt is made to do so.

       It is not necessary to call MP_UNBLOCK explicitly.  When a Fortran
       parallel region is entered, a check is made, and if the slaves are
       currently blocked, a call is made to MP_UNBLOCK automatically.

     * MP_BLOCKTIME controls the amount of time a slave thread waits for
       work before giving up.  When enough time has elapsed, the slave
       thread blocks itself.  This automatic blocking is independent of the
       user-level blocking provided by the MP_BLOCK and MP_UNBLOCk calls.
       Slave threads that have blocked themselves are unblocked
       automatically upon entering a parallel region.

       The iters argument to MP_BLOCKTIME specifies the number of times to
       spin in the wait loop.  By default, it is set to 10,000,000.  This
       takes about .25 seconds on a 200MHz processor.  As a special case,
       an argument of 0 disables the automatic blocking, which allows the
       slaves to spin-wait without limit.  The environment variable
       MP_BLOCKTIME can be set to an integer value.  It acts like an
       implicit call to MP_BLOCKTIME during program startup.  For more
       information on the MP_BLOCKTIME environment variable, see
       pe_environ(5).

     * MP_DESTROY deletes the slave threads.  They are stopped by forcing
       them to call the exit(2) system call.  In general, doing this is
       discouraged.  MP_BLOCk can be used in most cases.

     * MP_CREATE creates and initializes threads.  Its num argument
       specifies the number of threads to create.  Because the calling
       thread already counts as one, MP_CREATE actually creates num - 1 new
       slave threads.

     * MP_SETUP creates and initializes threads.  It calls MP_CREATE using
       the current default number of threads.  Unless otherwise specified,
       the default number is equal to the number of CPUs currently on the
       machine, or 8, whichever is less.  If you have not called either of
       the thread creation routines already, then MP_SETUP is invoked
       automatically when the first parallel region is entered.  If the
       MP_SETUP environment variable is set, then the MP_SETUP routine is
       called during Fortran initialization, before any user code is
       executed.

     * MP_NUMTHREADS returns the number of threads that would participate
       in an immediately following parallel region.  If the threads have
       already been created, it returns the current number of threads.  If
       the threads have not been created, it returns the current default
       number of threads.  The count includes the master thread.  Knowing
       this count can be useful in optimizing certain kinds of parallel
       loops by hand, but this function has the side effect of freezing the
       number of threads to the returned value.  As a result, this routine
       should be used sparingly.  To determine the number of threads
       without this side effect, see the description for the
       MP_SUGGESTED_NUMTHREADS routine on this man page.

     * MP_SET_NUMTHREADS sets the current default number of threads to the
       specified value.  Note that this call does not directly create the
       threads; it only specifies the number that a subsequent MP_SETUP
       call should use.  If the MP_SET_NUMTHREADS environment variable is
       set, it acts like an implicit call to MP_SET_NUMTHREADS during
       program startup.  For more information on the MP_SET_NUMTHREADS
       environment variable, see the pe_environ(5) man page.

       For convenience when operating among several machines with different
       numbers of CPUs, the num argument to MP_SET_NUMTHREADS can be set to
       an expression involving integer literals; the binary operators + or
       -; the binary functions min and max; and the special symbolic value
       all.  The all specification requests the total number of available
       CPUs on the current machine.

       For example, the following simple specification sets the number of
       threads to 7, which may be a fine choice on an 8 CPU machine, but
       would be a very bad choice on a 4 CPU machine:

          setenv MP_SET_NUMTHREADS 7

       A better specification would be the following, which sets the number
       of threads to be one less than the number of CPUs on the current
       machine (but always at least one):

          setenv MP_SET_NUMTHREADS "max(1,all-1)"

       If your configuration includes some machines with large numbers of
       CPUs, setting an upper bound is a good idea.  A specification such
       as the following requests no more than 4 cpus:

          setenv MP_SET_NUMTHREADS "min(all,4)"

       For compatibility with earlier releases, NUM_THREADS is supported as
       a synonym for MP_SET_NUMTHREADS.

     * MP_MY_THREADNUM returns an integer between 0 and n - 1, where n is
       the value returned by MP_NUMTHREADS.  The master process is always
       thread 0.  This routine is occasionally useful for optimizing
       certain kinds of loops by hand.

     * MP_IS_MASTER returns 1 if called by the master process, 0 otherwise.

     * MP_SETLOCK provides convenient (though limited) access to the
       locking routines.  The convenience is that no set up need be done;
       it may be called directly without any preliminaries.  The limitation
       is that there is only one lock.  It is analogous to the
       ussetlock(3P) routine, but it accepts no arguments and does not
       return a value.  This is useful for serializing access to shared
       variables (for example, counters) in a parallel region.  Note that
       it will frequently be necessary to declare those variables as
       VOLATILE to ensure that the optimizer does not assign them to a
       register.  For information on the VOLATILE statement, see your
       compiler's reference manual.

     * MP_SUGGESTED_NUMTHREADS uses its num argument as a hint about how
       many threads to use in subsequent parallel regions.  It returns the
       previous value of the number of threads to be employed in parallel
       regions.  It does not affect currently executing parallel regions,
       if any.  The implementation may ignore this hint depending on
       factors such as overall system load.  This routine can also be
       called with num = 0, in which case it simply returns the number of
       threads to be employed in parallel regions without the side effect
       present in MP_NUMTHREADS.

     * MP_UNSETLOCK is the companion routine for MP_SETLOCK.

     * MP_BARRIER provides a simple interface to a single barrier(3P).  It
       can be used inside a parallel loop to force a barrier
       synchronization to occur among the parallel threads.  The routine
       accepts no arguments, returns no value, and does not require any
       initialization.

     * MP_IN_DOACROSS_LOOP determines whether or not execution is currently
       inside a parallel loop.  This can be useful if you have an external
       routine that can be called both from inside a parallel loop and also
       from outside a parallel loop and if the routine must do different
       things depending on whether it is being called in parallel or not.

     * MP_SET_SLAVE_STACKSIZE specifies the stack size (in bytes) to be
       used by the slave processes when they are created by sprocsp(2).
       The default size is 16MB.  Note that slave processes only allocate
       their local data onto their stack.  Shared data (even if allocated
       on the master's stack) is not counted.

     * MP_SHMEM_GET32, MP_SHMEM_PUT32, MP_SHMEM_IGET32, MP_SHMEM_IPUT32,
       MP_SHMEM_GET64, MP_SHMEM_PUT64, MP_SHMEM_IGET64, and MP_SHMEM_IPUT64
       specify SHMEM-like operations.  These routines allow you to manage
       communication explicitly, for reasons of performance or style, in a
       manner similar to SHMEM, one-sided communication, or message
       passing.  The operations allow a thread to fetch from (get) and send
       to (put) data belonging to other threads.

       The MP_SHMEM routines can be used with OpenMP directives and with
       the Silicon Graphics DOACROSS directives.  These routines are
       identical to the original SHMEM routines, but they are prefixed by
       MP_.  For information on the SHMEM routines, see the intro_shmem(3)
       man page or contact your sales representative.

       When using the MP_SHMEM routines in a Fortran program, the data to
       be operated on must be in a common block and each thread must have
       its own private copy of the data being operated upon.  If you are
       using OpenMP directives, use the THREADPRIVATE directive to declare
       the data to be private.  If you are using the DOACROSS directive,
       declare the data to be private by specifying it as an argument to
       the -Xlocal option on the ld(1) command.  (These methods are
       equivalent to using the TASKCOMMON directive in a Fortran program on
       a UNICOS system.)  A GET routine requires that source point to
       private data.  A PUT routine requires that target point to private
       data.

       For more information on the OpenMP directives and the DOACROSS
       directive, see your compiler reference manuals.  Note that the
       DOACROSS directive is outmoded; the preferred alternative is the
       OpenMP DO directive.

       These routines accept the following arguments:

       target    For the 32-bit versions of these routines, target is a
                 pointer to a 32-bit quantity.  For the 64-bit versions of
                 these routines, target is a pointer to a 64-bit quantity.
                 For a PUT operation, target must be private data.

       source    For the 32-bit versions of these routines, target is a
                 pointer to a 32-bit quantity.  For the 64-bit versions of
                 these routines, target is a pointer to a 64-bit quantity.
                 For a GET operation, source must be private data.

       length    Specifies the number of elements to be copied, in units of
                 32-bit or 64-bit elements, as appropriate.

       source_thread, target_thread
                 Specify the numeric identifier of the remote source or
                 target thread.

       source_inc, target_inc
                 Specified for the strided routines, which are those with
                 _IGET and _IPUT in their names.  Specifies the increment,
                 in units of 32-bit or 64-bit elements, along each of
                 source and target, when performing the data transfer.  The
                 number of elements copied during a strided get or put
                 operation is determined by length.

       You can call the MP_SHMEM routines only after the threads have been
       created, typically in the first DO/DOACROSS/PARALLEL region.
       Performing these operations while the program is still serial leads
       to a runtime error because each thread's copy has not yet been
       created.  As library routines, they incur some runtime overhead.

DIRECTIVES
     The MIPSpro Fortran 77 and MIPSpro Fortran 90 compilers allow you to
     apply the capabilities of a Silicon Graphics multiprocessor computer
     to the execution of a single job.  By coding a few simple directives,
     the compiler splits the job into concurrently executing pieces,
     thereby decreasing the wall-clock run time of the job.

     Directives enable, disable, or modify a feature of the compiler.
     Essentially, directives are command line options specified within the
     input file instead of on the command line.  Unlike command line
     options, directives have no default setting. To invoke a directive,
     you must either toggle it on or set a desired value for its level.

     Directives placed on the first line of the input file are called
     global directives.  The compiler interprets them as if they appeared
     at the top of each program unit in the file.  Use global directives to
     ensure that the program is compiled with the correct command line
     options.  Directives appearing anywhere else in the file apply only
     until the end of the current program unit.  The compiler resets the
     value of the directive to the global value at the start of the next
     program unit.  You can set the global value using a command line
     option or a global directive.

     Some command line options act like global directives.  Other command
     line options override directives.  Many directives have corresponding
     command line options.  If you specify conflicting settings in the
     command line and a directive, the compiler chooses the most
     restrictive setting.  For Boolean options, if either the directive or
     the command line has the option turned off, it is considered off.  For
     options that require a numeric value, the compiler uses the minimum of
     the command line setting and the directive setting.

     The Fortran compilers accept directives that generate code that can be
     run in parallel.  The compiler directives look like Fortran comments.
     If multiprocessing is not turned on, these statements are treated as
     comments.  This allows the identical source to be compiled with a
     single-processing compiler or by Fortran without the multiprocessing
     option.

     The following directives sets are supported for multiprocessing:

     * The OpenMP Fortran API directives.  These portable, standard,
       multiprocessing directives are supported on IRIX and UNICOS systems.

     * The Origin series directives.  These directives are developed
       specifically for the Origin series systems.

     * Other directive sets, including the PCF directives and the
       Autotasking directives, are supported for multiprocessing, but they
       are outmoded.

     For more information on any of the multiprocessing directives, see the
     MIPSpro Fortran 77 Programmer's Guide or the MIPSpro Fortran 90
     Commands and Directives Reference Manual.

QUERY INTRINSICS
     The DSM(3I) man page describes several query intrinsics for
     distributed arrays.  You can use these intrinsics to obtain
     information about an individual dimension of a distributed array

COMMAND LINE SUPPORT
     Various command line options must be in effect in order for
     multiprocessing to occur.  The following command line options are used
     when multiprocessing is desired:

     * The -mp option enables all multiprocessing directives.

     * The -MP: option selectively disables particular directive sets and
       controls other aspects of multiprocessing.  This option must be
       specified in conjunction with the -mp option.  For example, to
       disable the OpenMP directives, you would include both -mp and
       -MP:open_mp=OFF on your command line.

     For more information on these command line options, see the f77(1) or
     f90(1) man pages.

EXAMPLES
     Example 1.  In the following example, assume that the command line
     includes -Wl,-Xlocal,mycommon_, which ensures that each thread has a
     private copy of X and Y:

          INTEGER X
          REAL(KIND=8), Y(100)
          COMMON /MYCOMMON/ X, Y

     Example 2.  The following example copies the value of X on thread 3
     into the private copy of X for the current thread:

          CALL MP_SHMEM_GET32 (X, X, 1, 3)

     Example 3.  The following example copies the value of LOCALVAR into
     the thread 5 copy of X:

          CALL MP_SHMEM_PUT32 (X, LOCALVAR, 1, 5)

     Example 4.  The following example fetches values from the thread 7
     copy of array Y into LOCALARRAY:

          CALL MP_SHMEM_GET64 (LOCALARRAY, Y, 100, 7)

     Example 5.  The following example copies the value of every other
     element of LOCALARRAY into the thread 9 copy of Y:

          CALL MP_SHMEM_IPUT64 (Y, LOCALARRAY, 2, 2, 50, 9)

SEE ALSO
     f77(1), f90(1)

     blockproc(2), exit(2), sprocsp(2),

     intro_shmem(3)

     DSM(3I), SYNC(3I)

     barrier(3P), ussetlock(3P),

     pe_environ(5)

     MIPSpro Fortran 77 Programmer's Guide

     MIPSpro Fortran 90 Commands and Directives Reference Manual