OpenMP (Open MultiProcessing) is a parallel programming model based on compiler directives which allows application developers to incrementally add parallelism to their application codes.
OpenMP API specification for parallel programming provides an application programming interface (API) that supports multi-platform shared memory multiprocessing programming in C, C++, and Fortran, on most platforms. It consists of a set of compiler directives, library routines, and environment variables that influence run-time behavior.
Since OpenMP focuses on the parallelism within a node (shared memory multiprocessing) it can be combined with message-passing programming models, such as MPI, to execute on multiple nodes.
You can use the OMP_NUM_THREADS
environment variable or the num_threads
directive within the #pragma parallel
to indicate the number of executing threads for the whole application or for the specified region, respectively.
All 4 version are valid, but they exemplify different aspects of a reduction.
By default, the first construct using the reduction
clause must be preferred. This is only if some issues are explicitly identified that any of the 3 alternatives might be explored.
The meaning of the schedule
clause is as follows:
static[,chunk]
: Distribute statically (meaning that the distribution is done before entering the loop) the loop iterations in batched of chunk
size in a round-robin fashion. If chunk
isn't specified, then the chunks are as even as possible and each thread gets at most one of them.dynamic[,chunk]
: Distribute the loop iterations among the threads by batches of chunk
size with a first-come-first-served policy, until no batch remains. If not specified, chunk
is set to 1guided[,chunk]
: Like dynamic
but with batches which sizes get smaller and smaller, down to 1auto
: Let the compiler and/or run time library decide what is best suitedruntime
: Deffer the decision at run time by mean of the OMP_SCHEDULE
environment variable. If at run time the environment variable is not defined, the default scheduling will be usedThe default for schedule
is implementation define. On many environments it is static
, but can also be dynamic
or could very well be auto
. Therefore, be careful that your implementation doesn't implicitly rely on it without explicitly setting it.
In the above examples, we used the fused form parallel for
or parallel do
. However, the loop construct can be used without fusing it with the parallel
directive, in the form of a #pragma omp for [...]
or !$omp do [...]
standalone directive within a parallel
region.
For the Fortran version only, the loop index variable(s) of the parallized loop(s) is (are) always private
by default. There is therefore no need of explicitly declaring them private
(although doing so isn't a error).
For the C and C++ version, the loop indexes are just like any other variables. Therefore, if their scope extends outside of the parallelized loop(s) (meaning if they are not declared like for ( int i = ...)
but rather like int i; ... for ( i = ... )
then they have to be declared private
.
A common pitfall is to believe that all threads of a parallel region should instantiate (create) tasks but this is not typically the case unless you want to create as many tasks as the number of threads times the number of elements to process. Therefore, in OpenMP task codes you'll find something similar to
#pragma omp parallel
#pragma omp single
...
#pragma omp task
{ code for a given task; }
...