Getting started with openmp

OpenMP (Open MultiProcessing) is a parallel programming model based on compiler directives which allows application developers to incrementally add parallelism to their application codes.

OpenMP API specification for parallel programming provides an application programming interface (API) that supports multi-platform shared memory multiprocessing programming in C, C++, and Fortran, on most platforms. It consists of a set of compiler directives, library routines, and environment variables that influence run-time behavior.

Since OpenMP focuses on the parallelism within a node (shared memory multiprocessing) it can be combined with message-passing programming models, such as MPI, to execute on multiple nodes.

Simple parallel example

You can use the OMP_NUM_THREADS environment variable or the num_threads directive within the #pragma parallel to indicate the number of executing threads for the whole application or for the specified region, respectively.

OpenMP reductions

All 4 version are valid, but they exemplify different aspects of a reduction.

By default, the first construct using the reduction clause must be preferred. This is only if some issues are explicitly identified that any of the 3 alternatives might be explored.

Loop parallelism in OpenMP

The meaning of the schedule clause is as follows:

static[,chunk]: Distribute statically (meaning that the distribution is done before entering the loop) the loop iterations in batched of chunk size in a round-robin fashion. If chunk isn't specified, then the chunks are as even as possible and each thread gets at most one of them.
dynamic[,chunk]: Distribute the loop iterations among the threads by batches of chunk size with a first-come-first-served policy, until no batch remains. If not specified, chunk is set to 1
guided[,chunk]: Like dynamic but with batches which sizes get smaller and smaller, down to 1
auto: Let the compiler and/or run time library decide what is best suited
runtime: Deffer the decision at run time by mean of the OMP_SCHEDULE environment variable. If at run time the environment variable is not defined, the default scheduling will be used

The default for schedule is implementation define. On many environments it is static, but can also be dynamic or could very well be auto. Therefore, be careful that your implementation doesn't implicitly rely on it without explicitly setting it.

In the above examples, we used the fused form parallel for or parallel do. However, the loop construct can be used without fusing it with the parallel directive, in the form of a #pragma omp for [...] or !$omp do [...] standalone directive within a parallel region.

For the Fortran version only, the loop index variable(s) of the parallized loop(s) is (are) always private by default. There is therefore no need of explicitly declaring them private (although doing so isn't a error).
For the C and C++ version, the loop indexes are just like any other variables. Therefore, if their scope extends outside of the parallelized loop(s) (meaning if they are not declared like for ( int i = ...) but rather like int i; ... for ( i = ... ) then they have to be declared private.

OpenMP reductions

Conditional parallel execution

Irregular OpenMP parallelism

A common pitfall is to believe that all threads of a parallel region should instantiate (create) tasks but this is not typically the case unless you want to create as many tasks as the number of threads times the number of elements to process. Therefore, in OpenMP task codes you'll find something similar to

#pragma omp parallel
#pragma omp single
...
   #pragma omp task
   { code for a given task; }
...

openmp

Topics related to openmp:

Getting started with openmp

Simple parallel example

OpenMP reductions

Loop parallelism in OpenMP

OpenMP reductions

Conditional parallel execution

Irregular OpenMP parallelism