OpenMP Work Sharing Constructs

Overview

Teaching: 20 min
Exercises: 10 min

Questions

During parallel code execution, how is the work distributed between threads?

Objectives

Learn about the OpenMP constructs for worksharing.

OpenMP offers several types of work sharing constructs that assist in parallelization.

The work-sharing constructs do not create new threads, instead they use a team of threads created by the parallel directive.
At the end of a work-sharing construct, a program will wait for all threads to complete. This behavior is called an implied barrier.

The omp parallel for

divides loop iterations across threads.
used after a parallel region has been created.
each thread executes the same instructions (data parallelism).

...
#pragma omp parallel for
    for (i=0; i < N; i++)
        c[i] = a[i] + b[i];
...

The omp sections

used to divide the work into distinct, discrete parts.
specify that the enclosed sections of code should be divided among threads.
employed in non-iterative work-sharing constructs.
more than one section can be executed by a thread.

Functional parallelism can be implemented using sections.

...
#pragma omp parallel shared(a,b,c,d) private(i)
  {
#pragma omp sections nowait
    {
#pragma omp section
    for (i=0; i < N; i++)
      c[i] = a[i] + b[i];
#pragma omp section
    for (i=0; i < N; i++)
      d[i] = a[i] * b[i];
    }  /* end of sections */
  }  /* end of parallel region */
...

nowait - do not wait for all threads to finish.

Using parallel sections with different thread counts

Compile the file sections.c and run it with a different number of threads. Start with 1 thread:
srun -c1 ./a.out
In this example there are two sections, and the program prints out which thread is handling which section.

What happens if the number of threads and the number of sections are different?

More threads than sections?

Less threads than sections?

Solution

If there are more threads than sections, only some threads will execute a section. If there are more sections than threads, the implementation defines how the extra sections are executed.

Applying the parallel directive

What will the following code do?
omp_set_num_threads(8);
#pragma omp parallel
for(i=0; i < N; i++){C[i] = A[i] + B[i];}
Answers:

One thread will execute each iteration sequentially

The iterations will be evenly distributed across 8 threads

Each of the 8 threads will execute all iterations sequentially overwriting the values of C.

Solution

The correct answer is 3.

Key Points

Data parallelism refers to the execution of the same task simultaneously on multiple computing cores.

Functional parallelism refers to the concurrent execution of different tasks on multiple computing cores.

previous episode

ACENET Summer School - Directive-Based Parallel Programming with OpenMP and OpenACC

next episode

OpenMP Work Sharing Constructs

Overview

The omp parallel for

The omp sections

Using parallel sections with different thread counts

Solution

Applying the parallel directive

Solution

Key Points

previous episode

next episode