OpenMP Work Sharing Constructs
Overview
Teaching: 20 min
Exercises: 10 minQuestions
During parallel code execution, how is the work distributed between threads?
Objectives
Learn about the OpenMP constructs for worksharing.
- OpenMP offers several types of work sharing constructs that assist in parallelization.
- The work-sharing constructs do not create new threads, instead they use a team of threads created by the
parallel
directive. - At the end of a work-sharing construct, a program will wait for all threads to complete. This behavior is called an implied barrier.
The omp parallel for
- divides loop iterations across threads.
- used after a parallel region has been created.
- each thread executes the same instructions (data parallelism).
...
#pragma omp parallel for
for (i=0; i < N; i++)
c[i] = a[i] + b[i];
...
The omp sections
- used to divide the work into distinct, discrete parts.
- specify that the enclosed sections of code should be divided among threads.
- employed in non-iterative work-sharing constructs.
- more than one section can be executed by a thread.
Functional parallelism can be implemented using sections.
...
#pragma omp parallel shared(a,b,c,d) private(i)
{
#pragma omp sections nowait
{
#pragma omp section
for (i=0; i < N; i++)
c[i] = a[i] + b[i];
#pragma omp section
for (i=0; i < N; i++)
d[i] = a[i] * b[i];
} /* end of sections */
} /* end of parallel region */
...
nowait
- do not wait for all threads to finish.
Using parallel sections with different thread counts
Compile the file sections.c and run it with a different number of threads. Start with 1 thread:
srun -c1 ./a.out
In this example there are two sections, and the program prints out which thread is handling which section.
- What happens if the number of threads and the number of sections are different?
- More threads than sections?
- Less threads than sections?
Solution
If there are more threads than sections, only some threads will execute a section. If there are more sections than threads, the implementation defines how the extra sections are executed.
Applying the parallel directive
What will the following code do?
omp_set_num_threads(8); #pragma omp parallel for(i=0; i < N; i++){C[i] = A[i] + B[i];}
Answers:
- One thread will execute each iteration sequentially
- The iterations will be evenly distributed across 8 threads
- Each of the 8 threads will execute all iterations sequentially overwriting the values of C.
Solution
The correct answer is 3.
Key Points
Data parallelism refers to the execution of the same task simultaneously on multiple computing cores.
Functional parallelism refers to the concurrent execution of different tasks on multiple computing cores.