A Parallel Hello World Program
Overview
Teaching: 20 min
Exercises: 10 minQuestions
How do you compile and run an OpenMP program?
What are OpenMP pragmas?
How to identify threads?
Objectives
Write, compile and run a multi-threaded program where each thread prints “hello world”.
Adding Parallelism to a Program
OpenMP core syntax
#pragma omp <directive> [clause] ... [clause]
- All OpenMP pragmas begin with
#pragma omp
- Directive specifies the parallel action
- Optional clause[s] describe additional behavior
How to add parallelism to the basic hello_world program?
- Include OpenMP header
- Use the
parallel
directive
Begin with the file hello_template.c
.
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char **argv) {
printf("Hello World\n");
}
Make a copy named hello_omp.c
.
- To make hello_omp.c parallel, add the following code:
...
#include <omp.h>
...
int main(int argc, char **argv) {
#pragma omp parallel
...
}
hello_omp.c
How to compile an OpenMP program?
- Compiling OpenMP code requires the use of a compiler flag (
-fopenmp
for GCC,-qopenmp
for Intel).
gcc -fopenmp -o hello hello_omp.c
icc -qopenmp -o hello hello_omp.c
Running an OpenMP program
- Use the environment variable
OMP_NUM_THREADS
to control the number of threads.
export OMP_NUM_THREADS=3
./hello
How does the parallel hello_world program work?
- The compiler generates code that starts a number of threads equal to the number of available cores or to the value of the variable OMP_NUM_THREADS (if it is set).
- Each team thread then executes the function
printf("Hello World\n")
- Threads rejoin the main thread when they return from the
printf()
function. At this point, team threads are terminated and only the main thread remains. - After reaching the end of the program, the main thread terminates.
Using multiple cores
Try running the
hello
program with different number of threads.
- Is it possible to use more threads than the number of cores on the machine?
You can use the
nproc
command to find out how many cores are on the machine.Solution
Since threads are a programming abstraction, there is no direct relationship between them and cores. In theory, you can launch as many threads as you like. However, if you use more threads than physical cores, performance may suffer. There is also a possibility that the OS and/or OpenMP implementation can limit the number of threads.
OpenMP with SLURM
To submit an OpenMP job to the SLURM scheduler, you can use the following submission script template:
#!/bin/bash #SBATCH --cpus-per-task=4 ./hello
Submission scripts are submitted to the queue with the
sbatch
command:sbatch <submission_script>
You can also ask for an interactive session with multiple cores like so:
[user45@login1 ~]$ salloc --mem-per-cpu=1000 --cpus-per-task=2 --time=1:0:0
salloc: Granted job allocation 179 salloc: Waiting for resource configuration salloc: Nodes node1 are ready for job [user45@node1 ~]$
The most practical way to run our short parallel program on our test cluster is using the
srun
command. It will run the program on the cluster from the interactive shell.
- All three commands (
sbatch
,salloc
andsrun
) accept the same keywords.srun --cpus-per-task=4 hello # or even shorter: srun -c4 hello
Identifying threads
How can we tell which thread is doing what?
- The function
omp_get_thread_num()
returns the thread ID of the currently running process.
...
#pragma omp parallel
{
int id = omp_get_thread_num();
printf("Hello World from thread %d\n", id);
}
...
- Another useful function is
omp_get_num_threads()
, which returns the number of threads.
Thread ordering
Run the
hello
program several times.
In what order do the threads write out their messages?
What is happening here?Solution
The messages are emitted in random order. This is an important rule of not only OpenMP programming, but parallel programming in general: parallel elements are scheduled to run by the operating system and order of their execution is not guaranteed.
Conditional Compilation
We said earlier that you should be able to use the same code for both OpenMP and serial compillation. Try compiling the code without the
-fopenmp
flag.
- What happens?
- Can you figure out how to fix it?
Hint: The -fopenmp option defines preprocessor macro
_OPENMP
. The#ifdef _OPENMP
preprocessor directive can be used to tell the compiler to process the line containing the omp_get_thread_num( ) function only if the macro exists.Solution
... #ifdef _OPENMP printf("Hello World %i\n", omp_get_thread_num()); #else printf("Hello World \n"); #endif ...
hello_omp.c
OpenMP Constructs
- Directives combined with code form a construct:
#pragma omp parallel private(thread_id) shared(nthreads)
{
nthreads = omp_get_num_threads();
thread_id) = omp_get_thread_num();
printf("This thread %d of %d is first\n", id, size);
}
Parallel Construct
- In general parallel consruct, it’s up to you to decide what work each thread takes on.
The omp single
- By using a single directive, we can run a block of code by just one thread, the thread that encounters it first.
View more information about the omp single directive
Which thread runs first?
Modify the following code to print out only the thread that gets to run first in the parallel section. You should be able to do it by adding only one line. Here’s a (rather dense) reference guide in which you can look for ideas: Directives and Constructs for C/C++.
#include <stdio.h> #include <stdlib.h> #include <omp.h> int main(int argc, char **argv) { int id, size; #pragma omp parallel private(id,size) { size = omp_get_num_threads(); id = omp_get_thread_num(); printf("This thread %d of %d is first\n", id, size); } }
Solution
... id = omp_get_thread_num(); #pragma omp single ...
first_thread.c
Key Points
OpenMP pragmas direct the compiler what to parallelize and how to parallelize it.
By using the environment variable
OMP_NUM_THREADS
, it is possible to control how many threads are used.The order in which parallel elements are executed cannot be guaranteed.
A compiler that isn’t aware of OpenMP pragmas will compile a single-threaded program.