A Parallel Hello World Program
Overview
Teaching: 20 min
Exercises: 10 minQuestions
How do you compile and run an OpenMP program?
What are OpenMP pragmas?
How to identify threads?
Objectives
Write, compile and run a multi-threaded program where each thread prints “hello world”.
Adding Parallelism to a Program
OpenMP core syntax
#pragma omp <directive> [clause] ... [clause]
- All OpenMP pragmas begin with 
#pragma omp - Directive specifies the parallel action
 - Optional clause[s] describe additional behavior
 
How to add parallelism to the basic hello_world program?
- Include OpenMP header
 - Use the 
paralleldirective 
Begin with the file hello_template.c.
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char **argv) {
   printf("Hello World\n");
}
Make a copy named hello_omp.c.
- To make hello_omp.c parallel, add the following code:
 
...
#include <omp.h>
...
int main(int argc, char **argv) {
#pragma omp parallel
...
}
hello_omp.c
How to compile an OpenMP program?
- Compiling OpenMP code requires the use of a compiler flag (
-fopenmpfor GCC,-qopenmpfor Intel). 
gcc -fopenmp -o hello hello_omp.c
icc -qopenmp -o hello hello_omp.c
Running an OpenMP program
- Use the environment variable 
OMP_NUM_THREADSto control the number of threads. 
export OMP_NUM_THREADS=3
./hello
How does the parallel hello_world program work?
- The compiler generates code that starts a number of threads equal to the number of available cores or to the value of the variable OMP_NUM_THREADS (if it is set).
 - Each team thread then executes the function 
printf("Hello World\n") - Threads rejoin the main thread when they return from the 
printf()function. At this point, team threads are terminated and only the main thread remains. - After reaching the end of the program, the main thread terminates.
 
Using multiple cores
Try running the
helloprogram with different number of threads.
- Is it possible to use more threads than the number of cores on the machine?
 You can use the
nproccommand to find out how many cores are on the machine.Solution
Since threads are a programming abstraction, there is no direct relationship between them and cores. In theory, you can launch as many threads as you like. However, if you use more threads than physical cores, performance may suffer. There is also a possibility that the OS and/or OpenMP implementation can limit the number of threads.
OpenMP with SLURM
To submit an OpenMP job to the SLURM scheduler, you can use the following submission script template:
#!/bin/bash #SBATCH --cpus-per-task=4 ./helloSubmission scripts are submitted to the queue with the
sbatchcommand:sbatch <submission_script>You can also ask for an interactive session with multiple cores like so:
[user45@login1 ~]$ salloc --mem-per-cpu=1000 --cpus-per-task=2 --time=1:0:0salloc: Granted job allocation 179 salloc: Waiting for resource configuration salloc: Nodes node1 are ready for job [user45@node1 ~]$The most practical way to run our short parallel program on our test cluster is using the
sruncommand. It will run the program on the cluster from the interactive shell.
- All three commands (
 sbatch,sallocandsrun) accept the same keywords.srun --cpus-per-task=4 hello # or even shorter: srun -c4 hello
Identifying threads
How can we tell which thread is doing what?
- The function 
omp_get_thread_num()returns the thread ID of the currently running process. 
...
#pragma omp parallel
   {
     int id = omp_get_thread_num();
     printf("Hello World from thread %d\n", id);
   }
...
- Another useful function is 
omp_get_num_threads(), which returns the number of threads. 
Thread ordering
Run the
helloprogram several times.
In what order do the threads write out their messages?
What is happening here?Solution
The messages are emitted in random order. This is an important rule of not only OpenMP programming, but parallel programming in general: parallel elements are scheduled to run by the operating system and order of their execution is not guaranteed.
Conditional Compilation
We said earlier that you should be able to use the same code for both OpenMP and serial compillation. Try compiling the code without the
-fopenmpflag.
- What happens?
 - Can you figure out how to fix it?
 Hint: The -fopenmp option defines preprocessor macro
_OPENMP. The#ifdef _OPENMPpreprocessor directive can be used to tell the compiler to process the line containing the omp_get_thread_num( ) function only if the macro exists.Solution
... #ifdef _OPENMP printf("Hello World %i\n", omp_get_thread_num()); #else printf("Hello World \n"); #endif ...hello_omp.c
OpenMP Constructs
- Directives combined with code form a construct:
 
#pragma omp parallel private(thread_id) shared(nthreads)
{
  nthreads = omp_get_num_threads();
  thread_id) = omp_get_thread_num();
  printf("This thread %d of %d is first\n", id, size);
}
Parallel Construct
- In general parallel consruct, it’s up to you to decide what work each thread takes on.
 
The omp single
- By using a single directive, we can run a block of code by just one thread, the thread that encounters it first.
 
View more information about the omp single directive
Which thread runs first?
Modify the following code to print out only the thread that gets to run first in the parallel section. You should be able to do it by adding only one line. Here’s a (rather dense) reference guide in which you can look for ideas: Directives and Constructs for C/C++.
#include <stdio.h> #include <stdlib.h> #include <omp.h> int main(int argc, char **argv) { int id, size; #pragma omp parallel private(id,size) { size = omp_get_num_threads(); id = omp_get_thread_num(); printf("This thread %d of %d is first\n", id, size); } }Solution
... id = omp_get_thread_num(); #pragma omp single ...first_thread.c
Key Points
OpenMP pragmas direct the compiler what to parallelize and how to parallelize it.
By using the environment variable
OMP_NUM_THREADS, it is possible to control how many threads are used.The order in which parallel elements are executed cannot be guaranteed.
A compiler that isn’t aware of OpenMP pragmas will compile a single-threaded program.