This lesson is being piloted (Beta version)

A Parallel Hello World Program

Overview

Teaching: 20 min
Exercises: 10 min
Questions
  • How do you compile and run an OpenMP program?

  • What are OpenMP pragmas?

  • How to identify threads?

Objectives
  • Write, compile and run a multi-threaded program where each thread prints “hello world”.

Adding Parallelism to a Program

OpenMP core syntax

#pragma omp <directive> [clause] ... [clause]

How to add parallelism to the basic hello_world program?

Begin with the file hello_template.c.

#include <stdio.h>
#include <stdlib.h>

int main(int argc, char **argv) {
   printf("Hello World\n");
}

hello_serial.c

Make a copy named hello_omp.c.

...
#include <omp.h>
...
int main(int argc, char **argv) {
#pragma omp parallel
...
}

hello_omp.c

How to compile an OpenMP program?

gcc -fopenmp -o hello hello_omp.c
icc -qopenmp -o hello hello_omp.c

Running an OpenMP program

export OMP_NUM_THREADS=3
./hello

How does the parallel hello_world program work?

  1. The compiler generates code that starts a number of threads equal to the number of available cores or to the value of the variable OMP_NUM_THREADS (if it is set).
  2. Each team thread then executes the function printf("Hello World\n")
  3. Threads rejoin the main thread when they return from the printf() function. At this point, team threads are terminated and only the main thread remains.
  4. After reaching the end of the program, the main thread terminates.

Using multiple cores

Try running the hello program with different number of threads.

  • Is it possible to use more threads than the number of cores on the machine?

You can use the nproc command to find out how many cores are on the machine.

Solution

Since threads are a programming abstraction, there is no direct relationship between them and cores. In theory, you can launch as many threads as you like. However, if you use more threads than physical cores, performance may suffer. There is also a possibility that the OS and/or OpenMP implementation can limit the number of threads.

OpenMP with SLURM

To submit an OpenMP job to the SLURM scheduler, you can use the following submission script template:

#!/bin/bash
#SBATCH --cpus-per-task=4
./hello

Submission scripts are submitted to the queue with the sbatch command: sbatch <submission_script>

You can also ask for an interactive session with multiple cores like so:

[user45@login1 ~]$ salloc --mem-per-cpu=1000 --cpus-per-task=2 --time=1:0:0
salloc: Granted job allocation 179
salloc: Waiting for resource configuration
salloc: Nodes node1 are ready for job
[user45@node1 ~]$ 

The most practical way to run our short parallel program on our test cluster is using the srun command. It will run the program on the cluster from the interactive shell.

  • All three commands (sbatch, salloc and srun) accept the same keywords.
srun --cpus-per-task=4 hello
# or even shorter:
srun -c4 hello

Identifying threads

How can we tell which thread is doing what?

...
#pragma omp parallel
   {
     int id = omp_get_thread_num();
     printf("Hello World from thread %d\n", id);
   }

...

Thread ordering

Run the hello program several times.
In what order do the threads write out their messages?
What is happening here?

Solution

The messages are emitted in random order. This is an important rule of not only OpenMP programming, but parallel programming in general: parallel elements are scheduled to run by the operating system and order of their execution is not guaranteed.

Conditional Compilation

We said earlier that you should be able to use the same code for both OpenMP and serial compillation. Try compiling the code without the -fopenmp flag.

  • What happens?
  • Can you figure out how to fix it?

Hint: The -fopenmp option defines preprocessor macro _OPENMP. The #ifdef _OPENMP preprocessor directive can be used to tell the compiler to process the line containing the omp_get_thread_num( ) function only if the macro exists.

Solution


  ...
  #ifdef _OPENMP
  printf("Hello World %i\n", omp_get_thread_num());
  #else
  printf("Hello World \n");
  #endif
  ...

hello_omp.c

OpenMP Constructs

#pragma omp parallel private(thread_id) shared(nthreads)
{
  nthreads = omp_get_num_threads();
  thread_id) = omp_get_thread_num();
  printf("This thread %d of %d is first\n", id, size);
}

Parallel Construct

The omp single

View more information about the omp single directive

Which thread runs first?

Modify the following code to print out only the thread that gets to run first in the parallel section. You should be able to do it by adding only one line. Here’s a (rather dense) reference guide in which you can look for ideas: Directives and Constructs for C/C++.

#include <stdio.h>
#include <stdlib.h>
#include <omp.h>

int main(int argc, char **argv) {
   int id, size;

   #pragma omp parallel private(id,size)
   {
      size = omp_get_num_threads();
      id = omp_get_thread_num();
      printf("This thread %d of %d is first\n", id, size);
   }
}

first_thread_template.c

Solution

...
      id = omp_get_thread_num();
      #pragma omp single
...

first_thread.c

Key Points

  • OpenMP pragmas direct the compiler what to parallelize and how to parallelize it.

  • By using the environment variable OMP_NUM_THREADS, it is possible to control how many threads are used.

  • The order in which parallel elements are executed cannot be guaranteed.

  • A compiler that isn’t aware of OpenMP pragmas will compile a single-threaded program.