This lesson is being piloted (Beta version)

Race Conditions with OpenMP

Overview

Teaching: 25 min
Exercises: 5 min
Questions
  • How can we calculate integrals in parallel?

  • How to compile a program using math functions with gcc?

Objectives
  • Understand what is a race condition

  • Learn how to avoid race conditions

  • Learn how to use reduction variables


Parallel Numerical Integration

Here is an example of an integration of the sine function from 0 to $\pi$.

Serial version is as follows:

#include <stdio.h>
#include <stdlib.h>
#include <math.h>

int main(int argc, char **argv) {
	int steps = 1e7;           /* Number of rectangles */ 
	double delta = M_PI/steps; /* Width of each rectangle */
	double total = 0.0;        /* Accumulator */
	int i;

	printf("Using %.0e steps\n", (float)steps);
	for (i=0; i<steps; i++) {
		total = total + sin(delta*i) * delta;
	}
	printf("The integral of sine from 0 to Pi is %.12f\n", total);
}

integrate_sin_serial.c

Make a copy of the file integrate_sin_serial.c that you will work with to parallelize. You can call it integrate_sin_omp.c.

Compiling Code with Mathematical Functions

To compile a C program that uses math library with GCC we need to explicitly link to it:

 gcc -o integrate integrate_sin_omp.c -lm

Let’s run the program.

./integrate
Using 1e+07 steps
The integral of sine from 0 to Pi is 2.000000000000
srun time -p ./integrate
Using 1e+07 steps
The integral of sine from 0 to Pi is 2.000000000000
real 0.94
user 0.94
sys 0.00

The output real is the useful one. This example took 0.941 seconds to run.

The user and sys lines describe how much time was spent in the “user” code and how much in the “system” code, a distinction that doesn’t interest us today.

Parallelizing Numerical Integration

Let’s parallelize the main loop and execute the code.

...
	printf("Using %.0e steps\n", (float)steps);
	#pragma omp parallel for
...	

integrate_sin_omp.c

How to Avoid Data Race Conditions?

The omp critical Directive

Race conditions can be avoided by adding a critical directive.

Add the omp critical directive to the statement in the main loop and rerun the code.

...
	printf("Using %.0e steps\n", (float)steps);
	#pragma omp parallel for
	for (i=0; i<steps; i++) {
	#pragma omp critical
...	

integrate_sin_omp.c

gcc -o integrate integrate_sin_omp.c -lm -fopenmp
srun -c1 time -p ./integrate
srun -c2 time -p ./integrate

The omp atomic Directive

Examples demonstrating how to use atomic:

#pragma omp atomic update
     k += n*mass;  /* update k atomically */
#pragma omp atomic read
     tmp = c;		/* read c atomically */
#pragma omp atomic write
     count = n*m;		/* write count atomically */
#pragma omp atomic capture
{ /* capture original value of v in d, then atomically update v */
	d = v; 
	v += n; 
} 

The atomic clauses: update (default), write, read, capture

Parallel Performance

  • Replace the critical directive with the atomic directive, recompile the code and run it on more that one thread. Try different number of threads (1-8) and record execution time.
  • How execution time compares to the code using critical?

Solution

Version Threads Time
serial   0.94
parallelized using critical 1 1.11
parallelized using critical 2 1.32
parallelized using critical 4 1.71
parallelized using atomic 1 1.03
parallelized using atomic 2 0.69
parallelized using atomic 4 0.72

The Optimal Way to Parallelize Integration

The best option is to let each thread work on its own chunk of data and then sum the sums of all threads. The reduction clause lets you specify thread-private variables that are subject to a reduction operation at the end of the parallel region

#pragma omp parallel for reduction(+: total)

Recompile the code and execute. Now we got the right answer, and x3.7 speedup with 4 threads!

Finding the Maximum Value in an Array

Imagine that you are trying to find the largest value in an array. Is there a way to do this type of search in parallel? Begin with the serial version:

/* --- File array_max_serial.c --- */
#include <stdio.h>
#include <stdlib.h>
#include <omp.h>
#include <math.h>

int main(int argc, char **argv) {
	int size = 1e8;
	int *rand_nums;
	int i;
	int curr_max;
	time_t t;

	rand_nums=malloc(size*sizeof(int)); 
	/* Intialize random number generator */
	srand((unsigned) time(&t));
	/* Initialize array with random values */
	for (i=0; i<size; i++) {
		rand_nums[i] = rand();
	}
	/* Find maximum */
	curr_max = 0.0;
	for (i=0; i<size; i++) {
		curr_max = fmax(curr_max, rand_nums[i]);
	}
	printf("Max value is %d\n", curr_max);
}

Solution

This problem is analogous to the summation.

  • Because the curr_max variable will be written to, each thread should have its own copy.
  • Once all threads have found the maximum value in their share of data, you need to determine which thread has the largest value.
    ...
    curr_max = 0.0;
    #pragma omp parallel for reduction(max:curr_max)
    ...
    

Key Points

  • Race conditions can be avoided by using the omp critical or the omp atomic directives

  • The best option to parallelize summation is to use the reduction directive