Drawing the Mandelbrot set
Overview
Teaching: 20 min
Exercises: 100 minQuestions
How do we handle irregular tasks?
Objectives
Learn about the schedule() clause
Here’s today’s only example in Fortran. Sorry.
The Mandelbrot set was a hot subject of computer art in the 1980s. The algorithm is quite simple: For each point on the screen, do an iterative calculation and decide whether the calculation diverges or not. Color that spot on the screen according to how many iterations it took to diverge — or black if it didn’t diverge in 1000 iterations.
For simplicity this implementation just prints out a number instead of coloring a pixel.
program mandel
! Mandelbrot set generator, after
! Chandra et al, "Parallel Programming in OpenMP", and
! http://en.wikipedia.org/wiki/Mandelbrot_set#For_programmers
! Ross Dickson, ACEnet, 2010.06.24
integer,parameter:: maxiter=999, m=30, n=20
double precision,parameter:: xmin=-2.d0,xmax=1.d0
double precision,parameter:: ymin=-1.d0,ymax=1.d0
integer depth(n,m)
integer iter, j, k
integer starttime, endtime, tickspersec
double precision x, y
call system_clock(starttime)
!$omp parallel do private(..?...)
do j = 1, m
do k = 1, n
x = xmin + j*(xmax-xmin)/m
y = ymin + k*(ymax-ymin)/n
depth(k, j) = mandel_val(x, y, maxiter)
end do
end do
!$omp end parallel do
call system_clock(endtime,tickspersec)
do k = n, 1, -1
write(*,'(30I4)') depth(k,:)
end do
write(*,*) 'working time: ',real(endtime-starttime)/real(tickspersec),' sec'
end program
integer function mandel_val(x0, y0, maxiter)
double precision x0, y0
integer maxiter
double precision x, y, xtemp
integer iter
x = 0.d0
y = 0.d0
iter = 0
do while (( x*x + y*y <= 4.d0 ) .and. ( iter < maxiter ))
xtemp = x*x - y*y + x0
y = 2*x*y + y0
x = xtemp
iter = iter + 1
end do
mandel_val = iter
end function
First, compile and run the program without OpenMP like so:
gfortran mandel.f90 -o mandel-serial
./mandel-serial
Appreciate the retro ASCII art, and note how long it took to run. A millisecond is not enough to get good performance measurements on.
Next, increase the dimensions m,n
to 3000,2000
and recompile. You don’t
want to dump 6,000,000 numbers to your screen, so send the output to a file:
./mandel-serial >mandel-serial.out
.
Check the run time: tail -1 mandel-serial.out
Now comes the parallelization.
Exercise 1: Parallelize the code
There is a
parallel do
directive in the code, exactly analogous to theparallel for
in C. But theprivate()
clause is not complete. Decide what variable or variables should be made private, and then compile and test the code like so:gfortran -fopenmp mandel.f90 -o mandel-omp OMP_NUM_THREADS=2 ./mandel-omp >mandel-2.out
Try a few different values of
OMP_NUM_THREADS
. How does the performance scale?
The schedule() clause
OpenMP loop directives (parallel for, parallel do
) can take several other
clauses besides the private()
clause we’ve already seen. One is schedule()
,
which allows us to specify how loop iterations are divided up among the
threads.
The default is static scheduling, in which all iterations are allocated to threads before they execute any loop iterations. In dynamic scheduling, only some of the iterations are allocated to threads at the beginning of the loop’s execution. Threads that complete their iterations are then eligible to get additional work. The allocation process continues until all the iterations have been distributed to threads.
There’s a tradeoff between overhead (i.e., how much time is spent setting up the schedule) and load balancing (i.e., how much time is spent waiting for the most heavily-worked thread to catch up). Static scheduling has low overhead but may be badly balanced; dynamic scheduling has higher overhead. Both can also take a chunk size; larger chunks mean less overhead and greater memory locality, smaller chunks may mean finer load balancing. You can omit the chunk size, it defaults to 1.
Bad load balancing might be what’s causing this Mandelbrot code not to
parallelize very well. Let’s add a schedule()
clause and see what happens.
!$omp parallel do private(...?...) schedule(dynamic,?)
Exercise 2: Play with the schedule() clause
Try different
schedule()
clauses and tabulate the run times with different thread numbers. What seems to work best for this problem?Does it change much if you grow the problem size? That is, if you make
m,n
bigger?There’s a third option,
guided
, which starts with large chunks and gradually decreases the chunk size as it works through the iterations. Try it out too, if you like. Withschedule(guided,<chunk>)
, the chunk parameter is the smallest chunk size it will try.
Key Points
Different loop scheduling may compensate for unbalanced loop iterations