Introduction

Overview

Teaching: 15 min
Exercises: 5 min
Questions
  • What is a GPU, and why do we use them?

  • What is CUDA?

Objectives
  • To know the names of some GPU programming tools

  • To get access to a GPU node with Slurm

What is a GPU, and why?

A Graphics Processing Unit is a specialized piece of computer circuitry designed to accelerate the creation of images. GPUs are key to the performance of many current computer games; a machine with only CPUs cannot update the picture on the screen fast enough to make the game playable.

Difference between CPUs and GPUs

Diagram CPUs vs GPUs

CPUs GPUs
extremely versatile (“Jack of all trades”) excel at number-crunching
task parallelism for diverse tasks data parallelism (single task)
minimize latency maximize throughput
multithreaded super-threaded
SIMD (Single Instruction Multiple Data) SIMT (Single-Instruction, Multiple-Thread)

In Summary

A GPU is effectively a small, highly specialized, parallel computer.

The GPU is especially well-suited to address problems that can be expressed as data-parallel computations - the same program is executed on many data elements in parallel - with high arithmetic intensity - the ratio of arithmetic operations to memory operations.

(CUDA C Programming Guide)

What is CUDA?

There are a number of packages you can use to program GPUs. Prominent ones are CUDA (only for NVIDIA GPUs), OpenCL, OpenACC, ROCm/HIP (only for AMD GPUs) and SYCL.

CUDA is an NVIDIA product. It once stood for Compute Unified Device Architecture, but not even NVIDIA uses that expansion any more. CUDA has two components. One part is a driver that actually handles communications with the card. The second part is a set of libraries that allow your code to interact with the driver portion. CUDA only works with NVIDIA graphics cards.

Why should I learn CUDA if it doesn’t work with AMD GPUs?

In fact the way ROCm/HIP is used to program AMD GPUs is very similar to CUDA and CUDA kernels can easily be transformed into HIP kernels.

As of 2024, only NVIDIA GPUs are in use across ACENET’s and the Alliance’s HPC systems.

Monitoring your graphics card

Along with drivers, library files, and compilers, CUDA comes with several utilities that you can use to manage your work. One very useful tool is ‘nvidia-smi’ which lets you check the basic status of your GPU card, like, “Do I have one?” and “Is it working”. There are also tools for profiling and debugging CUDA code.

nvidia-smi

Running on a GPU node

If you run nvidia-smi on a login node or on a regular compute node, it will complain that it can’t communicate with the NVIDIA driver. This should be no surprise: No NVIDIA driver needs to be running on a node that has no GPU.

To get access to a GPU we have to go through Slurm. The “normal” way is to use sbatch, which will queue up a job for execution later:

$ cat testjob
#!/bin/bash
#SBATCH --gres=gpu:1    # THIS IS THE KEY LINE
#SBATCH --cpus-per-task=10
#SBATCH --mem=40G
#SBATCH --time=0:5:0
nvidia-smi
$ sbatch testjob

That asks for one GPU card and 10 CPU cores. This would be perfect for the national Béluga cluster, since the GPU nodes there have 4 GPUs, 40 CPU cores, and more than 160GB of RAM.
Five minutes of run time is foolishly short for a production job, but we’re testing, so this should be okay.

Alternatively we could use salloc to request a GPU node, or part of one, and start an interactive shell there. Because we don’t have enough nodes on our virtual cluster to provide a GPU for each person in the course, we’ll use yet a third form that you already saw in a previous week, srun:

$ srun --gres=gpu:1  nvidia-smi
Thu Jun 20 14:39:56 2024
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.239.06   Driver Version: 470.239.06   CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GRID V100D-8C       On   | 00000000:00:05.0 Off |                    0 |
| N/A   N/A    P0    N/A /  N/A |    560MiB /  8192MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Key Points

  • A GPU (Graphics Processing Unit) is best at data-parallel, arithmetic-intense calculations

  • CUDA is one of several programming interfaces for general-purpose computing on GPUs

  • Alliance clusters have special GPU-equipped nodes, which must be requested from the scheduler