Introduction
Overview
Teaching: 15 min
Exercises: 5 minQuestions
What is a GPU, and why do we use them?
What is CUDA?
Objectives
To know the names of some GPU programming tools
To get access to a GPU node with Slurm
What is a GPU, and why?
A Graphics Processing Unit is a specialized piece of computer circuitry designed to accelerate the creation of images. GPUs are key to the performance of many current computer games; a machine with only CPUs cannot update the picture on the screen fast enough to make the game playable.
Difference between CPUs and GPUs
CPUs | GPUs |
---|---|
extremely versatile (“Jack of all trades”) | excel at number-crunching |
task parallelism for diverse tasks | data parallelism (single task) |
minimize latency | maximize throughput |
multithreaded | super-threaded |
SIMD (Single Instruction Multiple Data) | SIMT (Single-Instruction, Multiple-Thread) |
In Summary
A GPU is effectively a small, highly specialized, parallel computer.
The GPU is especially well-suited to address problems that can be expressed as data-parallel computations - the same program is executed on many data elements in parallel - with high arithmetic intensity - the ratio of arithmetic operations to memory operations.
What is CUDA?
There are a number of packages you can use to program GPUs. Prominent ones are CUDA (only for NVIDIA GPUs), OpenCL, OpenACC, ROCm/HIP (only for AMD GPUs) and SYCL.
CUDA is an NVIDIA product. It once stood for Compute Unified Device Architecture, but not even NVIDIA uses that expansion any more. CUDA has two components. One part is a driver that actually handles communications with the card. The second part is a set of libraries that allow your code to interact with the driver portion. CUDA only works with NVIDIA graphics cards.
Why should I learn CUDA if it doesn’t work with AMD GPUs?
In fact the way ROCm/HIP is used to program AMD GPUs is very similar to CUDA and CUDA kernels can easily be transformed into HIP kernels.
As of 2024, only NVIDIA GPUs are in use across ACENET’s and the Alliance’s HPC systems.
Monitoring your graphics card
Along with drivers, library files, and compilers, CUDA comes with several utilities that you can use to manage your work. One very useful tool is ‘nvidia-smi’ which lets you check the basic status of your GPU card, like, “Do I have one?” and “Is it working”. There are also tools for profiling and debugging CUDA code.
nvidia-smi
Running on a GPU node
If you run nvidia-smi
on a login node or on a regular compute node, it will complain that
it can’t communicate with the NVIDIA driver. This should be no surprise: No NVIDIA driver
needs to be running on a node that has no GPU.
To get access to a GPU we have to go through
Slurm.
The “normal” way is to use sbatch
, which will queue up a job for execution later:
$ cat testjob
#!/bin/bash
#SBATCH --gres=gpu:1 # THIS IS THE KEY LINE
#SBATCH --cpus-per-task=10
#SBATCH --mem=40G
#SBATCH --time=0:5:0
nvidia-smi
$ sbatch testjob
That asks for one GPU card and 10 CPU cores. This would be perfect for the national
Béluga cluster,
since the GPU nodes there have 4 GPUs, 40 CPU cores, and more than 160GB of RAM.
Five minutes of run time is foolishly short for a production job, but we’re testing,
so this should be okay.
Alternatively we could use salloc
to request a GPU node, or part of one,
and start an interactive shell there. Because we don’t have enough nodes
on our virtual cluster to provide a GPU for each person in the course, we’ll use
yet a third form that you already saw in a previous week, srun
:
$ srun --gres=gpu:1 nvidia-smi
Thu Jun 20 14:39:56 2024
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.239.06 Driver Version: 470.239.06 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GRID V100D-8C On | 00000000:00:05.0 Off | 0 |
| N/A N/A P0 N/A / N/A | 560MiB / 8192MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
Key Points
A GPU (Graphics Processing Unit) is best at data-parallel, arithmetic-intense calculations
CUDA is one of several programming interfaces for general-purpose computing on GPUs
Alliance clusters have special GPU-equipped nodes, which must be requested from the scheduler