ACENET Summer School - GPGPU: Glossary

Key Points

Introduction
  • A GPU (Graphics Processing Unit) is best at data-parallel, arithmetic-intense calculations

  • CUDA is one of several programming interfaces for general-purpose computing on GPUs

  • Alliance clusters have special GPU-equipped nodes, which must be requested from the scheduler

Hello World
  • Use nvcc to compile

  • CUDA source files are suffixed .cu

  • Use salloc to get an interactive session on a GPU node for testing

Adding Two Integers
  • The CPU (the ‘host’) and the GPU (the ‘device’) have separate memory banks

  • This requires explicit copying of data to and from the GPU

Adding vectors with GPU threads
  • Threads are the lowest level of parallelization on a GPU

  • A kernel function replaces the code inside the loop to be parallelized

  • The CUDA <<<M,N>>> notation replaces the loop itself

Using blocks instead of threads
  • Use nvprof to profile CUDA functions

  • Blocks are the batches in which a GPU handles data

  • Blocks are handled by streaming multiprocessors (SMs)

  • Each block can have up to 1024 threads (on our current GPU cards)

Putting It All Together
  • A typical kernel indexes data using both blocks and threads

Glossary

FIXME