Introduction
|
A GPU (Graphics Processing Unit) is best at data-parallel, arithmetic-intense calculations
CUDA is one of several programming interfaces for general-purpose computing on GPUs
Alliance clusters have special GPU-equipped nodes, which must be requested from the scheduler
|
Hello World
|
|
Adding Two Integers
|
|
Adding vectors with GPU threads
|
Threads are the lowest level of parallelization on a GPU
A kernel function replaces the code inside the loop to be parallelized
The CUDA <<<M,N>>> notation replaces the loop itself
|
Using blocks instead of threads
|
Use nvprof to profile CUDA functions
Blocks are the batches in which a GPU handles data
Blocks are handled by streaming multiprocessors (SMs)
Each block can have up to 1024 threads (on our current GPU cards)
|
Putting It All Together
|
|