Where To Go Next?

Overview

Teaching: 5 min
Exercises: 0 min
Questions
  • What software libraries can I use so that I don’t need to write my own kernels?

  • Where can I find additional resources?

Objectives
  • Links to additional resources.

Use of optimized libraries

If the problem that needs to be accelerated, is a known and somewhat known mathematical problem, it is often not necessary to write custom CUDA kernels, because there are a variety of GPU-accelerated libraries. Those usually perform much better than kernels that we could come up with ourselves, because teams of programmers have spent countless hours to optimize them.

Description Name URL
BLAS (Basic Linear Algebra Subprograms) cuBLAS https://docs.nvidia.com/cuda/cublas/
Drop-in BLAS (based on cuBLAS) NVBLAS https://docs.nvidia.com/cuda/nvblas/
FFT (Fast Fourier Transform) cuFFT https://docs.nvidia.com/cuda/cufft/
Sparse matrices cuSPARSE https://docs.nvidia.com/cuda/cusparse/
Solver for dense and sparse matrices cuSOLVER https://docs.nvidia.com/cuda/cusolver/
LAPACK (Linear Algebra Package) MAGMA https://icl.utk.edu/magma/
matrix, signal, and image processing ArrayFire https://arrayfire.org/docs/

More GPU accelerated libraries: https://developer.nvidia.com/gpu-accelerated-libraries

Where to go next?

This has been the barest of introductions to CUDA and GPU programming. Don’t forget the CUDA Programming Guide we mentioned earlier:

As mentioned in episode 1, there are other ways to program GPUs than CUDA. Here are two OpenACC tutorials, including one from the Alliance:

If you’re using GPUs, then performance obviously matters to you. A lot. Here’s a great blog post that summarizes the most important performance issues around GPUs. It’s from a perspective of Deep Learning, but the thoughts are quite general:

Key Points

  • Many software libraries implement highly-optimized solutions for common problems.