Thread Addressing

Overview

Teaching: 10 min
Exercises: 0 min
Questions
  • How can I address multidimensional data structures more intuitively?

Objectives
  • Use 2D and 3D Grids and Blocks.

So far we have used the grid-stride loop to process one-dimensional vector data on the GPU. This is quite easy for e.g. adding or multiplying two vectors.

We’ve used this by launching our Kernel with:

myKernel<<<numBlocks, blockSize>>>(paramlist);

Where both numBlocks and blockSize are integers and we use the variables blockDim.x, blockIdx.x and threadIdx.x inside the kernel to calculate our position in the vector.

However this can get cumbersome if the data that I’m trying to process has more dimensions, like matrices or tensors.

But luckily CUDA allows for blocks and grids to have more than one dimension by using the more general notation:

myKernel<<<GridDef, BlockDef>>>(paramlist);

Where GridDef and BlockDef are data structures of type dim3 instead of int, each of which can be 1-, 2- or 3-Dimensional.

1D Addressing Example: 100 blocks with 256 threads per block:

With this notation, we can still create 1D-indices:

dim3 gridDef1(100,1,1);
dim3 blockDef1(256,1,1);
myKernel<<<gridDef1, blockDef1>>>(paramlist);

which is equivalent to:

myKernel<<<100, 256>>>(paramlist);

This will create \(100 \cdot 256 = 25600\) threads in total.

2D Addressing Example: 10x10 blocks with 16x16 threads per block:

With this notation, we can still create 1D-indices:

dim3 gridDef2(10,10,1);
dim3 blockDef2(16,16,1);
myKernel<<<gridDef2, blockDef2>>>(paramlist);

Like the previous example this will create \(10 \cdot 10 = 100\) blocks and \(16 \cdot 16 = 256\) threads per block, but instead of just using the variables blockDim.x, blockIdx.x, and threadIdx.x, it uses blockDim.x, blockDim.y, blockIdx.x, blockIdx.y, threadIdx.x, and threadIdx.y.

In the 2D case, we can now calculate indices x and y to access datapoints in a 2D matrix:

__global__ void kernel2(float *idata, float *odata)
{
    int x, y;
    x = blockIdx.x * blockDim.x + threadIdx.x;
    y = blockIdx.y * blockDim.y + threadIdx.y;
    odata[y][x] = func(idata[y][x]);
}
...
dim3 gridDef2(10,10,1);
dim3 blockDef2(16,16,1);
kernel2<<<gridDef2, blockDef2>>>(paramList);

Comparing 1D and 2D Examples

Description 1D Addressing Example 2D Addressing Example
Grid Definition dim3 gridDef1(100,1,1); dim3 gridDef2(10,10,1);
Block Definition dim3 blockDef1(256,1,1); dim3 blockDef2(16,16,1);
block dimensions blockDim.x (\(100\)) blockDim.x (\(10\))
    blockDim.y (\(10\))
block indices blockIdx.x (\(0...99\)) blockIdx.x (\(0...9\))
    blockIdx.y (\(0...9\))
thread indices threadIdx.x (\(0...255\)) threadIdx.x (\(0...15\))
    threadIdx.y (\(0...15\))

This can also be extended to a third (z-) Dimension, however we are still limited by the maximum number of Threads per Block of 1024 and the Maximum Thread Dimensions we discussed in the previous episode.

Key Points

  • Using 2D or 3D GridDefs and BlockDefs can make it easier to address multi-dimensional data.

  • CUDA has a special type dim3 to define multi-dimensional grid and block definitions.