Thread Addressing
Overview
Teaching: 10 min
Exercises: 0 minQuestions
How can I address multidimensional data structures more intuitively?
Objectives
Use 2D and 3D Grids and Blocks.
So far we have used the grid-stride loop to process one-dimensional vector data on the GPU. This is quite easy for e.g. adding or multiplying two vectors.
We’ve used this by launching our Kernel with:
myKernel<<<numBlocks, blockSize>>>(paramlist);
Where both numBlocks
and blockSize
are integers and we use the variables blockDim.x
, blockIdx.x
and threadIdx.x
inside the kernel to calculate our position in the vector.
However this can get cumbersome if the data that I’m trying to process has more dimensions, like matrices or tensors.
But luckily CUDA allows for blocks and grids to have more than one dimension by using the more general notation:
myKernel<<<GridDef, BlockDef>>>(paramlist);
Where GridDef
and BlockDef
are data structures of type dim3
instead of int
, each of which can be 1-, 2- or 3-Dimensional.
1D Addressing Example: 100 blocks with 256 threads per block:
With this notation, we can still create 1D-indices:
dim3 gridDef1(100,1,1);
dim3 blockDef1(256,1,1);
myKernel<<<gridDef1, blockDef1>>>(paramlist);
which is equivalent to:
myKernel<<<100, 256>>>(paramlist);
This will create \(100 \cdot 256 = 25600\) threads in total.
2D Addressing Example: 10x10 blocks with 16x16 threads per block:
With this notation, we can still create 1D-indices:
dim3 gridDef2(10,10,1);
dim3 blockDef2(16,16,1);
myKernel<<<gridDef2, blockDef2>>>(paramlist);
Like the previous example this will create \(10 \cdot 10 = 100\) blocks
and \(16 \cdot 16 = 256\) threads per block, but instead of just using the variables
blockDim.x
, blockIdx.x
, and threadIdx.x
, it uses blockDim.x
, blockDim.y
, blockIdx.x
, blockIdx.y
, threadIdx.x
, and threadIdx.y
.
In the 2D case, we can now calculate indices x
and y
to access datapoints
in a 2D matrix:
__global__ void kernel2(float *idata, float *odata)
{
int x, y;
x = blockIdx.x * blockDim.x + threadIdx.x;
y = blockIdx.y * blockDim.y + threadIdx.y;
odata[y][x] = func(idata[y][x]);
}
...
dim3 gridDef2(10,10,1);
dim3 blockDef2(16,16,1);
kernel2<<<gridDef2, blockDef2>>>(paramList);
Comparing 1D and 2D Examples
Description | 1D Addressing Example | 2D Addressing Example |
---|---|---|
Grid Definition | dim3 gridDef1(100,1,1); |
dim3 gridDef2(10,10,1); |
Block Definition | dim3 blockDef1(256,1,1); |
dim3 blockDef2(16,16,1); |
block dimensions | blockDim.x (\(100\)) |
blockDim.x (\(10\)) |
blockDim.y (\(10\)) |
||
block indices | blockIdx.x (\(0...99\)) |
blockIdx.x (\(0...9\)) |
blockIdx.y (\(0...9\)) |
||
thread indices | threadIdx.x (\(0...255\)) |
threadIdx.x (\(0...15\)) |
threadIdx.y (\(0...15\)) |
This can also be extended to a third (z-) Dimension, however we are still limited by the maximum number of Threads per Block of 1024 and the Maximum Thread Dimensions we discussed in the previous episode.
Key Points
Using 2D or 3D GridDefs and BlockDefs can make it easier to address multi-dimensional data.
CUDA has a special type
dim3
to define multi-dimensional grid and block definitions.