Clarification on template BLOCK_DIM_Y parameter #1653
Replies: 2 comments 5 replies
-
Have you noted the
in the documentation, e.g. the last point here. I.e. the algorithm seems to allow for multi-dimensional blocks, but they are flattened. The algorithm itself is always one-dimensional. |
Beta Was this translation helpful? Give feedback.
-
@johnmansell-dyndrite thank you for reaching out! We should provide examples illustrating CUB usage in the context of multi-dimensional thread blocks. I'll file an issue to make sure this is tracked somewhere. As to your question, I believe there are multiple factors at play.
If you load from
Now, let's see what happens with multi-dimensional thread block.
As @pauleonix rightfully noted, having block dimensions at compile time allows us to avoid unnecessary linearized thread index computations. If the block is defined as From the functional point of view, nothing else changes, CUB internals use
I believe that the main issue that introduces confusion in your case is
The linearized data looks like:
In the code you attached, the padding is 0, because pitch is equal to 512 bytes, int2 is 8 bytes, giving us I hope this helps describe the behavior you observe. Please, let us know if you find other gaps in our documentation. This would greatly help us improve it. |
Beta Was this translation helpful? Give feedback.
-
Problem
I'm having trouble understanding how to utilize the
BLOCK_DIM_Y
parameter in cub templates. From what I can see, all the examples only use 1D templates. The documentation states:However it is not behaving the way I would expect, so I think I must have a misunderstanding of what is supposed to be happening, or how one should process 2D data.
Example
I've created an example which is the simplest case demonstrating my confusion.
cudaMemcpy2D
cub::BlockLoad
I've created the matrix with
int2
values so each (x,y) pair can be tracked to see where it ends up, or what's missing. The example works as is, but there seems to be some unknown constraints or something I'm not understanding. If I change the matrix dimensions, one of the following happens:The copy seems to work correctly if
BLOCK_DIM_X == ROWS * ITEMS * (constant)
and
BLOCK_DIM_Y == COLS * ITEMS * (constant)
Question
BLOCK_DIM_Y
is used in the templates, and if it's possible to iterate over the data in this manner?Beta Was this translation helpful? Give feedback.
All reactions