We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
libcu++
CUDA provides warp shuffle intrinsics that support a limited set of types. Secondly, they there are not check to validate the inputs see https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#warp-shuffle-functions
Provide:
cuda::shfl(T var, int srcLane, unsigned mask = 0xFFFFFFFF, int width=warpSize)
cuda::shfl_up(T var, int delta, unsigned mask = 0xFFFFFFFF, int width=warpSize)
cuda::shfl_down(T var, int delta, unsigned mask = 0xFFFFFFFF, int width=warpSize)
cuda::shfl_xor(T var, int delta, unsigned mask = 0xFFFFFFFF, int width=warpSize)
that work with any trivially copyable types and validate the inputs, e.g. width must be a power of two, in debug mode
I suggest putting mask at the end because most of the time warp shuffles are used with all warp threads
mask
No response
The text was updated successfully, but these errors were encountered:
No branches or pull requests
Is this a duplicate?
Area
libcu++
Is your feature request related to a problem? Please describe.
CUDA provides warp shuffle intrinsics that support a limited set of types. Secondly, they there are not check to validate the inputs
see https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#warp-shuffle-functions
Describe the solution you'd like
Provide:
cuda::shfl(T var, int srcLane, unsigned mask = 0xFFFFFFFF, int width=warpSize)
cuda::shfl_up(T var, int delta, unsigned mask = 0xFFFFFFFF, int width=warpSize)
cuda::shfl_down(T var, int delta, unsigned mask = 0xFFFFFFFF, int width=warpSize)
cuda::shfl_xor(T var, int delta, unsigned mask = 0xFFFFFFFF, int width=warpSize)
that work with any trivially copyable types and validate the inputs, e.g. width must be a power of two, in debug mode
Describe alternatives you've considered
I suggest putting
mask
at the end because most of the time warp shuffles are used with all warp threadsAdditional context
No response
The text was updated successfully, but these errors were encountered: