Replies: 1 comment 5 replies
-
So we've run into issues like this before and CUB actually already has some logic to optionally disable any mention of the half/bfloat types. Lines 49 to 66 in f8a26b2 Can you try adding |
Beta Was this translation helpful? Give feedback.
5 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
We've encountered another interesting conundrum, trying to build torch.
Torch builds with a bunch of
__CUDA_NO_{HALF/BFLOAT16}...
defines https://github.com/pytorch/pytorch/blob/faf0015052ee37db718bc5efa6673e0c25be1e8d/cmake/Dependencies.cmake#L1609It works well enough for them, at least up until the CUB version included with cuda-12.4 (v2.3.0?).
However, attempting to build it with cub v2.3.2 runs into an issue. Apparently the newer version of CUB needs some of the fp16/bf16 overloads disabled by the torch build flags.
Note that the ambiguous overload here is a red herring. The real problem is that the correct overloads were not found because they were disabled.
If CUDA-provided overloads are enabled, then torch build fails:
https://github.com/pytorch/pytorch/actions/runs/8991128640/job/24733299645?pr=125707
So, we somehow need to have the cake (have CUDA-provided overloads), and eat it (have those overloads disabled), to keep both cub and torch happy.
Ideally it would be great if CUDA headers would stash the overloads in some known namespace, which would allow cub to pull them in into its own namespace, but it's not an option, those overloads are just preprocessed away.
Edit: Not sure if it's possible, though, as the
__CUDA_NO_*
macros also disable member functions, not just free-standing ones.Fixing torch would also be nice, but at the moment things work fine for them, and as far as torch is concerned, it's not a supported build configuration. They are still building with CUDA-12.1 and very recent versions of CUDA and related libraries are not their problem yet.
Considering that we're missing a relatively few overload functions, I wonder if it would make sense for CUB to carry its own set, and either always use them, or fall back to them if it's used in a build which disables those overloads in CUDA headers.
@jrhemstad @miscco @voznesenskym @malfet
Beta Was this translation helpful? Give feedback.
All reactions