-
Notifications
You must be signed in to change notification settings - Fork 167
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG]: cuda::ptx takes long to compile #2933
Comments
I have attaced a trace of the compile time. It can be checked in perfetto.dev. Turns out that a large portion of the time is spent preprocessing the CUDA fp16 and bf16 headers. It is transitively included as follows:
|
Yep, looks like the extended FP type headers are quite expensive, but since they are included as part of the CCCL config, they will affect each translation unit. @miscco could we consider only defining |
yeah that would definitely be better |
With #2981, we are down to 12ms. |
Is this a duplicate?
Type of Bug
Performance
Component
libcu++
Describe the bug
Including
<cuda/ptx>
takes ~800ms on my workstation.How to Reproduce
Comparing the time to compile an empty file, a file including
cuda/ptx
and a file includingcuda/std/__type_traits/integral_constant.h
(which is included fromcuda/ptx
).Expected behavior
This should not be a heavy header.
Reproduction link
No response
Operating System
Ubuntu Linux 22.04
nvidia-smi output
NA
NVCC version
Benchmark was performed using prerelease version of nvcc, but should be reproducible with any recent version.
The text was updated successfully, but these errors were encountered: