Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CUDAX] Remove launch overloads taking dimensions and make everything except make_hierarchy return kernel_config #2979

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

pciolkosz
Copy link
Contributor

@pciolkosz pciolkosz commented Nov 28, 2024

I believe that at this point launch overloads taking hierarchy_dimensions instead of kernel_config is more of a past design artifact.
Hierarchy design predates kernel config and initially launch overloads taking a hierarchy were provided to make sure user won't need to take that extra step of converting hierarchy to a config (implicit conversion was not possible as well).
Recently make_config was extended to accept hierarchy levels. Now given a set of levels, if no launch options are needed the difference is literally make_hierarchy producing a hierarchy type and make_config producing a kernel config containing the same hierarchy.
I believe we should force users to use the kernel_config type in majority of cases and make it the only possible launch argument for below reasons:

  • It simplifies kernels that take config/dimensions as argument, before there was ambiguity which one to expect, now its consistent.
  • It simplifies that even further with the upcoming default kernel configuration, where kernel and launch site could have a mix of hierarchies or configs and it would be difficult to track what will end up being passed into the kernel. Trying to handle all of these cases in the implementation would also be a major pain
  • I think even for shorthands for config creation, I think I prefer distribute<N>(M).add(option1, option2) vs make_config(distribute<N>(M), option1, option2). I like how now user is mostly concerned about manipulating configs, instead of both hierarchies and configs and how they interact, unless they have some very specific case.
  • Operating only on configs is also closer to the equivalent launch function in python.

As a consequence of this change I also changed distribute and all of the operator& to return kernel_config so it works well with launch. I think we should keep make_hierarchy function for some special cases user would be interested only in the hierarchy type, but vast majority of cases it shouldn't be needed.

@pciolkosz pciolkosz requested a review from a team as a code owner November 28, 2024 00:33
Copy link
Contributor

🟩 CI finished in 22m 21s: Pass: 100%/54 | Total: 4h 42m | Avg: 5m 14s | Max: 16m 32s | Hits: 63%/256
  • 🟩 cudax: Pass: 100%/54 | Total: 4h 42m | Avg: 5m 14s | Max: 16m 32s | Hits: 63%/256

    🟩 cpu
      🟩 amd64              Pass: 100%/50  | Total:  4h 29m | Avg:  5m 22s | Max: 16m 32s | Hits:  63%/256   
      🟩 arm64              Pass: 100%/4   | Total: 13m 38s | Avg:  3m 24s | Max:  3m 27s
    🟩 ctk
      🟩 12.0               Pass: 100%/19  | Total:  1h 40m | Avg:  5m 17s | Max: 15m 08s | Hits:  63%/128   
      🟩 12.5               Pass: 100%/2   | Total: 12m 40s | Avg:  6m 20s | Max:  6m 44s
      🟩 12.6               Pass: 100%/33  | Total:  2h 49m | Avg:  5m 08s | Max: 16m 32s | Hits:  63%/128   
    🟩 cudacxx
      🟩 nvcc12.0           Pass: 100%/19  | Total:  1h 40m | Avg:  5m 17s | Max: 15m 08s | Hits:  63%/128   
      🟩 nvcc12.5           Pass: 100%/2   | Total: 12m 40s | Avg:  6m 20s | Max:  6m 44s
      🟩 nvcc12.6           Pass: 100%/33  | Total:  2h 49m | Avg:  5m 08s | Max: 16m 32s | Hits:  63%/128   
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/54  | Total:  4h 42m | Avg:  5m 14s | Max: 16m 32s | Hits:  63%/256   
    🟩 cxx
      🟩 Clang9             Pass: 100%/2   | Total:  8m 06s | Avg:  4m 03s | Max:  4m 29s
      🟩 Clang10            Pass: 100%/2   | Total:  8m 45s | Avg:  4m 22s | Max:  4m 26s
      🟩 Clang11            Pass: 100%/4   | Total: 14m 31s | Avg:  3m 37s | Max:  3m 45s
      🟩 Clang12            Pass: 100%/4   | Total: 14m 26s | Avg:  3m 36s | Max:  3m 48s
      🟩 Clang13            Pass: 100%/4   | Total: 14m 17s | Avg:  3m 34s | Max:  3m 40s
      🟩 Clang14            Pass: 100%/4   | Total: 26m 01s | Avg:  6m 30s | Max: 14m 49s
      🟩 Clang15            Pass: 100%/2   | Total:  7m 36s | Avg:  3m 48s | Max:  3m 52s
      🟩 Clang16            Pass: 100%/4   | Total: 14m 28s | Avg:  3m 37s | Max:  3m 52s
      🟩 Clang17            Pass: 100%/2   | Total:  7m 53s | Avg:  3m 56s | Max:  4m 04s
      🟩 Clang18            Pass: 100%/2   | Total: 20m 33s | Avg: 10m 16s | Max: 16m 08s
      🟩 GCC9               Pass: 100%/2   | Total:  7m 31s | Avg:  3m 45s | Max:  4m 04s
      🟩 GCC10              Pass: 100%/4   | Total: 15m 06s | Avg:  3m 46s | Max:  3m 52s
      🟩 GCC11              Pass: 100%/4   | Total: 15m 36s | Avg:  3m 54s | Max:  4m 28s
      🟩 GCC12              Pass: 100%/7   | Total:  1h 01m | Avg:  8m 48s | Max: 16m 32s
      🟩 GCC13              Pass: 100%/3   | Total:  9m 51s | Avg:  3m 17s | Max:  3m 26s
      🟩 MSVC14.36          Pass: 100%/1   | Total: 13m 01s | Avg: 13m 01s | Max: 13m 01s | Hits:  63%/128   
      🟩 MSVC14.39          Pass: 100%/1   | Total: 10m 44s | Avg: 10m 44s | Max: 10m 44s | Hits:  63%/128   
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 12m 40s | Avg:  6m 20s | Max:  6m 44s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/30  | Total:  2h 16m | Avg:  4m 33s | Max: 16m 08s
      🟩 GCC                Pass: 100%/20  | Total:  1h 49m | Avg:  5m 29s | Max: 16m 32s
      🟩 MSVC               Pass: 100%/2   | Total: 23m 45s | Avg: 11m 52s | Max: 13m 01s | Hits:  63%/256   
      🟩 NVHPC              Pass: 100%/2   | Total: 12m 40s | Avg:  6m 20s | Max:  6m 44s
    🟩 gpu
      🟩 v100               Pass: 100%/54  | Total:  4h 42m | Avg:  5m 14s | Max: 16m 32s | Hits:  63%/256   
    🟩 jobs
      🟩 Build              Pass: 100%/49  | Total:  3h 24m | Avg:  4m 10s | Max: 13m 01s | Hits:  63%/256   
      🟩 Test               Pass: 100%/5   | Total:  1h 18m | Avg: 15m 36s | Max: 16m 32s
    🟩 sm
      🟩 90                 Pass: 100%/1   | Total:  3m 17s | Avg:  3m 17s | Max:  3m 17s
      🟩 90a                Pass: 100%/1   | Total:  3m 06s | Avg:  3m 06s | Max:  3m 06s
    🟩 std
      🟩 17                 Pass: 100%/29  | Total:  2h 14m | Avg:  4m 38s | Max: 15m 23s
      🟩 20                 Pass: 100%/25  | Total:  2h 28m | Avg:  5m 55s | Max: 16m 32s | Hits:  63%/256   
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
+/- CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
+/- CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

🏃‍ Runner counts (total jobs: 54)

# Runner
43 linux-amd64-cpu16
5 linux-amd64-gpu-v100-latest-1
4 linux-arm64-cpu16
2 windows-amd64-cpu16

Copy link
Collaborator

@miscco miscco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Do we really need make_hierarchy? Wouldnt it make everything easier if a user would always create a config and then can get the hierarchy information from the config?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Review
Development

Successfully merging this pull request may close these issues.

2 participants