-
Hi @jrhemstad @gevtushenko @alliepiper From what I learned on the NVIDIA website, it mentioned that distributed shared memory allows direct SM-to-SM communications for loads, stores, and atomics across multiple SM shared memory blocks, and can improve certain operations like histogram collection by up to 1.7x speedup. I’m wondering if there is any plan to enhance the thrust library or CUB device-wide operations with distributed shared memory. Thank you, |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
Hey @lilohuang, thanks for your interest in CCCL! We're definitely discussing how to best leverage and expose the functionality of clusters and distributed shared memory in CUB algorithms. We don't have any concrete features on the roadmap at the moment, but stay tuned and we'll likely have some things soon. |
Beta Was this translation helpful? Give feedback.
Hey @lilohuang, thanks for your interest in CCCL!
We're definitely discussing how to best leverage and expose the functionality of clusters and distributed shared memory in CUB algorithms. We don't have any concrete features on the roadmap at the moment, but stay tuned and we'll likely have some things soon.