Utilizing the distributed shared memory to unleash the power of the H100? #1697

lilohuang · 2024-05-04T04:29:08Z

lilohuang
May 4, 2024

From what I learned on the NVIDIA website, it mentioned that distributed shared memory allows direct SM-to-SM communications for loads, stores, and atomics across multiple SM shared memory blocks, and can improve certain operations like histogram collection by up to 1.7x speedup. I’m wondering if there is any plan to enhance the thrust library or CUB device-wide operations with distributed shared memory.

Thank you,
Lilo

Answered by jrhemstad

May 6, 2024

Hey @lilohuang, thanks for your interest in CCCL!

We're definitely discussing how to best leverage and expose the functionality of clusters and distributed shared memory in CUB algorithms. We don't have any concrete features on the roadmap at the moment, but stay tuned and we'll likely have some things soon.

View full answer

jrhemstad · 2024-05-06T16:33:39Z

jrhemstad
May 6, 2024
Maintainer

Hey @lilohuang, thanks for your interest in CCCL!

We're definitely discussing how to best leverage and expose the functionality of clusters and distributed shared memory in CUB algorithms. We don't have any concrete features on the roadmap at the moment, but stay tuned and we'll likely have some things soon.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Utilizing the distributed shared memory to unleash the power of the H100? #1697

{{title}}

Replies: 1 comment

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Utilizing the distributed shared memory to unleash the power of the H100? #1697

lilohuang May 4, 2024

Replies: 1 comment

jrhemstad May 6, 2024 Maintainer

lilohuang
May 4, 2024

jrhemstad
May 6, 2024
Maintainer