Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Comparison with expandable_segments in pytorch/c10? #12

Open
YouJiacheng opened this issue Jan 2, 2024 · 3 comments
Open

Comparison with expandable_segments in pytorch/c10? #12

YouJiacheng opened this issue Jan 2, 2024 · 3 comments

Comments

@YouJiacheng
Copy link

pytorch/pytorch#96995

https://github.com/pytorch/pytorch/blob/95a86ed9ca107329151e0dc172386d50dd3471c6/c10/cuda/CUDACachingAllocator.cpp#L311-L324

The expandable_segments:True option is used to enable/disable this behavior. We
use cuda's low-level memory APIs, which are similar to mmap, to extend the
memory segments. These APIs separate the allocation of physical memory
(cuMemCreate) from the allocation of virtual address space (cuMemAddressReserve)
and the associate between them cuMemMap/cuMemSetAccess.

When we allocate a new segment, we allocate enough address space to map
basically the entire physical memory of the GPU (there is 256TiB of address
space), but we only map enough physical memory to handle the current amount of
memory needed by the program. As more is requested, we add more physical memory
to the segment. This can work at the granularity of GPU pages which are 2MiB
currently.

@YouJiacheng YouJiacheng changed the title Comparison with expandable_segments in pytorch/c10 Comparison with expandable_segments in pytorch/c10? Jan 2, 2024
@ruizhang1230
Copy link
Collaborator

Thank you for your interest in our work. GMLake was implemented before April 2023. Our work was originally completed on the PyTorch-1.13.1. After PyTorch-2.0 was released, we adapted our work to the 2.0 version. All of the experiments were conducted on the PyTorch-2.0. However, the expandable_segments was introduced in version 2.1, we have not yet conducted more detailed experiments with this feature. In recent days, we have conducted an in-depth investigation of the implementation of expandable_segment. As mentioned in the code comments, this feature primarily addresses the issue of increasing block size, whereas we address the problem of fragmentation, which is not the same. We have adapted our work to PyTorch 2.1 and conducted a simple comparative test on this feature. On the GPT-NeoX-20B model, the memory utilization rate of the expandable_segment feature was 87%, while for GMLake it was 95%. Expandable_segment is a very good work, and we plan to conduct a detailed analysis of this feature on a variety of models.

If you would like to have a deep talk, please leave an email address, and we will send you our contact information.

@YouJiacheng
Copy link
Author

Thank you for your informative reply. I believe GMLake and expandable_segment are concurrent works, but the mentioned PR introducing expandable_segment is dated Mar 17, 2023 (but released in 2.1).

The purpose of increasing segment size should be to eliminate fragmentation. Theoretically there can be no fragmentation (except intra page fragmentation) with expandable_segment, tensors can always be successfully allocated as long as there are enough spare pages, regardless of whether they are contiguous physically.

Stitching is naturally performed since the allocation of physical memory is separated from the allocation of virtual address space the associate between them.

@eedalong
Copy link

The skills used behind should be roughly same: manually managing virtual memory and physical memory mappinp.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants