-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comparison with expandable_segments in pytorch/c10? #12
Comments
Thank you for your interest in our work. GMLake was implemented before April 2023. Our work was originally completed on the PyTorch-1.13.1. After PyTorch-2.0 was released, we adapted our work to the 2.0 version. All of the experiments were conducted on the PyTorch-2.0. However, the expandable_segments was introduced in version 2.1, we have not yet conducted more detailed experiments with this feature. In recent days, we have conducted an in-depth investigation of the implementation of expandable_segment. As mentioned in the code comments, this feature primarily addresses the issue of increasing block size, whereas we address the problem of fragmentation, which is not the same. We have adapted our work to PyTorch 2.1 and conducted a simple comparative test on this feature. On the GPT-NeoX-20B model, the memory utilization rate of the expandable_segment feature was 87%, while for GMLake it was 95%. Expandable_segment is a very good work, and we plan to conduct a detailed analysis of this feature on a variety of models. If you would like to have a deep talk, please leave an email address, and we will send you our contact information. |
Thank you for your informative reply. I believe GMLake and expandable_segment are concurrent works, but the mentioned PR introducing The purpose of increasing segment size should be to eliminate fragmentation. Theoretically there can be no fragmentation (except intra page fragmentation) with Stitching is naturally performed since the allocation of physical memory is separated from the allocation of virtual address space the associate between them. |
The skills used behind should be roughly same: manually managing virtual memory and physical memory mappinp. |
pytorch/pytorch#96995
https://github.com/pytorch/pytorch/blob/95a86ed9ca107329151e0dc172386d50dd3471c6/c10/cuda/CUDACachingAllocator.cpp#L311-L324
The text was updated successfully, but these errors were encountered: