DataLoader with num_workers > 0 increases memory consumption over time #73

juancq · 2023-10-19T09:29:44Z

Using the dev branch, I have noticed that when using DataLoader with num_workers greater than 0, the memory consumption increases over time during a single epoch. This problem has been documented here:
https://docs.aws.amazon.com/codeguru/detector-library/python/pytorch-data-loader-with-multiple-workers/
pytorch/pytorch#13246 (comment)

mmcdermott · 2023-12-14T19:09:52Z

@juancq I just wanted to give you a heads up that what was formerly dev was merged into main and the new code in the dev branch has some changes to this from #74 that will be pulled out or re-worked to address the speed issues you caught, but dev itself now has that code in it. Apologies for any confusion.

mmcdermott · 2023-12-20T20:30:55Z

@juancq I believe there is a fix now for this and all related issues in #90. If you want to test it, though, right now it requires running the code in the branch in that PR and manually cloning then installing (via pip install -e .) this package: https://github.com/mmcdermott/nested_ragged_tensors as well which handles the manipulation of the ragged tensors that are used to speed things up and reduce memory costs here. This still caches files, so you may need to delete any previously cached files from the old version, but the cached files should be dramatically smaller and the actual runtime should have no memory leaks and be (at worst) competitive with the prior runtime and likely (on some tasks/settings) faster than the prior runtime on a iteration / batch level. No pressure to test for now; I'm going to push the other package to pypi so it can be installed normally and make a few other cosmetic improvements, but that's the state of things for your information.

juancq mentioned this issue Oct 19, 2023

Replace use of list for cached_data and subject_ids in pytorch dataset with polars objects #74

Closed

mmcdermott mentioned this issue Dec 20, 2023

Fixes the slowdowns and bugs caused by the prior improved compute practices, but requires a nested tensor package. #90

Merged

mmcdermott added bug Something isn't working resource constraints For when the system is too resource hungry labels Mar 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DataLoader with num_workers > 0 increases memory consumption over time #73

DataLoader with num_workers > 0 increases memory consumption over time #73

juancq commented Oct 19, 2023

mmcdermott commented Dec 14, 2023

mmcdermott commented Dec 20, 2023

DataLoader with num_workers > 0 increases memory consumption over time #73

DataLoader with num_workers > 0 increases memory consumption over time #73

Comments

juancq commented Oct 19, 2023

mmcdermott commented Dec 14, 2023

mmcdermott commented Dec 20, 2023