Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

memory leak related with MultiprocessFileCache? #339

Open
dshintani-pfn opened this issue Jul 19, 2024 · 0 comments
Open

memory leak related with MultiprocessFileCache? #339

dshintani-pfn opened this issue Jul 19, 2024 · 0 comments

Comments

@dshintani-pfn
Copy link

dshintani-pfn commented Jul 19, 2024

I observed the possible memory leak (~1GB/h) related with MultiprocessFileCache during training.

I defined the dataset class with cache as tutorial.

class CachedDataset:
    def __init__(
        self,
        common_config
    ) -> None:
        self._reader_dict = {
            dataset.name: File(dataset.name, mode="a") for dataset in common_config.datasets
        }
        self._cache = MultiprocessFileCache(len(self), do_pickle=True)

    def _load_from_disk(self, i: int) -> TrainData:
        return ...

    def __getitem__(self, i: int) -> Any:
        return self._cache.get_and_cache(i, self._load_from_disk)

and used this CachedDataset as dataset below for training.

train_set, val_set = torch.utils.data.random_split(
    dataset,
    [int(len(dataset) * train_set_ratio), len(dataset) - int(len(dataset) * train_set_ratio)],
)

train_loader = DataLoader(
    train_set, batch_size=train_args.batch_size, shuffle=True, collate_fn=collate_fn
)

This leakage was solved when I stopped using MultiprocessFileCache.

It might be due to the wrong usage of MultiprocessFileCache, but do you have any idea about this leakage?

@dshintani-pfn dshintani-pfn changed the title memory leak of MultiprocessFileCache? memory leak related with MultiprocessFileCache? Jul 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant