-
-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CUDA memory leak for Flux.Optimizer #148
Comments
I think this is the sequence of events which causes the leak:
There are a couple of ways we could address this, but I think it first raises a bigger question: why are we resetting the optimizer state at the beginning of each epoch in the first place? @lorenzoh do you remember the context for this decision? |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
(This issue has been moved here from FluxML/Flux.jl#2261)
I have a somewhat complicated training setup and have recently started encountering CUDA-out-of-memory issues which only show up after a number of epochs.
I have managed to construct a minimum working example here:
After about 50 epochs (~1 minute on my laptop), I get an error that CUDA cannot allocate any more memory.
This seems to be because in the optimizer, the state variable accumulates GPU Arrays over time.
The issue can be fixed by replacing
opt = Flux.Adam()
withopt = Optimizers.Adam()
. However, I think we should fix the problem for the Flux optimizer, since it seems to be "officially" supported.@DrChainsaw has suggested in the other issue that the problem is that the
ToDevice
callback is not applied to the optimizer parameters. However I haven't looked at the specifics, and how one would fix that. Any insights?The text was updated successfully, but these errors were encountered: