New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
Example of gradient clipping with manual optimization does not handle gradient unscaling properly #18089
Comments
I'm not sure if this is a limitation or not, currently I actually find no simple way to achieve this. For automatic optimization, gradient unscaling is performed right after the optimizer closure (training step, zero grad, backward) and before the gradient clipping (called in However, for manual optimization, the calling order is:
Above all, there seems to be no space for the user to insert gradient unscaling in So here comes a question, why not just also allow automatic gradient clipping for manual optimization? If users are supposed to take care of gradient clipping, most of the time they just simply call |
Hi, did you manage to find the correct way to do the scaling before the cliping for manual optimization ? |
@kkoutini No, I give up and use fabric instead. |
馃摎 Documentation
The doc of manual optimization give an example of gradient clipping (added by #16023):
However, it seems that this example does not handle gradient unscaling properly. The gradients should be unscaled when using mixed precision training before calling
self.clip_gradients
.cc @carmocca @justusschock @awaelchli @Borda
The text was updated successfully, but these errors were encountered: