-
Notifications
You must be signed in to change notification settings - Fork 389
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE]when saving multiple epochs add an epoch number suffix for when save best=False #597
Comments
Example: after epoch 1 it saves checkpoint_ep01.pth after epoch 2 it saves checkpoint_ep02.pth when loading mode back in according to config, it by default will load in sorted(glob(“checkpoint_ep*”))[-1] aka the last epoch to keep the behavior the same as it currently is alternatively if save_best_only=true, then keep the current behavior of saving as checkpoint.pth ? |
We didnt do that by default as model weights take a ton of disk space. We could theoretically make it a separate setting to additionally save all checkpoints, wdyt? |
Most research papers are only training for 1 epoch, sometimes 2. If the user knows what theyre doing and wants to enable it, I think its a nice option. Especially since its a simple implementation. |
🚀 Feature
Saves multiple .pth on each checkpoint. Instead of overwriting every checkpoint.pth
Motivation
Often useful to see how model performs at each epoch/savepoint. For example when training llm, want to measure the generative capabilities after each epoch and see if it is improving
The text was updated successfully, but these errors were encountered: