-
Notifications
You must be signed in to change notification settings - Fork 964
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OOM error when training on a 220G Memory machine with 8 V100. #867
Comments
There is not. I recommend you shard your dataset like the Pile (https://the-eye.eu/public/AI/pile/train/) and then unpack and train on one shard at a time. |
Thank you for your reply. Do you mean to change the train data path from
to
? |
Is your feature request related to a problem? Please describe.
I tried to train a model with my own data about 300G after binarized. And there will be an OOM error for this. Is there any parameter that can be used for loading part of the data in a stream way?
The text was updated successfully, but these errors were encountered: