Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exception: Current loss scale already at minimum - cannot decrease scale anymore #280

Open
Z-eloto opened this issue Nov 12, 2024 · 3 comments

Comments

@Z-eloto
Copy link

Z-eloto commented Nov 12, 2024

Thank you for sharing your codes.
When running gpt2/kd/kd_medium.sh on 2*3090, the program encountered this error. What should I do, such as adjusting the learning rate?

@shiboyu1999
Copy link

You can use fp32 to train the model or decrease the batch size to 1.

@Z-eloto
Copy link
Author

Z-eloto commented Nov 19, 2024

You can use fp32 to train the model or decrease the batch size to 1.

Thanks. I will try it. :)

@t1101675
Copy link
Contributor

You can also try using bfloat16 by replacing ds_config_zero1_fp16.json in this line with ds_config_zero1_bf16.json.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants