Exception: Current loss scale already at minimum - cannot decrease scale anymore #280

Z-eloto · 2024-11-12T07:00:34Z

Thank you for sharing your codes.
When running gpt2/kd/kd_medium.sh on 2*3090, the program encountered this error. What should I do, such as adjusting the learning rate?

shiboyu1999 · 2024-11-18T07:19:41Z

You can use fp32 to train the model or decrease the batch size to 1.

Z-eloto · 2024-11-19T11:09:19Z

You can use fp32 to train the model or decrease the batch size to 1.

Thanks. I will try it. :)

t1101675 · 2024-11-23T23:29:19Z

You can also try using bfloat16 by replacing ds_config_zero1_fp16.json in this line with ds_config_zero1_bf16.json.

t1101675 mentioned this issue Nov 23, 2024

MiniLLM and Data selection #285

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Exception: Current loss scale already at minimum - cannot decrease scale anymore #280

Exception: Current loss scale already at minimum - cannot decrease scale anymore #280

Z-eloto commented Nov 12, 2024 •

edited

Loading

shiboyu1999 commented Nov 18, 2024

Z-eloto commented Nov 19, 2024

t1101675 commented Nov 23, 2024

Exception: Current loss scale already at minimum - cannot decrease scale anymore #280

Exception: Current loss scale already at minimum - cannot decrease scale anymore #280

Comments

Z-eloto commented Nov 12, 2024 • edited Loading

shiboyu1999 commented Nov 18, 2024

Z-eloto commented Nov 19, 2024

t1101675 commented Nov 23, 2024

Z-eloto commented Nov 12, 2024 •

edited

Loading