You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After upgrading the Transformers library from version 4.47.1 to 4.48.0, I’ve observed a drastic increase in loss values during training. Under the same training script and configurations, the loss values in 4.47.1 are around 2.48 - 2.54, while in 4.48.0 they suddenly jump to 40+ (and sometimes even higher).
Below are sample logs from the first few training steps. The only change is the Transformers version; everything else remains identical:
Install Transformers 4.47.1 and run the training script with the above parameters (gradient accumulation steps = 16) — observe normal loss values around 2-3.
Upgrade to Transformers 4.48.0 (no other changes in code or environment) and rerun the same training script (gradient accumulation steps = 16) — notice a large increase in loss values (40+).
Still using Transformers 4.48.0, change gradient accumulation steps to 1 — observe that loss now returns to normal levels (around 2-3).
Expected behavior
Loss values should remain consistent (as in 4.47.1) if there are no major changes in hyperparameters, data, or environment aside from the Transformers library version.
The text was updated successfully, but these errors were encountered:
System Info
Who can help?
@ArthurZucker, @muellerzr
Issue Description
After upgrading the Transformers library from version
4.47.1
to4.48.0
, I’ve observed a drastic increase in loss values during training. Under the same training script and configurations, the loss values in4.47.1
are around 2.48 - 2.54, while in4.48.0
they suddenly jump to 40+ (and sometimes even higher).Below are sample logs from the first few training steps. The only change is the Transformers version; everything else remains identical:
Transformers v4.47.1:
Transformers v4.48.0:
In the same setup using
4.48.0
, if I change only the gradient accumulation steps from 16 to 1, the loss behaves similarly to4.47.1
, as shown below:Any insights into changes between 4.47.1 and 4.48.0 that might cause this behavior?
Thank you for your time, and I appreciate any help or pointers to relevant changes or fixes!
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Environment
Steps to Reproduce
Install Transformers
4.47.1
and run the training script with the above parameters (gradient accumulation steps = 16) — observe normal loss values around 2-3.Upgrade to Transformers
4.48.0
(no other changes in code or environment) and rerun the same training script (gradient accumulation steps = 16) — notice a large increase in loss values (40+).Still using Transformers
4.48.0
, change gradient accumulation steps to 1 — observe that loss now returns to normal levels (around 2-3).Expected behavior
Loss values should remain consistent (as in 4.47.1) if there are no major changes in hyperparameters, data, or environment aside from the Transformers library version.
The text was updated successfully, but these errors were encountered: