Loss, entropy, accuracy trends #188

slala2121 · 2022-01-27T21:21:59Z

I'm trying to understand the relationships and trends among these quantities.

From some experiments, I find that

as the training loss declines, the entropy of the distribution increases rather than decreases. This seems plausible because near convergence, the probability scores for all the negative examples are all relatively low and equal while the prob. score for the positive example increases.
for a small training dataset (10^3) samples, I find that the accuracy declines however. Why might this occur?

Thanks.

chentingpc · 2022-01-28T03:50:25Z

as the training loss declines, the entropy of the distribution increases rather than decreases. This seems plausible because near convergence, the probability scores for all the negative examples are all relatively low and equal while the prob. score for the positive example increases.

yes, it becomes more certain what positive is as it trains.

for a small training dataset (10^3) samples, I find that the accuracy declines however. Why might this occur?

is it overfitting? Otherwise the hparam may be problematic, like learning rate is too big (if you warmup too long learning rate will be too big after certain epochs, then training would be worse).

slala2121 · 2022-01-28T23:51:39Z

Could you explain how the contrastive accuracy is computed? I could understand the potential for overfitting if it's measured on samples different from the those used to compute the training loss. From the code, it seems that loss and accuracy are computed over similar quantities though.

…

On Thu, Jan 27, 2022 at 7:50 PM Ting Chen ***@***.***> wrote: as the training loss declines, the entropy of the distribution increases rather than decreases. This seems plausible because near convergence, the probability scores for all the negative examples are all relatively low and equal while the prob. score for the positive example increases. yes, it becomes more certain what positive is as it trains. for a small training dataset (10^3) samples, I find that the accuracy declines however. Why might this occur? is it overfitting? Otherwise the hparam may be problematic, like learning rate is too big (if you warmup too long learning rate will be too big after certain epochs, then training would be worse). — Reply to this email directly, view it on GitHub <#188 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AN3GCFVW4PRO6KO5XS3EDH3UYIHBBANCNFSM5M7CL5CQ> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>. You are receiving this because you authored the thread.Message ID: ***@***.***>

chentingpc · 2022-01-29T02:11:34Z

It's the prediction accuracy of positive examples among all candidates within the mini-batch.

…

On Fri, Jan 28, 2022 at 6:51 PM slala2121 ***@***.***> wrote: Could you explain how the contrastive accuracy is computed? I could understand the potential for overfitting if it's measured on samples different from the those used to compute the training loss. From the code, it seems that loss and accuracy are computed over similar quantities though. Best, Sayeri Lala PhD candidate | Electrical Engineering | Princeton University On Thu, Jan 27, 2022 at 7:50 PM Ting Chen ***@***.***> wrote: > as the training loss declines, the entropy of the distribution increases > rather than decreases. This seems plausible because near convergence, the > probability scores for all the negative examples are all relatively low and > equal while the prob. score for the positive example increases. > > yes, it becomes more certain what positive is as it trains. > > for a small training dataset (10^3) samples, I find that the accuracy > declines however. Why might this occur? > > is it overfitting? Otherwise the hparam may be problematic, like learning > rate is too big (if you warmup too long learning rate will be too big after > certain epochs, then training would be worse). > > — > Reply to this email directly, view it on GitHub > < #188 (comment) >, > or unsubscribe > < https://github.com/notifications/unsubscribe-auth/AN3GCFVW4PRO6KO5XS3EDH3UYIHBBANCNFSM5M7CL5CQ > > . > Triage notifications on the go with GitHub Mobile for iOS > < https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 > > or Android > < https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub >. > > You are receiving this because you authored the thread.Message ID: > ***@***.***> > — Reply to this email directly, view it on GitHub <#188 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAKERUL3FWOPK32WW4NNRV3UYMTZLANCNFSM5M7CL5CQ> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>. You are receiving this because you commented.Message ID: ***@***.***>

slala2121 · 2022-02-01T17:03:50Z

Okay. Then I'm not sure why overfitting would occur since the accuracy is measured over the same samples as the training dataset.

sagi-ezri · 2023-04-29T18:27:32Z

It is possible that as the training loss declines, the model becomes more confident in its predictions, which can lead to an increase in the entropy of the output distribution. This can happen because the model assigns higher probabilities to correct predictions and lower probabilities to incorrect predictions, which results in a narrower distribution and higher entropy.

Regarding the second observation, one possibility is that the model is too complex for the small training dataset, and therefore, it fails to generalize well to new examples. In this case, reducing the model's complexity or collecting more training data could potentially improve performance.
Another possibility is that the model is underfitting the training data, which can also result in poor accuracy. Underfitting occurs when the model is not complex enough to capture the underlying patterns in the data. In this case, increasing the model's complexity or changing the architecture may be helpful.
Finally, it is also possible that the accuracy measure being used is not sensitive enough to detect differences in performance. In such cases, other evaluation metrics such as precision, recall, or F1 score may be more appropriate to use.

I hope this helps clarify these issues.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Loss, entropy, accuracy trends #188

Loss, entropy, accuracy trends #188

slala2121 commented Jan 27, 2022

chentingpc commented Jan 28, 2022 •

edited

Loading

slala2121 commented Jan 28, 2022 via email •

edited

Loading

chentingpc commented Jan 29, 2022 via email

slala2121 commented Feb 1, 2022

sagi-ezri commented Apr 29, 2023

Loss, entropy, accuracy trends #188

Loss, entropy, accuracy trends #188

Comments

slala2121 commented Jan 27, 2022

chentingpc commented Jan 28, 2022 • edited Loading

slala2121 commented Jan 28, 2022 via email • edited Loading

chentingpc commented Jan 29, 2022 via email

slala2121 commented Feb 1, 2022

sagi-ezri commented Apr 29, 2023

chentingpc commented Jan 28, 2022 •

edited

Loading

slala2121 commented Jan 28, 2022 via email •

edited

Loading