Skip to content

When I perform adversarial training, I find that the output of BERT is always NAN. #13546

Discussion options

You must be logged in to vote

As far as I know, training with AMP (precision=16) can sometimes be unstable and lead to nan as you report.

Replies: 1 comment 2 replies

Comment options

You must be logged in to vote
2 replies
@Struggle-Forever
Comment options

@Struggle-Forever
Comment options

Answer selected by Struggle-Forever
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
2 participants