Using CR-CTC to train, decoder results are all start and end character. #1780
-
When using CR-CTC to train from a NULL model, the decoded results consistently converge to only output the start and end characters. Disabling CR_loss allows the training to proceed normally. However, even when using a model pre-trained with CTC_loss as the initial model, it still eventually converges to only the start and end characters. In my model, the time_mask and frequency_mask only mask a segment, with the window size randomly generated as an integer. Could this be causing the issue, or might there be other factors contributing to this behavior? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 14 replies
-
@masterjade7 What dataset are you using? Could you show your training script? Which loss do you use besides cr-ctc-loss? Have you tried regular SpecAugment? |
Beta Was this translation helpful? Give feedback.
If you use
reduction="mean",
in torch.nn.functional.ctc_loss, you can divide the finalcr_loss
by the total number of frames over the batch, could useencoder_out_lens.sum().item()
.For CTC loss, I would suggest using
reduction="sum",
in torch.nn.functional.ctc_loss, and divide that by the total number of frames over the batch. This would make the relative scale consistency with our settings.Like this (see the commented part):