am-scale
and lm-scale
for "Simple RNNT" loss smoothing
#1494
-
There're two hyperparameters for loss smoothing used in https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/pruned_transducer_stateless2/train.py |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 3 replies
-
We haven't tuned these two parameters on the latest Zipformer model, but I think @pkufool conducted some experiments on some early versions of the model. 🧐 |
Beta Was this translation helpful? Give feedback.
I found some results on our weekly report, I think these results are based on our first version of conformer (not the reworked one). The basic conclusion is, if
am_scale
greater than 0, the results get worse,lm_scale
helps to improve the performance and also helps to makemodified_beam_search
work better (max_symbol_per_frame=1), see our paper,simple_loss_scale
also helps to improve the performance (as some kind of regularization I think). We did not tune these values a lot, here are some previous results:simple_loss_scale
: