Shallow fusion or rescoring with word-level n-grams for modified_beam_search #1253
Replies: 3 comments 4 replies
-
Hi Srikanth, We didn't test rescoring with word-level ngram. But you're right, NN LMs are usually much more powerful. What's the order of your ngram LMand what beam size were you using? The rescoring thing usually works better with a larger beam width. Also, you might need to tune the |
Beta Was this translation helpful? Give feedback.
-
Hi, Thanks for you quick reply.
I must clarify that I adapted the code as mentioned above to use word-level ngrams.
We tested with both 3-gram and 4-gram models. I tried with high beam sizes up to 100. I did notice that the oracle WER improves as I kept increasing the beam size, but rescoring doesn't help.
I tried that too -- basically by increasing the range of values in Thanks, |
Beta Was this translation helpful? Give feedback.
-
Hi Sirkanth, OK. Maybe the n-gram is too weak to make any difference. We tested the n-gram shallow fusion before, and we only saw minor improvement with a 5-gram LM (#609). Since rescoring is weaker than shallow fusion, your findings might be expected. BTW, which |
Beta Was this translation helpful? Give feedback.
-
Hello,
I'm trying to evaluate shallow fusion of word level n-grams for the transducer models trained with
pruned_transducer_stateless7. I reused the code from
modified_beam_search_lm_rescoreand only changed the part where the
lm_scores`` is computed. It is computed simply withI did not observe any improvements with this method (neither degradation). With the pretrained Gigaspeech model and other models trained with internal datasets, the WER remains more or less the same. I also attempted overriding other methods such as
modified_beam_search_lm_rescore_LODR
, but I do not see any benefits here either.Is there a feeling if this is supposed to help with the transducer models (i.e.
pruned_transducer_*
)? I had a feeling that these methods are probably not supported because it simply doesn't help for this setup , or perhaps it is better to always use a Neural LM as reported in the RESULTS page for librispeech.Thanks,
Srikanth
Beta Was this translation helpful? Give feedback.
All reactions