Shallow fusion or rescoring with word-level n-grams for modified_beam_search #1253

mrsrikanth · 2023-09-14T09:16:14Z

mrsrikanth
Sep 14, 2023

Hello,

I'm trying to evaluate shallow fusion of word level n-grams for the transducer models trained with pruned_transducer_stateless7. I reused the code from modified_beam_search_lm_rescoreand only changed the part where thelm_scores`` is computed. It is computed simply with

lm_scores = []
for tokens in candidate_seqs:
     lm_scores.append(ngram_lm.score(sp.decode_ids(tokens).split()))
lm_scores = torch.tensor(lm_scores, device=device) * math.log(10)
tot_scores = am_scores.values + lm_scores * lm_scale

I did not observe any improvements with this method (neither degradation). With the pretrained Gigaspeech model and other models trained with internal datasets, the WER remains more or less the same. I also attempted overriding other methods such as modified_beam_search_lm_rescore_LODR, but I do not see any benefits here either.

Is there a feeling if this is supposed to help with the transducer models (i.e. pruned_transducer_*)? I had a feeling that these methods are probably not supported because it simply doesn't help for this setup , or perhaps it is better to always use a Neural LM as reported in the RESULTS page for librispeech.

Thanks,
Srikanth

marcoyang1998 · 2023-09-14T09:55:58Z

marcoyang1998
Sep 14, 2023
Maintainer

Hi Srikanth,

We didn't test rescoring with word-level ngram. But you're right, NN LMs are usually much more powerful.

What's the order of your ngram LMand what beam size were you using? The rescoring thing usually works better with a larger beam width. Also, you might need to tune the lm_scale.

0 replies

mrsrikanth · 2023-09-14T10:03:03Z

mrsrikanth
Sep 14, 2023
Author

Hi,

Thanks for you quick reply.

We didn't add test rescoring with word-level ngram.

I must clarify that I adapted the code as mentioned above to use word-level ngrams.

What's the order of your ngram LMand what beam size were you using?

We tested with both 3-gram and 4-gram models. I tried with high beam sizes up to 100. I did notice that the oracle WER improves as I kept increasing the beam size, but rescoring doesn't help.

Also, you might need to tune the lm_scale.

I tried that too -- basically by increasing the range of values in lm_scale_list. I still didn't see changes.

Thanks,
Srikanth

0 replies

marcoyang1998 · 2023-09-14T10:13:18Z

marcoyang1998
Sep 14, 2023
Maintainer

Hi Sirkanth,

OK. Maybe the n-gram is too weak to make any difference. We tested the n-gram shallow fusion before, and we only saw minor improvement with a 5-gram LM (#609). Since rescoring is weaker than shallow fusion, your findings might be expected.

BTW, which lm_scale yields the lowest WER in your experiment?

4 replies

mrsrikanth Sep 14, 2023
Author

Hi,

Yes, I came across this PR before posting here, but I noticed there the LMs were trained at the token levels, whereas my experiments were on word LMs.

I have also tested using the scores during shallow fusion by modifying the NgramLM class to output LM scores when word boundaries are reached, but I didn't see improvements either (with the caveat that I didn't try it out as extensively as I did for the rescoring experiments).

BTW, which lm_scale yields the lowest WER in your experiment?

lm_scale was between 0.1 and 0.2 for the best WERs across different test sets, and LODR scale was 0.01 for some test sets and 0.03 for others.

Thanks again,
Srikanth

marcoyang1998 Sep 14, 2023
Maintainer

Hi,

According to the LODR_scale, it seems that it's not using the LODR LM. Could you please try a token-level n-gram? Maybe there are OOM problems.

mrsrikanth Sep 14, 2023
Author

Hi,

Sure, I'll try out the token-level n-gram.

Maybe there are OOM problems.

I'm not sure if this is the case. I assume I will at least see some warning in the logs if this is the case.

marcoyang1998 Sep 15, 2023
Maintainer

Oh sorry, I meant OOV problem.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Shallow fusion or rescoring with word-level n-grams for modified_beam_search #1253

{{title}}

Replies: 3 comments 4 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Shallow fusion or rescoring with word-level n-grams for modified_beam_search #1253

mrsrikanth Sep 14, 2023

Replies: 3 comments · 4 replies

marcoyang1998 Sep 14, 2023 Maintainer

mrsrikanth Sep 14, 2023 Author

marcoyang1998 Sep 14, 2023 Maintainer

mrsrikanth Sep 14, 2023 Author

marcoyang1998 Sep 14, 2023 Maintainer

mrsrikanth Sep 14, 2023 Author

marcoyang1998 Sep 15, 2023 Maintainer

mrsrikanth
Sep 14, 2023

Replies: 3 comments 4 replies

marcoyang1998
Sep 14, 2023
Maintainer

mrsrikanth
Sep 14, 2023
Author

marcoyang1998
Sep 14, 2023
Maintainer

mrsrikanth Sep 14, 2023
Author

marcoyang1998 Sep 14, 2023
Maintainer

mrsrikanth Sep 14, 2023
Author

marcoyang1998 Sep 15, 2023
Maintainer