[WIP]: Implement token level shallow fusion #609

csukuangfj · 2022-10-10T01:26:11Z

We have been trying to use word-level G and LG for RNN-T decoding, but we have only tried this for fast_beam_search. However, using a word-level G or an LG cannot handle OOV words.

This PR tries to use a token-level G for shallow fusion with modified_beam_search. I am using OpenFst to manipulate the n-gram G on the CPU as it is easier to implement.

ezerhouni · 2022-10-10T16:53:03Z

@csukuangfj Look very promising. Ping me if you need an extra hand

csukuangfj · 2022-10-11T00:10:19Z

@csukuangfj Look very promising. Ping me if you need an extra hand

@ezerhouni

Thanks! I will draft a version without batch size support. If it gives promising results, we need your help to implement a version that supports batches.

ezerhouni · 2022-10-18T11:31:56Z

@csukuangfj Do you have any update on this issue ? I am very eager to try it out !

csukuangfj · 2022-10-18T11:33:26Z

@csukuangfj Do you have any update on this issue ? I am very eager to try it out !

Yes. But the results are not good so far. I will post them tonight.

csukuangfj · 2022-10-18T15:55:26Z

Steps for reproducing the following results:

cd egs/librispeech/ASR
git lfs install
git clone https://huggingface.co/csukuangfj/icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03
mkdir tmp3-3
cd tmp3-3
ln -s $PWD/../https://huggingface.co/csukuangfj/icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp/pretrained-iter-468000-avg-16.pt epoch-99.pt
cd ..

./generate-lm.sh

for lm_scale in  0.01 0.2 0.4 ; do
./lstm_transducer_stateless2/decode.py \
  --epoch 99 \
  --avg 1 \
  --use-averaged-model 0 \
  --exp-dir ./tmp3-3 \
  --max-duration 600 \
  --num-encoder-layers 12 \
  --rnn-hidden-size 1024 \
  --decoding-method modified_beam_search2 \
  --beam 8 \
  --max-contexts 4 \
  --ngram-lm-scale $lm_scale
done

You will find the results inside ./tmp3-3/modified_beam_search2

ngram_lm_scale	test-clean	test-other
0 (baseline)	2.73	7.15
-0.01	2.73	7.17
0.01	2.74	7.15
-0.05	2.75	7.19
0.2	2.76	7.28
-0.1	2.77	7.23
-0.2	2.83	7.46
-0.3	3.01	7.75

I am using a tri-gram LM. Note the cost on the final state of the FST is not considered

I will recheck the code in case it contains some bugs.

ezerhouni · 2022-10-18T15:59:30Z

@csukuangfj Thanks !

danpovey · 2022-10-18T16:32:17Z

I expect that unless there is some kind of domain mismatch, we will not see much or any improvement. (Unless we try super-large LMs. I seem to remember Liyong had some experiment with a 5-gram or something like that?)

csukuangfj · 2022-10-18T16:48:36Z

I expect that unless there is some kind of domain mismatch, we will not see much or any improvement. (Unless we try super-large LMs. I seem to remember Liyong had some experiment with a 5-gram or something like that?)

I think Liyong was using fast_beam_search + (L, or LG) in #472

We have never tried to use a token-level G with modified beam search, I think.

ezerhouni · 2022-10-18T17:49:09Z

I expect that unless there is some kind of domain mismatch, we will not see much or any improvement. (Unless we try super-large LMs. I seem to remember Liyong had some experiment with a 5-gram or something like that?)

I think Liyong was using fast_beam_search + (L, or LG) in #472

We have never tried to use a token-level G with modified beam search, I think.

My 2cts is that we need a very large LM (like 5gram). I will try it tomorrow and let you know

pkufool · 2022-10-19T00:05:50Z

I expect that unless there is some kind of domain mismatch, we will not see much or any improvement. (Unless we try super-large LMs. I seem to remember Liyong had some experiment with a 5-gram or something like that?)

I think Liyong was using fast_beam_search + (L, or LG) in #472

We have never tried to use a token-level G with modified beam search, I think.

@glynpu Liyoug did try using a token-level G with beam search, he did not make a PR though, the results are in our weekly meeting notes (the 20th week), as the follows:

The results show that we can not get improvement from a pruned LM.

glynpu · 2022-10-19T03:25:05Z

@glynpu Liyoug did try using a token-level G with beam search, he did not make a PR though, the results are in our weekly meeting notes (the 20th week), as the follows:

The results came from a word level LM.
I was using kenlm at that time, here is the related code:
glynpu@3a9ff31

ezerhouni · 2022-10-19T07:50:49Z

@csukuangfj Quick update :
I am testing with a 5gram at the moment. I am getting
test-clean : 2.68
test-other: 7.11

I am still doing some tests and do a more thorough review of the code.

ezerhouni · 2022-10-19T11:22:35Z

Ngram : 5
Beam Size 4 :

ngram_lm_scale	test-clean	test-other
0 (baseline)	2.73	7.15
0.01	2.74	7.15
0.1	2.68	7.11
0.2	2.68	7.14

Ngram : 5
Beam Size 8 :

ngram_lm_scale	test-clean	test-other
0 (baseline)	2.72	7.15
0.01	2.71	7.14
0.1	2.71	7.11
0.2	2.68	7.06
0.3	2.74	7.28

csukuangfj · 2022-10-19T11:39:34Z

@ezerhouni

Thanks! Are you using ./generate-lm.sh to generate the 5-gram LM or are you using an LM trained on an external dataset?

ezerhouni · 2022-10-19T11:44:21Z

@ezerhouni

Thanks! Are you using ./generate-lm.sh to generate the 5-gram LM or are you using an LM trained on an external dataset?

I am using ./generate-lm.sh. I am trying a 7gram to have an idea if it helps or not.

ezerhouni · 2022-10-19T14:25:45Z

@csukuangfj
I tried a 7gram and it seems to improve a bit (2.67/7.03) but I am not sure it is worth it

danpovey · 2022-10-20T12:01:33Z

I think the main use-case of this is when there is a domain mismatch from the training corpus to the target domain.
We can also try dividing the scores on the LM arcs by the corresonding scores given a low-order LM estimated on the training data.

csukuangfj · 2022-10-20T12:30:26Z

@csukuangfj I tried a 7gram and it seems to improve a bit (2.67/7.03) but I am not sure it is worth it

Sorry for the late replay. I though I have replied last night.

I think 7gram is more than enough. Thanks for your experiments. The result shows that the code works with an n-gram LM, though
we don't gain much from it. The next step is to use it to decode with a graph constructed from lists of specific words/phrases that we want to recognize.

ezerhouni · 2022-10-20T13:33:29Z

@csukuangfj I tried a 7gram and it seems to improve a bit (2.67/7.03) but I am not sure it is worth it

Sorry for the late replay. I though I have replied last night.

I think 7gram is more than enough. Thanks for your experiments. The result shows that the code works with an n-gram LM, though we don't gain much from it. The next step is to use it to decode with a graph constructed from lists of specific words/phrases that we want to recognize.

I agree, I think 5gram is enough. I was thinking to use it for detecting OOV words. I will let you know once I have more results. (except if you have something in mind)

csukuangfj · 2022-10-20T13:36:32Z

@csukuangfj I tried a 7gram and it seems to improve a bit (2.67/7.03) but I am not sure it is worth it

Sorry for the late replay. I though I have replied last night.
I think 7gram is more than enough. Thanks for your experiments. The result shows that the code works with an n-gram LM, though we don't gain much from it. The next step is to use it to decode with a graph constructed from lists of specific words/phrases that we want to recognize.

I agree, I think 5gram is enough. I was thinking to use it for detecting OOV words. I will let you know once I have more results. (except if you have something in mind)

By the way, @marcoyang1998 is using the RNN-LM model that you provided for conformer CTC for shallow fusion
and he can get a WER 2.46 for test-clean without being specifically tuned.

ezerhouni · 2022-10-20T13:42:20Z

@csukuangfj I tried a 7gram and it seems to improve a bit (2.67/7.03) but I am not sure it is worth it

Sorry for the late replay. I though I have replied last night.
I think 7gram is more than enough. Thanks for your experiments. The result shows that the code works with an n-gram LM, though we don't gain much from it. The next step is to use it to decode with a graph constructed from lists of specific words/phrases that we want to recognize.

I agree, I think 5gram is enough. I was thinking to use it for detecting OOV words. I will let you know once I have more results. (except if you have something in mind)

By the way, @marcoyang1998 is using the RNN-LM model that you provided for conformer CTC for shallow fusion and he can get a WER 2.46 for test-clean without being specifically tuned.

Sounds interesting ! If I am not mistaken, we can't add new word on the fly to an already trained RNN-LM isn't it ?

csukuangfj · 2022-10-20T13:45:02Z

Sounds interesting ! If I am not mistaken, we can't add new word on the fly to an already trained RNN-LM isn't it ?

The RNN-LM is at token level, so as long as the new word can be represented by the bpe tokens, it can be rescored by the RNN-LM, I think.

ezerhouni · 2022-10-20T13:48:16Z

The RNN-LM is at token level, so as long as the new word can be represented by the bpe tokens, it can be rescored by the RNN-LM, I think.

Indeed, but we can't "boost" specific words (or combination of specific tokens)

csukuangfj · 2022-10-20T14:18:24Z

The RNN-LM is at token level, so as long as the new word can be represented by the bpe tokens, it can be rescored by the RNN-LM, I think.

Indeed, but we can't "boost" specific words (or combination of specific tokens)

Yes, you are right. That is why we are trying to integrate FST into decoding.

ezerhouni · 2022-10-21T06:13:38Z

@csukuangfj I have a batch version (à la modified_beam_search), I took your commits and added mine on top of it (with a rebase), I will create a new PR if that's ok

csukuangfj · 2022-10-21T06:15:19Z

@csukuangfj I have a batch version (à la modified_beam_search), I took your commits and added mine on top of it (with a rebase), I will create a new PR if that's ok

Yes, thanks! I will close this PR once you create a new PR.

csukuangfj · 2022-10-21T07:16:30Z

See #630

Add utility for shallow fusion

9521865

test batch size == 1 without shallow fusion

cba06e9

Use shallow fusion for modified-beam-search

9e46dd0

ezerhouni mentioned this pull request Oct 21, 2022

Add Shallow fusion in modified_beam_search #630

Merged

csukuangfj closed this Oct 21, 2022

csukuangfj deleted the shallow-fusion branch October 21, 2022 07:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP]: Implement token level shallow fusion #609

[WIP]: Implement token level shallow fusion #609

csukuangfj commented Oct 10, 2022 •

edited

Loading

ezerhouni commented Oct 10, 2022

csukuangfj commented Oct 11, 2022

ezerhouni commented Oct 18, 2022

csukuangfj commented Oct 18, 2022

csukuangfj commented Oct 18, 2022

ezerhouni commented Oct 18, 2022

danpovey commented Oct 18, 2022

csukuangfj commented Oct 18, 2022

ezerhouni commented Oct 18, 2022

pkufool commented Oct 19, 2022 •

edited

Loading

glynpu commented Oct 19, 2022

ezerhouni commented Oct 19, 2022

ezerhouni commented Oct 19, 2022 •

edited

Loading

csukuangfj commented Oct 19, 2022

ezerhouni commented Oct 19, 2022

ezerhouni commented Oct 19, 2022

danpovey commented Oct 20, 2022

csukuangfj commented Oct 20, 2022

ezerhouni commented Oct 20, 2022

csukuangfj commented Oct 20, 2022

ezerhouni commented Oct 20, 2022

csukuangfj commented Oct 20, 2022

ezerhouni commented Oct 20, 2022

csukuangfj commented Oct 20, 2022

ezerhouni commented Oct 21, 2022

csukuangfj commented Oct 21, 2022 •

edited

Loading

csukuangfj commented Oct 21, 2022

[WIP]: Implement token level shallow fusion #609

[WIP]: Implement token level shallow fusion #609

Conversation

csukuangfj commented Oct 10, 2022 • edited Loading

ezerhouni commented Oct 10, 2022

csukuangfj commented Oct 11, 2022

ezerhouni commented Oct 18, 2022

csukuangfj commented Oct 18, 2022

csukuangfj commented Oct 18, 2022

ezerhouni commented Oct 18, 2022

danpovey commented Oct 18, 2022

csukuangfj commented Oct 18, 2022

ezerhouni commented Oct 18, 2022

pkufool commented Oct 19, 2022 • edited Loading

glynpu commented Oct 19, 2022

ezerhouni commented Oct 19, 2022

ezerhouni commented Oct 19, 2022 • edited Loading

csukuangfj commented Oct 19, 2022

ezerhouni commented Oct 19, 2022

ezerhouni commented Oct 19, 2022

danpovey commented Oct 20, 2022

csukuangfj commented Oct 20, 2022

ezerhouni commented Oct 20, 2022

csukuangfj commented Oct 20, 2022

ezerhouni commented Oct 20, 2022

csukuangfj commented Oct 20, 2022

ezerhouni commented Oct 20, 2022

csukuangfj commented Oct 20, 2022

ezerhouni commented Oct 21, 2022

csukuangfj commented Oct 21, 2022 • edited Loading

csukuangfj commented Oct 21, 2022

csukuangfj commented Oct 10, 2022 •

edited

Loading

pkufool commented Oct 19, 2022 •

edited

Loading

ezerhouni commented Oct 19, 2022 •

edited

Loading

csukuangfj commented Oct 21, 2022 •

edited

Loading