Add modified beam search decoding for streaming inference with emformer model #327

yaozengwei · 2022-04-21T13:20:55Z

This PR continues from the closed one ##321. I merged k2-fsa/master branch into my current branch.
I add the modified beam search decoding for streaming inference with emformer model, in the recipe transducer_emformer.
The class FeatureExtractionStream in transducer_emformer/streaming_feature_extractor.py use different attributes for different decoding methods.
The function decoding_result() in class FeatureExtractionStream is used to get current decoding result.

…g/icefall into attention_relu_specaug

…ction 0.4->0.3

…related changes.

…d units.

…lution module

…#308)

* Update README.md

Update results; will further update this before merge

fix comments

Add results for mixed precision with max-duration 300

* initial commit * support download, data prep, and fbank * on-the-fly feature extraction by default * support BPE based lang * support HLG for BPE * small fix * small fix * chunked feature extraction by default * Compute features for GigaSpeech by splitting the manifest. * Fixes after review. * Split manifests into 2000 pieces. * set audio duration mismatch tolerance to 0.01 * small fix * add conformer training recipe * Add conformer.py without pre-commit checking * lazy loading and use SingleCutSampler * DynamicBucketingSampler * use KaldifeatFbank to compute fbank for musan * use pretrained language model and lexicon * use 3gram to decode, 4gram to rescore * Add decode.py * Update .flake8 * Delete compute_fbank_gigaspeech.py * Use BucketingSampler for valid and test dataloader * Update params in train.py * Use bpe_500 * update params in decode.py * Decrease num_paths while CUDA OOM * Added README * Update RESULTS * black * Decrease num_paths while CUDA OOM * Decode with post-processing * Update results * Remove lazy_load option * Use default `storage_type` * Keep the original tolerance * Use split-lazy * black * Update pretrained model Co-authored-by: Fangjun Kuang <[email protected]>

* Add LG decoding * Add log weight pushing * Minor fixes

…former model.

pkufool and others added 30 commits February 6, 2022 18:22

Fix torch.nn.Embedding error for torch below 1.8.0

fcd25bd

Changes to fbank computation, use lilcom chunky writer

8f8ec22

Add min in q,k,v of attention

48a764e

Remove learnable offset, use relu instead.

a859dcb

Experiments based on SpecAugment change

3323cab

Merge branch 'spec-augment-change' of https://github.com/luomingshuan…

395065e

…g/icefall into attention_relu_specaug

Merge specaug change from Mingshuang.

beaf5bf

Use much more aggressive SpecAug setup

bd36216

Fix to num_feature_masks bug I introduced; reduce max_frames_mask_fra…

dd19a6a

…ction 0.4->0.3

Change p=0.5->0.9, mask_fraction 0.3->0.2

8aa50df

Change p=0.9 to p=0.8 in SpecAug

c170c53

Fix num_time_masks code; revert 0.8 to 0.9

4cd2c02

Change max_frames from 0.2 to 0.15

d187ad8

Remove ReLU in attention

2af1b3a

Adding diagnostics code...

581786a

Refactor/simplify ConformerEncoder

63d8d93

First version of rand-combine iterated-training-like idea.

c1063de

Improvements to diagnostics (RE those with 1 dim

2ff520c

Add pelu to this good-performing setup..

9d1b4ae

Small bug fixes/imports

9ed7d55

Add baseline for the PeLU expt, keeping only the small normalization-…

3fb559d

…related changes.

pelu_base->expscale, add 2xExpScale in subsampling, and in feedforwar…

5c177fc

…d units.

Double learning rate of exp-scale units

23b3aa2

Combine ExpScale and swish for memory reduction

bc6c720

Add import

cd216f5

Fix backprop bug

3d9ddc2

Fix bug in diagnostics

503f8d5

Increase scale on Scale from 4 to 20

3207bd9

Increase scale from 20 to 50.

7e88999

Fix duplicate Swish; replace norm+swish with swish+exp-scale in convo…

9cc5999

…lution module

danpovey and others added 24 commits April 11, 2022 21:00

Updating RESULTS.md; fix in beam_search.py

e8eb0b9

Fix rebase

ead8224

Code style check for librispeech pruned transducer stateless2 (k2-fsa…

93c60a9

…#308)

Update results for tedlium3 pruned RNN-T (k2-fsa#307)

118e195

* Update README.md

Fix CI errors. (k2-fsa#310)

bdeff33

Add more results

65818d1

Fix tensorboard log location

d0a53aa

Add one more epoch of full expt

9ed7a16

Merge pull request k2-fsa#309 from danpovey/update_results

2a854f5

Update results; will further update this before merge

fix comments

78418ac

Merge pull request k2-fsa#313 from glynpu/fix_comments

c000348

fix comments

Add results for mixed precision with max-duration 300

af6ae84

Merge pull request k2-fsa#315 from danpovey/mixprec_md300

62fbfb5

Add results for mixed precision with max-duration 300

Changes for pretrained.py (tedlium3 pruned RNN-T) (k2-fsa#311)

d88e786

Add LG decoding (k2-fsa#277)

021c798

* Add LG decoding * Add log weight pushing * Minor fixes

Support computing RNN-T loss with torchaudio (k2-fsa#316)

fce7f3c

Support modified beam search decoding for streaming inference with Em…

5228b44

…former model.

Formatted imports.

e74654c

Update results for torchaudio RNN-T. (k2-fsa#322)

3607c51

Fixed streaming decoding codes for emformer model.

cf0ce8d

Fixed docs.

d20a852

Merge remote-tracking branch 'k2-fsa/master' into streaming_decoding_new

83a5052

Merge branch 'streaming_decoding_new' into streaming_decoding

8fde2ac

yaozengwei closed this Apr 21, 2022

yaozengwei reopened this Apr 22, 2022

yaozengwei added 2 commits April 22, 2022 11:04

Sorted imports for transducer_emformer/streaming_feature_extractor.py

e97c9fb

Minor fix for transducer_emformer/streaming_feature_extractor.py

ece99a8

yaozengwei added the ready label Apr 22, 2022

csukuangfj merged commit b3e6bf6 into k2-fsa:streaming Apr 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add modified beam search decoding for streaming inference with emformer model #327

Add modified beam search decoding for streaming inference with emformer model #327

yaozengwei commented Apr 21, 2022

Add modified beam search decoding for streaming inference with emformer model #327

Add modified beam search decoding for streaming inference with emformer model #327

Conversation

yaozengwei commented Apr 21, 2022