Randomly combining intermediate layers in RNN-T training #229

danpovey · 2022-02-28T10:42:10Z

This PR demonstrates how to do something like "iterated loss" using intermediate layers, but with only one
loss-function evaluation. It does this by randomly interpolating different combinations of different layers, with
linear "adapter layers" for all but the last layer, and with the interpolation weights different per frame.

There is a significant WER improvement from this: on test-clean-100 on this setup i have 7.58/20.36 with greedy search, whereas with similar setups I was more usually getting 8.xx / 22.xx or so.
(This is hard to have an exact baseline for because the baseline didn't converge on 100 hours).

I am hoping someone could test this with a current setup somehow.

…g/icefall into attention_relu_specaug

…ction 0.4->0.3

danpovey · 2022-02-28T10:45:38Z

BTW, this will of course break compatibility with older models, so it may be necessary to introduce an option for it.
(The compatibility issue wouldn't actually affect decoding though, I think, since it's for parts that are only used during training).

csukuangfj · 2022-02-28T10:47:24Z

I can test it with the multiple datasets setup, which does converge on the 100h subset.

danpovey · 2022-02-28T12:39:38Z

BTW, for things like this and the diagnostics, I'd really like to have them also applied to the pruned recipe.
Right now I'm experimenting with smaller models, but with the non-pruned recipe, unfortunately i can't reduce the batch size without memory being exhausted, which is a pain (too much hassle right now to move my code over to the pruned recipe).

See k2-fsa#229

* Copy files for editing. * Add random combine from #229. * Minor fixes. * Pass model parameters from the command line. * Fix warnings. * Fix warnings. * Update readme. * Rename to avoid conflicts. * Update results. * Add CI for pruned_transducer_stateless5 * Typo fixes. * Remove random combiner. * Update decode.py and train.py to use periodically averaged models. * Minor fixes. * Revert to use random combiner. * Update results. * Minor fixes.

pkufool and others added 17 commits February 6, 2022 18:22

Fix torch.nn.Embedding error for torch below 1.8.0

fcd25bd

Changes to fbank computation, use lilcom chunky writer

8f8ec22

Add min in q,k,v of attention

48a764e

Remove learnable offset, use relu instead.

a859dcb

Experiments based on SpecAugment change

3323cab

Merge branch 'spec-augment-change' of https://github.com/luomingshuan…

395065e

…g/icefall into attention_relu_specaug

Merge specaug change from Mingshuang.

beaf5bf

Use much more aggressive SpecAug setup

bd36216

Fix to num_feature_masks bug I introduced; reduce max_frames_mask_fra…

dd19a6a

…ction 0.4->0.3

Change p=0.5->0.9, mask_fraction 0.3->0.2

8aa50df

Change p=0.9 to p=0.8 in SpecAug

c170c53

Fix num_time_masks code; revert 0.8 to 0.9

4cd2c02

Change max_frames from 0.2 to 0.15

d187ad8

Remove ReLU in attention

2af1b3a

Adding diagnostics code...

581786a

Refactor/simplify ConformerEncoder

63d8d93

First version of rand-combine iterated-training-like idea.

c1063de

csukuangfj mentioned this pull request Mar 9, 2022

WIP: Try to use multiple datasets with pruned transducer loss #245

Closed

csukuangfj added a commit to csukuangfj/icefall that referenced this pull request Mar 25, 2022

Randomly combining output from encoder layers.

12de880

See k2-fsa#229

csukuangfj mentioned this pull request Mar 25, 2022

WIP: Randomly combining outputs from different transformer encoder layer #269

Closed

1 task

danpovey merged commit c1063de into k2-fsa:master Apr 11, 2022

csukuangfj added a commit to csukuangfj/icefall that referenced this pull request Apr 23, 2022

Add random combine from k2-fsa#229.

51cc648

csukuangfj mentioned this pull request Apr 23, 2022

Narrower and deeper conformer #330

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Randomly combining intermediate layers in RNN-T training #229

Randomly combining intermediate layers in RNN-T training #229

danpovey commented Feb 28, 2022

danpovey commented Feb 28, 2022

csukuangfj commented Feb 28, 2022

danpovey commented Feb 28, 2022

Randomly combining intermediate layers in RNN-T training #229

Randomly combining intermediate layers in RNN-T training #229

Conversation

danpovey commented Feb 28, 2022

danpovey commented Feb 28, 2022

csukuangfj commented Feb 28, 2022

danpovey commented Feb 28, 2022