Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Randomly combining intermediate layers in RNN-T training #229

Merged
merged 17 commits into from
Apr 11, 2022

Conversation

danpovey
Copy link
Collaborator

This PR demonstrates how to do something like "iterated loss" using intermediate layers, but with only one
loss-function evaluation. It does this by randomly interpolating different combinations of different layers, with
linear "adapter layers" for all but the last layer, and with the interpolation weights different per frame.

There is a significant WER improvement from this: on test-clean-100 on this setup i have 7.58/20.36 with greedy search, whereas with similar setups I was more usually getting 8.xx / 22.xx or so.
(This is hard to have an exact baseline for because the baseline didn't converge on 100 hours).

I am hoping someone could test this with a current setup somehow.

@danpovey
Copy link
Collaborator Author

BTW, this will of course break compatibility with older models, so it may be necessary to introduce an option for it.
(The compatibility issue wouldn't actually affect decoding though, I think, since it's for parts that are only used during training).

@csukuangfj
Copy link
Collaborator

I can test it with the multiple datasets setup, which does converge on the 100h subset.

@danpovey
Copy link
Collaborator Author

BTW, for things like this and the diagnostics, I'd really like to have them also applied to the pruned recipe.
Right now I'm experimenting with smaller models, but with the non-pruned recipe, unfortunately i can't reduce the batch size without memory being exhausted, which is a pain (too much hassle right now to move my code over to the pruned recipe).

csukuangfj added a commit to csukuangfj/icefall that referenced this pull request Mar 25, 2022
@danpovey danpovey merged commit c1063de into k2-fsa:master Apr 11, 2022
csukuangfj added a commit to csukuangfj/icefall that referenced this pull request Apr 23, 2022
csukuangfj added a commit that referenced this pull request May 23, 2022
* Copy files for editing.

* Add random combine from #229.

* Minor fixes.

* Pass model parameters from the command line.

* Fix warnings.

* Fix warnings.

* Update readme.

* Rename to avoid conflicts.

* Update results.

* Add CI for pruned_transducer_stateless5

* Typo fixes.

* Remove random combiner.

* Update decode.py and train.py to use periodically averaged models.

* Minor fixes.

* Revert to use random combiner.

* Update results.

* Minor fixes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants