-
Notifications
You must be signed in to change notification settings - Fork 301
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add zipformer from Dan using multi-dataset setup #675
Add zipformer from Dan using multi-dataset setup #675
Conversation
… the 43->48 change.
…xp253-multidataset
Is the Zipformer subsampling factor same as the Conformer, i.e., |
Okay, it seems the formula is: |
Here are the final results for the PR. I have trained it for 16 epochs and the best result is at epoch 16. It looks like the WER will continue to decrease if The tensorboard log can be found at
To give you an idea of the greedy search results at earlier epochs:
|
Will merge once the CI passes. |
I have trained it for 20 epochs.
|
Thanks! |
Could you please upload the pretrained models to huggingface and update |
Sure! Done. The tensorboard dev log: The hugging face model: |
Same as #672
but it uses the mutli-dataset setup.
Results so far
Training commands
We use giga-prob 0.9, which means 90% of the time we select a batch from the gigaspeech and 10% of the time a batch from librispeech. The total hours of training data in an epoch is roughly 960 hours x 3 / 0.1 = 28.8k hours
Training time per epoch is about 9 hours 10 minutes with 8 GPUs (32 GB V100)