-
Notifications
You must be signed in to change notification settings - Fork 297
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Begin to use multiple datasets in training #213
Conversation
"with training dataset. ", | ||
) | ||
|
||
group.add_argument( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder whether we should standardize the name?
was asr_dataloader.py in another recipe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, reverted to the previous name.
Note: I am not going to use the changes in lhotse-speech/lhotse#565, which added support for multiplexing among CutSets, because there are utterances from different datasets in a batch if that method is used. |
Here is the tensorboard log for the following training command
It uses the You can see that the model starts to converge. The transducer loss for GigaSpeech is higher than that for LibriSpeech. One possible reason may be that the training sees less data from it. The following shows the model architecture. The encoder is shared between LibriSpeech and GigaSpeech, but they have separate decoder/joiner networks. |
Here are the results for this PR so far:
You can see that integrating the GigaSpeech dataset into the training pipeline helps to reduce the WER and results in faster convergence. The training command for this PR is given in #213 (comment), which is repeated below: ./transducer_stateless_multi_datasets/train.py \
--world-size 2 \
--num-epochs 40 \
--start-epoch 0 \
--exp-dir transducer_stateless_multi_datasets/exp-100-2 \
--full-libri 0 \
--max-duration 300 \
--lr-factor 1 \
--bpe-model data/lang_bpe_500/bpe.model \
--modified-transducer-prob 0.25 The training command for the baseline is given below: export CUDA_VISIBLE_DEVICES="0,1"
./transducer_stateless/train.py \
--world-size 2 \
--num-epochs 40 \
--start-epoch 0 \
--exp-dir transducer_stateless/exp-100-no-shift \
--full-libri 0 \
--max-duration 300 \
--lr-factor 1 \
--bpe-model data/lang_bpe_500/bpe.model \
--apply-frame-shift 0 \
--modified-transducer-prob 0.25 \
--ctc-weight 0.0 |
Cool!! |
Here are the results for using train-clean-100 + S subset of GigaSpeech (250 hours):
The training for A pre-trained model with train-clean-100 is available at https://huggingface.co/csukuangfj/icefall-asr-librispeech-100h-transducer-stateless-multi-datasets-bpe-500-2022-02-21 The tensorboard log can be found at https://tensorboard.dev/experiment/qUEKzMnrTZmOz1EXPda9RA/#scalars&_smoothingWeight=0 [EDITED]: |
Cool!! |
I will merge it and do some experiments based on it. The results for the full LibriSpeech will be posted later. |
nice! |
See details at lhotse-speech/lhotse#554 (comment)
TODOs