Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add zipformer from Dan using multi-dataset setup #675

Merged

Conversation

csukuangfj
Copy link
Collaborator

@csukuangfj csukuangfj commented Nov 12, 2022

Same as #672
but it uses the mutli-dataset setup.

Results so far

decoding method test-clean test-other comment
greedy search 2.02 4.8 epoch 11, avg 1
modified_beam_search 2.03 4.67 epoch 11, avg 1
modified_beam_search 1.97 4.63 epoch 11, avg2
fast_beam_search 2.03 4.74 epoch 11, avg 1
fast_beam_search 2.01 4.68 epoch 11, avg 2

Training commands

export CUDA_VISIBLE_DEVICES="0,1,2,3,4,5,6,7"

./pruned_transducer_stateless8/train.py \
  --world-size 8 \
  --num-epochs 20 \
  --full-libri 1 \
  --use-fp16 1 \
  --max-duration 750 \
  --exp-dir pruned_transducer_stateless8/exp \
  --feedforward-dims  "1024,1024,2048,2048,1024" \
  --master-port 12535 \
  --giga-prob 0.9

We use giga-prob 0.9, which means 90% of the time we select a batch from the gigaspeech and 10% of the time a batch from librispeech. The total hours of training data in an epoch is roughly 960 hours x 3 / 0.1 = 28.8k hours

Training time per epoch is about 9 hours 10 minutes with 8 GPUs (32 GB V100)

(py38) kuangfangjun:exp$ ls -lhtr epoch-*
-rw-r--r-- 1 kuangfangjun root 1.1G Nov  8 17:40 epoch-1.pt
-rw-r--r-- 1 kuangfangjun root 1.1G Nov  9 03:10 epoch-2.pt
-rw-r--r-- 1 kuangfangjun root 1.1G Nov  9 12:24 epoch-3.pt
-rw-r--r-- 1 kuangfangjun root 1.1G Nov  9 21:35 epoch-4.pt
-rw-r--r-- 1 kuangfangjun root 1.1G Nov 10 06:56 epoch-5.pt
-rw-r--r-- 1 kuangfangjun root 1.1G Nov 10 16:22 epoch-6.pt
-rw-r--r-- 1 kuangfangjun root 1.1G Nov 11 01:37 epoch-7.pt
-rw-r--r-- 1 kuangfangjun root 1.1G Nov 11 11:22 epoch-8.pt
-rw-r--r-- 1 kuangfangjun root 1.1G Nov 11 20:50 epoch-9.pt
-rw-r--r-- 1 kuangfangjun root 1.1G Nov 12 06:10 epoch-10.pt
-rw-r--r-- 1 kuangfangjun root 1.1G Nov 12 15:39 epoch-11.pt

@desh2608
Copy link
Collaborator

Is the Zipformer subsampling factor same as the Conformer, i.e., T = ((num_frames - 1) // 2 - 1) // 2?

@desh2608
Copy link
Collaborator

Is the Zipformer subsampling factor same as the Conformer, i.e., T = ((num_frames - 1) // 2 - 1) // 2?

Okay, it seems the formula is: T -> (T - 7)//2 for Zipformer (reference)

@csukuangfj
Copy link
Collaborator Author

Here are the final results for the PR.

I have trained it for 16 epochs and the best result is at epoch 16. It looks like the WER will continue to decrease if
we train it for more epochs, but I have terminated the training . If anyone wants to continue the training from epoch-16,
you can find the checkpoints at
https://huggingface.co/csukuangfj/icefall-asr-librispeech-pruned-transducer-stateless8-2022-11-14

The tensorboard log can be found at
https://tensorboard.dev/experiment/y6kAPnN3S3OwvQxQqKQzsQ/#scalars


test-clean test-other comment
greedy search 1.87 4.38 --epoch 16 --avg 2
modified beam search 1.81 4.34 --epoch 16 --avg 2
fast beam search 1.91 4.33 --epoch 16 --avg 2

To give you an idea of the greedy search results at earlier epochs:

test-clean test-other comment
1.91 4.42 epochh 15, avg 2
1.99 4.65 epoch 12, avg 1
2.02 4.72 epoch 11, avg 2
2.03 4.83 epoch 10, avg 1
2.13 5.0 epoch 9, avg 1
2.22 5.05 epoch 8, avg 1
2.28 5.33 epoch 7, avg 1
2.41 5.70 epoch 6, avg 1
2.54 5.97 epoch 5, avg 1

@csukuangfj
Copy link
Collaborator Author

Will merge once the CI passes.

@csukuangfj csukuangfj merged commit 855c766 into k2-fsa:master Nov 15, 2022
@csukuangfj csukuangfj deleted the from-dan-scaled-adam-exp253-multidataset branch November 15, 2022 08:56
@zhuangweiji
Copy link
Contributor

I have trained it for 20 epochs.
Here are the decoding results:

decoding-method greedy search modified beam search fast beam search
comment \ testset test-clean test-other test-clean test-other test-clean test-other
epoch 20 avg 9 1.84 4.24 1.8 4.25 1.8 4.16
epoch 20 avg 8 1.82 4.23 1.79 4.21 1.8 4.16
epoch 20 avg 7 1.83 4.22 1.8 4.18 1.81 4.14
epoch 20 avg 6 1.82 4.18 1.79 4.15 1.78 4.13
epoch 20 avg 5 1.81 4.17 1.79 4.16 1.78 4.1
epoch 20 avg 4 1.81 4.18 1.82 4.15 1.78 4.08
epoch 20 avg 3 1.82 4.24 1.83 4.16 1.78 4.11
epoch 20 avg 2 1.84 4.21 1.84 4.18 1.8 4.12
epoch 20 avg 1 1.86 4.26 1.84 4.19 1.81 4.12
epoch 19 avg 9 1.86 4.34 1.81 4.29 1.81 4.21
epoch 19 avg 8 1.84 4.3 1.77 4.25 1.81 4.16
epoch 19 avg 7 1.84 4.26 1.79 4.23 1.82 4.19
epoch 19 avg 6 1.82 4.26 1.8 4.23 1.8 4.15
epoch 19 avg 5 1.8 4.25 1.79 4.18 1.79 4.11
epoch 19 avg 4 1.82 4.26 1.79 4.2 1.78 4.15
epoch 19 avg 3 1.82 4.23 1.83 4.15 1.78 4.12
epoch 19 avg 2 1.84 4.21 1.82 4.17 1.78 4.13
epoch 19 avg 1 1.87 4.31 1.87 4.28 1.81 4.2
epoch 18 avg 9 1.88 4.41 1.85 4.38 1.84 4.29
epoch 18 avg 8 1.87 4.36 1.82 4.32 1.8 4.25
epoch 18 avg 7 1.83 4.31 1.77 4.25 1.8 4.19
epoch 18 avg 6 1.83 4.27 1.79 4.25 1.8 4.17
epoch 18 avg 5 1.83 4.29 1.8 4.24 1.8 4.17
epoch 18 avg 4 1.83 4.25 1.79 4.23 1.77 4.14
epoch 18 avg 3 1.84 4.3 1.8 4.22 1.79 4.16
epoch 18 avg 2 1.84 4.27 1.8 4.14 1.77 4.12
epoch 18 avg 1 1.86 4.25 1.82 4.18 1.8 4.11
epoch 17 avg 9 1.92 4.45 1.89 4.38 1.85 4.36
epoch 17 avg 8 1.9 4.41 1.85 4.38 1.84 4.33
epoch 17 avg 7 1.87 4.37 1.82 4.33 1.81 4.28
epoch 17 avg 6 1.83 4.36 1.79 4.3 1.8 4.22
epoch 17 avg 5 1.83 4.31 1.8 4.3 1.8 4.21
epoch 17 avg 4 1.83 4.31 1.8 4.26 1.79 4.2
epoch 17 avg 3 1.84 4.26 1.77 4.24 1.8 4.19
epoch 17 avg 2 1.83 4.29 1.79 4.29 1.78 4.2
epoch 17 avg 1 1.82 4.31 1.87 4.32 1.77 4.22
epoch 16 avg 9 1.96 4.48 1.99 4.48 1.91 4.4
epoch 16 avg 8 1.94 4.44 1.9 4.39 1.87 4.37
epoch 16 avg 7 1.9 4.4 1.87 4.37 1.86 4.32
epoch 16 avg 6 1.85 4.38 1.83 4.36 1.82 4.29
epoch 16 avg 5 1.84 4.38 1.81 4.32 1.81 4.25
epoch 16 avg 4 1.84 4.37 1.81 4.32 1.81 4.21
epoch 16 avg 3 1.85 4.37 1.88 4.31 1.79 4.25
epoch 16 avg 2 1.85 4.32 1.81 4.28 1.81 4.21
epoch 16 avg 1 1.88 4.37 1.92 4.39 1.83 4.25

@csukuangfj
Copy link
Collaborator Author

@zhuangweiji

Thanks!

@csukuangfj
Copy link
Collaborator Author

@zhuangweiji

Could you please upload the pretrained models to huggingface and update RESULTS.md?

@zhuangweiji
Copy link
Contributor

@zhuangweiji

Could you please upload the pretrained models to huggingface and update RESULTS.md?

Sure! Done.
#728

The tensorboard dev log:
https://tensorboard.dev/experiment/3e9AfOcgRwOXpLQlZvHZrQ

The hugging face model:
https://huggingface.co/WeijiZhuang/icefall-asr-librispeech-pruned-transducer-stateless8-2022-12-02

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants