Add zipformer from Dan using multi-dataset setup #675

csukuangfj · 2022-11-12T09:15:51Z

Same as #672
but it uses the mutli-dataset setup.

Results so far

decoding method	test-clean	test-other	comment
greedy search	2.02	4.8	epoch 11, avg 1
modified_beam_search	2.03	4.67	epoch 11, avg 1
modified_beam_search	1.97	4.63	epoch 11, avg2
fast_beam_search	2.03	4.74	epoch 11, avg 1
fast_beam_search	2.01	4.68	epoch 11, avg 2

Training commands

export CUDA_VISIBLE_DEVICES="0,1,2,3,4,5,6,7"

./pruned_transducer_stateless8/train.py \
  --world-size 8 \
  --num-epochs 20 \
  --full-libri 1 \
  --use-fp16 1 \
  --max-duration 750 \
  --exp-dir pruned_transducer_stateless8/exp \
  --feedforward-dims  "1024,1024,2048,2048,1024" \
  --master-port 12535 \
  --giga-prob 0.9

We use giga-prob 0.9, which means 90% of the time we select a batch from the gigaspeech and 10% of the time a batch from librispeech. The total hours of training data in an epoch is roughly 960 hours x 3 / 0.1 = 28.8k hours

Training time per epoch is about 9 hours 10 minutes with 8 GPUs (32 GB V100)

(py38) kuangfangjun:exp$ ls -lhtr epoch-*
-rw-r--r-- 1 kuangfangjun root 1.1G Nov  8 17:40 epoch-1.pt
-rw-r--r-- 1 kuangfangjun root 1.1G Nov  9 03:10 epoch-2.pt
-rw-r--r-- 1 kuangfangjun root 1.1G Nov  9 12:24 epoch-3.pt
-rw-r--r-- 1 kuangfangjun root 1.1G Nov  9 21:35 epoch-4.pt
-rw-r--r-- 1 kuangfangjun root 1.1G Nov 10 06:56 epoch-5.pt
-rw-r--r-- 1 kuangfangjun root 1.1G Nov 10 16:22 epoch-6.pt
-rw-r--r-- 1 kuangfangjun root 1.1G Nov 11 01:37 epoch-7.pt
-rw-r--r-- 1 kuangfangjun root 1.1G Nov 11 11:22 epoch-8.pt
-rw-r--r-- 1 kuangfangjun root 1.1G Nov 11 20:50 epoch-9.pt
-rw-r--r-- 1 kuangfangjun root 1.1G Nov 12 06:10 epoch-10.pt
-rw-r--r-- 1 kuangfangjun root 1.1G Nov 12 15:39 epoch-11.pt

…uals

… the 43->48 change.

…re random

…ayers.

…xp253-multidataset

desh2608 · 2022-11-15T00:43:03Z

Is the Zipformer subsampling factor same as the Conformer, i.e., T = ((num_frames - 1) // 2 - 1) // 2?

desh2608 · 2022-11-15T00:50:18Z

Is the Zipformer subsampling factor same as the Conformer, i.e., T = ((num_frames - 1) // 2 - 1) // 2?

Okay, it seems the formula is: T -> (T - 7)//2 for Zipformer (reference)

csukuangfj · 2022-11-15T04:38:06Z

Here are the final results for the PR.

I have trained it for 16 epochs and the best result is at epoch 16. It looks like the WER will continue to decrease if
we train it for more epochs, but I have terminated the training . If anyone wants to continue the training from epoch-16,
you can find the checkpoints at
https://huggingface.co/csukuangfj/icefall-asr-librispeech-pruned-transducer-stateless8-2022-11-14

The tensorboard log can be found at
https://tensorboard.dev/experiment/y6kAPnN3S3OwvQxQqKQzsQ/#scalars

	test-clean	test-other	comment
greedy search	1.87	4.38	--epoch 16 --avg 2
modified beam search	1.81	4.34	--epoch 16 --avg 2
fast beam search	1.91	4.33	--epoch 16 --avg 2

To give you an idea of the greedy search results at earlier epochs:

test-clean	test-other	comment
1.91	4.42	epochh 15, avg 2
1.99	4.65	epoch 12, avg 1
2.02	4.72	epoch 11, avg 2
2.03	4.83	epoch 10, avg 1
2.13	5.0	epoch 9, avg 1
2.22	5.05	epoch 8, avg 1
2.28	5.33	epoch 7, avg 1
2.41	5.70	epoch 6, avg 1
2.54	5.97	epoch 5, avg 1

csukuangfj · 2022-11-15T06:20:20Z

Will merge once the CI passes.

zhuangweiji · 2022-12-02T05:32:54Z

I have trained it for 20 epochs.
Here are the decoding results:

decoding-method	greedy search		modified beam search		fast beam search
comment \ testset	test-clean	test-other	test-clean	test-other	test-clean	test-other
epoch 20 avg 9	1.84	4.24	1.8	4.25	1.8	4.16
epoch 20 avg 8	1.82	4.23	1.79	4.21	1.8	4.16
epoch 20 avg 7	1.83	4.22	1.8	4.18	1.81	4.14
epoch 20 avg 6	1.82	4.18	1.79	4.15	1.78	4.13
epoch 20 avg 5	1.81	4.17	1.79	4.16	1.78	4.1
epoch 20 avg 4	1.81	4.18	1.82	4.15	1.78	4.08
epoch 20 avg 3	1.82	4.24	1.83	4.16	1.78	4.11
epoch 20 avg 2	1.84	4.21	1.84	4.18	1.8	4.12
epoch 20 avg 1	1.86	4.26	1.84	4.19	1.81	4.12
epoch 19 avg 9	1.86	4.34	1.81	4.29	1.81	4.21
epoch 19 avg 8	1.84	4.3	1.77	4.25	1.81	4.16
epoch 19 avg 7	1.84	4.26	1.79	4.23	1.82	4.19
epoch 19 avg 6	1.82	4.26	1.8	4.23	1.8	4.15
epoch 19 avg 5	1.8	4.25	1.79	4.18	1.79	4.11
epoch 19 avg 4	1.82	4.26	1.79	4.2	1.78	4.15
epoch 19 avg 3	1.82	4.23	1.83	4.15	1.78	4.12
epoch 19 avg 2	1.84	4.21	1.82	4.17	1.78	4.13
epoch 19 avg 1	1.87	4.31	1.87	4.28	1.81	4.2
epoch 18 avg 9	1.88	4.41	1.85	4.38	1.84	4.29
epoch 18 avg 8	1.87	4.36	1.82	4.32	1.8	4.25
epoch 18 avg 7	1.83	4.31	1.77	4.25	1.8	4.19
epoch 18 avg 6	1.83	4.27	1.79	4.25	1.8	4.17
epoch 18 avg 5	1.83	4.29	1.8	4.24	1.8	4.17
epoch 18 avg 4	1.83	4.25	1.79	4.23	1.77	4.14
epoch 18 avg 3	1.84	4.3	1.8	4.22	1.79	4.16
epoch 18 avg 2	1.84	4.27	1.8	4.14	1.77	4.12
epoch 18 avg 1	1.86	4.25	1.82	4.18	1.8	4.11
epoch 17 avg 9	1.92	4.45	1.89	4.38	1.85	4.36
epoch 17 avg 8	1.9	4.41	1.85	4.38	1.84	4.33
epoch 17 avg 7	1.87	4.37	1.82	4.33	1.81	4.28
epoch 17 avg 6	1.83	4.36	1.79	4.3	1.8	4.22
epoch 17 avg 5	1.83	4.31	1.8	4.3	1.8	4.21
epoch 17 avg 4	1.83	4.31	1.8	4.26	1.79	4.2
epoch 17 avg 3	1.84	4.26	1.77	4.24	1.8	4.19
epoch 17 avg 2	1.83	4.29	1.79	4.29	1.78	4.2
epoch 17 avg 1	1.82	4.31	1.87	4.32	1.77	4.22
epoch 16 avg 9	1.96	4.48	1.99	4.48	1.91	4.4
epoch 16 avg 8	1.94	4.44	1.9	4.39	1.87	4.37
epoch 16 avg 7	1.9	4.4	1.87	4.37	1.86	4.32
epoch 16 avg 6	1.85	4.38	1.83	4.36	1.82	4.29
epoch 16 avg 5	1.84	4.38	1.81	4.32	1.81	4.25
epoch 16 avg 4	1.84	4.37	1.81	4.32	1.81	4.21
epoch 16 avg 3	1.85	4.37	1.88	4.31	1.79	4.25
epoch 16 avg 2	1.85	4.32	1.81	4.28	1.81	4.21
epoch 16 avg 1	1.88	4.37	1.92	4.39	1.83	4.25

csukuangfj · 2022-12-02T05:37:27Z

@zhuangweiji

Thanks!

csukuangfj · 2022-12-02T05:42:37Z

@zhuangweiji

Could you please upload the pretrained models to huggingface and update RESULTS.md?

zhuangweiji · 2022-12-02T08:40:27Z

@zhuangweiji

Could you please upload the pretrained models to huggingface and update RESULTS.md?

Sure! Done.
#728

The tensorboard dev log:
https://tensorboard.dev/experiment/3e9AfOcgRwOXpLQlZvHZrQ

The hugging face model:
https://huggingface.co/WeijiZhuang/icefall-asr-librispeech-pruned-transducer-stateless8-2022-12-02

danpovey added 30 commits September 28, 2022 20:59

Bug fix

14a2603

Change subsamplling factor from 1 to 2

d6ef1be

Implement AttentionCombine as replacement for RandomCombine

461ad36

Decrease random_prob from 0.5 to 0.333

d398f0e

Add print statement

d8f7310

Apply single_prob mask, so sometimes we just get one layer as output.

056b9a4

Introduce feature mask per frame

38f8905

Include changes from Liyong about padding conformer module.

ab7c940

Reduce single_prob from 0.5 to 0.25

1eb603f

Reduce feature_mask_dropout_prob from 0.25 to 0.15.

cc64f2f

Remove dropout from inside ConformerEncoderLayer, for adding to resid…

e9326a7

…uals

Increase feature_mask_dropout_prob from 0.15 to 0.2.

8d517a6

Swap random_prob and single_prob, to reduce prob of being randomized.

cf5f7e5

Decrease feature_mask_dropout_prob back from 0.2 to 0.15, i.e. revert…

1be4554

… the 43->48 change.

Randomize order of some modules

c20fc3b

Bug fix

a0a1874

Stop backprop bug

5a89953

Introduce a scale dependent on the masking value

93dff29

Implement efficient layer dropout

b3af9f6

Simplify the learned scaling factor on the modules

88d0da7

Compute valid loss on batch 0.

96e0d92

Make the scaling factors more global and the randomness of dropout mo…

a9f950a

…re random

Bug fix

33c24e4

Introduce offset in layerdrop_scaleS

006fcc1

Remove final combination; implement layer drop that drops the final l…

5fe8cb1

…ayers.

Bug fices

8154283

Fix bug RE self.training

61f6283

Fix bug setting layerdrop mask

1cd7e93

Fix eigs call

040592a

Add debug info

bb233d3

danpovey and others added 10 commits October 31, 2022 19:37

Merge branch 'scaled_adam_exp236' into scaled_adam_exp242

4da4a3a

Remove the 5th of 6 encoder stacks

b806a21

Fix some typos

e1a87e9

small fixes

366f0bc

small fixes

ed9d754

Copy files

f8b231b

Update decode.py

21390ea

Add changes from the master

0302ed1

Add changes from the master

46e4230

Merge remote-tracking branch 'dan/master' into from-dan-scaled-adam-e…

71cc8ea

…xp253-multidataset

csukuangfj added 2 commits November 15, 2022 12:45

update results

0a7be72

Add CI

c49ef78

csukuangfj added the run-decode label Nov 15, 2022

Small fixes

60f4480

csukuangfj added run-decode and removed run-decode labels Nov 15, 2022

Small fixes

8d0bdd3

csukuangfj added run-decode and removed run-decode labels Nov 15, 2022

csukuangfj merged commit 855c766 into k2-fsa:master Nov 15, 2022

csukuangfj deleted the from-dan-scaled-adam-exp253-multidataset branch November 15, 2022 08:56

zhuangweiji mentioned this pull request Dec 2, 2022

update multidataset zipformer results #728

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add zipformer from Dan using multi-dataset setup #675

Add zipformer from Dan using multi-dataset setup #675

csukuangfj commented Nov 12, 2022 •

edited

Loading

desh2608 commented Nov 15, 2022

desh2608 commented Nov 15, 2022

csukuangfj commented Nov 15, 2022

csukuangfj commented Nov 15, 2022

zhuangweiji commented Dec 2, 2022

csukuangfj commented Dec 2, 2022

csukuangfj commented Dec 2, 2022

zhuangweiji commented Dec 2, 2022

Add zipformer from Dan using multi-dataset setup #675

Add zipformer from Dan using multi-dataset setup #675

Conversation

csukuangfj commented Nov 12, 2022 • edited Loading

Training commands

desh2608 commented Nov 15, 2022

desh2608 commented Nov 15, 2022

csukuangfj commented Nov 15, 2022

csukuangfj commented Nov 15, 2022

zhuangweiji commented Dec 2, 2022

csukuangfj commented Dec 2, 2022

csukuangfj commented Dec 2, 2022

zhuangweiji commented Dec 2, 2022

csukuangfj commented Nov 12, 2022 •

edited

Loading