[Not for merge]Different scale for codebook indexes from mask/unmasked area #395

glynpu · 2022-06-04T13:40:29Z

@danpovey Conclusion: Codebook indexes in un-masked area seems quite necessary for training.

All following results are decoded with --epoch 20, --avg 10, --max-duration 200
Note: In theory, Exp B and Exp C are the same. The only difference is about class SpecAugment.
Exp B used that in lhotse.
Exp C used a modified version in pruned_transducer_stateless6/aug.py which could return the masked area.

For Exp D/E/F without learning codebook indexes in un-masked area, WERs are even worse than baseline Exp A.

For Exp G, after using un-masked area codebook indexes again, results are better than baseline Exp A.
But it's still worse than Exp B/C, which tread masked/un-masked codebook indexes equally.

exp_id	intro	masked scale	unmasked scale	spec-aug-max-frames-mask-fraction	test-clean	test-other
A	baseline no vq distillation	*	*	default 0.15	7.09	18.88
B	distillation with hubert; using specaugment in lhotse	1.0	1.0	default 0.15	5.82	15.98
C	weighted masked area	1.0	1.0	default 0.15	5.68	15.83
D	weighted masked area	1.0	0.0	default 0.15	7.25	19.42
E	weighted masked area	10.0	0.0	default 0.15	7.58	20.31
F	weighted masked area	10.0	0.0	0.3	7.44	19.96
G	weighted masked area	10.0	0.5	default 0.15	6.57	17.76

danpovey · 2022-06-04T13:44:12Z

OK, interesting, thanks!
perhaps before we forget this, you could 2.0, 1.0, i.e. 2 for masked, 1 for unmasked.

glynpu · 2022-06-04T13:47:31Z

OK, interesting, thanks!
perhaps before we forget this, you could 2.0, 1.0, i.e. 2 for masked, 1 for unmasked.

Sure, will try this after server recovers.

glynpu · 2022-06-06T03:23:25Z

OK, interesting, thanks!
perhaps before we forget this, you could 2.0, 1.0, i.e. 2 for masked, 1 for unmasked.

Conclusions: The impact of changing "masked scale" or "spec-aug-max-frames-max-fraction" is smaller than training noise.

Because WER of Exp C < Exp H/I < Exp B.
Note: In theory, Exp B and Exp C are the same. The only difference is about class SpecAugment.
Exp B used that in lhotse.
Exp C used a modified version in pruned_transducer_stateless6/aug.py which could return the masked area.

Results of epoch-20-avg-10 decoding with modified_beam_search

exp_id	intro	epoch(avg)	masked scale	unmasked scale	spec-aug-max-frames-mask-fraction	test-clean	test-other
A	baseline no vq distillation	20(10)	*	*	default 0.15	7.09	18.88
B	distillation with hubert; using specaugment in lhotse	20(10)	1.0	1.0	default 0.15	5.82	15.98
C	weighted masked area	20(10)	1.0	1.0	default 0.15	5.68	15.83
H	weighted masked area	20(10)	2.0	1.0	default 0.15	5.68	16.01
I	weighted masked area	20(10)	2.0	1.0	0.3	5.76	15.94

Results of epoch-30-avg-10 decoding with modified_beam_search

exp_id	intro	epoch(avg)	masked scale	unmasked scale	spec-aug-max-frames-mask-fraction	test-clean	test-other
A	baseline no vq distillation	30(10)	*	*	default 0.15	6.03	18.19
B	distillation with hubert; using specaugment in lhotse	30(10)	1.0	1.0	default 0.15	5.52	15.15
C	weighted masked area	30(10)	1.0	1.0	default 0.15	5.28	14.90
H	weighted masked area	30(10)	2.0	1.0	default 0.15	5.38	14.81
I	weighted masked area	30(10)	2.0	1.0	0.3	5.45	14.90

danpovey · 2022-06-06T04:01:43Z

OK, cool. Perhaps you can try a scale of 4.0 instead of 2.0 on masked area.

glynpu · 2022-06-06T13:56:46Z

OK, cool. Perhaps you can try a scale of 4.0 instead of 2.0 on masked area.

Exp G/K gets slightly worse results with higher scale, but not significantly. @danpovey
Results of epoch-20-avg-10 decoding with modified_beam_search

exp_id	intro	epoch(avg)	masked scale	unmasked scale	spec-aug-max-frames-mask-fraction	test-clean	test-other
A	baseline no vq distillation	20(10)	*	*	default 0.15	7.09	18.88
B	distillation with hubert; using specaugment in lhotse	20(10)	1.0	1.0	default 0.15	5.82	15.98
C	weighted masked area	20(10)	1.0	1.0	default 0.15	5.68	15.83
H	weighted masked area	20(10)	2.0	1.0	default 0.15	5.68	16.01
I	weighted masked area	20(10)	2.0	1.0	0.3	5.76	15.94
G	weighted masked area	20(10)	4.0	1.0	default 0.15	5.83	16.37
K	weighted masked area	20(10)	4.0	1.0	0.3	5.92	16.47

Results of epoch-30-avg-10 decoding with modified_beam_search

exp_id	intro	epoch(avg)	masked scale	unmasked scale	spec-aug-max-frames-mask-fraction	test-clean	test-other
A	baseline no vq distillation	30(10)	*	*	default 0.15	6.03	18.19
B	distillation with hubert; using specaugment in lhotse	30(10)	1.0	1.0	default 0.15	5.52	15.15
C	weighted masked area	30(10)	1.0	1.0	default 0.15	5.28	14.90
H	weighted masked area	30(10)	2.0	1.0	default 0.15	5.38	14.81
I	weighted masked area	30(10)	2.0	1.0	0.3	5.45	14.90
G	weighted masked area	30(10)	4.0	1.0	default 0.15	5.57	15.24
K	weighted masked area	30(10)	4.0	1.0	0.3	5.63	15.32

danpovey · 2022-06-06T14:12:00Z

ok, cool.

glynpu · 2022-06-22T03:53:24Z

Exp L / M are trained with scale for masked area == 0.5 @danpovey
Seems no significant differences comparing to baseline B/C.
Results of epoch-20-avg-10 decoding with modified_beam_search

exp_id	intro	epoch(avg)	masked scale	unmasked scale	spec-aug-max-frames-mask-fraction	test-clean	test-other
A	baseline no vq distillation	20(10)	*	*	default 0.15	7.09	18.88
B	distillation with hubert; using specaugment in lhotse	20(10)	1.0	1.0	default 0.15	5.82	15.98
C	weighted masked area	20(10)	1.0	1.0	default 0.15	5.68	15.83
H	weighted masked area	20(10)	2.0	1.0	default 0.15	5.68	16.01
I	weighted masked area	20(10)	2.0	1.0	0.3	5.76	15.94
G	weighted masked area	20(10)	4.0	1.0	default 0.15	5.83	16.37
K	weighted masked area	20(10)	4.0	1.0	0.3	5.92	16.47
L	weighted masked area	20(10)	0.5	1.0	default 0.15	5.7	15.68
M	weighted masked area	20(10)	0.5	1.0	0.3	5.66	15.89

Results of epoch-30-avg-10 decoding with modified_beam_search

exp_id	intro	epoch(avg)	masked scale	unmasked scale	spec-aug-max-frames-mask-fraction	test-clean	test-other
A	baseline no vq distillation	30(10)	*	*	default 0.15	6.03	18.19
B	distillation with hubert; using specaugment in lhotse	30(10)	1.0	1.0	default 0.15	5.52	15.15
C	weighted masked area	30(10)	1.0	1.0	default 0.15	5.28	14.90
H	weighted masked area	30(10)	2.0	1.0	default 0.15	5.38	14.81
I	weighted masked area	30(10)	2.0	1.0	0.3	5.45	14.90
G	weighted masked area	30(10)	4.0	1.0	default 0.15	5.57	15.24
K	weighted masked area	30(10)	4.0	1.0	0.3	5.63	15.32
L	weighted masked area	30(10)	0.5	1.0	default 0.15	5.4	14.88
M	weighted masked area	30(10)	0.5	1.0	0.3	5.48	15.04

danpovey · 2022-06-22T14:47:21Z

egs/librispeech/ASR/tdnn_lstm_ctc/asr_datamodule.py

+            type=float,
+            default=0.15,
+        )
+
        group.add_argument(
            "--spec-aug-time-warp-factor",
            type=int,


I notice the default time_warp_factor is not -1, i.e. there is time warping by default. Are you disabling this via the command line by setting --spec-aug-time-warp-factor==1? Because otherwise, it seems to me that the codebook loss will be wrong because we're not time-warping the codebook indexes (this would be tricky to do anyway, because they can't really be interpolated).

See

icefall/egs/librispeech/ASR/distillation_with_hubert.sh

Line 188 in dc89b61

--spec-aug-time-warp-factor -1 \

I would recommend asserting its value being -1 in train.py in case users forgot to set it to -1.

now add it since this: #469

glynpu added 6 commits May 30, 2022 11:05

direct copy from lhotse

03fa995

icefall format

496fb96

copy asr_datamodule.py to stateless6

0c33543

predicted masked codebook indexes only

90024c3

different weight for masked/unmasked region

c381b49

config spec-aug-max-frames-mask-fraction

b52b5c6

danpovey reviewed Jun 22, 2022

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Not for merge]Different scale for codebook indexes from mask/unmasked area #395

[Not for merge]Different scale for codebook indexes from mask/unmasked area #395

glynpu commented Jun 4, 2022

danpovey commented Jun 4, 2022

glynpu commented Jun 4, 2022

glynpu commented Jun 6, 2022 •

edited

Loading

danpovey commented Jun 6, 2022

glynpu commented Jun 6, 2022

danpovey commented Jun 6, 2022

glynpu commented Jun 22, 2022

danpovey Jun 22, 2022

csukuangfj Jun 22, 2022

csukuangfj Jun 22, 2022 •

edited

Loading

glynpu Jul 8, 2022

[Not for merge]Different scale for codebook indexes from mask/unmasked area #395

Are you sure you want to change the base?

[Not for merge]Different scale for codebook indexes from mask/unmasked area #395

Conversation

glynpu commented Jun 4, 2022

danpovey commented Jun 4, 2022

glynpu commented Jun 4, 2022

glynpu commented Jun 6, 2022 • edited Loading

danpovey commented Jun 6, 2022

glynpu commented Jun 6, 2022

danpovey commented Jun 6, 2022

glynpu commented Jun 22, 2022

danpovey Jun 22, 2022

Choose a reason for hiding this comment

csukuangfj Jun 22, 2022

Choose a reason for hiding this comment

csukuangfj Jun 22, 2022 • edited Loading

Choose a reason for hiding this comment

glynpu Jul 8, 2022

Choose a reason for hiding this comment

glynpu commented Jun 6, 2022 •

edited

Loading

csukuangfj Jun 22, 2022 •

edited

Loading