Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Not for merge]Different scale for codebook indexes from mask/unmasked area #395

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

glynpu
Copy link
Collaborator

@glynpu glynpu commented Jun 4, 2022

@danpovey Conclusion: Codebook indexes in un-masked area seems quite necessary for training.

All following results are decoded with --epoch 20, --avg 10, --max-duration 200
Note: In theory, Exp B and Exp C are the same. The only difference is about class SpecAugment.
Exp B used that in lhotse.
Exp C used a modified version in pruned_transducer_stateless6/aug.py which could return the masked area.

For Exp D/E/F without learning codebook indexes in un-masked area, WERs are even worse than baseline Exp A.

For Exp G, after using un-masked area codebook indexes again, results are better than baseline Exp A.
But it's still worse than Exp B/C, which tread masked/un-masked codebook indexes equally.

 exp_id intro masked scale unmasked scale spec-aug-max-frames-mask-fraction test-clean   test-other
A baseline no vq distillation * * default 0.15 7.09 18.88
B distillation with hubert; using specaugment in lhotse 1.0 1.0 default 0.15 5.82 15.98
C weighted masked area 1.0 1.0 default 0.15 5.68 15.83
D weighted masked area 1.0 0.0 default 0.15 7.25 19.42
E weighted masked area 10.0 0.0 default 0.15 7.58 20.31
F weighted masked area 10.0 0.0 0.3 7.44 19.96
G weighted masked area 10.0 0.5 default 0.15 6.57 17.76

@danpovey
Copy link
Collaborator

danpovey commented Jun 4, 2022

OK, interesting, thanks!
perhaps before we forget this, you could 2.0, 1.0, i.e. 2 for masked, 1 for unmasked.

@glynpu
Copy link
Collaborator Author

glynpu commented Jun 4, 2022

OK, interesting, thanks!
perhaps before we forget this, you could 2.0, 1.0, i.e. 2 for masked, 1 for unmasked.

Sure, will try this after server recovers.

@glynpu
Copy link
Collaborator Author

glynpu commented Jun 6, 2022

OK, interesting, thanks!
perhaps before we forget this, you could 2.0, 1.0, i.e. 2 for masked, 1 for unmasked.

Conclusions: The impact of changing "masked scale" or "spec-aug-max-frames-max-fraction" is smaller than training noise.

Because WER of Exp C < Exp H/I < Exp B.
Note: In theory, Exp B and Exp C are the same. The only difference is about class SpecAugment.
Exp B used that in lhotse.
Exp C used a modified version in pruned_transducer_stateless6/aug.py which could return the masked area.

Results of epoch-20-avg-10 decoding with modified_beam_search

 exp_id intro epoch(avg) masked scale unmasked scale spec-aug-max-frames-mask-fraction test-clean   test-other
A baseline no vq distillation 20(10) * * default 0.15 7.09 18.88
B distillation with hubert; using specaugment in lhotse 20(10) 1.0 1.0 default 0.15 5.82 15.98
C weighted masked area 20(10) 1.0 1.0 default 0.15 5.68 15.83
H weighted masked area 20(10) 2.0 1.0 default 0.15 5.68 16.01
I weighted masked area 20(10) 2.0 1.0 0.3 5.76 15.94

Results of epoch-30-avg-10 decoding with modified_beam_search

 exp_id intro epoch(avg) masked scale unmasked scale spec-aug-max-frames-mask-fraction test-clean   test-other
A baseline no vq distillation 30(10) * * default 0.15 6.03 18.19
B distillation with hubert; using specaugment in lhotse 30(10) 1.0 1.0 default 0.15 5.52 15.15
C weighted masked area 30(10) 1.0 1.0 default 0.15 5.28 14.90
H weighted masked area 30(10) 2.0 1.0 default 0.15 5.38 14.81
I weighted masked area 30(10) 2.0 1.0 0.3 5.45 14.90

@danpovey
Copy link
Collaborator

danpovey commented Jun 6, 2022

OK, cool. Perhaps you can try a scale of 4.0 instead of 2.0 on masked area.

@glynpu
Copy link
Collaborator Author

glynpu commented Jun 6, 2022

OK, cool. Perhaps you can try a scale of 4.0 instead of 2.0 on masked area.

Exp G/K gets slightly worse results with higher scale, but not significantly. @danpovey
Results of epoch-20-avg-10 decoding with modified_beam_search

 exp_id intro epoch(avg) masked scale unmasked scale spec-aug-max-frames-mask-fraction test-clean   test-other
A baseline no vq distillation 20(10) * * default 0.15 7.09 18.88
B distillation with hubert; using specaugment in lhotse 20(10) 1.0 1.0 default 0.15 5.82 15.98
C weighted masked area 20(10) 1.0 1.0 default 0.15 5.68 15.83
H weighted masked area 20(10) 2.0 1.0 default 0.15 5.68 16.01
I weighted masked area 20(10) 2.0 1.0 0.3 5.76 15.94
G weighted masked area 20(10) 4.0 1.0 default 0.15 5.83 16.37
K weighted masked area 20(10) 4.0 1.0 0.3 5.92 16.47

Results of epoch-30-avg-10 decoding with modified_beam_search

 exp_id intro epoch(avg) masked scale unmasked scale spec-aug-max-frames-mask-fraction test-clean   test-other
A baseline no vq distillation 30(10) * * default 0.15 6.03 18.19
B distillation with hubert; using specaugment in lhotse 30(10) 1.0 1.0 default 0.15 5.52 15.15
C weighted masked area 30(10) 1.0 1.0 default 0.15 5.28 14.90
H weighted masked area 30(10) 2.0 1.0 default 0.15 5.38 14.81
I weighted masked area 30(10) 2.0 1.0 0.3 5.45 14.90
G weighted masked area 30(10) 4.0 1.0 default 0.15 5.57 15.24
K weighted masked area 30(10) 4.0 1.0 0.3 5.63 15.32

@danpovey
Copy link
Collaborator

danpovey commented Jun 6, 2022

ok, cool.

@glynpu
Copy link
Collaborator Author

glynpu commented Jun 22, 2022

Exp L / M are trained with scale for masked area == 0.5 @danpovey
Seems no significant differences comparing to baseline B/C.
Results of epoch-20-avg-10 decoding with modified_beam_search

 exp_id intro epoch(avg) masked scale unmasked scale spec-aug-max-frames-mask-fraction test-clean   test-other
A baseline no vq distillation 20(10) * * default 0.15 7.09 18.88
B distillation with hubert; using specaugment in lhotse 20(10) 1.0 1.0 default 0.15 5.82 15.98
C weighted masked area 20(10) 1.0 1.0 default 0.15 5.68 15.83
H weighted masked area 20(10) 2.0 1.0 default 0.15 5.68 16.01
I weighted masked area 20(10) 2.0 1.0 0.3 5.76 15.94
G weighted masked area 20(10) 4.0 1.0 default 0.15 5.83 16.37
K weighted masked area 20(10) 4.0 1.0 0.3 5.92 16.47
L weighted masked area 20(10) 0.5 1.0 default 0.15 5.7 15.68
M weighted masked area 20(10) 0.5 1.0 0.3 5.66 15.89

Results of epoch-30-avg-10 decoding with modified_beam_search

 exp_id intro epoch(avg) masked scale unmasked scale spec-aug-max-frames-mask-fraction test-clean   test-other
A baseline no vq distillation 30(10) * * default 0.15 6.03 18.19
B distillation with hubert; using specaugment in lhotse 30(10) 1.0 1.0 default 0.15 5.52 15.15
C weighted masked area 30(10) 1.0 1.0 default 0.15 5.28 14.90
H weighted masked area 30(10) 2.0 1.0 default 0.15 5.38 14.81
I weighted masked area 30(10) 2.0 1.0 0.3 5.45 14.90
G weighted masked area 30(10) 4.0 1.0 default 0.15 5.57 15.24
K weighted masked area 30(10) 4.0 1.0 0.3 5.63 15.32
L weighted masked area 30(10) 0.5 1.0 default 0.15 5.4 14.88
M weighted masked area 30(10) 0.5 1.0 0.3 5.48 15.04

type=float,
default=0.15,
)

group.add_argument(
"--spec-aug-time-warp-factor",
type=int,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I notice the default time_warp_factor is not -1, i.e. there is time warping by default. Are you disabling this via the command line by setting --spec-aug-time-warp-factor==1? Because otherwise, it seems to me that the codebook loss will be wrong because we're not time-warping the codebook indexes (this would be tricky to do anyway, because they can't really be interpolated).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See

--spec-aug-time-warp-factor -1 \

Copy link
Collaborator

@csukuangfj csukuangfj Jun 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend asserting its value being -1 in train.py in case users forgot to set it to -1.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now add it since this: #469

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants