-
Notifications
You must be signed in to change notification settings - Fork 301
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Not for merge]Different scale for codebook indexes from mask/unmasked area #395
base: master
Are you sure you want to change the base?
Conversation
OK, interesting, thanks! |
Sure, will try this after server recovers. |
Conclusions: The impact of changing "masked scale" or "spec-aug-max-frames-max-fraction" is smaller than training noise. Because WER of Exp C < Exp H/I < Exp B. Results of epoch-20-avg-10 decoding with modified_beam_search
Results of epoch-30-avg-10 decoding with modified_beam_search
|
OK, cool. Perhaps you can try a scale of 4.0 instead of 2.0 on masked area. |
Exp G/K gets slightly worse results with higher scale, but not significantly. @danpovey
Results of epoch-30-avg-10 decoding with modified_beam_search
|
ok, cool. |
Exp L / M are trained with scale for masked area == 0.5 @danpovey
Results of epoch-30-avg-10 decoding with modified_beam_search
|
type=float, | ||
default=0.15, | ||
) | ||
|
||
group.add_argument( | ||
"--spec-aug-time-warp-factor", | ||
type=int, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I notice the default time_warp_factor is not -1, i.e. there is time warping by default. Are you disabling this via the command line by setting --spec-aug-time-warp-factor==1? Because otherwise, it seems to me that the codebook loss will be wrong because we're not time-warping the codebook indexes (this would be tricky to do anyway, because they can't really be interpolated).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See
--spec-aug-time-warp-factor -1 \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would recommend asserting its value being -1 in train.py
in case users forgot to set it to -1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
now add it since this: #469
@danpovey Conclusion: Codebook indexes in un-masked area seems quite necessary for training.
All following results are decoded with
--epoch 20, --avg 10, --max-duration 200
Note: In theory, Exp B and Exp C are the same. The only difference is about class
SpecAugment
.Exp B used that in lhotse.
Exp C used a modified version in
pruned_transducer_stateless6/aug.py
which could return the masked area.For Exp D/E/F without learning codebook indexes in un-masked area, WERs are even worse than baseline Exp A.
For Exp G, after using un-masked area codebook indexes again, results are better than baseline Exp A.
But it's still worse than Exp B/C, which tread masked/un-masked codebook indexes equally.