-
Notifications
You must be signed in to change notification settings - Fork 297
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add MMI training with word pieces as modelling unit. #6
Conversation
IIRC the alignments from alimdl did not help before when we checked them in snowfall; do you expect different results with the current setup? |
We are thinking they might help training get started, where that's a problem, e.g. for MMI with BPE. |
I just started the MMI training with pre-computed alignments. The tensorboard logs are:
Without attention decoderIt throws the following warnings at some point (after several hundred batches): At some other point, it stops printing the above warnings and the MMI loss starts to decrease: You can see that pre-computed alignment is helpful to make the training converge. |
The best WER I get for this pull request is Training without attention decoder(decoding using whole-lattice-rescoring, i.e., HLG 1-best decoding + 4-gram LM rescoring)
Training attention decoder(decoding using attention decoder for rescoring)
LF-MMI + attention decoder seems not as good as CTC + attention decoder. Let's merge it first since it contains code for integrating framewise alignment information into training, which can |
@@ -0,0 +1,356 @@ | |||
# Copyright 2021 Piotr Żelasko |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there any change in this file? Was it supposed to be a symlink like the other asr_datamodule.py
in conformer_ctc
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is the same as the one in conformer_ctc and tdnn_lstm. I should have placed a symlink here.
@@ -142,69 +205,66 @@ def tokens(self) -> List[int]: | |||
return ans | |||
|
|||
|
|||
class BpeLexicon(Lexicon): | |||
class UniqLexicon(Lexicon): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it named UniqLexicon
? Not sure how to interpret it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uniq
here means each word in the lexicon has only one pronunciation, i.e., a unique pronunciation.
In BPE based lexicons, each word can be decomposed in a deterministic way.
In phone based lexicons, if a word has more than one pronunciation, there are scripts to keep only the first one.
func = _compute_mmi_loss_pruned | ||
else: | ||
func = _compute_mmi_loss_exact_non_optimized | ||
# func = _compute_mmi_loss_exact_optimized |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this intended to be commented out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the non_optimized
version is easier to understand and consumes less memory.
Will post the results once they are available.