Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MFA Validate Inconsistent Output #782

Open
shreeshailgan opened this issue Mar 19, 2024 · 2 comments
Open

MFA Validate Inconsistent Output #782

shreeshailgan opened this issue Mar 19, 2024 · 2 comments

Comments

@shreeshailgan
Copy link

shreeshailgan commented Mar 19, 2024

I am running mfa validate on the LibriTTS-train-clean-460 dataset using an IPA dictionary I have. The output contains:

WARNING  288196total OOV tokens       

However, in the generated oov_counts.txt file that is generated (see snapshot below), the sum of the counts in the 2nd column is 32,905. Shouldn't these two numbers be equal? If not, what does 288,196 represent?

--and	151
phoenix	104
--the	99
--a	88
--i	77
--but	67
ion	65
...
@mmcauliffe
Copy link
Member

Are you passing configuration options that remove punctuation symbols? What's the full command you're running and what version are you on?

@shreeshailgan
Copy link
Author

MFA version
montreal-forced-aligner 3.0.1 pyhd8ed1ab_0 conda-forge

Full command
mfa validate /path/to/data/ /path/to/lexicon --ignore_acoustics --num_jobs 48

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants