Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unintelligible Voice after Training Malagasy Corpus #239

Open
Tiana-Andria opened this issue Oct 16, 2024 · 0 comments
Open

Unintelligible Voice after Training Malagasy Corpus #239

Tiana-Andria opened this issue Oct 16, 2024 · 0 comments

Comments

@Tiana-Andria
Copy link

Hello,
I have been working on training a FastSpeech2 model for the Malagasy language and encountered issues with the output quality. The synthesized voice is unintelligible despite successfully completing the training process. Below is an outline of the steps I've taken and the model configuration.

Steps Taken:

  • Created a corpus of Malagasy (~19 hours of audio).
  • Aligned the data using the Montreal Forced Aligner (MFA).
  • Used a custom text cleaner for the Malagasy language.
  • Ran the prepare_align and preprocess steps successfully.
  • Modified the pinyin.py and cmudict.py files to add Malagasy phonemes.
  • Trained the model for 21,000 steps.

Using HiFi-GAN as the vocoder with the universal speaker setting.
Configured pitch and energy features at the phoneme level with normalization set to true.

Pitch Losses ranged from 1.1 to 5.17.
Energy Losses ranged from 0.55 to 0.9.

Could the unintelligibility be caused by high pitch loss during training? If so, what would be the best way to address this in terms of configuration or data preparation?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant