Unintelligible Voice after Training Malagasy Corpus #239

Tiana-Andria · 2024-10-16T09:26:52Z

Hello,
I have been working on training a FastSpeech2 model for the Malagasy language and encountered issues with the output quality. The synthesized voice is unintelligible despite successfully completing the training process. Below is an outline of the steps I've taken and the model configuration.

Steps Taken:

Created a corpus of Malagasy (~19 hours of audio).
Aligned the data using the Montreal Forced Aligner (MFA).
Used a custom text cleaner for the Malagasy language.
Ran the prepare_align and preprocess steps successfully.
Modified the pinyin.py and cmudict.py files to add Malagasy phonemes.
Trained the model for 21,000 steps.

Using HiFi-GAN as the vocoder with the universal speaker setting.
Configured pitch and energy features at the phoneme level with normalization set to true.

Pitch Losses ranged from 1.1 to 5.17.
Energy Losses ranged from 0.55 to 0.9.

Could the unintelligibility be caused by high pitch loss during training? If so, what would be the best way to address this in terms of configuration or data preparation?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unintelligible Voice after Training Malagasy Corpus #239

Unintelligible Voice after Training Malagasy Corpus #239

Tiana-Andria commented Oct 16, 2024

Unintelligible Voice after Training Malagasy Corpus #239

Unintelligible Voice after Training Malagasy Corpus #239

Comments

Tiana-Andria commented Oct 16, 2024