You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello,
I have been working on training a FastSpeech2 model for the Malagasy language and encountered issues with the output quality. The synthesized voice is unintelligible despite successfully completing the training process. Below is an outline of the steps I've taken and the model configuration.
Steps Taken:
Created a corpus of Malagasy (~19 hours of audio).
Aligned the data using the Montreal Forced Aligner (MFA).
Used a custom text cleaner for the Malagasy language.
Ran the prepare_align and preprocess steps successfully.
Modified the pinyin.py and cmudict.py files to add Malagasy phonemes.
Trained the model for 21,000 steps.
Using HiFi-GAN as the vocoder with the universal speaker setting.
Configured pitch and energy features at the phoneme level with normalization set to true.
Pitch Losses ranged from 1.1 to 5.17.
Energy Losses ranged from 0.55 to 0.9.
Could the unintelligibility be caused by high pitch loss during training? If so, what would be the best way to address this in terms of configuration or data preparation?
The text was updated successfully, but these errors were encountered:
Hello,
I have been working on training a FastSpeech2 model for the Malagasy language and encountered issues with the output quality. The synthesized voice is unintelligible despite successfully completing the training process. Below is an outline of the steps I've taken and the model configuration.
Steps Taken:
Using HiFi-GAN as the vocoder with the universal speaker setting.
Configured pitch and energy features at the phoneme level with normalization set to true.
Pitch Losses ranged from 1.1 to 5.17.
Energy Losses ranged from 0.55 to 0.9.
Could the unintelligibility be caused by high pitch loss during training? If so, what would be the best way to address this in terms of configuration or data preparation?
The text was updated successfully, but these errors were encountered: