-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Document how to build speech synthesis system for new languages #18
Comments
Hi thank you for this work. |
Yes. I think you will need to annotate the accent information (rising/failing tones) as the HTS-style label. |
Hi, 山本さん。I'm trying to synthesize mandarin using your tool. To my knowledge, i need to do forced alignment manually in previous. And then writing a frontend that adapts the language to extract linguistic features. So does that mean i only need to replace the frontend part? And could i using other forced alignment tools such as "montreal" at other alignment level which is neither 'state' nor 'phone', for example, 'syllable'? |
こんにちは、 @attitudechunfeng !
Yes, you can reuse other parts. You can also reuse a part of frontend (https://r9y9.github.io/nnmnkwii/latest/references/frontend.html#frontend) to convert your linguistic features to its numeric representation at either phone, state or frame-level if you use the HTS-style label format.
You could, but then you cannot reuse https://r9y9.github.io/nnmnkwii/latest/references/frontend.html#frontend, since it assumes state or phone-level alignment. |
本当にありがとう!I'll try it. |
Alternatively, you could consider end-to-end approach, which doesn't require alignment as well as linguistic feature extraction (the hard part of the TTS!). See https://github.com/r9y9/deepvoice3_pytorch if you are interested. |
Thank u. In fact, i'm also following your other excellent tts projects. However, i'm now trying do some work about offline usage, end-to-end models are not convenient to be transferred to mobile devices and its speed on cpu is also can't be guaranteed. So i have to use traditional method. |
I see. I hope you find something useful. Let me know if you find something should be improved. |
okay, if there're something interesting, i'll report it. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
I have wav files of punjabi language. Please guide me to generate full context labels and HTS-style question file |
All you need is that
With those all prepared, it should be very straightforward to implement.
The text was updated successfully, but these errors were encountered: