Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TTS re-write #72

Merged
merged 43 commits into from
Jan 10, 2025
Merged

TTS re-write #72

merged 43 commits into from
Jan 10, 2025

Conversation

Kostis-S-Z
Copy link
Member

@Kostis-S-Z Kostis-S-Z commented Dec 18, 2024

What's changing

The goal of this PR is to rewrite the TTS component to enable easily adding support for more models.

In the process, the following changes / additions were made:

  • Add unified TTS model interface to hide away complexity of the different tts models: TTSModel
  • Create registry for tts model loading functions: TTS_LOADERS
  • Create registry for tts model inference functions: TTS_INFERENCE
  • Validate config.yaml's text_to_speech_model with validate_text_to_speech_model
  • Add support for the Bark models and the parler multi-lingual and indic models
  • Re-write test_load_tts_model to use parametrization
  • Remove support for parler and bark models

Closes #29

How to test it

Steps to test the changes:

  1. Clone the repository: git clone https://github.com/Kostis-S-Z/document-to-podcast.git
  2. Move to the directory: cd document-to-podcast
  3. Change branch: git checkout multilingual-support
  4. Install the package: pip install -e .
  5. Pick a model and languages from the list below (under Model IDs / Languages tested:)
  6. Edit example_data/config.yaml (or create a copy) and change
  • input_file: Use a file that has text in a language of your choice
  • text_to_speech_model: Use one of the model ids, defined below, based on the language you are testing
  • text_to_text_prompt: Re-write it / Translate it in the testing language
  • speaker/description: Re-write it / Translate it in the testing language
  • voice_profile: Use one of the pre-defined profiles based on the testing language, from the list below
  1. document-to-podcast --from_config example_data/config.yaml
  2. If you don't want to wait for the whole podcast to be generated you can stop it mid-way, by pressing Ctrl+C in the terminal that you are running the process. It will stop and save the script and audio up until that point!
  3. Verify by checking podcast.txt and podcast.wav

Model IDs / Languages tested:

  • parler-tts/parler-tts-mini-multilingual-v1.1
    • Portuguese ( Sophia & Nicholas)
    • Dutch ( Mark & Jessica)
    • French ( Daniel & Christine)
    • German ( Nicole & Michelle)
    • Italian ( Julia & Richard)
    • Polish ( Alex & Natalie)
    • Spanish ( Steven & Olivia)
  • ai4bharat/indic-parler-tts
    • Hindi ( Rohit & Divya)
    • Telugu ( Prakash & Lalitha)
  • suno/bark
    • Spanish ( v2/es_speaker_0 & v2/es_speaker_8)
  • OuteTTS-0.2-500M
    • Korean (female_1 & male_1)

Additional notes for reviewers

Its expected that some languages will work better than others. Its also a common issue that the voice pattern might not be consistent across the speaker rounds (maybe Speaker 1 at first sound in one way, and then their voice might change)

Full list of languages supported:

  • OuteTTS-0.2-500M
    • English, Korean, Japanese, Chinese
  • suno/bark
    • English, German, Spanish, French, Hindi, Italian, Japanese, Korean, Polish, Portuguese, Russian, Turkish, Chinese
  • ai4bharat/indic-parler-tts
    • Assamese, Bengali, Bodo, Chhattisgarhi, Dogri, English, Gujarati, Hindi, Kannada, Malayalam, Manipuri, Marathi, Nepali, Odia, Punjabi, Sanskrit, Tamil, Telugu
  • parler-tts/parler-tts-mini-multilingual-v1.1
    • Dutch, French, German, Italian, Polish, Portuguese, Spanish

I already...

  • Tested the changes in a working environment to ensure they work as expected
  • Added some tests for any new functionality
  • Updated the documentation (both comments in code and under /docs)

pyproject.toml Outdated Show resolved Hide resolved
src/document_to_podcast/inference/text_to_speech.py Outdated Show resolved Hide resolved
example_data/config_bark.yaml Outdated Show resolved Hide resolved
@Kostis-S-Z Kostis-S-Z self-assigned this Dec 18, 2024
@Kostis-S-Z Kostis-S-Z linked an issue Dec 18, 2024 that may be closed by this pull request
@Kostis-S-Z Kostis-S-Z marked this pull request as ready for review December 19, 2024 13:40
@stefanfrench
Copy link
Contributor

@Kostis-S-Z Did some testing on this, summarized below:

  • French: This worked well. Locally I did get an error: ValueError: Cannot instantiate this tokenizer from a slow version. If it's based on sentencepiece, make sure you have sentencepiece installed.
    However runningpip install sentencepiece resolved it. This may have been due to how my parler-tts was installed locally. I tried to see if this error also exists on codespaces, however the codespace instance was termintated before it got to that stage, so I wasn't able to confirm.

  • Spanish: Tested this with suno/bark - worked with no issues.

  • Hindi: I tried changing the prompt to Hindi, but it still generated in English, so wasn't able to test the audio part. I think this could be a limitation of the Olmoe model. Perhaps we drop the ai4bharat/indic-parler-tts model unless you were able to get it working.

  • Korean: I got the following error with the OuteTTS-0.2-500M:
    Value error, Model OuteTTS-0.2-500M is missing a loading function. Please define it under model_loaders.py [type=value_error, input_value='OuteTTS-0.2-500M', input_type=str]

@Kostis-S-Z
Copy link
Member Author

@Kostis-S-Z Did some testing on this, summarized below:

Thanks for testing @stefanfrench !

* **French**: This worked well. Locally I did get an error: `ValueError: Cannot instantiate this tokenizer from a slow version. If it's based on sentencepiece, make sure you have sentencepiece installed.`
  However running`pip install sentencepiece` resolved it. This may have been due to how my parler-tts was installed locally. I tried to see if this error also exists on codespaces, however the codespace instance was termintated before it got to that stage, so I wasn't able to confirm.

Good that you documented this! However I didnt manage to reproduce it...

* **Spanish**: Tested this with suno/bark - worked with no issues.

💯

* **Hindi**: I tried changing the prompt to Hindi, but it still generated in English, so wasn't able to test the audio part. I think this could be a limitation of the Olmoe model. Perhaps we drop the `ai4bharat/indic-parler-tts` model unless you were able to get it working.

Yeah, its Olmo not being able to generate hindi script that is the limitation here. I would still keep the model in case someone else experiments with another LLM that does work with Hindi though.

* **Korean**: I got the following error with the `OuteTTS-0.2-500M`:
  `Value error, Model OuteTTS-0.2-500M is missing a loading function. Please define it under model_loaders.py [type=value_error, input_value='OuteTTS-0.2-500M', input_type=str]`

Right, you had to use the complete model id: OuteAI/OuteTTS-0.2-500M-GGUF/OuteTTS-0.2-500M-FP16.gguf instead of just OuteTTS-0.2-500M. I should have made it clearer in the PR description. However, I was still unable to generate Korean audio with this model, even though Korean script worked fine with bartowski/Meta-Llama-3.1-8B-Instruct-GGUF/Meta-Llama-3.1-8B-Instruct-Q4_K_L.gguf...

@Kostis-S-Z Kostis-S-Z requested review from a team and removed request for a team January 9, 2025 13:05
@Kostis-S-Z Kostis-S-Z changed the title Multilingual support TTS re-write Jan 9, 2025
demo/app.py Outdated Show resolved Hide resolved
pyproject.toml Outdated Show resolved Hide resolved
docs/step-by-step-guide.md Outdated Show resolved Hide resolved
@Kostis-S-Z Kostis-S-Z merged commit 528bb24 into mozilla-ai:main Jan 10, 2025
3 checks passed
@Kostis-S-Z Kostis-S-Z deleted the multilingual-support branch January 10, 2025 13:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add multi-langage support
3 participants