TTS re-write #72

Kostis-S-Z · 2024-12-18T13:05:32Z

What's changing

The goal of this PR is to rewrite the TTS component to enable easily adding support for more models.

In the process, the following changes / additions were made:

Add unified TTS model interface to hide away complexity of the different tts models: TTSModel
Create registry for tts model loading functions: TTS_LOADERS
Create registry for tts model inference functions: TTS_INFERENCE
Validate config.yaml's text_to_speech_model with validate_text_to_speech_model
Add support for the Bark models and the parler multi-lingual and indic models
Re-write test_load_tts_model to use parametrization
Remove support for parler and bark models

Closes #29

How to test it

Steps to test the changes:

Clone the repository: git clone https://github.com/Kostis-S-Z/document-to-podcast.git
Move to the directory: cd document-to-podcast
Change branch: git checkout multilingual-support
Install the package: pip install -e .
Pick a model and languages from the list below (under Model IDs / Languages tested:)
Edit example_data/config.yaml (or create a copy) and change

input_file: Use a file that has text in a language of your choice
text_to_speech_model: Use one of the model ids, defined below, based on the language you are testing
text_to_text_prompt: Re-write it / Translate it in the testing language
speaker/description: Re-write it / Translate it in the testing language
voice_profile: Use one of the pre-defined profiles based on the testing language, from the list below

document-to-podcast --from_config example_data/config.yaml
If you don't want to wait for the whole podcast to be generated you can stop it mid-way, by pressing Ctrl+C in the terminal that you are running the process. It will stop and save the script and audio up until that point!
Verify by checking podcast.txt and podcast.wav

Model IDs / Languages tested:

Additional notes for reviewers

Its expected that some languages will work better than others. Its also a common issue that the voice pattern might not be consistent across the speaker rounds (maybe Speaker 1 at first sound in one way, and then their voice might change)

Full list of languages supported:

OuteTTS-0.2-500M
- English, Korean, Japanese, Chinese
suno/bark
- English, German, Spanish, French, Hindi, Italian, Japanese, Korean, Polish, Portuguese, Russian, Turkish, Chinese
ai4bharat/indic-parler-tts
- Assamese, Bengali, Bodo, Chhattisgarhi, Dogri, English, Gujarati, Hindi, Kannada, Malayalam, Manipuri, Marathi, Nepali, Odia, Punjabi, Sanskrit, Tamil, Telugu
parler-tts/parler-tts-mini-multilingual-v1.1
- Dutch, French, German, Italian, Polish, Portuguese, Spanish

I already...

Tested the changes in a working environment to ensure they work as expected
Added some tests for any new functionality
Updated the documentation (both comments in code and under /docs)

pyproject.toml

src/document_to_podcast/inference/text_to_speech.py

example_data/config_bark.yaml

Co-authored-by: David de la Iglesia Castro <[email protected]>

stefanfrench · 2025-01-02T16:28:24Z

@Kostis-S-Z Did some testing on this, summarized below:

French: This worked well. Locally I did get an error: ValueError: Cannot instantiate this tokenizer from a slow version. If it's based on sentencepiece, make sure you have sentencepiece installed.
However runningpip install sentencepiece resolved it. This may have been due to how my parler-tts was installed locally. I tried to see if this error also exists on codespaces, however the codespace instance was termintated before it got to that stage, so I wasn't able to confirm.
Spanish: Tested this with suno/bark - worked with no issues.
Hindi: I tried changing the prompt to Hindi, but it still generated in English, so wasn't able to test the audio part. I think this could be a limitation of the Olmoe model. Perhaps we drop the ai4bharat/indic-parler-tts model unless you were able to get it working.
Korean: I got the following error with the OuteTTS-0.2-500M:
Value error, Model OuteTTS-0.2-500M is missing a loading function. Please define it under model_loaders.py [type=value_error, input_value='OuteTTS-0.2-500M', input_type=str]

Kostis-S-Z · 2025-01-07T11:01:08Z

@Kostis-S-Z Did some testing on this, summarized below:

Thanks for testing @stefanfrench !

* **French**: This worked well. Locally I did get an error: `ValueError: Cannot instantiate this tokenizer from a slow version. If it's based on sentencepiece, make sure you have sentencepiece installed.`
  However running`pip install sentencepiece` resolved it. This may have been due to how my parler-tts was installed locally. I tried to see if this error also exists on codespaces, however the codespace instance was termintated before it got to that stage, so I wasn't able to confirm.

Good that you documented this! However I didnt manage to reproduce it...

* **Spanish**: Tested this with suno/bark - worked with no issues.

💯

* **Hindi**: I tried changing the prompt to Hindi, but it still generated in English, so wasn't able to test the audio part. I think this could be a limitation of the Olmoe model. Perhaps we drop the `ai4bharat/indic-parler-tts` model unless you were able to get it working.

Yeah, its Olmo not being able to generate hindi script that is the limitation here. I would still keep the model in case someone else experiments with another LLM that does work with Hindi though.

* **Korean**: I got the following error with the `OuteTTS-0.2-500M`:
  `Value error, Model OuteTTS-0.2-500M is missing a loading function. Please define it under model_loaders.py [type=value_error, input_value='OuteTTS-0.2-500M', input_type=str]`

Right, you had to use the complete model id: OuteAI/OuteTTS-0.2-500M-GGUF/OuteTTS-0.2-500M-FP16.gguf instead of just OuteTTS-0.2-500M. I should have made it clearer in the PR description. However, I was still unable to generate Korean audio with this model, even though Korean script worked fine with bartowski/Meta-Llama-3.1-8B-Instruct-GGUF/Meta-Llama-3.1-8B-Instruct-Q4_K_L.gguf...

…lingual-support

demo/app.py

pyproject.toml

src/document_to_podcast/inference/model_loaders.py

docs/step-by-step-guide.md

Kostis-S-Z added 13 commits December 17, 2024 16:09

[WIP] Add bark and parler multi support

286a93c

Add config files for other models to easily test across models

14b69bf

Use model loading wrapper function for download_models.py

20ab8e9

Make sure transformers>4.31.0 (required for bark model)

ee38e10

Add parler dependency

890c684

Use TTSModelWrapper for demo code

8cc7b0d

Use TTSModelWrapper for cli

dcbb254

Add outetts_language attribute

b0d40bc

Add TTSModelWrapper

5e47b1e

Update text_to_speech.py

945c44f

Pass model-specific variables as **kwargs

4565fb8

Rename TTSModelWrapper to TTSInterface

01d0e7a

Update language argument to kwargs

5af3e72

daavoo reviewed Dec 18, 2024

View reviewed changes

pyproject.toml Outdated Show resolved Hide resolved

src/document_to_podcast/inference/text_to_speech.py Outdated Show resolved Hide resolved

example_data/config_bark.yaml Outdated Show resolved Hide resolved

daavoo reviewed Dec 18, 2024

View reviewed changes

example_data/config_bark.yaml Outdated Show resolved Hide resolved

Kostis-S-Z and others added 2 commits December 18, 2024 14:33

Remove parler from dependencies

e3a3f17

Co-authored-by: David de la Iglesia Castro <[email protected]>

Merge branch 'mozilla-ai:main' into multilingual-support

a918574

Kostis-S-Z self-assigned this Dec 18, 2024

Kostis-S-Z linked an issue Dec 18, 2024 that may be closed by this pull request

Add multi-langage support #29

Closed

Kostis-S-Z mentioned this pull request Dec 18, 2024

Enable user to exit podcast generation gracefully #74

Merged

3 tasks

Kostis-S-Z added 9 commits December 19, 2024 11:55

Separate inference from TTSModel

fb814fa

Make sure config model is properly registered

672c0e0

Decouple loading & inference of TTS model

28b02b8

Decouple loading & inference of TTS model

b489e0d

Enable user to exit podcast generation gracefully

dc89668

Add Q2 Oute version to TTS_LOADERS

0d143eb

Add comment for support in TTS_INFERENCE

e9ca498

Update test_model_loaders.py

47112a0

Update test_text_to_speech.py

ec0fe5a

Kostis-S-Z marked this pull request as ready for review December 19, 2024 13:40

Kostis-S-Z requested a review from daavoo December 30, 2024 12:25

Kostis-S-Z mentioned this pull request Dec 30, 2024

Multilingual support Kostis-S-Z/document-to-podcast#1

Merged

Kostis-S-Z added 6 commits January 9, 2025 14:59

Change default model to 500M

8c1291f

Remove support for bark and parler models

a8ead71

Update docs

80586cf

Remove unused code

cb69c74

Remove parler dep from tests

33a240c

Update docs

b45cd1b

Kostis-S-Z requested review from a team and removed request for a team January 9, 2025 13:05

Kostis-S-Z changed the title ~~Multilingual support~~ TTS re-write Jan 9, 2025

Kostis-S-Z and others added 5 commits January 9, 2025 16:28

Update from upstream main

9b8451b

Lint

7af94d6

Merge branch 'main' into multilingual-support

424d013

Lint

c907b96

Merge remote-tracking branch 'origin/multilingual-support' into multi…

14d6aa0

…lingual-support

daavoo reviewed Jan 9, 2025

View reviewed changes

demo/app.py Outdated Show resolved Hide resolved

Lint

9aee09a

daavoo reviewed Jan 9, 2025

View reviewed changes

pyproject.toml Outdated Show resolved Hide resolved

daavoo reviewed Jan 9, 2025

View reviewed changes

src/document_to_podcast/inference/model_loaders.py Show resolved Hide resolved

daavoo approved these changes Jan 9, 2025

View reviewed changes

Kostis-S-Z added 2 commits January 9, 2025 17:49

Remove transformers dependency

eadb84d

Update from upstream

e93ad38

stefanfrench approved these changes Jan 9, 2025

View reviewed changes

docs/step-by-step-guide.md Outdated Show resolved Hide resolved

Kostis-S-Z added 2 commits January 9, 2025 19:14

Remove parler reference from docs

fc41b8a

Update from upstream/main

0aacda7

Kostis-S-Z merged commit 528bb24 into mozilla-ai:main Jan 10, 2025
3 checks passed

Kostis-S-Z deleted the multilingual-support branch January 10, 2025 13:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TTS re-write #72

TTS re-write #72

Kostis-S-Z commented Dec 18, 2024 •

edited

Loading

stefanfrench commented Jan 2, 2025

Kostis-S-Z commented Jan 7, 2025

TTS re-write #72

TTS re-write #72

Conversation

Kostis-S-Z commented Dec 18, 2024 • edited Loading

What's changing

How to test it

Additional notes for reviewers

Full list of languages supported:

I already...

stefanfrench commented Jan 2, 2025

Kostis-S-Z commented Jan 7, 2025

Kostis-S-Z commented Dec 18, 2024 •

edited

Loading