add XTTSv2 #4673

kanttouchthis · 2023-11-20T04:58:21Z

Checklist:

I have read the Contributing guidelines.

Description

adds XTTSv2 for multilingual TTS with voice cloning.
Installation needs to be tested further but seems to work on windows. Dependencies may cause conflicts.
Edit: example

Merge dev branch

TeuMasaki · 2023-11-20T13:39:50Z

It seems that this implementation fails with ZeroDivisionError when there are unpronounceable sequences in the generation.

['She pauses, watching you make your way over to the chair and collapse into it with relief.']
Processing time: 1.938103199005127
Real-time factor: 0.2928350478159128
Text splitted to sentences.

Processing time: 0.0
Traceback (most recent call last):
File "D:\oobabooga\text-generation-webui\installer_files\env\Lib\site-packages\gradio\queueing.py", line 407, in call_prediction
output = await route_utils.call_process_api(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\oobabooga\text-generation-webui\installer_files\env\Lib\site-packages\gradio\route_utils.py", line 226, in call_process_api
output = await app.get_blocks().process_api(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\oobabooga\text-generation-webui\installer_files\env\Lib\site-packages\gradio\blocks.py", line 1550, in process_api
result = await self.call_function(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\oobabooga\text-generation-webui\installer_files\env\Lib\site-packages\gradio\blocks.py", line 1199, in call_function
prediction = await utils.async_iteration(iterator)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\oobabooga\text-generation-webui\installer_files\env\Lib\site-packages\gradio\utils.py", line 519, in async_iteration
return await iterator.anext()
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\oobabooga\text-generation-webui\installer_files\env\Lib\site-packages\gradio\utils.py", line 512, in anext
return await anyio.to_thread.run_sync(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\oobabooga\text-generation-webui\installer_files\env\Lib\site-packages\anyio\to_thread.py", line 33, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\oobabooga\text-generation-webui\installer_files\env\Lib\site-packages\anyio_backends_asyncio.py", line 877, in run_sync_in_worker_thread
return await future
^^^^^^^^^^^^
File "D:\oobabooga\text-generation-webui\installer_files\env\Lib\site-packages\anyio_backends_asyncio.py", line 807, in run
result = context.run(func, *args)
^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\oobabooga\text-generation-webui\installer_files\env\Lib\site-packages\gradio\utils.py", line 495, in run_sync_iterator_async
return next(iterator)
^^^^^^^^^^^^^^
File "D:\oobabooga\text-generation-webui\installer_files\env\Lib\site-packages\gradio\utils.py", line 649, in gen_wrapper
yield from f(*args, **kwargs)
File "D:\oobabooga\text-generation-webui\modules\chat.py", line 342, in generate_chat_reply_wrapper
for i, history in enumerate(generate_chat_reply(text, state, regenerate, _continue, loading_message=True)):
File "D:\oobabooga\text-generation-webui\modules\chat.py", line 310, in generate_chat_reply
for history in chatbot_wrapper(text, state, regenerate=regenerate, _continue=_continue, loading_message=loading_message):
File "D:\oobabooga\text-generation-webui\modules\chat.py", line 278, in chatbot_wrapper
output['visible'][-1][1] = apply_extensions('output', output['visible'][-1][1], state, is_chat=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\oobabooga\text-generation-webui\modules\extensions.py", line 224, in apply_extensions
return EXTENSION_MAP[typ](*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\oobabooga\text-generation-webui\modules\extensions.py", line 82, in _apply_string_extensions
text = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "D:\oobabooga\text-generation-webui\extensions\XTTSv2\script.py", line 153, in output_modifier
return tts_narrator(string)
^^^^^^^^^^^^^^^^^^^^
File "D:\oobabooga\text-generation-webui\extensions\XTTSv2\script.py", line 135, in tts_narrator
tts.tts_to_file(text=turn,
File "D:\oobabooga\text-generation-webui\installer_files\env\Lib\site-packages\TTS\api.py", line 403, in tts_to_file
wav = self.tts(text=text, speaker=speaker, language=language, speaker_wav=speaker_wav, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\oobabooga\text-generation-webui\installer_files\env\Lib\site-packages\TTS\api.py", line 341, in tts
wav = self.synthesizer.tts(
^^^^^^^^^^^^^^^^^^^^^
File "D:\oobabooga\text-generation-webui\installer_files\env\Lib\site-packages\TTS\utils\synthesizer.py", line 492, in tts
print(f" > Real-time factor: {process_time / audio_time}")
~~~~~~~~~~~~~^~~~~~~~~~~~
ZeroDivisionError: float division by zero```

kanttouchthis · 2023-11-20T14:02:26Z

do you know what the text was?

Dampfinchen · 2023-11-20T14:47:42Z

Nice job! I've noticed XTTSv2 also supports streaming. Do you think its possible to use it conjunction with token streaming or have it generated immediately after one sentence is finished? Since the TTS model keeps being in VRAM, using it simultaneously with text generation should be possible.

TeuMasaki · 2023-11-20T15:22:29Z

do you know what the text was?

It was a stop token '</s>' after the asterisk '*' causing the problem. It does work normally when there is a non asterisk prefixed stop token though.

*Mishka explains her understanding of the Chinese city based on your description.*</s>

> Text splitted to sentences.
['Mishka explains her understanding of the Chinese city based on your description.']
Processing time: 1.7906074523925781
> Real-time factor: 0.3192755719146748
Text splitted to sentences.
> Processing time: 0.0
Traceback (most recent call last):
...

oobabooga · 2023-11-21T01:37:32Z

I made the structure more similar to silero_tts and made some various fixes. I think that this looks pretty good now and it's working reliably.

@kanttouchthis I ended up removing the narrator feature for simplicity and will accept your PR to text-generation-webui-extensions for people who want to try it.

The only remaining issue is that the TTS library apparently re-downloads the model every time instead of using the existing cache. I'll merge this PR and try to find a solution to that in a future one.

kanttouchthis · 2023-11-21T05:52:42Z

The model cache issue was fixed in TTS 0.20.6

erew123 · 2023-11-22T02:04:53Z

I'm seeing some oddity with the asterisk issue mentioned above. It causes the TTS to generate 2-4 seconds of audio strange sounds or sometimes cut out some of the speech, before restarting a sentence or two later.

What you see in the web interface
*This is a narrative description.* "This is the character speaking."

What you see if you look at the command prompt/text generation
"*This is a narrative description.", '*', '"This is the character speaking."'

I've listened to quite a few generations now and looked at quite a lot of the command prompt/terminal and best I can tell, its when that asterisk gets split/broken out. Im not sure if its specific to some models or just a general issue.

I have a suspicion that its also badly impacting generation time, as generations that seem to suffer this issue, seem to take a bit longer to process, even though the actual audio output isn't specifically any longer.

I'm on the current build of the coqui_tts extension (at time of writing).

ElhamAhmedian · 2023-11-22T07:22:40Z

Which loader should be used in the extension?

Thanks

allenhs · 2023-11-22T07:25:38Z

Coqui also supports using different voices for the narrator etc. Can this feature be added? Said feature already exists in the extension located here: https://github.com/kanttouchthis/text_generation_webui_xtts

aios-ai · 2023-11-22T12:16:39Z

Nice job! I've noticed XTTSv2 also supports streaming. Do you think its possible to use it conjunction with token streaming or have it generated immediately after one sentence is finished? Since the TTS model keeps being in VRAM, using it simultaneously with text generation should be possible.

I'd also love to see that, but I think there is more to it then just calling the tts engines streaming mode. I added a feature request in regards of this topic and also describes the difference in text-generation streaming and tts streaming which needs to be made compatible: #4706

Maybe the text-generation team can comment on it but I guess it makes sense to have a dedicated issue for this topic.

erew123 · 2023-11-24T01:46:49Z

@oobabooga @kanttouchthis Please could you have a look at this #4712 I have found a solution to speeding up speech generation for people who have a low VRAM situation. I've written some code (badly) that works, but, not actually being a coder, someone would need to integrate it properly into the script.py properly (and tidy up the code).

Thanks

morozig · 2024-03-23T05:48:47Z

Hi guys! Can you please tell where those voices came from? Are they creative commons licensed in any way? I'm wandering if I can use them in a video game.

101100 · 2024-04-30T21:20:31Z

@oobabooga I'm also curious about the source of the voice files.

oobabooga and others added 11 commits November 13, 2023 11:39

Merge pull request #4579 from oobabooga/dev

454fcf3

Merge dev branch

Merge pull request #4606 from oobabooga/dev

2337aeb

Merge dev branch

Merge pull request #4608 from oobabooga/dev

8a2af87

Merge dev branch

Merge pull request #4627 from oobabooga/dev

0ee8d2b

Merge dev branch

Merge pull request #4628 from oobabooga/dev

f889302

Merge dev branch

Merge pull request #4632 from oobabooga/dev

3146124

Merge dev branch

Merge pull request #4660 from oobabooga/dev

d1bba48

Merge dev branch

Merge pull request #4662 from oobabooga/dev

22e7a22

Merge dev branch

Merge pull request #4664 from oobabooga/dev

f11092a

Merge dev branch

add XTTSv2

d51a989

fix installation

64a1c1d

oobabooga added 8 commits November 20, 2023 11:51

Move the folder XTTSv2 -> xttsv2

84478f9

Sort imports

4a8f834

Move the folder xttsv2 -> coqui_tts

9b66e97

Make the requirements just TTS==0.20.*

62d32b7

Style changes

eab1049

Make structure more similar to silero_tts + multiple fixes

ca270cf

Minor bug fix

334fabe

Warn people about installing the requirements

4d096e4

oobabooga merged commit 8dc9ec3 into oobabooga:dev Nov 21, 2023

oobabooga mentioned this pull request Nov 21, 2023

Coqui AI and Tortoise TTS #1106

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add XTTSv2 #4673

add XTTSv2 #4673

kanttouchthis commented Nov 20, 2023 •

edited

Loading

TeuMasaki commented Nov 20, 2023

kanttouchthis commented Nov 20, 2023

Dampfinchen commented Nov 20, 2023 •

edited

Loading

TeuMasaki commented Nov 20, 2023 •

edited

Loading

oobabooga commented Nov 21, 2023

kanttouchthis commented Nov 21, 2023

erew123 commented Nov 22, 2023 •

edited

Loading

ElhamAhmedian commented Nov 22, 2023

allenhs commented Nov 22, 2023

aios-ai commented Nov 22, 2023 •

edited

Loading

erew123 commented Nov 24, 2023

morozig commented Mar 23, 2024

101100 commented Apr 30, 2024

add XTTSv2 #4673

add XTTSv2 #4673

Conversation

kanttouchthis commented Nov 20, 2023 • edited Loading

Checklist:

Description

TeuMasaki commented Nov 20, 2023

kanttouchthis commented Nov 20, 2023

Dampfinchen commented Nov 20, 2023 • edited Loading

TeuMasaki commented Nov 20, 2023 • edited Loading

oobabooga commented Nov 21, 2023

kanttouchthis commented Nov 21, 2023

erew123 commented Nov 22, 2023 • edited Loading

ElhamAhmedian commented Nov 22, 2023

allenhs commented Nov 22, 2023

aios-ai commented Nov 22, 2023 • edited Loading

erew123 commented Nov 24, 2023

morozig commented Mar 23, 2024

101100 commented Apr 30, 2024

kanttouchthis commented Nov 20, 2023 •

edited

Loading

Dampfinchen commented Nov 20, 2023 •

edited

Loading

TeuMasaki commented Nov 20, 2023 •

edited

Loading

erew123 commented Nov 22, 2023 •

edited

Loading

aios-ai commented Nov 22, 2023 •

edited

Loading