Coqui AI and Tortoise TTS #1106

da3dsoul · 2023-04-12T20:38:53Z

As per #885, Coqui has Pros and Cons. It's already written, so PRing it. More soon (tm). When this is accepted, I'll try to write up some more info into the wiki....and figure out how to do that.

…euse

# Conflicts: # .gitignore

da3dsoul · 2023-04-15T02:11:39Z

Tortoise needs more testing by not me. I don't have the hardware to run it alongside any of the models I have. As per discussion in #885, this has 3 Tortoise implementations: official, fast, and MRQ

system1system2 · 2023-04-17T12:12:30Z

@da3dsoul I used Coqui AI for some time but I'm not super impressed with the quality so far. Maybe I didn't use it in the right way.

Tortoise TTS' quality is unmatched so far. Even by 11Labs.

I have an M2 Max with 96GB RAM and I'd be happy to test anything on this hardware as long as support for Mac is improved. (I use Tortoise TTS fast, but only in a Colab instance because on a Mac, even with my SOTA hardware, it's excruciatingly slow).

Tortoise needs more testing by not me. I don't have the hardware to run it alongside any of the models I have. As per discussion in #885, this has 3 Tortoise implementations: official, fast, and MRQ

da3dsoul · 2023-04-17T12:53:07Z

I can't do anything about Mac support, since a lot of this stuff uses CUDA (nvidia GPU acceleration). I found the quality of Coqui varies quite a lot depending on which model you use, as it has several choices. The speed and minimum requirements, as well....

Ph0rk0z · 2023-04-17T14:13:24Z

Coqui is good enough.. it generates a ok-ish voice in a few seconds. Tortoise would need its own GPU.. it's not even about memory, it just takes a while.

da3dsoul · 2023-06-28T19:15:08Z

I wonder if Coqui updated and moved something

Urammar · 2023-06-28T19:29:55Z

Also as far as Tortoise

I am really having no luck at all!

da3dsoul · 2023-06-28T19:36:04Z

Huh which model is that? It does tell you what to check. BigVGAN is missing. I've never heard of that, but that's what it says

da3dsoul · 2023-06-28T19:37:02Z

I wonder if Coqui updated and moved something

Can confirm this is not the case. I installed fresh and it worked

Urammar · 2023-06-28T19:42:18Z

Huh which model is that? It does tell you what to check. BigVGAN is missing. I've never heard of that, but that's what it says

Thats just what it says out of the box trying to load the addon in the webui, I have no idea what it is referencing, but its hardcoded in the vocoder.py?

import torch
import torch.nn as nn
import torch.nn.functional as F

import json
from enum import Enum
from typing import Optional, Callable
from dataclasses import dataclass
try:
    from BigVGAN.models import BigVGAN as BVGModel
    from BigVGAN.env import AttrDict
except ImportError:
    raise ImportError(
        "BigVGAN not installed, can't use BigVGAN vocoder\n"
        "Please see the installation instructions on README."
    )

MAX_WAV_VALUE = 32768.0

Urammar · 2023-06-28T19:46:01Z

I downloaded pytorch_model.bin from here and threw it in the models folder, but that didnt change anything either.

Urammar · 2023-06-28T19:48:25Z

Literally all I did was click this and apply and restart interface

Also, ideally, shouldnt there be some kind of model loader box, dropdown, or something? Instead of just erroring out in the terminal? I get this is a work in progress

da3dsoul · 2023-06-28T21:35:05Z

There should, yes, and that's why I'm confused

Urammar · 2023-06-30T16:43:15Z

Okay so ive done a complete reinstall of ooba, and its working much better now, it even grabbed the model by itself which is nice, and correctly installed those dependencies.

Im now getting the following error when I generate text, though, despite the extension actually loading in the ui and everything else seeming like its playing nice.

It does indeed generate text, but no audio.

da3dsoul · 2023-06-30T17:00:55Z

I'll see if I can make that happen and let you know. Can you show what extensions you have loaded?

Urammar · 2023-06-30T17:24:03Z

I created a brand new ooba specifically just for coqui also, and ive run that separately (not both at once) in their own little conda environments and that says this

I'll see if I can make that happen and let you know. Can you show what extensions you have loaded?

Yeah, so for Tortoise_tts_fast is literally just that and I think I had gallery checked also.

So to reiterate, these are brand new installs of the oobabooga webui with nothing in them except a small model for testing and the relevant extension

Urammar · 2023-06-30T17:31:31Z

update:
After rebooting the webui again trying to launch tortoisettsfast, im now getting totally different errors.

This seems really janky man, are you trying to merge this pull request? This is absolutely not ready for prime time. Have you only tested this on your own machine? Does this literally only work in your environment?

da3dsoul · 2023-06-30T17:31:38Z

Ok I'll take a look

da3dsoul · 2023-06-30T17:41:03Z

This seems really janky man, are you trying to merge this pull request? This is absolutely not ready for prime time. Have you only tested this on your own machine? Does this literally only work in your environment?

As per the issue linked, quite a few people have used it with success. Is the current state still good? By your experience, the answer is no. This PR has been open for 2.5 months, and things change a lot very quickly in this space, so it might need a whole new round of testing since it was last called "ready".

Urammar · 2023-06-30T17:41:29Z

Yeah that makes a lot of sense

Urammar · 2023-06-30T19:51:33Z

Any joy? I'm at a loss now, sadly

Ph0rk0z · 2023-07-24T11:09:29Z

Well that error message is indicative of improperly installed tortoise. Probably due to windows and it not being in the right venv.

# Conflicts: # extensions/silero_tts/script.py

da3dsoul · 2023-07-24T14:30:48Z

I updated the code based on changes to silero. Coqui fucked my conda environment, so...probably don't try that atm. Due to my conda env being fucked, I'm not testing Tortoise right now. I'll probably do a full reinstall later. @Ph0rk0z if you'd like to test, be my guest.

Ph0rk0z · 2023-07-25T12:31:39Z

I have been using bark and I have coqui as a pip to use in audio-webui, I use the same env for it and textgen and most other "nvidia" things. Thankfully it hasn't broken anything yet. For all of these I have been setting up the environments manually so that they don't install anything I don't want. I think users want the opposite experience where the script does it all for them :P

oobabooga · 2023-07-25T14:51:24Z

I think that it would be really nice if the code here could be abstracted into pipelines similar to what the multimodal extension does. The resulting structure would be something similar to:

extensions
└── tts
    ├── pipelines
    │   ├── coqui
    │   ├── elevenlabs
    │   ├── silero
    │   ├── tortoise
    │   ├── tortoise_fast
    │   └── tortoise_mrq
    └── script.py
    └── requirements.txt

The one-click-installer tries to automatically install the requirements for every built-in extension, and it would fail for tortoise. By compartmentalizing it like this, we could leave only the requirements for silero and elevenlabs in extensions/tts/requirements.txt.

Additional TTS extensions like edge_tts in #3199 and bark (external) could be adapted for this framework.

The caveat is that for TTS, additional UI elements are required by each pipeline, like the API key for elevenlabs. So the framework would have to have its own ui() sub-functions.

I'll try to do it eventually, but if @da3dsoul wants to beat me to it that would be helpful.

da3dsoul · 2023-07-25T14:53:41Z

Maybe. I'll look at the multi-modal pipeline thing

78Alpha · 2023-08-01T20:24:50Z

Thought I would test it out, but I haven't had much luck with llama 2 models and this working. Updated some dependencies and it says it makes a message, but all I see is it saying Loading autoregressive model: models/tortoise\autoregressive.pth

No error that I can see, but no output.
Updated dependencies were numba, librosa, and transformers. All to latest as of today. I turned off the unload LLM model and turned off Low VRAM to try to get it to load both, but I am still sitting at 9.1 GB / 24 GB so it seems the model never finished loading.

Added in some debug printing and it ran into Error(s) in loading state_dict for UnifiedVoice: Unexpected key(s) in state_dict:

Followed by the GPT keys.

Edit:

Appears to be a problem with transformers, 4.31.0 seems to have a regression that breaks the tortoise models and prevents loading, but it is required for llama V2 and later models. Using the tortoise part currently locks you out of the new models, but using any other tts locks you out of voice flexibility.

da3dsoul · 2023-08-01T20:49:55Z

Good to know it doesn't work. I need to reinstall my whole workspace to fix things, as I'm currently stuck in dependency hell

oobabooga · 2023-11-21T01:56:56Z

I'm closing this in favor of #4673.

I don't want to include tortoise as XTTSv2 seems to be better overall. I don't know if the preprocessing code here applies to the new model; if so, a new PR would be welcome.

da3dsoul · 2023-11-21T02:11:00Z

That's fair. We can maybe re-evaluate Coqui later. Considering Coqui is what messed up my build, I'm pretty eh on it.
I can probably do some work on generifying the pre-preprocessing like I did here, maybe improving it further. We'll see if I get some time to play with it

da3dsoul added 9 commits April 12, 2023 16:31

Coqui TTS

c7ff817

Coqui AI TTS normalization. Extract the normalization into core for r…

8f08d1f

…euse

Fix Chat mode vs mode in Coqui, as was done to Silero

527ac07

Tortoise TTS extension and install script

2a400dc

Merge branch 'oobabooga:main' into main

57362f3

Merge branch 'oobabooga:main' into tortoise

e1a6e31

Tortoise TTS extension improvements. Tortoise TTS Fast included

8e3f367

Tortoise TTS extension improvements. Tortoise TTS MRQ added

fc8a85b

Merge branch 'tortoise' into main

948e418

# Conflicts: # .gitignore

da3dsoul changed the title ~~Coqui AI TTS~~ Coqui AI and Tortoise TTS Apr 15, 2023

Merge remote-tracking branch 'upstream/main' into main

7654b9d

This was referenced Apr 15, 2023

Add the ability to select a narration voice in the Silero_tts extension #1213

Closed

Advanced voice control for Silero TTS #956

Closed

da3dsoul added 10 commits April 15, 2023 16:31

Tortoise TTS models dir from args and voice dir setting

113c270

Tortoise TTS More settings. Models dir is pulled from args

4af035a

Tortoise TTS Tuning Settings

ab451ac

Tortoise TTS Fix some error

9c2328c

Tortoise TTS MRQ Low VRAM option. Coqui Voice Clone path setting

81b48b2

Remove extra file

f4eb5f3

Fix model dir

7851525

Merge remote-tracking branch 'upstream/main' into main

5838e89

Update gitignore. sd_api_pictures won't have wav

7d6f0b1

Put the config.yaml back

12f1170

oobabooga added the extensions Pull requests concerning extensions and not the core functionality of the web UI. label Apr 19, 2023

da3dsoul added 2 commits April 20, 2023 23:38

MRQ UI and fixes

db35316

Remove Test Values

c351a1f

6561368163 mentioned this pull request Jul 24, 2023

Tortoise needs more testing by not me. I don't have the hardware to run it alongside any of the models I have. As per discussion in #885, this has 3 Tortoise implementations: official, fast, and MRQ #3277

Closed

da3dsoul added 2 commits July 24, 2023 09:42

Merge remote-tracking branch 'upstream/main'

f5f345c

# Conflicts: # extensions/silero_tts/script.py

Update (Needs testing)

97721a1

oobabooga removed the Pending clarifications label Jul 25, 2023

oobabooga closed this Nov 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Coqui AI and Tortoise TTS #1106

Coqui AI and Tortoise TTS #1106

da3dsoul commented Apr 12, 2023

da3dsoul commented Apr 15, 2023

system1system2 commented Apr 17, 2023 •

edited

Loading

da3dsoul commented Apr 17, 2023

Ph0rk0z commented Apr 17, 2023

da3dsoul commented Jun 28, 2023

Urammar commented Jun 28, 2023

da3dsoul commented Jun 28, 2023

da3dsoul commented Jun 28, 2023

Urammar commented Jun 28, 2023 •

edited

Loading

Urammar commented Jun 28, 2023

Urammar commented Jun 28, 2023 •

edited

Loading

da3dsoul commented Jun 28, 2023

Urammar commented Jun 30, 2023 •

edited

Loading

da3dsoul commented Jun 30, 2023

Urammar commented Jun 30, 2023

Urammar commented Jun 30, 2023

da3dsoul commented Jun 30, 2023

da3dsoul commented Jun 30, 2023

Urammar commented Jun 30, 2023

Urammar commented Jun 30, 2023

Ph0rk0z commented Jul 24, 2023

da3dsoul commented Jul 24, 2023

Ph0rk0z commented Jul 25, 2023

oobabooga commented Jul 25, 2023

da3dsoul commented Jul 25, 2023

78Alpha commented Aug 1, 2023 •

edited

Loading

da3dsoul commented Aug 1, 2023

oobabooga commented Nov 21, 2023

da3dsoul commented Nov 21, 2023

Coqui AI and Tortoise TTS #1106

Coqui AI and Tortoise TTS #1106

Conversation

da3dsoul commented Apr 12, 2023

da3dsoul commented Apr 15, 2023

system1system2 commented Apr 17, 2023 • edited Loading

da3dsoul commented Apr 17, 2023

Ph0rk0z commented Apr 17, 2023

da3dsoul commented Jun 28, 2023

Urammar commented Jun 28, 2023

da3dsoul commented Jun 28, 2023

da3dsoul commented Jun 28, 2023

Urammar commented Jun 28, 2023 • edited Loading

Urammar commented Jun 28, 2023

Urammar commented Jun 28, 2023 • edited Loading

da3dsoul commented Jun 28, 2023

Urammar commented Jun 30, 2023 • edited Loading

da3dsoul commented Jun 30, 2023

Urammar commented Jun 30, 2023

Urammar commented Jun 30, 2023

da3dsoul commented Jun 30, 2023

da3dsoul commented Jun 30, 2023

Urammar commented Jun 30, 2023

Urammar commented Jun 30, 2023

Ph0rk0z commented Jul 24, 2023

da3dsoul commented Jul 24, 2023

Ph0rk0z commented Jul 25, 2023

oobabooga commented Jul 25, 2023

da3dsoul commented Jul 25, 2023

78Alpha commented Aug 1, 2023 • edited Loading

da3dsoul commented Aug 1, 2023

oobabooga commented Nov 21, 2023

da3dsoul commented Nov 21, 2023

system1system2 commented Apr 17, 2023 •

edited

Loading

Urammar commented Jun 28, 2023 •

edited

Loading

Urammar commented Jun 28, 2023 •

edited

Loading

Urammar commented Jun 30, 2023 •

edited

Loading

78Alpha commented Aug 1, 2023 •

edited

Loading