-
-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Coqui AI and Tortoise TTS #1106
Conversation
# Conflicts: # .gitignore
Tortoise needs more testing by not me. I don't have the hardware to run it alongside any of the models I have. As per discussion in #885, this has 3 Tortoise implementations: official, fast, and MRQ |
@da3dsoul I used Coqui AI for some time but I'm not super impressed with the quality so far. Maybe I didn't use it in the right way. Tortoise TTS' quality is unmatched so far. Even by 11Labs. I have an M2 Max with 96GB RAM and I'd be happy to test anything on this hardware as long as support for Mac is improved. (I use Tortoise TTS fast, but only in a Colab instance because on a Mac, even with my SOTA hardware, it's excruciatingly slow).
|
I can't do anything about Mac support, since a lot of this stuff uses CUDA (nvidia GPU acceleration). I found the quality of Coqui varies quite a lot depending on which model you use, as it has several choices. The speed and minimum requirements, as well.... |
Coqui is good enough.. it generates a ok-ish voice in a few seconds. Tortoise would need its own GPU.. it's not even about memory, it just takes a while. |
I wonder if Coqui updated and moved something |
Huh which model is that? It does tell you what to check. BigVGAN is missing. I've never heard of that, but that's what it says |
Can confirm this is not the case. I installed fresh and it worked |
Thats just what it says out of the box trying to load the addon in the webui, I have no idea what it is referencing, but its hardcoded in the vocoder.py?
|
I downloaded pytorch_model.bin from here and threw it in the models folder, but that didnt change anything either. |
There should, yes, and that's why I'm confused |
Okay so ive done a complete reinstall of ooba, and its working much better now, it even grabbed the model by itself which is nice, and correctly installed those dependencies. Im now getting the following error when I generate text, though, despite the extension actually loading in the ui and everything else seeming like its playing nice. It does indeed generate text, but no audio. |
I'll see if I can make that happen and let you know. Can you show what extensions you have loaded? |
I created a brand new ooba specifically just for coqui also, and ive run that separately (not both at once) in their own little conda environments and that says this
Yeah, so for Tortoise_tts_fast is literally just that and I think I had gallery checked also. So to reiterate, these are brand new installs of the oobabooga webui with nothing in them except a small model for testing and the relevant extension |
update: This seems really janky man, are you trying to merge this pull request? This is absolutely not ready for prime time. Have you only tested this on your own machine? Does this literally only work in your environment? |
Ok I'll take a look |
As per the issue linked, quite a few people have used it with success. Is the current state still good? By your experience, the answer is no. This PR has been open for 2.5 months, and things change a lot very quickly in this space, so it might need a whole new round of testing since it was last called "ready". |
Yeah that makes a lot of sense |
Any joy? I'm at a loss now, sadly |
Well that error message is indicative of improperly installed tortoise. Probably due to windows and it not being in the right venv. |
# Conflicts: # extensions/silero_tts/script.py
I updated the code based on changes to silero. Coqui fucked my conda environment, so...probably don't try that atm. Due to my conda env being fucked, I'm not testing Tortoise right now. I'll probably do a full reinstall later. @Ph0rk0z if you'd like to test, be my guest. |
I have been using bark and I have coqui as a pip to use in audio-webui, I use the same env for it and textgen and most other "nvidia" things. Thankfully it hasn't broken anything yet. For all of these I have been setting up the environments manually so that they don't install anything I don't want. I think users want the opposite experience where the script does it all for them :P |
I think that it would be really nice if the code here could be abstracted into pipelines similar to what the multimodal extension does. The resulting structure would be something similar to:
The one-click-installer tries to automatically install the requirements for every built-in extension, and it would fail for tortoise. By compartmentalizing it like this, we could leave only the requirements for silero and elevenlabs in Additional TTS extensions like edge_tts in #3199 and bark (external) could be adapted for this framework. The caveat is that for TTS, additional UI elements are required by each pipeline, like the API key for elevenlabs. So the framework would have to have its own I'll try to do it eventually, but if @da3dsoul wants to beat me to it that would be helpful. |
Maybe. I'll look at the multi-modal pipeline thing |
Thought I would test it out, but I haven't had much luck with llama 2 models and this working. Updated some dependencies and it says it makes a message, but all I see is it saying No error that I can see, but no output. Added in some debug printing and it ran into Followed by the GPT keys. Edit: Appears to be a problem with transformers, 4.31.0 seems to have a regression that breaks the tortoise models and prevents loading, but it is required for llama V2 and later models. Using the tortoise part currently locks you out of the new models, but using any other tts locks you out of voice flexibility. |
Good to know it doesn't work. I need to reinstall my whole workspace to fix things, as I'm currently stuck in dependency hell |
I'm closing this in favor of #4673. I don't want to include tortoise as XTTSv2 seems to be better overall. I don't know if the preprocessing code here applies to the new model; if so, a new PR would be welcome. |
That's fair. We can maybe re-evaluate Coqui later. Considering Coqui is what messed up my build, I'm pretty eh on it. |
As per #885, Coqui has Pros and Cons. It's already written, so PRing it. More soon (tm). When this is accepted, I'll try to write up some more info into the wiki....and figure out how to do that.