-
-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extension: Stable Diffusion Api integration #309
Conversation
Lets the bot answer you with a picture!
I couldn't test the extension so far, probably because I don't have the 'NeverEndingDream' model installed. I will try again later. |
I couldn't test the extension so far, probably because I don't have the 'NeverEndingDream' model installed. I will try again later.
:D
It's gonna use the last model something was generated with by default; you don't even need to have any particular one present.
I haven't really even implemented the ability to specify a model because it's a separate API call
|
Yeass!! I was able to get this to work, but I had to remove the "modules.py" and "modules-1.0.0.dist-info" folder from my textgen environment for it to work. I'm 'running on windows without wsl. |
Yeah modules is listed in the list of requirements for the extension but it will conflict with the modules/ folder in the textgen webui directory. Please consider removing this in a commit. |
This sounds great! We just need a little bit more information to avoid guessing at how to get them to communicate. Here's my Stable Diffusion launch line: And here's my Ooba text gen launch: I don't think this will make them talk. Both programs are running on the same machine in the same browser in two different tabs. How should those lines read to allow textgen to utilize Stable Diffusion? And if I need to know my local machine's IP, how do I do that? If you can answer those questions, maybe we could put the answers in the wiki so people don't bug you about it. |
Yea, VRAM probably is the problem, you cant really host 2 VRAM eaters in the same consumer machine. That's another reason for moving the chatting AI to CPU (supported by AVX2) like the llama.cpp (https://github.com/ggerganov/llama.cpp) /alpaca.app (https://github.com/antimatter15/alpaca.cpp) projects, so we consume RAM instead of VRAM. But text-generation-webui seems not supported yet and some people are working on the integration: #447 If that's done, I guess this extension would be more usable in average consumer machines. |
You don't; if you're running them on the same machine you can use a special address |
Ok, so my problem seems to be with Auto having SSL as I am getting a "[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate" Error, any suggestions? |
@JohnWJarrett |
My TGWI params are (When trying to use SD along side)
and my WUI params are
And yes, i use the SSL addon for WUI so yeah, through https |
Thanks! I haven't yet tested if the API works correctly when used through https, and that probably is the root cause of the issue. I'll try to look for the fix for https in the meantime |
r we able to use this when we're not in cai mode? |
I am working on trying integration of some multimodal models like mm-cot or nvidia prismer currently. Maybe it would be possible to have a common interface for picture handling? Both receiving and sending. |
@Brawlence, yeah, it get's past the cert error if I disable the SSL, but then I got a different error, one that I actually have a solution for... So, seeing as I am using a different port than WUI's default, I just copied and pasted the new url (http://127.0.0.1:8880/) into the settings on TGW, which I am guessing you might see the issue, or you might not, I didn't for about an hour until I was looking into the log and tried, on a whim, to do "localhost:8880" which gave me this error
which is when I noticed the "8880//sdapi", so I think you should truncate the trailing "/" in the IP if the user accidentally leaves it there, it was a thing I overlooked and I'm sure I wont be the only one, it's a stupid user error, sure, but I'd imagine it'd be an easy fix? I don't know, I hate Python with a passion so I never bothered learning it that much. But other than that, yeah, it works fine, even on my 8GB GFX, I am not gonna try push it for anything over 256 images, but then again, I don't really need to, it's more just for the extra fun than anything. EDIT: Also, while playing around, and this is just some general info for anyone who was wondering, you can put a LoRa into the "Prompt Prefix" and it will work, which would be good for getting a very consistent character. |
@Brawlence I've made some updates that I'd be happy to share! Now one can optionally use 'subject' and 'pronoun' that will replace Also added a What I'd like to do next is actually read this information out of the I'm also able to get SD models working, but unfortunately I can't find where in the SD API that allows you to set the model. |
Ooh yes please I'd like to try your updates out 😁 |
Thanks for your feedback! Here's a preview for the upcoming change: It's gonna strip the
But yes one can! I have already tested the memory juggling feature (see #471 and AUTOMATIC1111/stable-diffusion-webui/pull/8780) and if both of those patches are accepted then it would be possible to:
— all at the cost of ~20 additional seconds spent on shuffling models around. I've already tested it on my machine and it works. Demo
Of course, I'd be more than happy to have llama.cpp implemented as well, more options are always better |
My current opinion: while llama.cpp uses CPU and RAM, and SD uses GPU and VRAM. These two will not conflict with each other. For now, llama cares more about the size of the RAM/VRAM and GPU acceleration is not obvious, and in most PCs RAM is much larger than VRAM. |
Hi, I am having some trouble in getting this extension to work. I always get the same error. File "C:\Users\user\text-generation-webui\extensions\sd_api_pictures\script.py", line 85, in get_SD_pictures Now, it seems that the key "Images" in the directory r is not existing. How can I fix this? (I am new to github, sorry if I posted this in the wrong place. I don't find the same issue under issues. Thank you for your answer. |
@Andy-Goodheart it looks like the SD API is not responding to your requests. Make sure that the IP and port under "Stable Diffusion host address" are correct and that SD is started with the |
@oobabooga Thanks a lot! =) That solved it for me. I didn't have the --api Argument in the webui-user.bat file. |
@DerAlo Hmmmmm. What SD model do you use? Try generating the description verbatim in Auto1111's interface, what do you get there? For me, such pictures are usually generated either when the model tries to do something it was not trained on OR when CFG_scale is set too high |
Its strange - in 1111's interface everythin' is fine.. Model is 'SD_model': 'sd-v1-4' and cfg is at 7.... i really dont get it^^ but thx 4 ur reply :) |
anyone could write a tutorial for this extension, i can't strat this without error (rtx 3070) :( |
@francoisatt what's the error, what's the parameters on the launch, what models do you use and how much VRAM you got? |
I have this extension running and it seems like it is working as intended- However, the output .PNG images do not have any Stable Diffusion metadata., which is very unfortunate. |
@ItsOkayItsOfficial |
Try to reorder the Google translation extension after the sd-api or if that does not help, removing it (even for a while to test) I have a hunch that it's messing with the outputted text which is needed for SD-api to work |
Could you open this as an issue? I'll look if it's possible to get metadata sent via the API too |
@Brawlence I would like to understand a little better how the prompting works, though. I can't seem to get the extension to output images that are corresponding with the context/theme of the conversation. E.g: I assume it just sent “Alright. Let's see if you are impressed.” as a prompt to SD, without using any of the prompt prefixes or negative prompts that I configured in the settings below the chat interface. Is there any way I can actually see the full prompts that are being communicated? Is it logged somewhere (either in SD or in the text-generation-webui) where I can see what it does (and does not) in order to understand what it's trying to do and how to get a better result? |
@tandpastatester the easiest way right now is to open the auto1111's WebUI (even with the currently unloaded model) and click the 'show previous generation parameters' button (the leftmost one under the 'Generate/Stop' on the txt2img tab I'm currently figuring out how to solve for #920 which would shed light on this issue as well. In the meantime, try changing the prefix to include more tags for the character OR forcing the generation on something you predict to be very descriptive, it usually does way better on long outputs than on short ones |
@tandpastatester That looks like you are not using the correct VAE in Stable Diffusion. You can see what is being sent by SD underneather the picture that is returned in oobabooga. It is a combination of:
Unfortunately, you can't really get detailed generations like what you're looking for. I was trying to work on something that could make them a little more customizable, but, like @Brawlence, I cannot seem to figure out how to get the name of the currently used bot anywhere from within this project. EDIT: Wow... I just checked ffd102e and everything I was doing will have to be reworked. I can't keep up with this stuff. |
Extension: Stable Diffusion Api integration
Here's a trick: You can use a lora of your character if you have one and put it in the prompt. That will make a stable character when you ask it to send photos of itself. You can also use png information to give you the prompt which generated the character, so you'll get a good start of what your character should look like. |
This is awesome...wish i could use it though :-( I'm using a 7.16gb model on a 8gb gpu. |
That's literally already implemented |
It is? Great news! |
Is anyone aware of an equivalent that uses ComfyUI for the backend? |
If you do this don't let your computer access the internet, I don't think this is very secure idk...But I like to use the webui on my mobile devices on my network and need to run the site as an https://ip:port so I can use things like the microphone on the mobile device. The best web browser I've found to work is Opera, you need to use the --ssl-keyfile and --ssl-certfile flags in the CMD_FLAGS.txt file, https://github.com/oobabooga/text-generation-webui?tab=readme-ov-file#gradio I used this website: https://regery.com/en/security/ssl-tools/self-signed-certificate-generator to create the keys and certs, downloaded them, and put the location of the files after the appropriate flags. These keys and certs are self signed, they normally come from a trusted external sources, so when you try to access via a web browser you will get a warning that the certs are not recognized. |
Hi. Did you ended up figuring out how to fix this issue? |
Description:
Lets the bot answer you with a picture!
Load it in the
--cai-chat
mode with--extension sd_api_pictures
alongsidesend_pictures
(it's not really required, but completes the picture).If enabled, the image generation is triggered either:
'send | mail | me'
are detected simultaneously with'image | pic | picture | photo'
One needs an available instance of Automatic1111's webui running with an --api flag. Ain't tested with a notebook / cloud hosted one but should be possible. I'm running it locally in parallel on the same machine as the textgen-webui. One also needs to specify custom
--listen-port
if he's gonna run everything locally.For the record, 12 GB VRAM is barely enough to run NeverEndingDream 512×512 fp16 and LLaMA-7b in 4bit precision.
TODO: We should really think about a way to juggle models around RAM and VRAM for this project to work on lower VRAM cards.
Extension interface
Don't mind the Windranger Arcana key in the Prompt Prefix, that's just the name of an embedding I trained beforehand.
Demonstrations:
Conversation 1
Conversation 2