Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compatibility with Playground v2.5 #5

Open
edixiong opened this issue Jul 9, 2024 · 4 comments
Open

Compatibility with Playground v2.5 #5

edixiong opened this issue Jul 9, 2024 · 4 comments

Comments

@edixiong
Copy link

edixiong commented Jul 9, 2024

Thanks for the great repo. I am working on integrating this pipeline with Playground v2.5 (https://huggingface.co/playgroundai/playground-v2.5-1024px-aesthetic), which appears to use the same pipeline as SDXL (StableDiffusionXLPipeline). Directly loading the model using .from_pretrained will not cause error but the output image has some artifact (of being gray). Do you have any clue on what modification need to be done here?

download-11

@RoyiRa
Copy link
Owner

RoyiRa commented Jul 9, 2024 via email

@edixiong
Copy link
Author

edixiong commented Jul 9, 2024

Hi Roy, Thanks for the reply. The output image is the result of p2p pipeline. Using the pipeline for normal generation works fine pipe = DiffusionPipeline.from_pretrained().

However, with the P2P pipeline(https://github.com/RoyiRa/prompt-to-prompt-with-sdxl/blob/main/prompt_to_prompt_pipeline.py), the output image is being gray. For example with the following code:

import torch
import numpy as np
import matplotlib.pyplot as plt
from prompt_to_prompt_pipeline import Prompt2PromptPipeline
from datasets import load_dataset
import random

seed = 10002
p1 = 0.6
p2 = 0.6
g_cpu = torch.Generator().manual_seed(seed)

device = torch.device('cuda:0') if torch.cuda.is_available() else torch.device('cpu')
if device.type == "cuda":
    pipe = Prompt2PromptPipeline.from_pretrained("playgroundai/playground-v2.5-1024px-aesthetic",
    torch_dtype=torch.float16,
    variant="fp16").to(device)
else:
    raise RuntimeError

prompts = ["a pink bear riding a bicycle on the beach", "a pink dragon riding a bicycle on the beach"]
cross_attention_kwargs = {"edit_type": "replace",
                          "n_self_replace": 0.4,
                          "n_cross_replace": {"default_": 1.0, "dragon": 0.4},
                          }


image = pipe(prompts, cross_attention_kwargs=cross_attention_kwargs, generator=g_cpu)
print(f"Num images: {len(image['images'])}")
from IPython.display import display
for img in image['images']:
    display(img)

The output image is being gray, like the one I shared before.

The normal SDXL pipeline will produce normal image like below:
download-12

@RoyiRa
Copy link
Owner

RoyiRa commented Jul 9, 2024

I didn't come across it when I worked on the pipeline, but my intuition is that it is related to the VAE and maybe the use of float16. Did you try looking at these?

@edixiong
Copy link
Author

edixiong commented Jul 9, 2024

Thanks for your suggestion. However I don't think it's VAE or float16 causing the issue. The VAE is loaded correctly (AutoencoderKL) and the normal pipeline also uses float16.

The code that uses the normal pipeline to create the normal image is

pipe = DiffusionPipeline.from_pretrained(
    "playgroundai/playground-v2.5-1024px-aesthetic",
    torch_dtype=torch.float16,
    variant="fp16",
).to("cuda")
prompt = "a pink bear riding a bicycle on the beach"
image = pipe(prompt=prompt, num_inference_steps=50).images[0]

and it uses float16 as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants