Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using an image as visual prompt for generation #118

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -165,4 +165,5 @@ models/
# Cog
.cog

gradio_cached_examples
gradio_cached_examples
*.ipynb
28 changes: 28 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,11 @@ InstantID is a new state-of-the-art tuning-free method to achieve ID-Preserving
<img src="assets/0.png">
</p>

With prompt images
<p align="center">
<img src="assets/visual_prompts_example.png">
</p>

### Comparison with Previous Works

<p align="center">
Expand Down Expand Up @@ -167,6 +172,29 @@ To save VRAM, you can enable CPU offloading
pipe.enable_model_cpu_offload()
```

## Using visual prompt
It will help determine the overall color palette and style of the image.
```python
# Load and resize image for visual prompt (the size should be the same as face_image)
visual_prompt = load_image("./examples/visual_prompts/boke.jpg")
face_w, face_h = face_image.size
visual_prompt = visual_prompt.resize((face_w, face_h))

# generate image
image = pipe(
prompt,
negative_prompt=negative_prompt,
image_embeds=face_emb,
image=face_kps,
visual_prompt=visual_prompt,
visual_prompt_strength=0.1, # visual prompt strength gives best result in range (0.05 - 0.3)
num_inference_steps=30,
num_images_per_prompt=1,
controlnet_conditioning_scale=0.8,
ip_adapter_scale=0.8,
).images[0]
```

## Speed Up with LCM-LoRA

Our work is compatible with [LCM-LoRA](https://github.com/luosiallen/latent-consistency-model). First, download the model.
Expand Down
Binary file added assets/visual_prompts_example.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added examples/visual_prompts/aquarelle.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added examples/visual_prompts/boke.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading