Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Support AnimateDiff, a popular text2animation method #1980

Merged
merged 60 commits into from
Sep 20, 2023
Merged
Show file tree
Hide file tree
Changes from 51 commits
Commits
Show all changes
60 commits
Select commit Hold shift + click to select a range
15d09e4
first commit for animatediff
ElliotQi Aug 15, 2023
a39aa96
fix lint errors
ElliotQi Aug 15, 2023
955248f
modify readme file and add readme_zh-CN.md
ElliotQi Aug 15, 2023
b46ec12
fix some typos in readme
ElliotQi Aug 15, 2023
016c86e
delete test_animatediff.py
ElliotQi Aug 15, 2023
127c99a
add some docstring
ElliotQi Aug 20, 2023
0aa9655
Merge branch 'open-mmlab:main' into animatediff
ElliotQi Aug 21, 2023
c562050
Merge branch 'open-mmlab:main' into animatediff
ElliotQi Aug 23, 2023
3609ac3
fix cross attention for 512*512 animation quality
ElliotQi Aug 24, 2023
800faae
Merge branch 'open-mmlab:main' into animatediff
ElliotQi Aug 24, 2023
8a092b3
fix some initial setting for cpu load
ElliotQi Sep 2, 2023
6dd1920
add unittest samples
ElliotQi Sep 2, 2023
ec7e191
modify unittest codes
ElliotQi Sep 2, 2023
60cd955
remove duplicated unittest files
ElliotQi Sep 2, 2023
116f42a
modify unittest codes for minimum memory
ElliotQi Sep 2, 2023
e0994cf
modify test_unet3d resolution for minimum memory unittest
ElliotQi Sep 2, 2023
dda8716
modify test_unet_blocks3d input resolution for minimum memory unittest
ElliotQi Sep 2, 2023
87bb203
Merge branch 'open-mmlab:main' into animatediff
ElliotQi Sep 3, 2023
9e0b432
modify animatediff.py for gradio
ElliotQi Sep 3, 2023
55d381c
add gradio app for animatediff
ElliotQi Sep 3, 2023
a10ede2
skip test with large memory
ElliotQi Sep 4, 2023
cde60c6
Merge branch 'main' into animatediff
ElliotQi Sep 4, 2023
276e051
Merge branch 'main' into animatediff
liuwenran Sep 4, 2023
4f54924
fix environment building
ElliotQi Sep 6, 2023
1de46df
Merge branch 'animatediff' of github.com:ElliotQi/mmagic into animate…
ElliotQi Sep 6, 2023
b645f0c
Merge branch 'main' into animatediff
ElliotQi Sep 9, 2023
485fdd2
Merge branch 'main' into animatediff
ElliotQi Sep 11, 2023
76cb637
fix merging conflict
ElliotQi Sep 11, 2023
d67f61a
Merge branch 'open-mmlab:main' into animatediff
ElliotQi Sep 11, 2023
541c2e9
Add different style ckpt
ElliotQi Sep 11, 2023
b476929
Merge branch 'animatediff' of github.com:ElliotQi/mmagic into animate…
ElliotQi Sep 11, 2023
eaeb9a0
Merge branch 'main' into animatediff
liuwenran Sep 11, 2023
3a1de39
fix environment building
ElliotQi Sep 11, 2023
7168ee2
Merge branch 'animatediff' of github.com:ElliotQi/mmagic into animate…
ElliotQi Sep 11, 2023
3502e09
add new motion module
ElliotQi Sep 13, 2023
cf0e50c
Merge branch 'open-mmlab:main' into animatediff
ElliotQi Sep 18, 2023
71102cf
add prompts for all config files in README
ElliotQi Sep 18, 2023
e407e79
add image in README
ElliotQi Sep 18, 2023
ec15c1b
fix sd ckpt auto downloading
ElliotQi Sep 18, 2023
7b51631
remove unused import in test code
ElliotQi Sep 18, 2023
55a2b15
align README_zh and README
ElliotQi Sep 18, 2023
8e33acc
fix building error
ElliotQi Sep 18, 2023
e89b7a4
delete unused comments
ElliotQi Sep 18, 2023
f9b6f2a
fix test memory
ElliotQi Sep 18, 2023
fa16d66
Merge branch 'main' into animatediff
ElliotQi Sep 18, 2023
c7a90e8
fix text_model error for later transformer version
ElliotQi Sep 19, 2023
df00be0
Merge branch 'main' into animatediff
ElliotQi Sep 19, 2023
67c29ca
fix comment copyright
ElliotQi Sep 19, 2023
8cece90
add animatediff gradio README
ElliotQi Sep 19, 2023
db8023d
modify some copyright in motion_module.py
ElliotQi Sep 19, 2023
178a2e8
modify README for better test guidance
ElliotQi Sep 19, 2023
cacfd29
fix inference without xformers and mimsave for higher version of imageio
ElliotQi Sep 19, 2023
28a3437
fix errors in different versions of imageio
ElliotQi Sep 20, 2023
7aed840
Merge branch 'main' into animatediff
ElliotQi Sep 20, 2023
9e30b43
add train tutorial and pretrained models
ElliotQi Sep 20, 2023
e5d611b
fix some comments in README
ElliotQi Sep 20, 2023
08bf694
delete personal information
ElliotQi Sep 20, 2023
cda517c
fix gradio sd selection
ElliotQi Sep 20, 2023
fbf49ec
add some tips for run gradio
ElliotQi Sep 20, 2023
9bad0b9
add pretrained links
ElliotQi Sep 20, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
200 changes: 200 additions & 0 deletions configs/animatediff/README.md
ElliotQi marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
@@ -0,0 +1,200 @@
# AnimateDiff (2023)

> [AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning](https://arxiv.org/abs/2307.04725)

> **Task**: Text2Video

<!-- [ALGORITHM] -->

## Abstract

<!-- [ABSTRACT] -->

With the advance of text-to-image models (e.g., Stable Diffusion) and corresponding personalization techniques such as DreamBooth and LoRA, everyone can manifest their imagination into high-quality images at an affordable cost. Subsequently, there is a great demand for image animation techniques to further combine generated static images with motion dynamics. In this report, we propose a practical framework to animate most of the existing personalized text-to-image models once and for all, saving efforts in model-specific tuning. At the core of the proposed framework is to insert a newly initialized motion modeling module into the frozen text-to-image model and train it on video clips to distill reasonable motion priors. Once trained, by simply injecting this motion modeling module, all personalized versions derived from the same base T2I readily become text-driven models that produce diverse and personalized animated images. We conduct our evaluation on several public representative personalized text-to-image models across anime pictures and realistic photographs, and demonstrate that our proposed framework helps these models generate temporally smooth animation clips while preserving the domain and diversity of their outputs.

<!-- [IMAGE] -->
ElliotQi marked this conversation as resolved.
Show resolved Hide resolved

![512](https://github.com/ElliotQi/mmagic/assets/46469021/54d92aca-dfa9-4eeb-ba38-3f6c981e5399)

## Pretrained models

We use Stable Diffusion's weights provided by HuggingFace Diffusers. You do not have to download the weights manually. If you use Diffusers wrapper, the weights will be downloaded automatically.

This model has several weights including vae, unet and clip. You should download the weights from [stable-diffusion-1.5](https://huggingface.co/runwayml/stable-diffusion-v1-5) and change the 'pretrained_model_path' in config to the weights dir.

| Model | Dataset | Download |
| :-------------------------------------------------------: | :-----: | :-------------------------------------------------------------------------------: |
| [ToonYou](./animatediff_ToonYou.py) | - | [model](https://civitai.com/api/download/models/78775) |
| [Lyriel](./animatediff_Lyriel.py) | - | [model](https://civitai.com/api/download/models/72396) |
| [RcnzCartoon](./animatediff_RcnzCartoon.py) | - | [model](https://civitai.com/api/download/models/71009) |
| [MajicMix](./animatediff_MajicMix.py) | - | [model](https://civitai.com/api/download/models/79068) |
| [RealisticVision](./animatediff_RealisticVision.py) | - | [model](https://civitai.com/api/download/models/29460) |
| [RealisticVision_v2](./animatediff_RealisticVision_v2.py) | - | [model](https://huggingface.co/guoyww/animatediff/resolve/main/mm_sd_v15_v2.ckpt) |

## Quick Start

Running the following codes, you can get a text-generated image.

1. Download [ToonYou](https://civitai.com/api/download/models/78775) and MotionModule checkpoint

```bash
#!/bin/bash

mkdir models && cd models
mkdir Motion_Module && mkdir DreamBooth_LoRA
gdown 1RqkQuGPaCO5sGZ6V6KZ-jUWmsRu48Kdq -O Motion_Module/
gdown 1ql0g_Ys4UCz2RnokYlBjyOYPbttbIpbu -O models/Motion_Module/
wget https://civitai.com/api/download/models/78775 -P DreamBooth_LoRA/ --content-disposition --no-check-certificate
```

2. Modify the config file in `configs/animatediff/animatediff_ToonYou.py`

```python
models_path = {Your Checkpoints Path}
motion_module_cfg=dict(
path={Your MotionModule path}
),
dream_booth_lora_cfg=dict(
type='ToonYou',
path={Your Dreambooth_Lora path},
steps=25,
guidance_scale=7.5)
```

3. Enjoy Text2Video world

```python
from mmengine import Config

ElliotQi marked this conversation as resolved.
Show resolved Hide resolved
from mmagic.registry import MODELS
from mmagic.utils import register_all_modules

import os
import torch
from pathlib import Path
import datetime
from mmagic.models.editors.animatediff import save_videos_grid



register_all_modules()

cfg = Config.fromfile('configs/animatediff/animatediff_ToonYou.py')
animatediff = MODELS.build(cfg.model).cuda()
ElliotQi marked this conversation as resolved.
Show resolved Hide resolved
ElliotQi marked this conversation as resolved.
Show resolved Hide resolved
prompts = [
"best quality, masterpiece, 1girl, looking at viewer, blurry background, upper body, contemporary, dress",

"masterpiece, best quality, 1girl, solo, cherry blossoms, hanami, pink flower, white flower, spring season, wisteria, petals, flower, plum blossoms, outdoors, falling petals, white hair, black eyes,",

"best quality, masterpiece, 1boy, formal, abstract, looking at viewer, masculine, marble pattern",

"best quality, masterpiece, 1girl, cloudy sky, dandelion, contrapposto, alternate hairstyle,"
]

negative_prompts = [
"",
"badhandv4,easynegative,ng_deepnegative_v1_75t,verybadimagenegative_v1.3, bad-artist, bad_prompt_version2-neg, teeth",
"",
"",
]

sample_idx = 0
random_seeds = cfg.randomness['seed']
random_seeds = [random_seeds] if isinstance(random_seeds, int) else list(random_seeds)
samples = []
time_str = datetime.datetime.now().strftime("%Y-%m-%dT%H-%M-%S")
savedir = f"samples/{Path(cfg.model['dream_booth_lora_cfg']['type']).stem}-{time_str}"
os.makedirs(savedir)
for prompt_idx, (prompt, n_prompt, random_seed) in enumerate(zip(prompts, negative_prompts, random_seeds)):
output_dict = animatediff.infer(prompt,negative_prompt=n_prompt, video_length=16, height=512, width=512, seed=random_seed,num_inference_steps=cfg.model['dream_booth_lora_cfg']['steps'])
sample = output_dict['samples']
prompt = "-".join((prompt.replace("/", "").split(" ")[:10]))
save_videos_grid(sample, f"{savedir}/sample/{sample_idx}-{prompt}.gif")
print(f"save to {savedir}/sample/{prompt}.gif")
samples.append(sample)
sample_idx += 1

samples = torch.concat(samples)
save_videos_grid(samples, f"{savedir}/sample.gif", n_rows=4)


```

### Prompts for other config

- Lyriel

```yaml
prompt:
- "dark shot, epic realistic, portrait of halo, sunglasses, blue eyes, tartan scarf, white hair by atey ghailan, by greg rutkowski, by greg tocchini, by james gilleard, by joe fenton, by kaethe butcher, gradient yellow, black, brown and magenta color scheme, grunge aesthetic!!! graffiti tag wall background, art by greg rutkowski and artgerm, soft cinematic light, adobe lightroom, photolab, hdr, intricate, highly detailed, depth of field, faded, neutral colors, hdr, muted colors, hyperdetailed, artstation, cinematic, warm lights, dramatic light, intricate details, complex background, rutkowski, teal and orange"
- "A forbidden castle high up in the mountains, pixel art, intricate details2, hdr, intricate details, hyperdetailed5, natural skin texture, hyperrealism, soft light, sharp, game art, key visual, surreal"
- "dark theme, medieval portrait of a man sharp features, grim, cold stare, dark colors, Volumetric lighting, baroque oil painting by Greg Rutkowski, Artgerm, WLOP, Alphonse Mucha dynamic lighting hyperdetailed intricately detailed, hdr, muted colors, complex background, hyperrealism, hyperdetailed, amandine van ray"
- "As I have gone alone in there and with my treasures bold, I can keep my secret where and hint of riches new and old. Begin it where warm waters halt and take it in a canyon down, not far but too far to walk, put in below the home of brown."

n_prompt:
- "3d, cartoon, lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry, artist name, young, loli, elf, 3d, illustration"
- "3d, cartoon, anime, sketches, worst quality, low quality, normal quality, lowres, normal quality, monochrome, grayscale, skin spots, acnes, skin blemishes, bad anatomy, girl, loli, young, large breasts, red eyes, muscular"
- "dof, grayscale, black and white, bw, 3d, cartoon, anime, sketches, worst quality, low quality, normal quality, lowres, normal quality, monochrome, grayscale, skin spots, acnes, skin blemishes, bad anatomy, girl, loli, young, large breasts, red eyes, muscular,badhandsv5-neg, By bad artist -neg 1, monochrome"
- "holding an item, cowboy, hat, cartoon, 3d, disfigured, bad art, deformed,extra limbs,close up,b&w, weird colors, blurry, duplicate, morbid, mutilated, [out of frame], extra fingers, mutated hands, poorly drawn hands, poorly drawn face, mutation, deformed, ugly, blurry, bad anatomy, bad proportions, extra limbs, cloned face, disfigured, out of frame, ugly, extra limbs, bad anatomy, gross proportions, malformed limbs, missing arms, missing legs, extra arms, extra legs, mutated hands, fused fingers, too many fingers, long neck, Photoshop, video game, ugly, tiling, poorly drawn hands, poorly drawn feet, poorly drawn face, out of frame, mutation, mutated, extra limbs, extra legs, extra arms, disfigured, deformed, cross-eye, body out of frame, blurry, bad art, bad anatomy, 3d render"
```

- RcnzCartoon

```yaml
prompt:
- "Jane Eyre with headphones, natural skin texture,4mm,k textures, soft cinematic light, adobe lightroom, photolab, hdr, intricate, elegant, highly detailed, sharp focus, cinematic look, soothing tones, insane details, intricate details, hyperdetailed, low contrast, soft cinematic light, dim colors, exposure blend, hdr, faded"
- "close up Portrait photo of muscular bearded guy in a worn mech suit, light bokeh, intricate, steel metal [rust], elegant, sharp focus, photo by greg rutkowski, soft lighting, vibrant colors, masterpiece, streets, detailed face"
- "absurdres, photorealistic, masterpiece, a 30 year old man with gold framed, aviator reading glasses and a black hooded jacket and a beard, professional photo, a character portrait, altermodern, detailed eyes, detailed lips, detailed face, grey eyes"
- "a golden labrador, warm vibrant colours, natural lighting, dappled lighting, diffused lighting, absurdres, highres,k, uhd, hdr, rtx, unreal, octane render, RAW photo, photorealistic, global illumination, subsurface scattering"

n_prompt:
- "deformed, distorted, disfigured, poorly drawn, bad anatomy, wrong anatomy, extra limb, missing limb, floating limbs, mutated hands and fingers, disconnected limbs, mutation, mutated, ugly, disgusting, blurry, amputation"
- "nude, cross eyed, tongue, open mouth, inside, 3d, cartoon, anime, sketches, worst quality, low quality, normal quality, lowres, normal quality, monochrome, grayscale, skin spots, acnes, skin blemishes, bad anatomy, red eyes, muscular"
- "easynegative, cartoon, anime, sketches, necklace, earrings worst quality, low quality, normal quality, bad anatomy, bad hands, shiny skin, error, missing fingers, extra digit, fewer digits, jpeg artifacts, signature, watermark, username, blurry, chubby, anorectic, bad eyes, old, wrinkled skin, red skin, photograph By bad artist -neg, big eyes, muscular face,"
- "beard, EasyNegative, lowres, chromatic aberration, depth of field, motion blur, blurry, bokeh, bad quality, worst quality, multiple arms, badhand"

```

- MajicMix

```yaml
prompt:
- "1girl, offshoulder, light smile, shiny skin best quality, masterpiece, photorealistic"
- "best quality, masterpiece, photorealistic, 1boy, 50 years old beard, dramatic lighting"
- "best quality, masterpiece, photorealistic, 1girl, light smile, shirt with collars, waist up, dramatic lighting, from below"
- "male, man, beard, bodybuilder, skinhead,cold face, tough guy, cowboyshot, tattoo, french windows, luxury hotel masterpiece, best quality, photorealistic"

n_prompt:
- "ng_deepnegative_v1_75t, badhandv4, worst quality, low quality, normal quality, lowres, bad anatomy, bad hands, watermark, moles"
- "nsfw, ng_deepnegative_v1_75t,badhandv4, worst quality, low quality, normal quality, lowres,watermark, monochrome"
- "nsfw, ng_deepnegative_v1_75t,badhandv4, worst quality, low quality, normal quality, lowres,watermark, monochrome"
- "nude, nsfw, ng_deepnegative_v1_75t, badhandv4, worst quality, low quality, normal quality, lowres, bad anatomy, bad hands, monochrome, grayscale watermark, moles, people"
```

- Realistic & Realistic_v2 (same prompts with different random seed, find more details in their config files)

```yaml
prompt:
- "b&w photo of 42 y.o man in black clothes, bald, face, half body, body, high detailed skin, skin pores, coastline, overcast weather, wind, waves, 8k uhd, dslr, soft lighting, high quality, film grain, Fujifilm XT3"
- "close up photo of a rabbit, forest, haze, halation, bloom, dramatic atmosphere, centred, rule of thirds, 200mm 1.4f macro shot"
- "photo of coastline, rocks, storm weather, wind, waves, lightning, 8k uhd, dslr, soft lighting, high quality, film grain, Fujifilm XT3"
- "night, b&w photo of old house, post apocalypse, forest, storm weather, wind, rocks, 8k uhd, dslr, soft lighting, high quality, film grain"

n_prompt:
- "semi-realistic, cgi, 3d, render, sketch, cartoon, drawing, anime, text, close up, cropped, out of frame, worst quality, low quality, jpeg artifacts, ugly, duplicate, morbid, mutilated, extra fingers, mutated hands, poorly drawn hands, poorly drawn face, mutation, deformed, blurry, dehydrated, bad anatomy, bad proportions, extra limbs, cloned face, disfigured, gross proportions, malformed limbs, missing arms, missing legs, extra arms, extra legs, fused fingers, too many fingers, long neck"
- "semi-realistic, cgi, 3d, render, sketch, cartoon, drawing, anime, text, close up, cropped, out of frame, worst quality, low quality, jpeg artifacts, ugly, duplicate, morbid, mutilated, extra fingers, mutated hands, poorly drawn hands, poorly drawn face, mutation, deformed, blurry, dehydrated, bad anatomy, bad proportions, extra limbs, cloned face, disfigured, gross proportions, malformed limbs, missing arms, missing legs, extra arms, extra legs, fused fingers, too many fingers, long neck"
- "blur, haze, deformed iris, deformed pupils, semi-realistic, cgi, 3d, render, sketch, cartoon, drawing, anime, mutated hands and fingers, deformed, distorted, disfigured, poorly drawn, bad anatomy, wrong anatomy, extra limb, missing limb, floating limbs, disconnected limbs, mutation, mutated, ugly, disgusting, amputation"
- "blur, haze, deformed iris, deformed pupils, semi-realistic, cgi, 3d, render, sketch, cartoon, drawing, anime, art, mutated hands and fingers, deformed, distorted, disfigured, poorly drawn, bad anatomy, wrong anatomy, extra limb, missing limb, floating limbs, disconnected limbs, mutation, mutated, ugly, disgusting, amputation"

```

ElliotQi marked this conversation as resolved.
Show resolved Hide resolved
## Citation

```bibtex
@article{guo2023animatediff,
title={AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning},
author={Guo, Yuwei and Yang, Ceyuan and Rao, Anyi and Wang, Yaohui and Qiao, Yu and Lin, Dahua and Dai, Bo},
journal={arXiv preprint arXiv:2307.04725},
year={2023}
}
```
Loading
Loading