Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Support disco-diffusion text-2-image #1234

Merged
merged 67 commits into from
Dec 2, 2022
Merged
Show file tree
Hide file tree
Changes from 30 commits
Commits
Show all changes
67 commits
Select commit Hold shift + click to select a range
0a0616d
going through adm unconditional sampling
plyfager Sep 14, 2022
1a88411
add config and code for test
plyfager Sep 16, 2022
261cc60
resolve conflict
plyfager Sep 16, 2022
601391a
fix lint
plyfager Sep 16, 2022
4613727
modify unet
plyfager Sep 19, 2022
b35e77c
format adm
plyfager Sep 20, 2022
a4690a8
support adm 512
plyfager Sep 22, 2022
2554a57
fix lint
plyfager Sep 26, 2022
1955304
support cls-g sampling
plyfager Sep 26, 2022
b85c93d
support ddim sampling
plyfager Sep 26, 2022
48a1382
init disco
plyfager Sep 27, 2022
08cd338
support disco-diffusion text2image
plyfager Oct 26, 2022
6a5c2b3
support secondary model in disco-diffusion (#1368)
yanniangu Oct 27, 2022
6073e19
support init_image as input
plyfager Nov 11, 2022
cb31610
init docstring
plyfager Nov 15, 2022
8da2eb2
solve conflict
plyfager Nov 16, 2022
9226e6d
Merge branch 'dev-1.x' of https://github.com/open-mmlab/mmediting int…
plyfager Nov 17, 2022
7d4d99a
refactor disco
plyfager Nov 21, 2022
1dab8ab
fix lint
plyfager Nov 22, 2022
6432fd2
resolve conflict
plyfager Nov 22, 2022
30911e3
Merge branch 'plyfager/disco-diffusion' of github.com:open-mmlab/mmed…
plyfager Nov 22, 2022
65e66fc
fix lint
plyfager Nov 22, 2022
3a47b31
remove disco bug
plyfager Nov 22, 2022
7de125e
remove data_preprocessor
plyfager Nov 23, 2022
8438b29
complete docstring of disco and partial guider
plyfager Nov 23, 2022
ee06f02
complete docstring for guider
plyfager Nov 23, 2022
e487cce
refine secondary model
plyfager Nov 23, 2022
8d8ef6f
Merge branch 'dev-1.x' of github.com:open-mmlab/mmediting into plyfag…
plyfager Nov 23, 2022
f555e88
fix lint
plyfager Nov 23, 2022
7109bbc
Merge branch 'dev-1.x' of github.com:open-mmlab/mmediting into plyfag…
plyfager Nov 23, 2022
33f5923
move cutter and loss config to infer
plyfager Nov 24, 2022
47cb257
fix adm and unet
plyfager Nov 25, 2022
a3e2ed8
rename config
plyfager Nov 25, 2022
3b7ae3e
support portrait generator config
plyfager Nov 28, 2022
8a93dd0
fix clip wrapper
plyfager Nov 28, 2022
ab24126
move unet to DDPM
plyfager Nov 28, 2022
c71ac35
rename clip_ext
plyfager Nov 28, 2022
67abcad
adjust requirements
plyfager Nov 28, 2022
b62d317
try_import
plyfager Nov 28, 2022
472bbd7
add dist.get_rank() == 0 as additional condition
plyfager Nov 28, 2022
515b121
add resize_right to requirements
plyfager Nov 28, 2022
4cf2fba
remove disco_baseline
plyfager Nov 28, 2022
44160ed
update url
plyfager Nov 29, 2022
ac935bd
fix a disco typo
plyfager Nov 29, 2022
d420b8c
add imagenet 256 config
plyfager Nov 29, 2022
eb45aae
Make Disco's readme simple
plyfager Nov 29, 2022
d9f0661
rename disco to disco_diffusion
plyfager Nov 29, 2022
62c49ac
add adm readme
plyfager Nov 30, 2022
b2ef637
fix lint
plyfager Nov 30, 2022
61b04af
Merge branch 'dev-1.x' of github.com:open-mmlab/mmediting into plyfag…
plyfager Nov 30, 2022
881b17a
support directly init disco with instance module
plyfager Nov 30, 2022
0d853fc
add ut of disco
plyfager Nov 30, 2022
ffee5e9
fix init
plyfager Nov 30, 2022
957a193
fix lint
plyfager Nov 30, 2022
07890b3
improve docstring coverage
plyfager Nov 30, 2022
678d579
fix lint
plyfager Nov 30, 2022
d83a3ba
fix docstring
plyfager Dec 2, 2022
ee74f0f
fix lint
plyfager Dec 2, 2022
3410381
add credits
plyfager Dec 2, 2022
e9d8c4f
Merge branch 'dev-1.x' of github.com:open-mmlab/mmediting into plyfag…
plyfager Dec 2, 2022
432da88
mv losses
plyfager Dec 2, 2022
d39386b
fix lint
plyfager Dec 2, 2022
224e184
rename diffuser
plyfager Dec 2, 2022
de08313
fix lint
plyfager Dec 2, 2022
707148a
delete null
plyfager Dec 2, 2022
6b6c087
rm raise error
plyfager Dec 2, 2022
369ee89
fix comment
plyfager Dec 2, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
45 changes: 45 additions & 0 deletions configs/_base_/datasets/imagenet_512.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
# dataset settings
dataset_type = 'ImageNet'

# different from mmcls, we adopt the setting used in BigGAN.
# We use `RandomCropLongEdge` in training and `CenterCropLongEdge` in testing.
train_pipeline = [
dict(type='LoadImageFromFile', key='img'),
dict(type='RandomCropLongEdge', keys=['img']),
dict(type='Resize', scale=(512, 512), keys=['img'], backend='pillow'),
dict(type='Flip', flip_ratio=0.5, direction='horizontal'),
dict(type='PackEditInputs')
]

test_pipeline = [
dict(type='LoadImageFromFile', key='img'),
dict(type='CenterCropLongEdge', keys=['img']),
dict(type='Resize', scale=(512, 512), backend='pillow'),
dict(type='PackEditInputs')
]

train_dataloader = dict(
batch_size=None,
num_workers=5,
dataset=dict(
type=dataset_type,
data_root='./data/imagenet/',
ann_file='meta/train.txt',
data_prefix='train',
pipeline=train_pipeline),
sampler=dict(type='DefaultSampler', shuffle=True),
persistent_workers=True)

val_dataloader = dict(
batch_size=None,
num_workers=5,
dataset=dict(
type=dataset_type,
data_root='./data/imagenet/',
ann_file='meta/train.txt',
data_prefix='train',
pipeline=test_pipeline),
sampler=dict(type='DefaultSampler', shuffle=False),
persistent_workers=True)

test_dataloader = val_dataloader
45 changes: 45 additions & 0 deletions configs/_base_/datasets/imagenet_64.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
# dataset settings
dataset_type = 'ImageNet'

# different from mmcls, we adopt the setting used in BigGAN.
# We use `RandomCropLongEdge` in training and `CenterCropLongEdge` in testing.
train_pipeline = [
dict(type='LoadImageFromFile', key='img'),
dict(type='RandomCropLongEdge', keys=['img']),
dict(type='Resize', scale=(64, 64), keys=['img'], backend='pillow'),
dict(type='Flip', flip_ratio=0.5, direction='horizontal'),
dict(type='PackEditInputs')
]

test_pipeline = [
dict(type='LoadImageFromFile', key='img'),
dict(type='CenterCropLongEdge', keys=['img']),
dict(type='Resize', scale=(64, 64), backend='pillow'),
dict(type='PackEditInputs')
]

train_dataloader = dict(
batch_size=None,
num_workers=5,
dataset=dict(
type=dataset_type,
data_root='./data/imagenet/',
ann_file='meta/train.txt',
data_prefix='train',
pipeline=train_pipeline),
sampler=dict(type='DefaultSampler', shuffle=True),
persistent_workers=True)

val_dataloader = dict(
batch_size=64,
num_workers=5,
dataset=dict(
type=dataset_type,
data_root='./data/imagenet/',
ann_file='meta/train.txt',
data_prefix='train',
pipeline=test_pipeline),
sampler=dict(type='DefaultSampler', shuffle=False),
persistent_workers=True)

test_dataloader = val_dataloader
149 changes: 149 additions & 0 deletions configs/disco/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,149 @@
# Disco Diffusion(Google Colab)
plyfager marked this conversation as resolved.
Show resolved Hide resolved
plyfager marked this conversation as resolved.
Show resolved Hide resolved

plyfager marked this conversation as resolved.
Show resolved Hide resolved
> [](<>)

> **Task**: Text2Image

<!-- [ALGORITHM] -->

## Abstract

<!-- [ABSTRACT] -->

Disco Diffusion (DD) is a Google Colab Notebook which leverages an AI Image generating technique called CLIP-Guided Diffusion to allow you to create compelling and beautiful images from just text inputs. Created by Somnai, augmented by Gandamu, and building on the work of RiversHaveWings, nshepperd, and many others.
plyfager marked this conversation as resolved.
Show resolved Hide resolved

<!-- [IMAGE] -->

<div align=center>
plyfager marked this conversation as resolved.
Show resolved Hide resolved
<img src="https://user-images.githubusercontent.com/22982797/201001789-7ef108a0-f607-401e-98dc-4e16d6be384f.png"/>
</div>

## Models Card
plyfager marked this conversation as resolved.
Show resolved Hide resolved

## Quick Start

In order to get started, we introduce a simplest way to get an image within 6 line of codes.

```python
from mmengine import Config, MODELS
from mmedit.utils import register_all_modules
register_all_modules()

disco = MODELS.build(Config.fromfile('configs/disco/disco-baseline.py').model).cuda().eval()
text_prompts = {
0: ["A beautiful painting of a singular lighthouse, shining its light across a tumultuous sea of blood by greg rutkowski and thomas kinkade, Trending on artstation.", "yellow color scheme"]
}
image = disco.infer(height=768, width=1280, text_prompts=text_prompts, show_progress=True, num_inference_steps=250, eta=0.8)['samples']
plyfager marked this conversation as resolved.
Show resolved Hide resolved
```

## Advanced Tutorials

For detailed description and advanced usage.

### Overall Architecture(In Construction)

### Infer Settings

For fixed Disco-Diffusions, there are several runtime settings.
plyfager marked this conversation as resolved.
Show resolved Hide resolved

1. Image Resolution.
plyfager marked this conversation as resolved.
Show resolved Hide resolved
Despite the limit of your device limitation, you can set height and width of image as you like.
plyfager marked this conversation as resolved.
Show resolved Hide resolved

Performing code,
plyfager marked this conversation as resolved.
Show resolved Hide resolved

```python
plyfager marked this conversation as resolved.
Show resolved Hide resolved
from mmengine import Config, MODELS
from mmedit.utils import register_all_modules
register_all_modules()

disco = MODELS.build(Config.fromfile('configs/disco/disco-baseline.py').model).cuda().eval()
text_prompts = {
0: ["A beautiful painting of a singular lighthouse, shining its light across a tumultuous sea of blood by greg rutkowski and thomas kinkade, Trending on artstation.", "yellow color scheme"]
}
image = disco.infer(height=512, width=1024, text_prompts=text_prompts, show_progress=True, num_inference_steps=250, eta=0.8)['samples']
plyfager marked this conversation as resolved.
Show resolved Hide resolved
```

get

<div align=center>
<img src="https://user-images.githubusercontent.com/22982797/201041058-b47a897c-852e-4b78-9627-48706dade1d5.png"/>
</div>

2. Initial image.
plyfager marked this conversation as resolved.
Show resolved Hide resolved
You can set the initial image for your art work, simply set `init_image` to your image path. By set `init_scale`, you can adjust the similarity of initial image and your result.

**Note**: Make sure you set `skip_steps` to ~50% of your steps if you want to use an init image.
plyfager marked this conversation as resolved.
Show resolved Hide resolved

For example, Take this picture as initial image
plyfager marked this conversation as resolved.
Show resolved Hide resolved

<div align="center">
<br/>
<img src="https://user-images.githubusercontent.com/22982797/201272831-81f2b1f4-3e28-4468-8e84-b7c52ad74e11.jpg" width="800"/>
</div>

Note that, `init_scale` need to be set in config, this field is contained in `loss_cfg`.
plyfager marked this conversation as resolved.
Show resolved Hide resolved

```python
from mmengine import Config, MODELS
from mmedit.utils import register_all_modules

register_all_modules()
config = 'configs/disco/disco-init_scale20.py'
disco = MODELS.build(Config.fromfile(config).model).cuda().eval()
text_prompts = {
0: ["a huge dragon, human like, flying with flame, and two big wings"]
}
image_path = 'PATH/TO/INIT_IMAGE'
image = disco.infer(width=1280, height=768, init_image=image_path, text_prompts=text_prompts, show_progress=True, num_inference_steps=250, skip_steps=150, eta=0.8)['samples']
```

and get

<div align="center">
<br/>
<img src="https://user-images.githubusercontent.com/22982797/201273268-ce775eeb-fb9d-4997-a3f6-b93835593f36.png" width="800"/>
</div>

Then we use default `init_scale=1000`

```python
from mmengine import Config, MODELS
from mmedit.utils import register_all_modules

register_all_modules()
config = 'configs/disco/disco-baseline.py'
plyfager marked this conversation as resolved.
Show resolved Hide resolved
disco = MODELS.build(Config.fromfile(config).model).cuda().eval()
text_prompts = {
0: ["a huge dragon, human like, flying with flame, and two big wings"]
}
image_path = 'PATH/TO/INIT_IMAGE'
image = disco.infer(width=1280, height=768, init_image=image_path, text_prompts=text_prompts, show_progress=True, num_inference_steps=250, skip_steps=150, eta=0.8)['samples']
```

and get

<div align="center">
<br/>
<img src="https://user-images.githubusercontent.com/22982797/201273252-3e9d1293-5a83-4ca1-a177-b9fa2639ba14.png" width="800"/>
</div>

plyfager marked this conversation as resolved.
Show resolved Hide resolved
### Unet Settings(In Construction)

### Clip Models Settings(In Construction)

### Cutter Settings(In Construction)

### Diffuser Settings(In Construction)

### Loss Settings(In Construction)
plyfager marked this conversation as resolved.
Show resolved Hide resolved

## Citation
plyfager marked this conversation as resolved.
Show resolved Hide resolved

```bibtex
@misc{github,
author={alembics},
title={disco-diffusion},
year={2022},
url={https://github.com/alembics/disco-diffusion},
}
```
59 changes: 59 additions & 0 deletions configs/disco/disco-baseline.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
unet = dict(
plyfager marked this conversation as resolved.
Show resolved Hide resolved
type='DenoisingUnet',
image_size=512,
in_channels=3,
base_channels=256,
resblocks_per_downsample=2,
attention_res=(32, 16, 8),
norm_cfg=dict(type='GN32', num_groups=32),
dropout=0.0,
num_classes=0,
use_fp16=True,
resblock_updown=True,
attention_cfg=dict(
type='MultiHeadAttentionBlock',
num_heads=4,
num_head_channels=64,
use_new_attention_order=False),
use_scale_shift_norm=True)

unet_ckpt_path = 'https://download.openmmlab.com/mmediting/synthesizers/disco/adm-u_finetuned_imagenet-512x512-ab471d70.pth' # noqa
secondary_model_ckpt_path = 'https://download.openmmlab.com/mmediting/synthesizers/disco/secondary_model_imagenet_2.pth' # noqa
pretrained_cfgs = dict(
unet=dict(ckpt_path=unet_ckpt_path, prefix='unet'),
secondary_model=dict(ckpt_path=secondary_model_ckpt_path, prefix=''))

secondary_model = dict(type='SecondaryDiffusionImageNet2')

diffuser = dict(
type='DDIMScheduler',
variance_type='learned_range',
beta_schedule='linear',
clip_sample=False)

clip_models_cfg = [
dict(type='ClipWrapper', clip_type='clip', name='ViT-B/32', jit=False),
dict(type='ClipWrapper', clip_type='clip', name='ViT-B/16', jit=False),
dict(type='ClipWrapper', clip_type='clip', name='RN50', jit=False)
]

# pretrained_cfgs = None
cutter_cfg = dict(
cut_overview=eval('[12]*400+[4]*600'),
cut_innercut=eval('[4]*400+[12]*600'),
cut_ic_pow=eval('[1]*1000'),
cut_icgray_p=eval('[0.2]*400+[0]*600'),
cutn_batches=4)

loss_cfg = dict(tv_scale=0, range_scale=150, sat_scale=0, init_scale=1000)

model = dict(
type='DiscoDiffusion',
unet=unet,
diffuser=diffuser,
secondary_model=secondary_model,
cutter_cfg=cutter_cfg,
loss_cfg=loss_cfg,
clip_models_cfg=clip_models_cfg,
use_fp16=True,
pretrained_cfgs=pretrained_cfgs)
59 changes: 59 additions & 0 deletions configs/disco/disco-init_scale20.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
unet = dict(
plyfager marked this conversation as resolved.
Show resolved Hide resolved
type='DenoisingUnet',
image_size=512,
in_channels=3,
base_channels=256,
resblocks_per_downsample=2,
attention_res=(32, 16, 8),
norm_cfg=dict(type='GN32', num_groups=32),
dropout=0.0,
num_classes=0,
use_fp16=True,
resblock_updown=True,
attention_cfg=dict(
type='MultiHeadAttentionBlock',
num_heads=4,
num_head_channels=64,
use_new_attention_order=False),
use_scale_shift_norm=True)

unet_ckpt_path = 'https://download.openmmlab.com/mmediting/synthesizers/disco/adm-u_finetuned_imagenet-512x512-ab471d70.pth' # noqa
secondary_model_ckpt_path = 'https://download.openmmlab.com/mmediting/synthesizers/disco/secondary_model_imagenet_2.pth' # noqa
pretrained_cfgs = dict(
unet=dict(ckpt_path=unet_ckpt_path, prefix='unet'),
secondary_model=dict(ckpt_path=secondary_model_ckpt_path, prefix=''))

secondary_model = dict(type='SecondaryDiffusionImageNet2')

diffuser = dict(
type='DDIMScheduler',
variance_type='learned_range',
beta_schedule='linear',
clip_sample=False)

clip_models_cfg = [
dict(type='ClipWrapper', clip_type='clip', name='ViT-B/32', jit=False),
dict(type='ClipWrapper', clip_type='clip', name='ViT-B/16', jit=False),
dict(type='ClipWrapper', clip_type='clip', name='RN50', jit=False)
]

# pretrained_cfgs = None
cutter_cfg = dict(
cut_overview=eval('[12]*400+[4]*600'),
cut_innercut=eval('[4]*400+[12]*600'),
cut_ic_pow=eval('[1]*1000'),
cut_icgray_p=eval('[0.2]*400+[0]*600'),
cutn_batches=4)

loss_cfg = dict(tv_scale=0, range_scale=150, sat_scale=0, init_scale=20)

model = dict(
type='DiscoDiffusion',
unet=unet,
diffuser=diffuser,
secondary_model=secondary_model,
cutter_cfg=cutter_cfg,
loss_cfg=loss_cfg,
clip_models_cfg=clip_models_cfg,
use_fp16=True,
pretrained_cfgs=pretrained_cfgs)
9 changes: 9 additions & 0 deletions configs/disco/metafile.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
Collections:
- Metadata:
Architecture:
- Disco Diffusion
Name: Disco Diffusion
Paper:
- <>
README: configs/disco/README.md
Models: []
Loading