Beyond Surface Statistics: Scene Representations in a Latent Diffusion Model

Linear probes found controllable representations of scene attributes in a text-to-image diffusion model

Project page of "Beyond Surface Statistics: Scene Representations in a Latent Diffusion Model"
Paper arXiv link: https://arxiv.org/abs/2306.05720
[NeurIPS link] [Poster link]

How to generate a short video of moving foreground object using a pretrained text-to-image generative model?

See application_of_intervention.ipynb for how to use our intervention technique to generate a short video of moving objects.

Some examples:

The gifs are sampled using the original text-to-image diffusion model without fine-tuning. All frames are generated using the same prompt, random seed (inital latent vectors), and model. We edited the intermediate activations of the latent diffusion model when it generated the images so its internal representtaion of foreground match with our reference mask. See notebook for implementation details.

Probe Weights:

Unzip the probe_checkpoints.zip to acquire all probe weights trained by us. The probe weights in the unzipped folder should be sufficient for you to run all experiments shown in the paper.

Citation

If you find the source code of this repo helpful, please cite

@article{chen2023beyond,
  title={Beyond Surface Statistics: Scene Representations in a Latent Diffusion Model},
  author={Chen, Yida and Vi{\'e}gas, Fernanda and Wattenberg, Martin},
  journal={arXiv preprint arXiv:2306.05720},
  year={2023}
}

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
modified_diffusers		modified_diffusers
probe_src		probe_src
resources		resources
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
application_of_intervention.ipynb		application_of_intervention.ipynb
citation.bib		citation.bib
citations.bib		citations.bib
create_the_synthetic_dataset.ipynb		create_the_synthetic_dataset.ipynb
environment.yml		environment.yml
index.html		index.html
intervention_binary_depth.py		intervention_binary_depth.py
intervention_config.py		intervention_config.py
intervention_continuous_depth.py		intervention_continuous_depth.py
intervention_on_LDM_representation.ipynb		intervention_on_LDM_representation.ipynb
probe_checkpoints.zip		probe_checkpoints.zip
probing_binary_depth.py		probing_binary_depth.py
probing_binary_depth_controlled.py		probing_binary_depth_controlled.py
probing_binary_depth_vae.py		probing_binary_depth_vae.py
probing_continuous_depth.py		probing_continuous_depth.py
probing_continuous_depth_controlled.py		probing_continuous_depth_controlled.py
probing_continuous_depth_vae.py		probing_continuous_depth_vae.py
probing_depth_config.py		probing_depth_config.py
run_probing_experiments.ipynb		run_probing_experiments.ipynb
test_indices.pkl		test_indices.pkl
test_split_prompts_seeds.csv		test_split_prompts_seeds.csv
train_indices.pkl		train_indices.pkl
train_split_prompts_seeds.csv		train_split_prompts_seeds.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Beyond Surface Statistics: Scene Representations in a Latent Diffusion Model

How to generate a short video of moving foreground object using a pretrained text-to-image generative model?

Some examples:

Probe Weights:

Citation

About

Releases

Packages

Languages

License

yc015/scene-representation-diffusion-model

Folders and files

Latest commit

History

Repository files navigation

Beyond Surface Statistics: Scene Representations in a Latent Diffusion Model

How to generate a short video of moving foreground object using a pretrained text-to-image generative model?

Some examples:

Probe Weights:

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages