Skip to content

Latest commit

 

History

History
82 lines (56 loc) · 2.25 KB

README.md

File metadata and controls

82 lines (56 loc) · 2.25 KB

unsafe-diffusion

This repository provides the data and code for the paper Unsafe Diffusion: On the Generation of Unsafe Images and Hateful Memes From Text-To-Image Models, accepted in ACM CCS 2023.

Paper: https://arxiv.org/pdf/2305.13873.pdf

Unsafe Image Generation

1. Collecting Prompts

We use three harmful prompt datasets and one harmless prompt dataset. Request the prompt datasets here: https://zenodo.org/record/8255664

  • 4chan prompts (harmful)
  • Lexica prompts (harmful)
  • Template prompts (harmful)
  • COCO prompts (harmless)

2. Generating Images

We use four open-sourced Text-to-Image models:

3. Unsafe Image Classification

We labeled 800 generated images. Request the image dataset here: https://zenodo.org/record/8255664

Prerequisite

pip install -r requirements.txt

Train the Multi-headed Safety Classifier

python train.py
  --images_dir ./data/images \
  --labels_dir ./data/labels.xlsx \
  --output_dir ./checkpoints/multi-headed\

Evaluate the Classifier and Other Baselines

python evaluate.py
  --images_dir ./data/images \
  --labels_dir ./data/labels.xlsx \
  --checkpoints_dir ./checkpoints

Directly Use the Classifier to Detect Unsafe Images

python inference.py
  --images_dir ./data/images \
  --output_dir ./results

Hateful Meme Generation

We employ three image editing techniques on top of Stable Diffusion:

Reference

If you find this helpful, please cite the following work:

@inproceedings{QSHBZZ23,
author = {Yiting Qu and Xinyue Shen and Xinlei He and Michael Backes and Savvas Zannettou and Yang Zhang},
title = {{Unsafe Diffusion: On the Generation of Unsafe Images and Hateful Memes From Text-To-Image Models}},
booktitle = {{ACM SIGSAC Conference on Computer and Communications Security (CCS)}},
publisher = {ACM},
year = {2023}
}