Skip to content

[CVPR 2021] Multi-Modal-CelebA-HQ: A Large-Scale Text-Driven Face Generation and Understanding Dataset

Notifications You must be signed in to change notification settings

IIGROUP/MM-CelebA-HQ-Dataset

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Multi-Modal-CelebA-HQ

Paper Maintenance PR's Welcome Images 30000

Multi-Modal-CelebA-HQ (MM-CelebA-HQ) is a dataset containing 30,000 high-resolution face images selected from CelebA, following CelebA-HQ. Each image in the dataset is accompanied by a semantic mask, sketch, descriptive text, and an image with a transparent background.

Multi-Modal-CelebA-HQ can be used to train and evaluate algorithms for a range of face generation and understanding tasks, including text-to-image generation, sketch-to-image generation, text-guided image editing, image captioning, and visual question answering. This dataset is introduced and employed in TediGAN.

TediGAN: Text-Guided Diverse Face Image Generation and Manipulation.
Weihao Xia, Yujiu Yang, Jing-Hao Xue, and Baoyuan Wu.
CVPR 2021.

Updates 🚩

  • [07/10/2023] 3DMM coefficients and corresponding rendered images have been added to the repository.
  • [04/10/2023] The scripts for text and sketch generation have been added to the repository.
  • [06/12/2020] The paper is released on arXiv.
  • [11/13/2020] The multi-modal-celeba-hq dataset has been released.

Data Generation

Description

  • The textual descriptions are generated using probabilistic context-free grammar (PCFG) based on the given attributes. We create ten unique single sentence descriptions per image to obtain more training data following the format of the popular CUB dataset and COCO dataset. The previous study proposed CelebTD-HQ, but it is not publicly available.
  • For semantic labels, we use CelebAMask-HQ dataset, which contains manually-annotated semantic mask of facial attributes corresponding to CelebA-HQ.
  • For sketches, we follow the same data generation pipeline as in DeepFaceDrawing. We first apply Photocopy filter in Photoshop to extract edges, which preserves facial details and introduces excessive noise, then apply the sketch-simplification to get edge maps resembling hand-drawn sketches.
  • For background removing, we use an open-source tool Rembg and a commercial software removebg. Different backgrounds can be further added using image composition or harmonization methods like DoveNet.
  • For 3DMM coefficients and the corresponding rendered image, we use Deep3DFaceReconstruction. Please follow the instructions for data generation. We also provide the Cleaned Face Datasets, the "cleaned" version of two popular face datasets, CelebAHQ and FFHQ, made by removing instances with extreme poses, occlusions, blurriness, and the presence of multiple individuals in the frame.

Usage

This section outlines the process of generating the data for our task.

The scripts provided here are not restricted to the CelebA-HQ dataset and can be utilized to preprocess any dataset that includes attribute annotations, be it image, video, or 3D shape data. This flexibility enables the creation of custom datasets that meet specific requirements. For example, the create_caption.py script can be applied to generate diverse descriptions for each video by using video facial attributes (e.g., those provided by CelebV-HQ), leading to a text-video dataset, similar to CelebV-Text.

Text

Please download celeba-hq-attribute.txt (CelebAMask-HQ-attribute-anno.txt) and run the following script.

python create_caption.py

The generated textual descriptions can be found at ./celeba_caption.

Please fill out the form to request the processing script. If feasible, please send me a follow-up email after submitting the form to remind me.

Sketch

If Photoshop is available to you, please apply the Photocopy filter in Photoshop to extract edges. Photoshop allows batch processing so you don't have to mannually process each image. The Sobel operator is an lternative way to extract edges when Photoshop is unavailable or a simpler approach is preferred. This process preserves facial details but introduces excessive noise. The sketch-simplification model is applied to get edge maps resembling hand-drawn sketches.

The sketch simplification model requires torch==0.4.1 and torchvision==0.2.1.

python create_sketch.py

The generated sketches can be found at ./celeba_sketch.

Overview

image

Note: Upon request, the download links of raw data and annotations have been removed from this repo. Please redirect to their original site for the raw data.and email me for the post-processing scripts. The scripts for text and sketch generation have been added to the repository.

All data is hosted on Google Drive (not available).

Path Size Files Format Description
multi-modal-celeba ~20 GB 420,002 Main folder
├  train 347 KB 1 PKL filenames of training images
├  test 81 KB 1 PKL filenames of test images
├  image 2 GB 30,000 JPG images from celeba-hq of size 512×512
├  text 11 MB 30,0000 TXT 10 descriptions of each image in celeba-hq
├  coeff 115 MB 29,437 MAT 3dmm coefficients of each image in celeba-hq
├  rendered 834 MB 29,437 PNG rendered image of each image in celeba-hq of size 256×256

For 3DMM coefficients and rendered images of each image in the FFHQ dataset, please refer to cleaned-celebahq-ffhq.

Pretrained Models

We provide the pretrained models of AttnGAN, ControlGAN, DMGAN, DFGAN, and ManiGAN. Please consider citing our paper if you use these pretrained models. Feel free to pull requests if you have any updates. Feel free to pull requests if you have any updates.

Method FID LPIPIS Download
AttnGAN 125.98 0.512 Google Drive
ControlGAN 116.32 0.522 Google Drive
DF-GAN 137.60 0.581 Google Drive
DM-GAN 131.05 0.544 Google Drive
TediGAN 106.37 0.456 Google Drive

The pretrained model of ManiGAN is here. The training scripts and pretrained models on faces of sketch-to-to-image and label-to-image can be found here. Those with problems accessing Google Drive can refer to an alternative link at Baidu Cloud (code: b273) for the dataset and pretrained models.

Related Works

  • CelebA dataset:
    Ziwei Liu, Ping Luo, Xiaogang Wang and Xiaoou Tang, "Deep Learning Face Attributes in the Wild", in IEEE International Conference on Computer Vision (ICCV), 2015
  • CelebA-HQ was collected from CelebA and further post-processed by the following paper :
    Karras et. al., "Progressive Growing of GANs for Improved Quality, Stability, and Variation", in Internation Conference on Reoresentation Learning (ICLR), 2018
  • CelebAMask-HQ manually-annotated masks with the size of 512 x 512 and 19 classes including all facial components and accessories such as skin, nose, eyes, eyebrows, ears, mouth, lip, hair, hat, eyeglass, earring, necklace, neck, and cloth. It was collected by the following paper :
    Lee et. al., "MaskGAN: Towards Diverse and Interactive Facial Image Manipulation", in Computer Vision and Pattern Recognition (CVPR), 2020

Citation

If you find the dataset, processing scripts, and pretrained models useful for your research, please consider citing our paper:

@inproceedings{xia2021tedigan,
  title={TediGAN: Text-Guided Diverse Face Image Generation and Manipulation},
  author={Xia, Weihao and Yang, Yujiu and Xue, Jing-Hao and Wu, Baoyuan},
  booktitle={IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2021}
}

@article{xia2021towards,
  title={Towards Open-World Text-Guided Face Image Generation and Manipulation},
  author={Xia, Weihao and Yang, Yujiu and Xue, Jing-Hao and Wu, Baoyuan},
  journal={arxiv preprint arxiv: 2104.08910},
  year={2021}
}

If you use images and masks, please cite:

@inproceedings{liu2015faceattributes,
 title = {Deep Learning Face Attributes in the Wild},
 author = {Liu, Ziwei and Luo, Ping and Wang, Xiaogang and Tang, Xiaoou},
 booktitle = {Proceedings of International Conference on Computer Vision (ICCV)},
 year = {2015} 
}

@inproceedings{karras2017progressive,
  title={Progressive growing of gans for improved quality, stability, and variation},
  author={Karras, Tero and Aila, Timo and Laine, Samuli and Lehtinen, Jaakko},
  journal={International Conference on Learning Representations (ICLR)},
  year={2018}
}

@inproceedings{CelebAMask-HQ,
  title={MaskGAN: Towards Diverse and Interactive Facial Image Manipulation},
  author={Lee, Cheng-Han and Liu, Ziwei and Wu, Lingyun and Luo, Ping},
  booktitle={IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2020}
}

License

The use of this software is RESTRICTED to non-commercial research and educational purposes. The license is the same as in CelebAMask-HQ.

About

[CVPR 2021] Multi-Modal-CelebA-HQ: A Large-Scale Text-Driven Face Generation and Understanding Dataset

Topics

Resources

Stars

Watchers

Forks

Languages