Skip to content

Pytorch implementation for Word-Level Fine-Grained Story Visualization.

Notifications You must be signed in to change notification settings

mrlibw/Word-Level-Story-Visualization

Repository files navigation

Word-Level Fine-Grained Story Visualization

Pytorch implementation for Word-Level Fine-Grained Story Visualization. The goal is to generate a sequence of images to narrate each sentence in a multi-sentence story with a global consistency across dynamic scenes and characters.

Overview

Word-Level Fine-Grained Story Visualization.
Bowen Li, Thomas Lukasiewicz.
University of Oxford, TU Wien
ECCV 2022

Data

  1. Download Pororo dataset and extract the folder to data/pororo.
  2. Download Abstract Scenes dataset and extract the folder to data/abstract.

Training

All code was developed and tested on CentOS 7 with Python 3.7 (Anaconda) and PyTorch 1.1.

Text Encoder Pretraining

  • Please refer ControlGAN for more details about pretraining the text encoder. The text encoder pretraining is based on DAMSM, which maximizes the cosine similarity between text and image pairs provided by the corresponding dataset.

Our Model

  • Train the model for Pororo dataset:
python main_pororo.py --cfg cfg/pororo.yml
  • Train the model for Abstract dataset:
python main_abstract.py --cfg cfg/abstract.yml

*.yml files include configuration for training and testing. If you store the datasets in somewhere else, please modify DATA_DIR to point to the location.

Note that we evaluate our approach at the resolution 64 × 64 on Pororo and 256×256 on Abstract Scenes, as Abstract Scenes provides larger-scale ground-truth images. To work on images at the resolution 256 × 256, we repeat the same upsampling blocks in the generator and downsampling blocks in the discriminator.

Pretrained Text Encoder

Pretrained Our Model

  • Pororo. Download and save it to models/.

  • Abstract. Download and save it to models/.

Evaluation

  • Run the following commands to evaluate our approach on the Pororo and Abstract Scenes test dataset, including image generation of all stories in the test dataset, and calculation of both FID and FSD scores:
python main_pororo.py --cfg ./cfg/pororo.yml --eval_fid True
python main_abstract.py --cfg ./cfg/abstract.yml --eval_fid True

FID and FSD results will be saved in a .csv file.

Code Structure

  • cfg/: contains *.yml files.
  • datasets/: dataloader.
  • main_pororo.py: the entry point for training and testing on Pororo.
  • main_abstract.py: the entry point for training and testing on Abstract Scenes.
  • trainer.py: creates the networks, harnesses and reports the progress of training.
  • model.py: defines the architecture.
  • inference.py: functions for evaluation.
  • miscc/utils.py: loss functions and addtional help functions.
  • miscc/config.py: creates the option list.

Citation

If you find this useful for your research, please use the following.

@article{li2022word,
  title={Word-Level Fine-Grained Story Visualization},
  author={Li, Bowen and Lukasiewicz, Thomas},
  journal={arXiv preprint arXiv:2208.02341},
  year={2022}
}

Acknowledgements

This code borrows from StoryGAN and ControlGAN repositories. Many thanks.

About

Pytorch implementation for Word-Level Fine-Grained Story Visualization.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages