The code for VPEval a novel interpretable/explainable evaluation framework for T2I generation models, based on visual programming, as described in the paper:
Visual Programming for Text-to-Image Generation and Evaluation
Jaemin Cho, Abhay Zala, Mohit Bansal
[Project Page] [Paper] [Code for VPGen]
See our change log here.
# Evaluate Source Code
src/
# Data Files
data/
# Data Download and Code Run Scripts
scripts/
# Create a conda environment
conda create -n vpeval python=3.8
conda activate vpeval
# Install requirements
pip install -r requirements.txt
# Install 2nd requirements (as they must be installed second)
pip install -r requirements_2.txt
Then please follow directions on installing GroundingDINO: https://github.com/IDEA-Research/GroundingDINO
You also need to make sure you have downloaded the GroundingDINO weights and put them in the weights
directory.
You can do this by running
bash scripts/download_grounding_dino_weights.sh
Then you can download and extract all the model generated images by running
bash scripts/download_images.sh
To run skill based evaluation, please run
bash scripts/evaluate_skill_based.sh
Note: In the paper, we use the first 1000 IDs located in the data/skill_based/random_ids_{skill}.json
file, where skill is any of object, count, spatial, etc.
- This is already implemented in the code
Example outputs of our open ended evaluation process.
To run open ended evaluation, please run
bash scripts/evaluate_open_ended.sh
Then run the following to get the scores
python src/utils/score_open_ended.py
When running a script, pass the --visualization_savepath
argument to choose where to save the explainations.
The visual explainations (bounding boxes) will be saved in the ../images/
directory and then a JSON file
will also be saved in the root path that includes the text explainations along with a path to the coorsponding images if it is available.
We've released a fine-tuned (on ChatGPT outputs) LLama2 7B model. If you do not want to use ChatGPT then you can use this model instead. Please refer to this code file.
If you find our project useful in your research, please cite the following paper:
@inproceedings{Cho2023VPT2I,
author = {Jaemin Cho and Abhay Zala and Mohit Bansal},
title = {Visual Programming for Text-to-Image Generation and Evaluation},
booktitle = {NeurIPS},
year = {2023},
}