GitHub - zhaoyucs/VSD: Code for "Visual Spatial Description: Controlled Spatial-Oriented Image-to-Text Generation"

Visual Description Description

The datasets VSDv2 are available now.

This repository cotains code and data for our paper Visual Spatial Description: Controlled Spatial-Oriented Image-to-Text Generation

** Note ** Please go into VLT5 and follow the README there for Pretrained Models and Feature Extraction.

Setup

# Create python environment (optional)
conda create -n vsd python=3.7
source activate vsd

# Install python dependencies
pip install -r requirements.txt

# For captioning evaluation
python -c "import language_evaluation; language_evaluation.download('coco')"

Code structure

# Store images, features, and annotations
./datasets

# Image feature extraction
./feature_extraction

# Train VL-T5
./VL-T5/
    src/
        modeling_t5.py modeling_bart.py                       <= VL-T5/VL-BART model classes
        caption_sp.py, vrd_caption.py                         <= fine-tuning
        param.py                                              <= (argparse) configuration
        tokenization.py                                       <= custom tokenizer
        utils.py, dist_utils.py                               <= utility functions
    snap/                                                     <= store weight checkpoints

Pretrained Models

pretrained VL-BART and VL-T5 are provided by [1]
Download snap/ from Google Drive

gdrive download 1_SBj4sZ0gUqfBon1gFBiNRAmfHv5w_ph --recursive

Run

bash ./baseline.sh gpu_num
bash ./end2end.sh gpu_num

Acknowledgement

This repo is adapted from VLT5.

Reference

Please cite our paper if you use our models or data in your project.

@inproceedings{zhao2022vsd,
  title     = {Visual Spatial Description: Controlled Spatial-Oriented Image-to-Text
               Generation},
  author    = {Yu Zhao and
               Jianguo Wei and
               Zhichao Lin and
               Yueheng Sun and
               Meishan Zhang and
               Min Zhang},
  booktitle = {EMNLP},
  year      = {2022}
}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
VLModel		VLModel
dataset		dataset
feature_extraction		feature_extraction
.DS_Store		.DS_Store
README.md		README.md
baseline.sh		baseline.sh
end2end.sh		end2end.sh
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Visual Description Description

Setup

Code structure

Pretrained Models

Run

Acknowledgement

Reference

About

Releases

Packages

Languages

zhaoyucs/VSD

Folders and files

Latest commit

History

Repository files navigation

Visual Description Description

Setup

Code structure

Pretrained Models

Run

Acknowledgement

Reference

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages