Skip to content

Latest commit

 

History

History
71 lines (56 loc) · 2.37 KB

README.md

File metadata and controls

71 lines (56 loc) · 2.37 KB

True Visual Grounding

Repo for the paper Uncovering the Full Potential of Visual Grounding Methods in VQA (Daniel Reich and Tanja Schultz)

main-fig

Setup

Environment

Create and activate conda environment:

conda create -n truevg python=3.6
conda activate truevg

Install the dependencies with:

pip install -r requirements.txt

Setup data and project directory

  • Inside scripts/common.sh, edit PROJ_DIR variable by assigning it the project path.
  • Most required data to run experiments from the paper can be downloaded manually here (setup scripts below will download files from there): https://zenodo.org/records/10357278

For GQA

Download data for GQA:

./scripts/download/download_truevg_gqa.sh

Preprocess the data:

./scripts/preprocessing/preprocessing_truevg_gqa.sh

For VQA-HAT

Download data for VQA-HAT (make sure to also use included files under files/ in this repo):

./scripts/download/download_truevg_vqa.sh

Preprocess the data:

./scripts/preprocessing/preprocessing_truevg_vqa.sh

Training and Testing

  • Run scripts in scripts/GQA/, and scripts/VQAHAT/ to train models. Tests and evaluations run automatically after training. The first argument to these scripts is the dataset name (gqacp, hatcp), the second is your GPU's number. Example:
./scripts/GQA/updn_baseline_DET.sh gqacp 0
./scripts/VQAHAT/updn_visFIS_INF_semantic.sh hatcp 1

Script names should be self-explanatory as to which model they train (in correspondence with the paper). Evaluation results (accuracy and FPVG) are printed after training and testing has finished.

Acknowledgement

This code builds on VisFIS, which itself used resources from negative analysis of grounding, ramen, and bottom-up-attention-vqa . Code from FPVG is also used.

Citation

If you find this code useful for your research, please consider citing:

@inproceedings{reich2024truevg,
  title={Uncovering the Full Potential of Visual Grounding Methods in VQA},
  author = {Reich, Daniel and Schultz, Tanja},
  booktitle={arXiv},
  year={2024}
}