Repo for the paper Uncovering the Full Potential of Visual Grounding Methods in VQA (Daniel Reich and Tanja Schultz)
Create and activate conda environment:
conda create -n truevg python=3.6
conda activate truevg
Install the dependencies with:
pip install -r requirements.txt
- Inside
scripts/common.sh
, editPROJ_DIR
variable by assigning it the project path. - Most required data to run experiments from the paper can be downloaded manually here (setup scripts below will download files from there): https://zenodo.org/records/10357278
Download data for GQA:
./scripts/download/download_truevg_gqa.sh
Preprocess the data:
./scripts/preprocessing/preprocessing_truevg_gqa.sh
Download data for VQA-HAT (make sure to also use included files under files/ in this repo):
./scripts/download/download_truevg_vqa.sh
Preprocess the data:
./scripts/preprocessing/preprocessing_truevg_vqa.sh
- Run scripts in
scripts/GQA/
, andscripts/VQAHAT/
to train models. Tests and evaluations run automatically after training. The first argument to these scripts is the dataset name (gqacp, hatcp), the second is your GPU's number. Example:
./scripts/GQA/updn_baseline_DET.sh gqacp 0
./scripts/VQAHAT/updn_visFIS_INF_semantic.sh hatcp 1
Script names should be self-explanatory as to which model they train (in correspondence with the paper). Evaluation results (accuracy and FPVG) are printed after training and testing has finished.
This code builds on VisFIS, which itself used resources from negative analysis of grounding, ramen, and bottom-up-attention-vqa . Code from FPVG is also used.
If you find this code useful for your research, please consider citing:
@inproceedings{reich2024truevg,
title={Uncovering the Full Potential of Visual Grounding Methods in VQA},
author = {Reich, Daniel and Schultz, Tanja},
booktitle={arXiv},
year={2024}
}