Official implementation of the Few-shot Visual Relationship Co-localization (ICCV 2021) paper
-
Use python >= 3.8.5. Conda recommended : https://docs.anaconda.com/anaconda/install/linux/
-
Use pytorch 1.7.0 CUDA 10.2
-
Other requirements from 'requirements.txt'
To setup environment
# create new env vrc
$ conda create -n vrc python=3.8.5
# activate vrc
$ conda activate vrc
# install pytorch, torchvision
$ conda install pytorch==1.7.0 torchvision==0.8.0 cudatoolkit=10.2 -c pytorch
# install other dependencies
$ pip install -r requirements.txt
-
Download VG images from https://visualgenome.org/
-
Extract faster_rcnn features of VG images using data_preparation/vrc_extract_frcnn_feats.py. Please follow instructions here.
-
Download VrR-VG dataset from http://vrr-vg.com/ or Google Drive Link
To check and update training, model and dataset parameters see VR_Encoder/configs
$ python train_vr_encoder.py
To check and update training, testing, model and dataset parameters see VR_SimilarityNetwork/configs
$ python SimilarityNetworkTrain.py
$ python ConcatplusSimilarityNetworkTrain.py
To evaluate (set eval setting in test_config.yaml)
$ python FullModelTest.py
If you find this code/paper useful for your research, please consider citing.
@InProceedings{teotiaMMM2021,
author = "Teotia, Revant and Mishra, Vaibhav and Maheshwari, Mayank and Mishra, Anand",
title = "Few-shot Visual Relationship Co-Localization",
booktitle = "ICCV",
year = "2021",
}
This repo uses https://gitlab.com/meetshah1995/vqa-maskrcnn-benchmark and scripts from https://github.com/facebookresearch/mmf for Faster R-CNN feature extraction.
Code provided by https://github.com/zawlin/cvpr17_vtranse and https://github.com/yangxuntu/vrd helped in implementing VR encoder.
For any clarification, comment, or suggestion please create an issue or contact Revant, Vaibhav or Mayank.