first commit

yuleiniu · Jun 4, 2019 · a677b2f · a677b2f
commit a677b2f
Show file tree

Hide file tree

Showing 29 changed files with 2,983 additions and 0 deletions.
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1,41 @@
+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[cod]
+*$py.class
+
+# C extensions
+*.so
+
+# Distribution / packaging
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+MANIFEST
+detectron/
+*.ipynb
+
+# Datasets, pretrained models, checkpoints and preprocessed files
+data/
+!visdialch/data/
+checkpoints/
+logs/
+
+# IPython Notebook
+.ipynb_checkpoints
+
+# virtualenv
+venv/
+ENV/
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -0,0 +1,13 @@
+repos:
+-   repo: https://github.com/ambv/black
+    rev: 19.3b0
+    hooks:
+    - id: black
+      language_version: python3.6
+-   repo: https://github.com/pre-commit/pre-commit-hooks
+    rev: v2.1.0
+    hooks:
+    - id: flake8
+    - id: trailing-whitespace
+    - id: check-added-large-files
+    - id: end-of-file-fixer
diff --git a/LICENSE b/LICENSE
@@ -0,0 +1,60 @@
+BSD 3-Clause License
+
+Copyright (c) 2018, Yulei Niu
+All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are met:
+
+* Redistributions of source code must retain the above copyright notice, this
+  list of conditions and the following disclaimer.
+
+* Redistributions in binary form must reproduce the above copyright notice,
+  this list of conditions and the following disclaimer in the documentation
+  and/or other materials provided with the distribution.
+
+* Neither the name of the copyright holder nor the names of its
+  contributors may be used to endorse or promote products derived from
+  this software without specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
+FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+
+BSD 3-Clause License
+
+Copyright (c) 2018, Karan Desai
+All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are met:
+
+* Redistributions of source code must retain the above copyright notice, this
+  list of conditions and the following disclaimer.
+
+* Redistributions in binary form must reproduce the above copyright notice,
+  this list of conditions and the following disclaimer in the documentation
+  and/or other materials provided with the distribution.
+
+* Neither the name of the copyright holder nor the names of its
+  contributors may be used to endorse or promote products derived from
+  this software without specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
+FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
diff --git a/README.md b/README.md
@@ -0,0 +1,137 @@
+Recursive Visual Attention in Visual Dialog
+====================================
+
+This repository contains the code for the following paper:
+
+* Yulei Niu, Hanwang Zhang, Manli Zhang, Jianhong Zhang, Zhiwu Lu, Ji-Rong Wen, *Recursive Visual Attention in Visual Dialog*. In CVPR, 2019. ([PDF](https://arxiv.org/pdf/1812.02664.pdf))
+
+```
+@InProceedings{Niu_2019_CVPR,
+    author = {Niu, Yulei and Zhang, Hanwang and Zhang, Manli and Zhang, Jianhong and Lu, Zhiwu and Wen, Ji-Rong},
+    title = {Recursive Visual Attention in Visual Dialog},
+    booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
+    month = {June},
+    year = {2019}
+}
+
+```
+
+This code is reimplemented as a fork of [batra-mlp-lab/visdial-challenge-starter-pytorch][6].
+
+
+Setup and Dependencies
+----------------------
+
+This code is implemented using PyTorch v1.0, and provides out of the box support with CUDA 9 and CuDNN 7. Anaconda/Miniconda is the recommended to set up this codebase: 
+
+### Anaconda or Miniconda
+
+1. Install Anaconda or Miniconda distribution based on Python3+ from their [downloads' site][1].
+2. Clone this repository and create an environment:
+
+```shell
+git clone https://www.github.com/yuleiniu/rva
+conda create -n visdial-ch python=3.6
+
+# activate the environment and install all dependencies
+conda activate visdial-ch
+cd rva/
+pip install -r requirements.txt
+
+# install this codebase as a package in development version
+python setup.py develop
+```
+
+
+Download Data
+-------------
+
+1. Download the VisDial v1.0 dialog json files from [here][3] and keep it under `$PROJECT_ROOT/data` directory, for default arguments to work effectively.
+
+2. Get the word counts for VisDial v1.0 train split [here][4]. They are used to build the vocabulary.
+
+3. [batra-mlp-lab][6] provides pre-extracted image features of VisDial v1.0 images, using a Faster-RCNN pre-trained on Visual Genome. If you wish to extract your own image features, skip this step and download VisDial v1.0 images from [here][3] instead. Extracted features for v1.0 train, val and test are available for download at these links. Note that these files do not contain the bounding box information.
+
+  * [`features_faster_rcnn_x101_train.h5`](https://s3.amazonaws.com/visual-dialog/data/v1.0/2019/features_faster_rcnn_x101_train.h5): Bottom-up features of 36 proposals from images of `train` split.
+  * [`features_faster_rcnn_x101_val.h5`](https://s3.amazonaws.com/visual-dialog/data/v1.0/2019/features_faster_rcnn_x101_val.h5): Bottom-up features of 36 proposals from images of `val` split.
+  * [`features_faster_rcnn_x101_test.h5`](https://s3.amazonaws.com/visual-dialog/data/v1.0/2019/features_faster_rcnn_x101_test.h5): Bottom-up features of 36 proposals from images of `test` split.
+
+4. [batra-mlp-lab][6] also provides pre-extracted FC7 features from VGG16.
+
+  * [`features_vgg16_fc7_train.h5`](https://s3.amazonaws.com/visual-dialog/data/v1.0/2019/features_vgg16_fc7_train.h5): VGG16 FC7 features from images of `train` split.
+  * [`features_vgg16_fc7_val.h5`](https://s3.amazonaws.com/visual-dialog/data/v1.0/2019/features_vgg16_fc7_val.h5): VGG16 FC7 features from images of `val` split.
+  * [`features_vgg16_fc7_test.h5`](https://s3.amazonaws.com/visual-dialog/data/v1.0/2019/features_vgg16_fc7_test.h5): VGG16 FC7 features from images of `test` split.
+
+5. Download the GloVe pretrained word vectors from [here][12], and keep `glove.6B.300d.txt` under `$PROJECT_ROOT/data` directory.
+
+Extracting Features (Optional)
+-------------
+
+### With Docker (Optional)
+For Dockerfile, please refer to [batra-mlp-lab/visdial-challenge-starter-pytorch][8].
+
+### Without Docker (Optional)
+
+0. Set up opencv, [cocoapi][9] and [Detectron][10].
+
+1. Prepare the [MSCOCO][11] and [Flickr][3] images.
+
+2. Extract visual features.
+```shell
+python ./data/extract_features_detectron.py --image-root /path/to/MSCOCO/train2014/ /path/to/MSCOCO/val2014/ --save-path /path/to/feature --split train # Bottom-up features of 36 proposals from images of train split.
+python ./data/extract_features_detectron.py --image-root /path/to/Flickr/VisualDialog_val2018 --save-path /path/to/feature --split val # Bottom-up features of 36 proposals from images of val split.
+python ./data/extract_features_detectron.py --image-root /path/to/Flickr/VisualDialog_test2018 --save-path /path/to/feature --split test # Bottom-up features of 36 proposals from images of test split.
+```
+
+Initializing GloVe Word Embeddings
+--------------
+Simply run 
+```shell
+python data/init_glove.py
+```
+
+
+Training
+--------
+
+Train the model provided in this repository as:
+
+```shell
+python train.py --config-yml configs/rva.yml --gpu-ids 0 # provide more ids for multi-GPU execution other args...
+```
+
+### Saving model checkpoints
+
+This script will save model checkpoints at every epoch as per path specified by `--save-dirpath`. Refer [visdialch/utils/checkpointing.py][7] for more details on how checkpointing is managed.
+
+### Logging
+
+We use [Tensorboard][2] for logging training progress. Recommended: execute `tensorboard --logdir /path/to/save_dir --port 8008` and visit `localhost:8008` in the browser.
+
+
+Evaluation
+----------
+
+Evaluation of a trained model checkpoint can be done as follows:
+
+```shell
+python evaluate.py --config-yml /path/to/config.yml --load-pthpath /path/to/checkpoint.pth --split val --gpu-ids 0
+```
+
+This will generate an EvalAI submission file, and report metrics from the [Visual Dialog paper][5] (Mean reciprocal rank, R@{1, 5, 10}, Mean rank), and Normalized Discounted Cumulative Gain (NDCG), introduced in the first Visual Dialog Challenge (in 2018).
+
+The metrics reported here would be the same as those reported through EvalAI by making a submission in `val` phase. To generate a submission file for `test-std` or `test-challenge` phase, replace `--split val` with `--split test`.
+
+
+[1]: https://conda.io/docs/user-guide/install/download.html
+[2]: https://www.github.com/lanpa/tensorboardX
+[3]: https://visualdialog.org/data
+[4]: https://s3.amazonaws.com/visual-dialog/data/v1.0/2019/visdial_1.0_word_counts_train.json
+[5]: https://arxiv.org/abs/1611.08669
+[6]: https://www.github.com/batra-mlp-lab/visdial-challenge-starter-pytorch
+[7]: https://www.github.com/yuleiniu/rva/blob/master/visdialch/utils/checkpointing.py
+[8]: https://www.github.com/batra-mlp-lab/visdial-challenge-starter-pytorch#docker
+[9]: https://www.github.com/cocodataset/cocoapi
+[10]: https://www.github.com/facebookresearch/Detectron
+[11]: http://cocodataset.org/#download
+[12]: http://nlp.stanford.edu/data/glove.6B.zip
diff --git a/configs/rva.yml b/configs/rva.yml
@@ -0,0 +1,40 @@
+# Dataset reader arguments
+dataset:
+  image_features_train_h5: 'data/features_faster_rcnn_x101_train.h5'
+  image_features_val_h5: 'data/features_faster_rcnn_x101_val.h5'
+  image_features_test_h5: 'data/features_faster_rcnn_x101_test.h5'
+  word_counts_json: 'data/visdial_1.0_word_counts_train.json'
+  glove_npy: 'data/glove.npy'
+
+  img_norm: 1
+  concat_history: false
+  max_sequence_length: 20
+  vocab_min_count: 5
+
+
+# Model related arguments
+model:
+  encoder: 'rva'
+  decoder: 'disc'
+
+  img_feature_size: 2048
+  word_embedding_size: 300
+  lstm_hidden_size: 512
+  lstm_num_layers: 2
+  dropout: 0.5
+  dropout_fc: 0.3
+
+  relu: 'ReLU'
+
+# Optimization related arguments
+solver:
+  batch_size: 24 # 32 x num_gpus is a good rule of thumb
+  num_epochs: 15
+  initial_lr: 0.01
+  training_splits: "train"  # "trainval"
+  lr_gamma: 0.1
+  lr_milestones: # epochs when lr —> lr * lr_gamma
+    - 5
+    - 10
+  warmup_factor: 0.2
+  warmup_epochs: 1