This is the PyTorch implementation for our paper:
Embodied Vision-and-Language Navigation with Dynamic Convolutional Filters
Federico Landi, Lorenzo Baraldi, Massimiliano Corsini, Rita Cucchiara
British Machine Vision Conference (BMVC), 2019
Oral Presentation
Visit the main website for more details.
If you use our code for your research, please cite our paper (BMVC 2019 oral):
@inproceedings{landi2019embodied,
title={Embodied Vision-and-Language Navigation with Dynamic Convolutional Filters},
author={Landi, Federico and Baraldi, Lorenzo and Corsini, Massimiliano and Cucchiara, Rita},
booktitle={Proceedings of the British Machine Vision Conference},
year={2019}
}
Clone the repository:
# Make sure to clone with --recursive
git clone --recursive https://github.com/fdlandi/DynamicConv-agent.git
cd DynamicConv-agent
If you didn't clone with the --recursive
flag, then you'll need to manually clone the pybind submodule from the top-level directory:
git submodule update --init --recursive
Python 3.6 is required to run our code. You can install the other modules via:
cd speaksee
pip install -e .
cd ..
pip install -r requirements.txt
Please follow the instructions on the Matterport3DSimulator to install the simulator via Docker.
The simulator can be built outside of a docker container using the cmake build commands described above. However, this is not the recommended approach, as all dependencies will need to be installed locally and may conflict with existing libraries. The main requirements are:
- Ubuntu >= 14.04
- Nvidia-driver with CUDA installed
- C++ compiler with C++11 support
- CMake >= 3.10
- OpenCV >= 2.4 including 3.x
- OpenGL
- GLM
- Numpy
Optional dependences (depending on the cmake rendering options):
Build the simulator and run the unit tests:
cd DynamicConv-agent
mkdir build && cd build
cmake -DEGL_RENDERING=ON ..
make
cd ../
./build/tests ~Timing
If you use a conda environment for your experiments, you should specify the python path in the cmake options:
cmake -DEGL_RENDERING=ON -DPYTHON_EXECUTABLE:FILEPATH='path_to_your_python_bin' ..
Alternatively, skip the generation and just download and extract our tsv files into the img_features
directory:
You can train our agent by running:
python tasks/R2R/main.py
The number of dynamic filters can be set with the --num_heads
parameter:
python tasks/R2R/main.py --num_heads=4
Results in our paper were obtained with version v0.1 of the Matterport3DSimulator. Due to this difference, results could vary from the one in the paper. Using different GPUs for training, as well as different random seeds, may also affect results.
We provide the weights obtained with our training. To reproduce results from the paper, run:
python tasks/R2R/main.py --name=normal_data --num_heads=4 --eval_only
or:
python tasks/R2R/main.py --name=data_augmentation --num_heads=4 --eval_only
The Matterport3D dataset, and data derived from it, is released under the Matterport3D Terms of Use. Our code is released under the MIT license.