This respository is the finetune code of HOP: History-and-Order Aware Pre-training for Vision-and-Language Navigation. The code is based on Recurrent-VLN-BERT. Thanks to Yicong Hong for releasing the Recurrent-VLN-BERT code.
- Install docker Please check here to install docker.
- Create container
To pull the image:
If your CUDA version is 11.3, you can pull the image:
docker pull starrychiao/hop-recurrent:v1
To create the container:docker pull starrychiao/vlnbert-2022-3090:1.0
or (if you pull the image for cuda 11.3)docker run -it --ipc host --shm-size=1024m --gpus all --name your_name --volume "your_directory":/root/mount/Matterport3DSimulator starrychiao/hop-recurrent:v1
docker run -it --ipc host --shm-size=1024m --gpus all --name your_name --volume "your_directory":/root/mount/Matterport3DSimulator starrychiao/vlnbert-2022-3090:1.0
- Set up
docker start "your container id or name" docker exec -it "your container id or name" /bin/bash cd /root/mount/Matterport3DSimulator
- Download the trained models.
cd finetune_r2r
Please follow the instructions below to prepare the data in directories:
- MP3D navigability graphs:
connectivity
- Download the connectivity maps .
- MP3D image features:
img_features
- Download the Scene features (ResNet-152-Places365).
- R2R data:
data
- Download the R2R data [5.8MB].
- Augmented data:
data/prevalent
- Download the collected triplets in PREVALENT [1.5GB] (pre-processed for easy use).
- Pre-trained HOP weights:
load_model/checkpoint
- Download the
pytorch_model.bin
from here.
- Download the
bash run/train_agent.bash
bash run/test_agent.bash
cd finetune_ndh
Please follow the instructions below to prepare the data in directories:
- MP3D navigability graphs:
connectivity
- Download the connectivity maps .
- MP3D image features:
img_features
- Download the Scene features (ResNet-152-Places365).
- Pre-trained HOP weights for NDH:
load/model
- Download the
pytorch_model.bin
from here.
- Download the
bash run/train.bash
bash run/test.bash
If you use or discuss our HOP, please cite our paper:
@InProceedings{Qiao2022HOP,
author = {Qiao, Yanyuan, Qi Yuankai, Hong, Yicong, Yu, Zheng, Wang, Peng and Wu, Qi},
title = {HOP: History-and-Order Aware Pre-training for Vision-and-Language Navigation},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2022},
pages = {15418-15427}
}