This repo provides PyTorch implementation of our paper: 'Detecting Attended Visual Targets in Video' [paper]
We present a state-of-the-art method for predicting attention targets from third-person point of view. The model takes head bounding box of a person of interest, and outputs an attention heatmap of that person.
We release our new dataset, training/evaluation code, a demo code, and pre-trained models for the two main experiments reported in our paper. Pleaser refer to the paper for details.
The code has been verified on Python 3.5 and PyTorch 0.4. We provide a conda environment.yml file which you can use to re-create the environment we used. Instructions on how to create an environment from an environment.yml file can be found here.
Download our model weights using:
sh download_models.sh
You can try out our demo using the sample data included in this repo by running:
python demo.py
We use the extended GazeFollow annotation prepared by Chong et al. ECCV 2018, which makes an additional annotation to the original GazeFollow dataset regarding whether gaze targets are within or outside the frame. You can download the extended dataset from here (image and label) or here (label only).
Please adjust the dataset path accordingly in config.py.
Run:
python eval_on_gazefollow.py
to get the model's performance on the GazeFollow test set.
Run:
python train_on_gazefollow.py
to train the model. You can expect to see similar learning curves to ours.
We created a new dataset, VideoAttentionTarget, with fully annotated attention targets in video for this experiment. Dataset details can be found in our paper. Download the VideoAttentionTarget dataset from here.
Please adjust the dataset path accordingly in config.py.
Run:
python eval_on_videoatttarget.py
to get the model's performance on the VideoAttentionTarget test set.
Run:
python train_on_videoatttarget.py
to do the temporal training.
Run:
## pip install tensorflow
tensorboard --logdir=[yourlogdir]
If you use our dataset and/or code, please cite
@inproceedings{Chong_2020_CVPR,
title={Detecting Attended Visual Targets in Video},
author={Chong, Eunji and Wang, Yongxin and Ruiz, Nataniel and Rehg, James M.},
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2020}
}
If you only use the extended GazeFollow annotations, please cite
@InProceedings{Chong_2018_ECCV,
author = {Chong, Eunji and Ruiz, Nataniel and Wang, Yongxin and Zhang, Yun and Rozga, Agata and Rehg, James M.},
title = {Connecting Gaze, Scene, and Attention: Generalized Attention Estimation via Joint Modeling of Gaze and Scene Saliency},
booktitle = {The European Conference on Computer Vision (ECCV)},
month = {September},
year = {2018}
}
We make use of the PyTorch ConvLSTM implementation provided by https://github.com/kamo-naoyuki/pytorch_convolutional_rnn.
If you have any questions, please email Eunji Chong at [email protected].