GitHub URL: https://github.com/mbbuehler/iPercept
This repository contains the code for DenseBag, a neural network architecture for the problem of eye gaze estimation.
Our model ranked first in the in-class Kaggle competition (Machine Perception, ETH Zurich, 2018).
DenseBag consists of multiple Densely Connected Convolutional Neural Networks which are trained independently on bootstrapped samples from the original data and then averaged to give a final result. This reduces the variance and overfitting demonstrated by the individual estimators. See the report for more information or try out our code.
- Download densebag_code.zip and extract
wget http://mbuehler.ch/public_downloads/densebag/densebag_code.zip
unzip densebag_code.zip
cd densebag_code
- Make sure you have installed the required python modules.
python setup.py install
- Create a folder and add the dataset:
mkdir datasets
cd datasets
wget http://mbuehler.ch/public_downloads/densebag/MPIIGaze_kaggle_students.h5
cd ..
- Create the folder for the outputs
mkdir outputs/
cd ..
- Now you can run the training script from the source folder
cd src/
python train_densebag.py -B_start 10 -B 13
The next section describes in more detail how to train and make predictions using DenseBag.
Training the DenseBag model consists of two steps. In the following they are explained in detail.
In order to retrain the DenseBag model, you start by training a number B of base models (DenseNetFixed) using the script train_densebag.py
. For example, you can train 10 models use train_densebag.py -B 10
.
The parameter B is set as random seed. Make sure you use different random seeds for training the model. For example if you have already trained 10 models with random seed B=0,...,9, you can train more models using the optional parameter B_start, i.e. train_densebag.py -B_start 10 -B 15
. This will train another 5 models and use the random seed 10,11,...,14.
The weights of the trained base models are saved in the outputs folder: ../outputs/DenseBag_RS<B>_<timestamp>
. This folder also contains a csv file to_submit_to_kaggle_<timestamp>.csv
This file contains the predictions for this base model on the test set. In the second step we will average these predictions.
We recommend to train several models in parallel on different machines / GPUs. Again please make sure to use different random seeds, otherwise the final model might not perform optimally.
In our experiments we started obtaining good results using B=10 models. For 10 < B < 30 the results got better and better. For B>30 we did not significantly improve the kaggle score any more (although the variance might be reduced further for larger B). See bellow for possible issues for training.
Note: We trained the model on the Azure Ubuntu Data Science Machine (1 Nvidia Tesla K80 GPU). Training a single model took between 30 and 40 minutes.
After training the base models you can average their predictions using the script utils/densebag_bag_predictions.py. For a clean file hierarchy we recommend storing the trained models and predictions in a new folder. Here is how to do it:
- Create a new folder
outputs/DenseBag
- Move / copy all output folders for your models (e.g.
DenseBag_RS001_12345
to the newly created folderoutputs/DenseBag
. - Go to directory
src/util
and rundensebag_bag_predictions.py
. This will produce a kaggle submission file in theoutputs/DenseBag
folder, e.g.to_submit_to_kaggle_B_98_1529048062.csv
.
We noted that when training several models using train_densebag.py the Azure Virtual Machine freezes after training 5 models in a row. We have not solved this problem (we suspect issues with the VM or a problem with tensorflow). To mitigate this you can train models in batches of 5. If you find out how to fix this issue, please let us know. Thank you.
You might not want to retrain the model. We provide you with a collection of 30 pre-trained base models by here. These models were each trained on a bootstrapped sample of a combined dataset of the MPII training and validation sets.
We experimented with a number of different architectures. The code for building and training these models has been moved to archive folders (e.g. src/archive, src/models/archive,...).
Appearance-based Gaze Estimation in the Wild, X. Zhang, Y. Sugano, M. Fritz and A. Bulling, Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June, p.4511-4520, (2015).
We would like to thank the ETH AIT Lab for organising this challenge and writing the skeleton code. A big thanks goes to Microsoft Azure for providing us with the resources to train DenseBag. Lastly, thanks to Yixuan Li for the implementation of the DenseNet Model, which we adapted to our needs.