Skip to content

Latest commit

 

History

History
114 lines (87 loc) · 4.71 KB

README.md

File metadata and controls

114 lines (87 loc) · 4.71 KB

[CVPR2023] TokenHPE: Learning Orientation Tokens for Efficient Head Pose Estimation via Transformers

This repository is an official implementation of TokenHPE.

Overview


We propose a novel critical minority relationship-aware method based on the Transformer architecture in which the facial part relationships can be learned. Specifically, we design several orientation tokens to explicitly encode the basic orientation regions. Meanwhile, a novel token guide multi-loss function is designed to guide the orientation tokens as they learn the desired regional similarities and relationships.

Preparation

Environments

python == 3.9, torch >= 1.10.1, CUDA ==11.2

Datasets

Follow the 6DRepnet to prepare the datasets:

  • 300W-LP, AFLW2000 from here.

  • BIWI (Biwi Kinect Head Pose Database) from here.

Store them in the datasets directory.

For 300W-LP and AFLW2000 we need to create a filenamelist.

python create_filename_list.py --root_dir datasets/300W_LP

The BIWI datasets needs be preprocessed by a face detector to cut out the faces from the images. You can use the script provided here. For 7:3 splitting of the BIWI dataset you can use the equivalent script here. The cropped image size is set to 256.

Download weights

Download trained weights from gdrive You can choose to use the pretrained ViT-B/16 weigthts for the feature extractor. (optional)

Directory structure

  • After preparation, you will be able to see the following directory structure:
    TokenHPE
    ├── datasets
    │   ├── 300W_LP
    │     ├── files.txt
    │     ├── ...
    │   ├── AFLW2000 
    │     ├── files.txt
    │     ├── ... 
    │   ├── ...
    ├── weights
    │   ├── TokenHPEv1-ViTB-224_224-lyr3.tar
    ├── figs
    ├── create_filename_list.py
    ├── datasets.py
    ├── README.md
    ├── ...
    

Training & Evaluation

Download trained weight from gdrive, then you can evaluate the model following:

python test.py  --batch_size 64 \
                --dataset ALFW2000 \
                --data_dir datasets/AFLW2000 \
                --filename_list datasets/AFLW2000/files.txt \
                --model_path ./weights/TokenHPEv1-ViTB-224_224-lyr3.tar \
                --show_viz False 

You can train the model following:

python train.py --batch_size 64 \
                --num_epochs 60 \
                --lr 0.00001 \
                --dataset Pose_300W_LP \
                --data_dir datasets/300W_LP \
                --filename_list datasets/300W_LP/files.txt

Inference & Visualization

You can get the visualizations following:

python inference.py  --model_path ./weights/TokenHPEv1-ViTB-224_224-lyr3.tar \
                     --image_path img_path_here

Main results

We provide some results on AFLW2000 with models trained on 300W_LP. These models are trained on one TITAN V GPU.

config MAE VMAE training download
TokenHPEv1-ViT/B-224*224-lyr3 4.81 6.09 ~24hours gdrive

Acknowledgement

Many thanks to the authors of 6DRepnet. We reuse their code for data preprocessing and evaluation which greatly reduced redundant work.

Citation

If you find our work useful, please cite the paper:

@InProceedings{Zhang_2023_CVPR,
    author    = {Zhang, Cheng and Liu, Hai and Deng, Yongjian and Xie, Bochen and Li, Youfu},
    title     = {TokenHPE: Learning Orientation Tokens for Efficient Head Pose Estimation via Transformers},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2023},
    pages     = {8897-8906}
}