Skip to content

Latest commit

 

History

History
853 lines (817 loc) · 20 KB

README.md

File metadata and controls

853 lines (817 loc) · 20 KB

Is 2D Heatmap Even Necessary for Human Pose Estimation?

NOTE: SimDR is the old name of this work, and we now use SimCC officially in our paper. For simplicity, we won't change the name in our codes, considering it has already been used by a lot of people.

The 2D heatmap representation has dominated human pose estimation for years due to its high performance. However, heatmap-based approaches suffer from several shortcomings:

    1. The performance drops dramatically in the low-resolution images, which are frequently encountered in real-world scenarios.
    1. To improve the localization precision, multiple upsample layers may be needed to recover the feature map resolution from low to high, which are computationally expensive.
    1. Extra coordinate refinement is usually necessary to reduce the quantization error of downscaled heatmaps.

Intro: Given the shortcomings revealed above, we don't think 2D heatmap is the final solution for keypoint coordinate representation to this field. By contrast, SimDR is a simple yet effective scheme which gets rid of extra post-processing and reduces the quantisation error by the coordinate representation design. For the first time, SimDR brings heatmap-free methods to the competitive performance level of heatmap-based methods, outperforming the latter by a large margin in low input resolution cases. Additionally, SimDR allows one to directly remove the time-consuming upsampling module of some methods, which may inspire new researches on lightweight models for Human Pose Estimation

We hope proposed SimDR will motivate the community to rethink the design of coordinate representation for 2D human pose estimation.

For details see SimCC: a Simple Coordinate Classification Perspective for Human Pose Estimation by Yanjie Li, Sen Yang, Peidong Liu, Shoukui Zhang, Yunxiao Wang, Zhicheng Wang, Wankou Yang and Shu-Tao Xia.

image

News!

  • [2022.07.17] Our paper ''SimCC: a Simple Coordinate Classification Perspective for Human Pose Estimation'' has been accpeted by ECCV'2022 as Oral presentation (acceptance rate: 2.7%). If you find this repository useful please give it a star 🌟.
  • [2021.08.17] The pretrained models are released in Google Drive!
  • [2021.07.09] The codes for SimDR and SimDR* (space-aware SimDR) are released!

Experiments

Results on COCO test-dev set

Method Representation Input size GFLOPs AP AR
SimBa-Res50 heatmap 384x288 20.0 71.5 76.9
SimBa-Res50 SimDR* 384x288 20.2 72.7 78.0
HRNet-W48 heatmap 256x192 14.6 74.2 79.5
HRNet-W48 SimDR* 256x192 14.6 75.4 80.5
HRNet-W48 heatmap 384x288 32.9 75.5 80.5
HRNet-W48 SimDR* 384x288 32.9 76.0 81.1

Note:

  • Flip test is used.
  • Person detector has person AP of 60.9 on COCO test-dev2017 dataset.
  • GFLOPs is for convolution and linear layers only.

Results on COCO validation set

Method Representation Input size #Params GFLOPs Extra post. AP AR
SimBa-Res50 heatmap 64x64 34.0M 0.7 Y 34.4 43.7
heatmap 64x64 34.0M 0.7 N 25.8 36.0
SimDR (ours) 64x64 34.1M 0.7 N 40.8 49.6
heatmap 128x128 34.0M 3.0 Y 60.3 67.6
heatmap 128x128 34.0M 3.0 N 55.4 63.6
SimDR (ours) 128x128 34.8M 3.0 N 62.6 69.5
heatmap 256x192 34.0M 8.9 Y 70.4 76.3
heatmap 256x192 34.0M 8.9 N 68.5 74.8
SimDR (ours) 256x192 36.8M 9.0 N 71.4 77.4
TokenPose-S heatmap 64x64 4.9M 1.4 Y 57.1 64.8
heatmap 64x64 4.9M 1.4 N 35.9 47.0
SimDR (ours) 64x64 4.9M 1.4 N 62.8 70.1
heatmap 128x128 5.2M 1.6 Y 65.4 71.6
heatmap 128x128 5.2M 1.6 N 57.6 64.9
SimDR (ours) 128x128 5.1M 1.6 N 71.4 76.4
heatmap 256x192 6.6M 2.2 Y 72.5 78.0
heatmap 256x192 6.6M 2.2 N 69.9 75.8
SimDR (ours) 256x192 5.5M 2.2 N 73.6 78.9
SimBa-Res101 heatmap 64x64 53.0M 1.0 Y 34.1 43.5
heatmap 64x64 53.0M 1.0 N 25.7 36.1
SimDR (ours) 64x64 53.1M 1.0 N 39.6 48.9
heatmap 128x128 53.0M 4.1 Y 59.2 66.7
heatmap 128x128 53.0M 4.1 N 54.4 62.5
SimDR (ours) 128x128 53.5M 4.1 N 63.1 70.1
heatmap 256x192 53.0M 12.4 Y 71.4 77.1
heatmap 256x192 53.0M 12.4 N 69.5 75.6
SimDR (ours) 256x192 53.7M 12.4 N 72.3 78.0
HRNet-W32 heatmap 64x64 28.5M 0.6 Y 45.8 55.3
heatmap 64x64 28.5M 0.6 N 34.6 45.6
SimDR (ours) 64x64 28.6M 0.6 N 56.4 64.9
heatmap 128x128 28.5M 2.4 Y 67.2 74.1
heatmap 128x128 28.5M 2.4 N 61.9 69.4
SimDR (ours) 128x128 29.1M 2.4 N 70.7 76.7
heatmap 256x192 28.5M 7.1 Y 74.4 79.8
heatmap 256x192 28.5M 7.1 N 72.3 78.2
SimDR 256x192 31.3M 7.1 N 75.3 80.8
HRNet-W48 heatmap 64x64 63.6M 1.2 Y 48.5 57.8
heatmap 64x64 63.6M 1.2 N 36.9 47.8
SimDR (ours) 64x64 63.7M 1.2 N 59.7 67.5
heatmap 128x128 63.6M 4.9 Y 68.9 75.3
heatmap 128x128 63.6M 4.9 N 63.3 70.5
SimDR (ours) 128x128 64.1M 4.9 N 72.0 77.9
heatmap 256x192 63.6M 14.6 Y 75.1 80.4
heatmap 256x192 63.6M 14.6 N 73.1 78.7
SimDR (ours) 256x192 66.3M 14.6 N 75.9 81.2

Note:

  • Flip test is used.
  • Person detector has person AP of 56.4 on COCO val2017 dataset.
  • GFLOPs is for convolution and linear layers only.
  • Extra post. = extra post-processing towards refining the predicted keypoint coordinate.

Results on higher input resolution

Results on the COCO validation set with the input size of 384×288.

Method Representation AP AP_50 AP_75 AP_M AP_L AR
SimBa-Res50 heatmap 72.2 89.3 78.9 68.1 79.7 77.6
SimDR (ours) 73.0 89.3 79.7 69.5 79.9 78.6
SimDR* (ours) 73.4 89.2 80.0 69.7 80.6 78.8
SimBa-Res101 heatmap 73.6 89.6 80.3 69.9 81.1 79.1
SimDR (ours) 74.2 89.6 80.9 70.7 80.9 79.8
SimBa-Res152 heatmap 74.3 89.6 81.1 70.5 81.6 79.7
SimDR (ours) 74.9 89.9 81.5 71.4 81.7 80.4
HRNet-W48 heatmap 76.3 90.8 82.9 72.3 83.4 81.2
SimDR* (ours) 76.9 90.9 83.2 73.2 83.8 82.0

Note:

  • Flip test is used.
  • Person detector has person AP of 56.4 on COCO val2017 dataset.

Results on MPII val set

Method Representation Input size Hea Sho Elb Wri Hip Kne Ank Mean
[email protected]
HRNet-W32 heatmap 64x64 89.7 86.6 75.1 65.7 77.2 69.2 63.6 76.4
SimDR (ours) 64x64 96.5 89.5 77.5 67.6 79.8 71.5 65.0 78.7
heatmap 256x256 97.1 95.9 90.3 86.4 89.1 87.1 83.3 90.3
SimDR (ours) 256x256 96.8 95.9 90.0 85.0 89.1 85.4 81.3 89.6
SimDR* (ours) 256x256 97.2 96.0 90.4 85.6 89.5 85.8 81.8 90.0
[email protected]
HRNet-W32 heatmap 64x64 12.9 11.7 9.7 7.1 7.2 7.2 6.6 9.2
SimDR (ours) 64x64 30.9 23.3 18.1 15.0 10.5 13.1 12.8 18.5
heatmap 256x256 44.5 37.3 37.5 36.9 15.1 25.9 27.2 33.1
SimDR (ours) 256x256 50.1 41.0 45.3 42.4 16.6 29.7 30.3 37.8

Note:

  • Flip test is used.
  • It seems that there is a bug while computing [email protected] in the original code, we have it fixed in this repo.

Results on CrowdPose

Method Representation Input size AP AP_50 AP_75 AP_E AP_M AP_H
HRNet-W32 heatmap 64x64 42.4 69.6 45.5 51.2 43.1 31.8
SimDR (ours) 64x64 46.5 70.9 50.0 56.0 47.5 34.7
heatmap 256x192 66.4 81.1 71.5 74.0 67.4 55.6
SimDR (ours) 256x192 66.7 82.1 72.0 74.1 67.8 56.2

Start to use

1. Dependencies installation & data preparation

Please refer to THIS to prepare the environment step by step.

2. Model Zoo

Pretrained models are provided in our model zoo.

3. Trainging

Training on COCO train2017 dataset

To train with SimDR as keypoint coordinate representation :

python tools/train.py \
    --cfg experiments/coco/hrnet/simdr/nmt_w48_256x192_adam_lr1e-3.yaml\

To train with SimDR* as keypoint coordinate representation :

python tools/train.py \
    --cfg experiments/coco/hrnet/sa_simdr/w48_256x192_adam_lr1e-3_split2_sigma4.yaml\

*Note: After using SimDR, the decovonlution layers of SimpleBaseline can be reserved or removed.

Training on MPII dataset

To train with SimDR as keypoint coordinate representation :

python tools/train.py \
    --cfg experiments/mpii/hrnet/simdr/norm_w32_256x256_adam_lr1e-3_ls2e1.yaml

To train with SimDR* as keypoint coordinate representation :

python tools/train.py \
    --cfg experiments/mpii/hrnet/sa_simdr/w32_256x256_adam_lr1e-3_split2_sigma6.yaml

4. Testing

Testing on COCO val2017 dataset using model zoo's models

python tools/test.py \
    --cfg experiments/coco/hrnet/simdr/nmt_w48_256x192_adam_lr1e-3.yaml \
    TEST.MODEL_FILE _PATH_TO_CHECKPOINT_ \
    TEST.USE_GT_BBOX False
python tools/test.py \
    --cfg experiments/coco/hrnet/sa_simdr/w48_256x192_adam_lr1e-3_split2_sigma4.yaml \
    TEST.MODEL_FILE _PATH_TO_CHECKPOINT_ \
    TEST.USE_GT_BBOX False

Testing on MPII dataset using model zoo's models

python tools/test.py \
    --cfg experiments/mpii/hrnet/simdr/norm_w32_256x256_adam_lr1e-3_ls2e1.yaml \
    TEST.MODEL_FILE _PATH_TO_CHECKPOINT_ TEST.PCKH_THRE 0.5

Citations

If you use our code or models in your research, please cite with:

@misc{li20212d,
      title={Is 2D Heatmap Representation Even Necessary for Human Pose Estimation?}, 
      author={Yanjie Li and Sen Yang and Shoukui Zhang and Zhicheng Wang and Wankou Yang and Shu-Tao Xia and Erjin Zhou},
      year={2021},
      eprint={2107.03332},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Acknowledgement

Thanks for the open-source HRNet.