NOTE: SimDR is the old name of this work, and we now use SimCC officially in our paper. For simplicity, we won't change the name in our codes, considering it has already been used by a lot of people.
The 2D heatmap representation has dominated human pose estimation for years due to its high performance. However, heatmap-based approaches suffer from several shortcomings:
-
- The performance drops dramatically in the low-resolution images, which are frequently encountered in real-world scenarios.
-
- To improve the localization precision, multiple upsample layers may be needed to recover the feature map resolution from low to high, which are computationally expensive.
-
- Extra coordinate refinement is usually necessary to reduce the quantization error of downscaled heatmaps.
Intro: Given the shortcomings revealed above, we don't think 2D heatmap is the final solution for keypoint coordinate representation to this field. By contrast, SimDR is a simple yet effective scheme which gets rid of extra post-processing and reduces the quantisation error by the coordinate representation design. For the first time, SimDR brings heatmap-free methods to the competitive performance level of heatmap-based methods, outperforming the latter by a large margin in low input resolution cases. Additionally, SimDR allows one to directly remove the time-consuming upsampling module of some methods, which may inspire new researches on lightweight models for Human Pose Estimation
We hope proposed SimDR will motivate the community to rethink the design of coordinate representation for 2D human pose estimation.
For details see SimCC: a Simple Coordinate Classification Perspective for Human Pose Estimation by Yanjie Li, Sen Yang, Peidong Liu, Shoukui Zhang, Yunxiao Wang, Zhicheng Wang, Wankou Yang and Shu-Tao Xia.
- [2022.07.17] Our paper ''SimCC: a Simple Coordinate Classification Perspective for Human Pose Estimation'' has been accpeted by ECCV'2022 as Oral presentation (acceptance rate: 2.7%). If you find this repository useful please give it a star 🌟.
- [2021.08.17] The pretrained models are released in Google Drive!
- [2021.07.09] The codes for SimDR and SimDR* (space-aware SimDR) are released!
Method | Representation | Input size | GFLOPs | AP | AR |
---|---|---|---|---|---|
SimBa-Res50 | heatmap | 384x288 | 20.0 | 71.5 | 76.9 |
SimBa-Res50 | SimDR* | 384x288 | 20.2 | 72.7 | 78.0 |
HRNet-W48 | heatmap | 256x192 | 14.6 | 74.2 | 79.5 |
HRNet-W48 | SimDR* | 256x192 | 14.6 | 75.4 | 80.5 |
HRNet-W48 | heatmap | 384x288 | 32.9 | 75.5 | 80.5 |
HRNet-W48 | SimDR* | 384x288 | 32.9 | 76.0 | 81.1 |
- Flip test is used.
- Person detector has person AP of 60.9 on COCO test-dev2017 dataset.
- GFLOPs is for convolution and linear layers only.
Method | Representation | Input size | #Params | GFLOPs | Extra post. | AP | AR |
---|---|---|---|---|---|---|---|
SimBa-Res50 | heatmap | 64x64 | 34.0M | 0.7 | Y | 34.4 | 43.7 |
heatmap | 64x64 | 34.0M | 0.7 | N | 25.8 | 36.0 | |
SimDR (ours) | 64x64 | 34.1M | 0.7 | N | 40.8 | 49.6 | |
heatmap | 128x128 | 34.0M | 3.0 | Y | 60.3 | 67.6 | |
heatmap | 128x128 | 34.0M | 3.0 | N | 55.4 | 63.6 | |
SimDR (ours) | 128x128 | 34.8M | 3.0 | N | 62.6 | 69.5 | |
heatmap | 256x192 | 34.0M | 8.9 | Y | 70.4 | 76.3 | |
heatmap | 256x192 | 34.0M | 8.9 | N | 68.5 | 74.8 | |
SimDR (ours) | 256x192 | 36.8M | 9.0 | N | 71.4 | 77.4 | |
TokenPose-S | heatmap | 64x64 | 4.9M | 1.4 | Y | 57.1 | 64.8 |
heatmap | 64x64 | 4.9M | 1.4 | N | 35.9 | 47.0 | |
SimDR (ours) | 64x64 | 4.9M | 1.4 | N | 62.8 | 70.1 | |
heatmap | 128x128 | 5.2M | 1.6 | Y | 65.4 | 71.6 | |
heatmap | 128x128 | 5.2M | 1.6 | N | 57.6 | 64.9 | |
SimDR (ours) | 128x128 | 5.1M | 1.6 | N | 71.4 | 76.4 | |
heatmap | 256x192 | 6.6M | 2.2 | Y | 72.5 | 78.0 | |
heatmap | 256x192 | 6.6M | 2.2 | N | 69.9 | 75.8 | |
SimDR (ours) | 256x192 | 5.5M | 2.2 | N | 73.6 | 78.9 | |
SimBa-Res101 | heatmap | 64x64 | 53.0M | 1.0 | Y | 34.1 | 43.5 |
heatmap | 64x64 | 53.0M | 1.0 | N | 25.7 | 36.1 | |
SimDR (ours) | 64x64 | 53.1M | 1.0 | N | 39.6 | 48.9 | |
heatmap | 128x128 | 53.0M | 4.1 | Y | 59.2 | 66.7 | |
heatmap | 128x128 | 53.0M | 4.1 | N | 54.4 | 62.5 | |
SimDR (ours) | 128x128 | 53.5M | 4.1 | N | 63.1 | 70.1 | |
heatmap | 256x192 | 53.0M | 12.4 | Y | 71.4 | 77.1 | |
heatmap | 256x192 | 53.0M | 12.4 | N | 69.5 | 75.6 | |
SimDR (ours) | 256x192 | 53.7M | 12.4 | N | 72.3 | 78.0 | |
HRNet-W32 | heatmap | 64x64 | 28.5M | 0.6 | Y | 45.8 | 55.3 |
heatmap | 64x64 | 28.5M | 0.6 | N | 34.6 | 45.6 | |
SimDR (ours) | 64x64 | 28.6M | 0.6 | N | 56.4 | 64.9 | |
heatmap | 128x128 | 28.5M | 2.4 | Y | 67.2 | 74.1 | |
heatmap | 128x128 | 28.5M | 2.4 | N | 61.9 | 69.4 | |
SimDR (ours) | 128x128 | 29.1M | 2.4 | N | 70.7 | 76.7 | |
heatmap | 256x192 | 28.5M | 7.1 | Y | 74.4 | 79.8 | |
heatmap | 256x192 | 28.5M | 7.1 | N | 72.3 | 78.2 | |
SimDR | 256x192 | 31.3M | 7.1 | N | 75.3 | 80.8 | |
HRNet-W48 | heatmap | 64x64 | 63.6M | 1.2 | Y | 48.5 | 57.8 |
heatmap | 64x64 | 63.6M | 1.2 | N | 36.9 | 47.8 | |
SimDR (ours) | 64x64 | 63.7M | 1.2 | N | 59.7 | 67.5 | |
heatmap | 128x128 | 63.6M | 4.9 | Y | 68.9 | 75.3 | |
heatmap | 128x128 | 63.6M | 4.9 | N | 63.3 | 70.5 | |
SimDR (ours) | 128x128 | 64.1M | 4.9 | N | 72.0 | 77.9 | |
heatmap | 256x192 | 63.6M | 14.6 | Y | 75.1 | 80.4 | |
heatmap | 256x192 | 63.6M | 14.6 | N | 73.1 | 78.7 | |
SimDR (ours) | 256x192 | 66.3M | 14.6 | N | 75.9 | 81.2 |
- Flip test is used.
- Person detector has person AP of 56.4 on COCO val2017 dataset.
- GFLOPs is for convolution and linear layers only.
- Extra post. = extra post-processing towards refining the predicted keypoint coordinate.
Results on the COCO validation set with the input size of 384×288.
Method | Representation | AP | AP_50 | AP_75 | AP_M | AP_L | AR |
---|---|---|---|---|---|---|---|
SimBa-Res50 | heatmap | 72.2 | 89.3 | 78.9 | 68.1 | 79.7 | 77.6 |
SimDR (ours) | 73.0 | 89.3 | 79.7 | 69.5 | 79.9 | 78.6 | |
SimDR* (ours) | 73.4 | 89.2 | 80.0 | 69.7 | 80.6 | 78.8 | |
SimBa-Res101 | heatmap | 73.6 | 89.6 | 80.3 | 69.9 | 81.1 | 79.1 |
SimDR (ours) | 74.2 | 89.6 | 80.9 | 70.7 | 80.9 | 79.8 | |
SimBa-Res152 | heatmap | 74.3 | 89.6 | 81.1 | 70.5 | 81.6 | 79.7 |
SimDR (ours) | 74.9 | 89.9 | 81.5 | 71.4 | 81.7 | 80.4 | |
HRNet-W48 | heatmap | 76.3 | 90.8 | 82.9 | 72.3 | 83.4 | 81.2 |
SimDR* (ours) | 76.9 | 90.9 | 83.2 | 73.2 | 83.8 | 82.0 |
- Flip test is used.
- Person detector has person AP of 56.4 on COCO val2017 dataset.
Method | Representation | Input size | Hea | Sho | Elb | Wri | Hip | Kne | Ank | Mean |
---|---|---|---|---|---|---|---|---|---|---|
[email protected] | ||||||||||
HRNet-W32 | heatmap | 64x64 | 89.7 | 86.6 | 75.1 | 65.7 | 77.2 | 69.2 | 63.6 | 76.4 |
SimDR (ours) | 64x64 | 96.5 | 89.5 | 77.5 | 67.6 | 79.8 | 71.5 | 65.0 | 78.7 | |
heatmap | 256x256 | 97.1 | 95.9 | 90.3 | 86.4 | 89.1 | 87.1 | 83.3 | 90.3 | |
SimDR (ours) | 256x256 | 96.8 | 95.9 | 90.0 | 85.0 | 89.1 | 85.4 | 81.3 | 89.6 | |
SimDR* (ours) | 256x256 | 97.2 | 96.0 | 90.4 | 85.6 | 89.5 | 85.8 | 81.8 | 90.0 | |
[email protected] | ||||||||||
HRNet-W32 | heatmap | 64x64 | 12.9 | 11.7 | 9.7 | 7.1 | 7.2 | 7.2 | 6.6 | 9.2 |
SimDR (ours) | 64x64 | 30.9 | 23.3 | 18.1 | 15.0 | 10.5 | 13.1 | 12.8 | 18.5 | |
heatmap | 256x256 | 44.5 | 37.3 | 37.5 | 36.9 | 15.1 | 25.9 | 27.2 | 33.1 | |
SimDR (ours) | 256x256 | 50.1 | 41.0 | 45.3 | 42.4 | 16.6 | 29.7 | 30.3 | 37.8 |
- Flip test is used.
- It seems that there is a bug while computing [email protected] in the original code, we have it fixed in this repo.
Method | Representation | Input size | AP | AP_50 | AP_75 | AP_E | AP_M | AP_H |
---|---|---|---|---|---|---|---|---|
HRNet-W32 | heatmap | 64x64 | 42.4 | 69.6 | 45.5 | 51.2 | 43.1 | 31.8 |
SimDR (ours) | 64x64 | 46.5 | 70.9 | 50.0 | 56.0 | 47.5 | 34.7 | |
heatmap | 256x192 | 66.4 | 81.1 | 71.5 | 74.0 | 67.4 | 55.6 | |
SimDR (ours) | 256x192 | 66.7 | 82.1 | 72.0 | 74.1 | 67.8 | 56.2 |
Please refer to THIS to prepare the environment step by step.
Pretrained models are provided in our model zoo.
To train with SimDR as keypoint coordinate representation :
python tools/train.py \
--cfg experiments/coco/hrnet/simdr/nmt_w48_256x192_adam_lr1e-3.yaml\
To train with SimDR* as keypoint coordinate representation :
python tools/train.py \
--cfg experiments/coco/hrnet/sa_simdr/w48_256x192_adam_lr1e-3_split2_sigma4.yaml\
*Note: After using SimDR, the decovonlution layers of SimpleBaseline can be reserved or removed.
To train with SimDR as keypoint coordinate representation :
python tools/train.py \
--cfg experiments/mpii/hrnet/simdr/norm_w32_256x256_adam_lr1e-3_ls2e1.yaml
To train with SimDR* as keypoint coordinate representation :
python tools/train.py \
--cfg experiments/mpii/hrnet/sa_simdr/w32_256x256_adam_lr1e-3_split2_sigma6.yaml
python tools/test.py \
--cfg experiments/coco/hrnet/simdr/nmt_w48_256x192_adam_lr1e-3.yaml \
TEST.MODEL_FILE _PATH_TO_CHECKPOINT_ \
TEST.USE_GT_BBOX False
python tools/test.py \
--cfg experiments/coco/hrnet/sa_simdr/w48_256x192_adam_lr1e-3_split2_sigma4.yaml \
TEST.MODEL_FILE _PATH_TO_CHECKPOINT_ \
TEST.USE_GT_BBOX False
python tools/test.py \
--cfg experiments/mpii/hrnet/simdr/norm_w32_256x256_adam_lr1e-3_ls2e1.yaml \
TEST.MODEL_FILE _PATH_TO_CHECKPOINT_ TEST.PCKH_THRE 0.5
If you use our code or models in your research, please cite with:
@misc{li20212d,
title={Is 2D Heatmap Representation Even Necessary for Human Pose Estimation?},
author={Yanjie Li and Sen Yang and Shoukui Zhang and Zhicheng Wang and Wankou Yang and Shu-Tao Xia and Erjin Zhou},
year={2021},
eprint={2107.03332},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
Thanks for the open-source HRNet.