-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[Refactor] Internet for 3d hand pose estimation (#2632)
- Loading branch information
Showing
27 changed files
with
2,248 additions
and
26 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
# InterHand2.6M: A Dataset and Baseline for 3D Interacting Hand Pose Estimation from a Single RGB Image | ||
|
||
## Results and Models | ||
|
||
### InterHand2.6m 3D Dataset | ||
|
||
| Arch | Set | MPJPE-single | MPJPE-interacting | MPJPE-all | MRRPE | APh | ckpt | log | Details and Download | | ||
| :------------------------------- | :-------: | :----------: | :---------------: | :-------: | :---: | :--: | :------------------------------: | :-----------------------------: | :-----------------------------------------------: | | ||
| [InterNet_resnet_50](/configs/hand/3d_kpt_sview_rgb_img/internet/interhand3d/res50_interhand3d_all_256x256.py) | test(H+M) | 9.47 | 13.40 | 11.59 | 29.28 | 0.99 | [ckpt](https://download.openmmlab.com/mmpose/hand3d/internet/res50_intehand3dv1.0_all_256x256-42b7f2ac_20210702.pth) | [log](https://download.openmmlab.com/mmpose/hand3d/internet/res50_intehand3dv1.0_all_256x256_20210702.log.json) | [internet_interhand3d.md](./interhand3d/internet_interhand3d.md) | | ||
| [InterNet_resnet_50](/configs/hand/3d_kpt_sview_rgb_img/internet/interhand3d/res50_interhand3d_all_256x256.py) | val(M) | 11.22 | 15.23 | 13.16 | 31.73 | 0.98 | [ckpt](https://download.openmmlab.com/mmpose/hand3d/internet/res50_intehand3dv1.0_all_256x256-42b7f2ac_20210702.pth) | [log](https://download.openmmlab.com/mmpose/hand3d/internet/res50_intehand3dv1.0_all_256x256_20210702.log.json) | [internet_interhand3d.md](./interhand3d/internet_interhand3d.md) | |
59 changes: 59 additions & 0 deletions
59
configs/hand_3d_keypoint/internet/interhand3d/internet_interhand3d.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,59 @@ | ||
<!-- [ALGORITHM] --> | ||
|
||
<details> | ||
<summary align="right"><a href="https://link.springer.com/content/pdf/10.1007/978-3-030-58565-5_33.pdf">InterNet (ECCV'2020)</a></summary> | ||
|
||
```bibtex | ||
@InProceedings{Moon_2020_ECCV_InterHand2.6M, | ||
author = {Moon, Gyeongsik and Yu, Shoou-I and Wen, He and Shiratori, Takaaki and Lee, Kyoung Mu}, | ||
title = {InterHand2.6M: A Dataset and Baseline for 3D Interacting Hand Pose Estimation from a Single RGB Image}, | ||
booktitle = {European Conference on Computer Vision (ECCV)}, | ||
year = {2020} | ||
} | ||
``` | ||
|
||
</details> | ||
|
||
<!-- [BACKBONE] --> | ||
|
||
<details> | ||
<summary align="right"><a href="http://openaccess.thecvf.com/content_cvpr_2016/html/He_Deep_Residual_Learning_CVPR_2016_paper.html">ResNet (CVPR'2016)</a></summary> | ||
|
||
```bibtex | ||
@inproceedings{he2016deep, | ||
title={Deep residual learning for image recognition}, | ||
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian}, | ||
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition}, | ||
pages={770--778}, | ||
year={2016} | ||
} | ||
``` | ||
|
||
</details> | ||
|
||
<!-- [DATASET] --> | ||
|
||
<details> | ||
<summary align="right"><a href="https://link.springer.com/content/pdf/10.1007/978-3-030-58565-5_33.pdf">InterHand2.6M (ECCV'2020)</a></summary> | ||
|
||
```bibtex | ||
@InProceedings{Moon_2020_ECCV_InterHand2.6M, | ||
author = {Moon, Gyeongsik and Yu, Shoou-I and Wen, He and Shiratori, Takaaki and Lee, Kyoung Mu}, | ||
title = {InterHand2.6M: A Dataset and Baseline for 3D Interacting Hand Pose Estimation from a Single RGB Image}, | ||
booktitle = {European Conference on Computer Vision (ECCV)}, | ||
year = {2020} | ||
} | ||
``` | ||
|
||
</details> | ||
|
||
Results on InterHand2.6M val & test set | ||
|
||
| Train Set | Set | Arch | Input Size | MPJPE-single | MPJPE-interacting | MPJPE-all | MRRPE | APh | ckpt | log | | ||
| :-------- | :-------- | :----------------------------------------: | :--------: | :----------: | :---------------: | :-------: | :---: | :--: | :----------------------------------------: | :---------------------------------------: | | ||
| All | test(H+M) | [InterNet_resnet_50](/configs/hand/3d_kpt_sview_rgb_img/internet/interhand3d/res50_interhand3d_all_256x256.py) | 256x256 | 9.69 | 13.72 | 11.86 | 29.27 | 0.99 | [ckpt](https://download.openmmlab.com/mmpose/v1/hand_3d_keypoint/internet/interhand3d/internet_res50_interhand3d-d6ff20d6_20230913.pth) | [log](https://download.openmmlab.com/mmpose/v1/hand_3d_keypoint/internet/interhand3d/internet_res50_interhand3d-d6ff20d6_20230913.json) | | ||
| All | val(M) | [InterNet_resnet_50](/configs/hand/3d_kpt_sview_rgb_img/internet/interhand3d/res50_interhand3d_all_256x256.py) | 256x256 | 11.30 | 15.57 | 13.36 | 32.15 | 0.98 | [ckpt](https://download.openmmlab.com/mmpose/v1/hand_3d_keypoint/internet/interhand3d/internet_res50_interhand3d-d6ff20d6_20230913.pth) | [log](https://download.openmmlab.com/mmpose/v1/hand_3d_keypoint/internet/interhand3d/internet_res50_interhand3d-d6ff20d6_20230913.json) | | ||
| All | test(H+M) | [InterNet_resnet_50\*](/configs/hand/3d_kpt_sview_rgb_img/internet/interhand3d/res50_interhand3d_all_256x256.py) | 256x256 | 9.47 | 13.40 | 11.59 | 29.28 | 0.99 | [ckpt](https://download.openmmlab.com/mmpose/hand3d/internet/res50_intehand3dv1.0_all_256x256-42b7f2ac_20210702.pth) | [log](https://download.openmmlab.com/mmpose/hand3d/internet/res50_intehand3dv1.0_all_256x256_20210702.log.json) | | ||
| All | val(M) | [InterNet_resnet_50\*](/configs/hand/3d_kpt_sview_rgb_img/internet/interhand3d/res50_interhand3d_all_256x256.py) | 256x256 | 11.22 | 15.23 | 13.16 | 31.73 | 0.98 | [ckpt](https://download.openmmlab.com/mmpose/hand3d/internet/res50_intehand3dv1.0_all_256x256-42b7f2ac_20210702.pth) | [log](https://download.openmmlab.com/mmpose/hand3d/internet/res50_intehand3dv1.0_all_256x256_20210702.log.json) | | ||
|
||
*Models with * are trained in [MMPose 0.x](https://github.com/open-mmlab/mmpose/tree/0.x). The checkpoints and logs are only for validation.* |
34 changes: 34 additions & 0 deletions
34
configs/hand_3d_keypoint/internet/interhand3d/internet_interhand3d.yml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,34 @@ | ||
Collections: | ||
- Name: InterNet | ||
Paper: | ||
Title: 'InterHand2.6M: A Dataset and Baseline for 3D Interacting Hand Pose Estimation | ||
from a Single RGB Image' | ||
URL: https://link.springer.com/content/pdf/10.1007/978-3-030-58565-5_33.pdf | ||
README: https://github.com/open-mmlab/mmpose/blob/master/docs/en/papers/algorithms/internet.md | ||
Models: | ||
- Config: configs/hand_3d_keypoint/internet/interhand3d/internet_res50_4xb16-20e_interhand3d-256x256.py | ||
In Collection: InterNet | ||
Metadata: | ||
Architecture: &id001 | ||
- InterNet | ||
- ResNet | ||
Training Data: InterHand2.6M | ||
Name: internet_res50_4xb16-20e_interhand3d-256x256 | ||
Results: | ||
- Dataset: InterHand2.6M (H+M) | ||
Metrics: | ||
APh: 0.99 | ||
MPJPE-all: 11.86 | ||
MPJPE-interacting: 13.72 | ||
MPJPE-single: 9.69 | ||
MRRPE: 29.27 | ||
Task: Hand 3D Keypoint | ||
- Dataset: InterHand2.6M (M) | ||
Metrics: | ||
APh: 0.98 | ||
MPJPE-all: 13.36 | ||
MPJPE-interacting: 15.57 | ||
MPJPE-single: 11.30 | ||
MRRPE: 32.15 | ||
Task: Hand 3D Keypoint | ||
Weights: https://download.openmmlab.com/mmpose/hand3d/internet/res50_intehand3dv1.0_all_256x256-42b7f2ac_20210702.pth |
182 changes: 182 additions & 0 deletions
182
...igs/hand_3d_keypoint/internet/interhand3d/internet_res50_4xb16-20e_interhand3d-256x256.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,182 @@ | ||
_base_ = ['../../../_base_/default_runtime.py'] | ||
|
||
# visualization | ||
vis_backends = [ | ||
dict(type='LocalVisBackend'), | ||
] | ||
visualizer = dict( | ||
type='Pose3dLocalVisualizer', vis_backends=vis_backends, name='visualizer') | ||
|
||
# runtime | ||
train_cfg = dict(max_epochs=20, val_interval=1) | ||
|
||
# optimizer | ||
optim_wrapper = dict(optimizer=dict(type='Adam', lr=0.0002)) | ||
|
||
# learning policy | ||
param_scheduler = [ | ||
dict( | ||
type='MultiStepLR', | ||
begin=0, | ||
end=20, | ||
milestones=[15, 17], | ||
gamma=0.1, | ||
by_epoch=True) | ||
] | ||
|
||
auto_scale_lr = dict(base_batch_size=128) | ||
|
||
# hooks | ||
default_hooks = dict( | ||
checkpoint=dict( | ||
type='CheckpointHook', | ||
interval=1, | ||
save_best='MPJPE_all', | ||
rule='less', | ||
max_keep_ckpts=1), | ||
logger=dict(type='LoggerHook', interval=20), | ||
) | ||
|
||
# codec settings | ||
codec = dict( | ||
type='Hand3DHeatmap', | ||
image_size=[256, 256], | ||
root_heatmap_size=64, | ||
heatmap_size=[64, 64, 64], | ||
sigma=2.5, | ||
max_bound=255, | ||
depth_size=64) | ||
|
||
# model settings | ||
model = dict( | ||
type='TopdownPoseEstimator', | ||
data_preprocessor=dict( | ||
type='PoseDataPreprocessor', | ||
mean=[123.675, 116.28, 103.53], | ||
std=[58.395, 57.12, 57.375], | ||
bgr_to_rgb=True), | ||
backbone=dict( | ||
type='ResNet', | ||
depth=50, | ||
init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')), | ||
head=dict( | ||
type='InternetHead', | ||
keypoint_head_cfg=dict( | ||
in_channels=2048, | ||
out_channels=21 * 64, | ||
depth_size=codec['depth_size'], | ||
deconv_out_channels=(256, 256, 256), | ||
deconv_kernel_sizes=(4, 4, 4), | ||
), | ||
root_head_cfg=dict( | ||
in_channels=2048, | ||
heatmap_size=codec['root_heatmap_size'], | ||
hidden_dims=(512, ), | ||
), | ||
hand_type_head_cfg=dict( | ||
in_channels=2048, | ||
num_labels=2, | ||
hidden_dims=(512, ), | ||
), | ||
decoder=codec), | ||
test_cfg=dict(flip_test=False)) | ||
|
||
# base dataset settings | ||
dataset_type = 'InterHand3DDataset' | ||
data_mode = 'topdown' | ||
data_root = 'data/interhand2.6m/' | ||
|
||
# pipelines | ||
train_pipeline = [ | ||
dict(type='LoadImage'), | ||
dict(type='GetBBoxCenterScale'), | ||
dict(type='HandRandomFlip', prob=0.5), | ||
dict(type='RandomBBoxTransform', rotate_factor=90.0), | ||
dict(type='TopdownAffine', input_size=codec['image_size']), | ||
dict(type='GenerateTarget', encoder=codec), | ||
dict( | ||
type='PackPoseInputs', | ||
meta_keys=('id', 'img_id', 'img_path', 'rotation', 'img_shape', | ||
'focal', 'principal_pt', 'input_size', 'input_center', | ||
'input_scale', 'hand_type', 'hand_type_valid', 'flip', | ||
'flip_indices', 'abs_depth')) | ||
] | ||
val_pipeline = [ | ||
dict(type='LoadImage'), | ||
dict(type='GetBBoxCenterScale'), | ||
dict(type='TopdownAffine', input_size=codec['image_size']), | ||
dict( | ||
type='PackPoseInputs', | ||
meta_keys=('id', 'img_id', 'img_path', 'rotation', 'img_shape', | ||
'focal', 'principal_pt', 'input_size', 'input_center', | ||
'input_scale', 'hand_type', 'hand_type_valid', 'flip', | ||
'flip_indices', 'abs_depth')) | ||
] | ||
|
||
# data loaders | ||
train_dataloader = dict( | ||
batch_size=16, | ||
num_workers=1, | ||
persistent_workers=True, | ||
drop_last=False, | ||
sampler=dict(type='DefaultSampler', shuffle=True), | ||
dataset=dict( | ||
type=dataset_type, | ||
ann_file='annotations/all/InterHand2.6M_train_data.json', | ||
camera_param_file='annotations/all/InterHand2.6M_train_camera.json', | ||
joint_file='annotations/all/InterHand2.6M_train_joint_3d.json', | ||
use_gt_root_depth=True, | ||
rootnet_result_file=None, | ||
data_mode=data_mode, | ||
data_root=data_root, | ||
data_prefix=dict(img='images/train/'), | ||
pipeline=train_pipeline, | ||
)) | ||
val_dataloader = dict( | ||
batch_size=16, | ||
num_workers=1, | ||
persistent_workers=True, | ||
drop_last=False, | ||
sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), | ||
dataset=dict( | ||
type=dataset_type, | ||
ann_file='annotations/machine_annot/InterHand2.6M_val_data.json', | ||
camera_param_file='annotations/machine_annot/' | ||
'InterHand2.6M_val_camera.json', | ||
joint_file='annotations/machine_annot/InterHand2.6M_val_joint_3d.json', | ||
use_gt_root_depth=True, | ||
rootnet_result_file=None, | ||
data_mode=data_mode, | ||
data_root=data_root, | ||
data_prefix=dict(img='images/val/'), | ||
pipeline=val_pipeline, | ||
test_mode=True, | ||
)) | ||
test_dataloader = dict( | ||
batch_size=16, | ||
num_workers=1, | ||
persistent_workers=True, | ||
drop_last=False, | ||
sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), | ||
dataset=dict( | ||
type=dataset_type, | ||
ann_file='annotations/all/' | ||
'InterHand2.6M_test_data.json', | ||
camera_param_file='annotations/all/' | ||
'InterHand2.6M_test_camera.json', | ||
joint_file='annotations/all/' | ||
'InterHand2.6M_test_joint_3d.json', | ||
use_gt_root_depth=True, | ||
rootnet_result_file=None, | ||
data_mode=data_mode, | ||
data_root=data_root, | ||
data_prefix=dict(img='images/test/'), | ||
pipeline=val_pipeline, | ||
test_mode=True, | ||
)) | ||
|
||
# evaluators | ||
val_evaluator = [ | ||
dict(type='InterHandMetric', modes=['MPJPE', 'MRRPE', 'HandednessAcc']) | ||
] | ||
test_evaluator = val_evaluator |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.