Skip to content

Commit

Permalink
[Refactor] Internet for 3d hand pose estimation (#2632)
Browse files Browse the repository at this point in the history
  • Loading branch information
LareinaM authored Sep 19, 2023
1 parent 8c34ddc commit bf8b363
Show file tree
Hide file tree
Showing 27 changed files with 2,248 additions and 26 deletions.
10 changes: 10 additions & 0 deletions configs/hand_3d_keypoint/internet/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# InterHand2.6M: A Dataset and Baseline for 3D Interacting Hand Pose Estimation from a Single RGB Image

## Results and Models

### InterHand2.6m 3D Dataset

| Arch | Set | MPJPE-single | MPJPE-interacting | MPJPE-all | MRRPE | APh | ckpt | log | Details and Download |
| :------------------------------- | :-------: | :----------: | :---------------: | :-------: | :---: | :--: | :------------------------------: | :-----------------------------: | :-----------------------------------------------: |
| [InterNet_resnet_50](/configs/hand/3d_kpt_sview_rgb_img/internet/interhand3d/res50_interhand3d_all_256x256.py) | test(H+M) | 9.47 | 13.40 | 11.59 | 29.28 | 0.99 | [ckpt](https://download.openmmlab.com/mmpose/hand3d/internet/res50_intehand3dv1.0_all_256x256-42b7f2ac_20210702.pth) | [log](https://download.openmmlab.com/mmpose/hand3d/internet/res50_intehand3dv1.0_all_256x256_20210702.log.json) | [internet_interhand3d.md](./interhand3d/internet_interhand3d.md) |
| [InterNet_resnet_50](/configs/hand/3d_kpt_sview_rgb_img/internet/interhand3d/res50_interhand3d_all_256x256.py) | val(M) | 11.22 | 15.23 | 13.16 | 31.73 | 0.98 | [ckpt](https://download.openmmlab.com/mmpose/hand3d/internet/res50_intehand3dv1.0_all_256x256-42b7f2ac_20210702.pth) | [log](https://download.openmmlab.com/mmpose/hand3d/internet/res50_intehand3dv1.0_all_256x256_20210702.log.json) | [internet_interhand3d.md](./interhand3d/internet_interhand3d.md) |
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
<!-- [ALGORITHM] -->

<details>
<summary align="right"><a href="https://link.springer.com/content/pdf/10.1007/978-3-030-58565-5_33.pdf">InterNet (ECCV'2020)</a></summary>

```bibtex
@InProceedings{Moon_2020_ECCV_InterHand2.6M,
author = {Moon, Gyeongsik and Yu, Shoou-I and Wen, He and Shiratori, Takaaki and Lee, Kyoung Mu},
title = {InterHand2.6M: A Dataset and Baseline for 3D Interacting Hand Pose Estimation from a Single RGB Image},
booktitle = {European Conference on Computer Vision (ECCV)},
year = {2020}
}
```

</details>

<!-- [BACKBONE] -->

<details>
<summary align="right"><a href="http://openaccess.thecvf.com/content_cvpr_2016/html/He_Deep_Residual_Learning_CVPR_2016_paper.html">ResNet (CVPR'2016)</a></summary>

```bibtex
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
```

</details>

<!-- [DATASET] -->

<details>
<summary align="right"><a href="https://link.springer.com/content/pdf/10.1007/978-3-030-58565-5_33.pdf">InterHand2.6M (ECCV'2020)</a></summary>

```bibtex
@InProceedings{Moon_2020_ECCV_InterHand2.6M,
author = {Moon, Gyeongsik and Yu, Shoou-I and Wen, He and Shiratori, Takaaki and Lee, Kyoung Mu},
title = {InterHand2.6M: A Dataset and Baseline for 3D Interacting Hand Pose Estimation from a Single RGB Image},
booktitle = {European Conference on Computer Vision (ECCV)},
year = {2020}
}
```

</details>

Results on InterHand2.6M val & test set

| Train Set | Set | Arch | Input Size | MPJPE-single | MPJPE-interacting | MPJPE-all | MRRPE | APh | ckpt | log |
| :-------- | :-------- | :----------------------------------------: | :--------: | :----------: | :---------------: | :-------: | :---: | :--: | :----------------------------------------: | :---------------------------------------: |
| All | test(H+M) | [InterNet_resnet_50](/configs/hand/3d_kpt_sview_rgb_img/internet/interhand3d/res50_interhand3d_all_256x256.py) | 256x256 | 9.69 | 13.72 | 11.86 | 29.27 | 0.99 | [ckpt](https://download.openmmlab.com/mmpose/v1/hand_3d_keypoint/internet/interhand3d/internet_res50_interhand3d-d6ff20d6_20230913.pth) | [log](https://download.openmmlab.com/mmpose/v1/hand_3d_keypoint/internet/interhand3d/internet_res50_interhand3d-d6ff20d6_20230913.json) |
| All | val(M) | [InterNet_resnet_50](/configs/hand/3d_kpt_sview_rgb_img/internet/interhand3d/res50_interhand3d_all_256x256.py) | 256x256 | 11.30 | 15.57 | 13.36 | 32.15 | 0.98 | [ckpt](https://download.openmmlab.com/mmpose/v1/hand_3d_keypoint/internet/interhand3d/internet_res50_interhand3d-d6ff20d6_20230913.pth) | [log](https://download.openmmlab.com/mmpose/v1/hand_3d_keypoint/internet/interhand3d/internet_res50_interhand3d-d6ff20d6_20230913.json) |
| All | test(H+M) | [InterNet_resnet_50\*](/configs/hand/3d_kpt_sview_rgb_img/internet/interhand3d/res50_interhand3d_all_256x256.py) | 256x256 | 9.47 | 13.40 | 11.59 | 29.28 | 0.99 | [ckpt](https://download.openmmlab.com/mmpose/hand3d/internet/res50_intehand3dv1.0_all_256x256-42b7f2ac_20210702.pth) | [log](https://download.openmmlab.com/mmpose/hand3d/internet/res50_intehand3dv1.0_all_256x256_20210702.log.json) |
| All | val(M) | [InterNet_resnet_50\*](/configs/hand/3d_kpt_sview_rgb_img/internet/interhand3d/res50_interhand3d_all_256x256.py) | 256x256 | 11.22 | 15.23 | 13.16 | 31.73 | 0.98 | [ckpt](https://download.openmmlab.com/mmpose/hand3d/internet/res50_intehand3dv1.0_all_256x256-42b7f2ac_20210702.pth) | [log](https://download.openmmlab.com/mmpose/hand3d/internet/res50_intehand3dv1.0_all_256x256_20210702.log.json) |

*Models with * are trained in [MMPose 0.x](https://github.com/open-mmlab/mmpose/tree/0.x). The checkpoints and logs are only for validation.*
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
Collections:
- Name: InterNet
Paper:
Title: 'InterHand2.6M: A Dataset and Baseline for 3D Interacting Hand Pose Estimation
from a Single RGB Image'
URL: https://link.springer.com/content/pdf/10.1007/978-3-030-58565-5_33.pdf
README: https://github.com/open-mmlab/mmpose/blob/master/docs/en/papers/algorithms/internet.md
Models:
- Config: configs/hand_3d_keypoint/internet/interhand3d/internet_res50_4xb16-20e_interhand3d-256x256.py
In Collection: InterNet
Metadata:
Architecture: &id001
- InterNet
- ResNet
Training Data: InterHand2.6M
Name: internet_res50_4xb16-20e_interhand3d-256x256
Results:
- Dataset: InterHand2.6M (H+M)
Metrics:
APh: 0.99
MPJPE-all: 11.86
MPJPE-interacting: 13.72
MPJPE-single: 9.69
MRRPE: 29.27
Task: Hand 3D Keypoint
- Dataset: InterHand2.6M (M)
Metrics:
APh: 0.98
MPJPE-all: 13.36
MPJPE-interacting: 15.57
MPJPE-single: 11.30
MRRPE: 32.15
Task: Hand 3D Keypoint
Weights: https://download.openmmlab.com/mmpose/hand3d/internet/res50_intehand3dv1.0_all_256x256-42b7f2ac_20210702.pth
Original file line number Diff line number Diff line change
@@ -0,0 +1,182 @@
_base_ = ['../../../_base_/default_runtime.py']

# visualization
vis_backends = [
dict(type='LocalVisBackend'),
]
visualizer = dict(
type='Pose3dLocalVisualizer', vis_backends=vis_backends, name='visualizer')

# runtime
train_cfg = dict(max_epochs=20, val_interval=1)

# optimizer
optim_wrapper = dict(optimizer=dict(type='Adam', lr=0.0002))

# learning policy
param_scheduler = [
dict(
type='MultiStepLR',
begin=0,
end=20,
milestones=[15, 17],
gamma=0.1,
by_epoch=True)
]

auto_scale_lr = dict(base_batch_size=128)

# hooks
default_hooks = dict(
checkpoint=dict(
type='CheckpointHook',
interval=1,
save_best='MPJPE_all',
rule='less',
max_keep_ckpts=1),
logger=dict(type='LoggerHook', interval=20),
)

# codec settings
codec = dict(
type='Hand3DHeatmap',
image_size=[256, 256],
root_heatmap_size=64,
heatmap_size=[64, 64, 64],
sigma=2.5,
max_bound=255,
depth_size=64)

# model settings
model = dict(
type='TopdownPoseEstimator',
data_preprocessor=dict(
type='PoseDataPreprocessor',
mean=[123.675, 116.28, 103.53],
std=[58.395, 57.12, 57.375],
bgr_to_rgb=True),
backbone=dict(
type='ResNet',
depth=50,
init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')),
head=dict(
type='InternetHead',
keypoint_head_cfg=dict(
in_channels=2048,
out_channels=21 * 64,
depth_size=codec['depth_size'],
deconv_out_channels=(256, 256, 256),
deconv_kernel_sizes=(4, 4, 4),
),
root_head_cfg=dict(
in_channels=2048,
heatmap_size=codec['root_heatmap_size'],
hidden_dims=(512, ),
),
hand_type_head_cfg=dict(
in_channels=2048,
num_labels=2,
hidden_dims=(512, ),
),
decoder=codec),
test_cfg=dict(flip_test=False))

# base dataset settings
dataset_type = 'InterHand3DDataset'
data_mode = 'topdown'
data_root = 'data/interhand2.6m/'

# pipelines
train_pipeline = [
dict(type='LoadImage'),
dict(type='GetBBoxCenterScale'),
dict(type='HandRandomFlip', prob=0.5),
dict(type='RandomBBoxTransform', rotate_factor=90.0),
dict(type='TopdownAffine', input_size=codec['image_size']),
dict(type='GenerateTarget', encoder=codec),
dict(
type='PackPoseInputs',
meta_keys=('id', 'img_id', 'img_path', 'rotation', 'img_shape',
'focal', 'principal_pt', 'input_size', 'input_center',
'input_scale', 'hand_type', 'hand_type_valid', 'flip',
'flip_indices', 'abs_depth'))
]
val_pipeline = [
dict(type='LoadImage'),
dict(type='GetBBoxCenterScale'),
dict(type='TopdownAffine', input_size=codec['image_size']),
dict(
type='PackPoseInputs',
meta_keys=('id', 'img_id', 'img_path', 'rotation', 'img_shape',
'focal', 'principal_pt', 'input_size', 'input_center',
'input_scale', 'hand_type', 'hand_type_valid', 'flip',
'flip_indices', 'abs_depth'))
]

# data loaders
train_dataloader = dict(
batch_size=16,
num_workers=1,
persistent_workers=True,
drop_last=False,
sampler=dict(type='DefaultSampler', shuffle=True),
dataset=dict(
type=dataset_type,
ann_file='annotations/all/InterHand2.6M_train_data.json',
camera_param_file='annotations/all/InterHand2.6M_train_camera.json',
joint_file='annotations/all/InterHand2.6M_train_joint_3d.json',
use_gt_root_depth=True,
rootnet_result_file=None,
data_mode=data_mode,
data_root=data_root,
data_prefix=dict(img='images/train/'),
pipeline=train_pipeline,
))
val_dataloader = dict(
batch_size=16,
num_workers=1,
persistent_workers=True,
drop_last=False,
sampler=dict(type='DefaultSampler', shuffle=False, round_up=False),
dataset=dict(
type=dataset_type,
ann_file='annotations/machine_annot/InterHand2.6M_val_data.json',
camera_param_file='annotations/machine_annot/'
'InterHand2.6M_val_camera.json',
joint_file='annotations/machine_annot/InterHand2.6M_val_joint_3d.json',
use_gt_root_depth=True,
rootnet_result_file=None,
data_mode=data_mode,
data_root=data_root,
data_prefix=dict(img='images/val/'),
pipeline=val_pipeline,
test_mode=True,
))
test_dataloader = dict(
batch_size=16,
num_workers=1,
persistent_workers=True,
drop_last=False,
sampler=dict(type='DefaultSampler', shuffle=False, round_up=False),
dataset=dict(
type=dataset_type,
ann_file='annotations/all/'
'InterHand2.6M_test_data.json',
camera_param_file='annotations/all/'
'InterHand2.6M_test_camera.json',
joint_file='annotations/all/'
'InterHand2.6M_test_joint_3d.json',
use_gt_root_depth=True,
rootnet_result_file=None,
data_mode=data_mode,
data_root=data_root,
data_prefix=dict(img='images/test/'),
pipeline=val_pipeline,
test_mode=True,
))

# evaluators
val_evaluator = [
dict(type='InterHandMetric', modes=['MPJPE', 'MRRPE', 'HandednessAcc'])
]
test_evaluator = val_evaluator
4 changes: 3 additions & 1 deletion mmpose/codecs/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
from .associative_embedding import AssociativeEmbedding
from .decoupled_heatmap import DecoupledHeatmap
from .edpose_label import EDPoseLabel
from .hand_3d_heatmap import Hand3DHeatmap
from .image_pose_lifting import ImagePoseLifting
from .integral_regression_label import IntegralRegressionLabel
from .megvii_heatmap import MegviiHeatmap
Expand All @@ -18,5 +19,6 @@
'MSRAHeatmap', 'MegviiHeatmap', 'UDPHeatmap', 'RegressionLabel',
'SimCCLabel', 'IntegralRegressionLabel', 'AssociativeEmbedding', 'SPR',
'DecoupledHeatmap', 'VideoPoseLifting', 'ImagePoseLifting',
'MotionBERTLabel', 'YOLOXPoseAnnotationProcessor', 'EDPoseLabel'
'MotionBERTLabel', 'YOLOXPoseAnnotationProcessor', 'EDPoseLabel',
'Hand3DHeatmap'
]
Loading

0 comments on commit bf8b363

Please sign in to comment.