Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Refactor] Internet for 3d hand pose estimation #2632

Merged
merged 25 commits into from
Sep 19, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions configs/hand_3d_keypoint/internet/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# InterHand2.6M: A Dataset and Baseline for 3D Interacting Hand Pose Estimation from a Single RGB Image

## Results and Models

### InterHand2.6m 3D Dataset

| Arch | Set | MPJPE-single | MPJPE-interacting | MPJPE-all | MRRPE | APh | ckpt | log | Details and Download |
| :------------------------------- | :-------: | :----------: | :---------------: | :-------: | :---: | :--: | :------------------------------: | :-----------------------------: | :-----------------------------------------------: |
| [InterNet_resnet_50](/configs/hand/3d_kpt_sview_rgb_img/internet/interhand3d/res50_interhand3d_all_256x256.py) | test(H+M) | 9.47 | 13.40 | 11.59 | 29.28 | 0.99 | [ckpt](https://download.openmmlab.com/mmpose/hand3d/internet/res50_intehand3dv1.0_all_256x256-42b7f2ac_20210702.pth) | [log](https://download.openmmlab.com/mmpose/hand3d/internet/res50_intehand3dv1.0_all_256x256_20210702.log.json) | [internet_interhand3d.md](./interhand3d/internet_interhand3d.md) |
| [InterNet_resnet_50](/configs/hand/3d_kpt_sview_rgb_img/internet/interhand3d/res50_interhand3d_all_256x256.py) | val(M) | 11.22 | 15.23 | 13.16 | 31.73 | 0.98 | [ckpt](https://download.openmmlab.com/mmpose/hand3d/internet/res50_intehand3dv1.0_all_256x256-42b7f2ac_20210702.pth) | [log](https://download.openmmlab.com/mmpose/hand3d/internet/res50_intehand3dv1.0_all_256x256_20210702.log.json) | [internet_interhand3d.md](./interhand3d/internet_interhand3d.md) |
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
<!-- [ALGORITHM] -->

<details>
<summary align="right"><a href="https://link.springer.com/content/pdf/10.1007/978-3-030-58565-5_33.pdf">InterNet (ECCV'2020)</a></summary>

```bibtex
@InProceedings{Moon_2020_ECCV_InterHand2.6M,
author = {Moon, Gyeongsik and Yu, Shoou-I and Wen, He and Shiratori, Takaaki and Lee, Kyoung Mu},
title = {InterHand2.6M: A Dataset and Baseline for 3D Interacting Hand Pose Estimation from a Single RGB Image},
booktitle = {European Conference on Computer Vision (ECCV)},
year = {2020}
}
```

</details>

<!-- [BACKBONE] -->

<details>
<summary align="right"><a href="http://openaccess.thecvf.com/content_cvpr_2016/html/He_Deep_Residual_Learning_CVPR_2016_paper.html">ResNet (CVPR'2016)</a></summary>

```bibtex
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
```

</details>

<!-- [DATASET] -->

<details>
<summary align="right"><a href="https://link.springer.com/content/pdf/10.1007/978-3-030-58565-5_33.pdf">InterHand2.6M (ECCV'2020)</a></summary>

```bibtex
@InProceedings{Moon_2020_ECCV_InterHand2.6M,
author = {Moon, Gyeongsik and Yu, Shoou-I and Wen, He and Shiratori, Takaaki and Lee, Kyoung Mu},
title = {InterHand2.6M: A Dataset and Baseline for 3D Interacting Hand Pose Estimation from a Single RGB Image},
booktitle = {European Conference on Computer Vision (ECCV)},
year = {2020}
}
```

</details>

Results on InterHand2.6M val & test set

| Train Set | Set | Arch | Input Size | MPJPE-single | MPJPE-interacting | MPJPE-all | MRRPE | APh | ckpt | log |
| :-------- | :-------- | :----------------------------------------: | :--------: | :----------: | :---------------: | :-------: | :---: | :--: | :----------------------------------------: | :---------------------------------------: |
| All | test(H+M) | [InterNet_resnet_50](/configs/hand/3d_kpt_sview_rgb_img/internet/interhand3d/res50_interhand3d_all_256x256.py) | 256x256 | 9.69 | 13.72 | 11.86 | 29.27 | 0.99 | [ckpt](https://download.openmmlab.com/mmpose/v1/hand_3d_keypoint/internet/interhand3d/internet_res50_interhand3d-d6ff20d6_20230913.pth) | [log](https://download.openmmlab.com/mmpose/v1/hand_3d_keypoint/internet/interhand3d/internet_res50_interhand3d-d6ff20d6_20230913.json) |
| All | val(M) | [InterNet_resnet_50](/configs/hand/3d_kpt_sview_rgb_img/internet/interhand3d/res50_interhand3d_all_256x256.py) | 256x256 | 11.30 | 15.57 | 13.36 | 32.15 | 0.98 | [ckpt](https://download.openmmlab.com/mmpose/v1/hand_3d_keypoint/internet/interhand3d/internet_res50_interhand3d-d6ff20d6_20230913.pth) | [log](https://download.openmmlab.com/mmpose/v1/hand_3d_keypoint/internet/interhand3d/internet_res50_interhand3d-d6ff20d6_20230913.json) |
| All | test(H+M) | [InterNet_resnet_50\*](/configs/hand/3d_kpt_sview_rgb_img/internet/interhand3d/res50_interhand3d_all_256x256.py) | 256x256 | 9.47 | 13.40 | 11.59 | 29.28 | 0.99 | [ckpt](https://download.openmmlab.com/mmpose/hand3d/internet/res50_intehand3dv1.0_all_256x256-42b7f2ac_20210702.pth) | [log](https://download.openmmlab.com/mmpose/hand3d/internet/res50_intehand3dv1.0_all_256x256_20210702.log.json) |
| All | val(M) | [InterNet_resnet_50\*](/configs/hand/3d_kpt_sview_rgb_img/internet/interhand3d/res50_interhand3d_all_256x256.py) | 256x256 | 11.22 | 15.23 | 13.16 | 31.73 | 0.98 | [ckpt](https://download.openmmlab.com/mmpose/hand3d/internet/res50_intehand3dv1.0_all_256x256-42b7f2ac_20210702.pth) | [log](https://download.openmmlab.com/mmpose/hand3d/internet/res50_intehand3dv1.0_all_256x256_20210702.log.json) |

*Models with * are trained in [MMPose 0.x](https://github.com/open-mmlab/mmpose/tree/0.x). The checkpoints and logs are only for validation.*
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
Collections:
- Name: InterNet
Paper:
Title: 'InterHand2.6M: A Dataset and Baseline for 3D Interacting Hand Pose Estimation
from a Single RGB Image'
URL: https://link.springer.com/content/pdf/10.1007/978-3-030-58565-5_33.pdf
README: https://github.com/open-mmlab/mmpose/blob/master/docs/en/papers/algorithms/internet.md
Models:
- Config: configs/hand_3d_keypoint/internet/interhand3d/internet_res50_4xb16-20e_interhand3d-256x256.py
In Collection: InterNet
Metadata:
Architecture: &id001
- InterNet
- ResNet
Training Data: InterHand2.6M
Name: internet_res50_4xb16-20e_interhand3d-256x256
Results:
- Dataset: InterHand2.6M (H+M)
Metrics:
APh: 0.99
MPJPE-all: 11.86
MPJPE-interacting: 13.72
MPJPE-single: 9.69
MRRPE: 29.27
Task: Hand 3D Keypoint
- Dataset: InterHand2.6M (M)
Metrics:
APh: 0.98
MPJPE-all: 13.36
MPJPE-interacting: 15.57
MPJPE-single: 11.30
MRRPE: 32.15
Task: Hand 3D Keypoint
Weights: https://download.openmmlab.com/mmpose/hand3d/internet/res50_intehand3dv1.0_all_256x256-42b7f2ac_20210702.pth
Original file line number Diff line number Diff line change
@@ -0,0 +1,182 @@
_base_ = ['../../../_base_/default_runtime.py']

# visualization
vis_backends = [
dict(type='LocalVisBackend'),
]
visualizer = dict(
type='Pose3dLocalVisualizer', vis_backends=vis_backends, name='visualizer')

# runtime
train_cfg = dict(max_epochs=20, val_interval=1)

# optimizer
optim_wrapper = dict(optimizer=dict(type='Adam', lr=0.0002))

# learning policy
param_scheduler = [
dict(
type='MultiStepLR',
begin=0,
end=20,
milestones=[15, 17],
gamma=0.1,
by_epoch=True)
]

auto_scale_lr = dict(base_batch_size=128)

# hooks
default_hooks = dict(
checkpoint=dict(
type='CheckpointHook',
interval=1,
save_best='MPJPE_all',
rule='less',
max_keep_ckpts=1),
logger=dict(type='LoggerHook', interval=20),
)

# codec settings
codec = dict(
type='Hand3DHeatmap',
image_size=[256, 256],
root_heatmap_size=64,
heatmap_size=[64, 64, 64],
sigma=2.5,
max_bound=255,
depth_size=64)

# model settings
model = dict(
type='TopdownPoseEstimator',
data_preprocessor=dict(
type='PoseDataPreprocessor',
mean=[123.675, 116.28, 103.53],
std=[58.395, 57.12, 57.375],
bgr_to_rgb=True),
backbone=dict(
type='ResNet',
depth=50,
init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')),
head=dict(
type='InternetHead',
keypoint_head_cfg=dict(
in_channels=2048,
out_channels=21 * 64,
depth_size=codec['depth_size'],
deconv_out_channels=(256, 256, 256),
deconv_kernel_sizes=(4, 4, 4),
),
root_head_cfg=dict(
in_channels=2048,
heatmap_size=codec['root_heatmap_size'],
hidden_dims=(512, ),
),
hand_type_head_cfg=dict(
in_channels=2048,
num_labels=2,
hidden_dims=(512, ),
),
decoder=codec),
test_cfg=dict(flip_test=False))

# base dataset settings
dataset_type = 'InterHand3DDataset'
data_mode = 'topdown'
data_root = 'data/interhand2.6m/'

# pipelines
train_pipeline = [
dict(type='LoadImage'),
dict(type='GetBBoxCenterScale'),
dict(type='HandRandomFlip', prob=0.5),
dict(type='RandomBBoxTransform', rotate_factor=90.0),
dict(type='TopdownAffine', input_size=codec['image_size']),
dict(type='GenerateTarget', encoder=codec),
dict(
type='PackPoseInputs',
meta_keys=('id', 'img_id', 'img_path', 'rotation', 'img_shape',
'focal', 'principal_pt', 'input_size', 'input_center',
'input_scale', 'hand_type', 'hand_type_valid', 'flip',
'flip_indices', 'abs_depth'))
]
val_pipeline = [
dict(type='LoadImage'),
dict(type='GetBBoxCenterScale'),
dict(type='TopdownAffine', input_size=codec['image_size']),
dict(
type='PackPoseInputs',
meta_keys=('id', 'img_id', 'img_path', 'rotation', 'img_shape',
'focal', 'principal_pt', 'input_size', 'input_center',
'input_scale', 'hand_type', 'hand_type_valid', 'flip',
'flip_indices', 'abs_depth'))
]

# data loaders
train_dataloader = dict(
batch_size=16,
num_workers=1,
persistent_workers=True,
drop_last=False,
sampler=dict(type='DefaultSampler', shuffle=True),
dataset=dict(
type=dataset_type,
ann_file='annotations/all/InterHand2.6M_train_data.json',
camera_param_file='annotations/all/InterHand2.6M_train_camera.json',
joint_file='annotations/all/InterHand2.6M_train_joint_3d.json',
use_gt_root_depth=True,
rootnet_result_file=None,
data_mode=data_mode,
data_root=data_root,
data_prefix=dict(img='images/train/'),
pipeline=train_pipeline,
))
val_dataloader = dict(
batch_size=16,
num_workers=1,
persistent_workers=True,
drop_last=False,
sampler=dict(type='DefaultSampler', shuffle=False, round_up=False),
dataset=dict(
type=dataset_type,
ann_file='annotations/machine_annot/InterHand2.6M_val_data.json',
camera_param_file='annotations/machine_annot/'
'InterHand2.6M_val_camera.json',
joint_file='annotations/machine_annot/InterHand2.6M_val_joint_3d.json',
use_gt_root_depth=True,
rootnet_result_file=None,
data_mode=data_mode,
data_root=data_root,
data_prefix=dict(img='images/val/'),
pipeline=val_pipeline,
test_mode=True,
))
test_dataloader = dict(
batch_size=16,
num_workers=1,
persistent_workers=True,
drop_last=False,
sampler=dict(type='DefaultSampler', shuffle=False, round_up=False),
dataset=dict(
type=dataset_type,
ann_file='annotations/all/'
'InterHand2.6M_test_data.json',
camera_param_file='annotations/all/'
'InterHand2.6M_test_camera.json',
joint_file='annotations/all/'
'InterHand2.6M_test_joint_3d.json',
use_gt_root_depth=True,
rootnet_result_file=None,
data_mode=data_mode,
data_root=data_root,
data_prefix=dict(img='images/test/'),
pipeline=val_pipeline,
test_mode=True,
))

# evaluators
val_evaluator = [
dict(type='InterHandMetric', modes=['MPJPE', 'MRRPE', 'HandednessAcc'])
]
test_evaluator = val_evaluator
4 changes: 3 additions & 1 deletion mmpose/codecs/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
from .associative_embedding import AssociativeEmbedding
from .decoupled_heatmap import DecoupledHeatmap
from .edpose_label import EDPoseLabel
from .hand_3d_heatmap import Hand3DHeatmap
from .image_pose_lifting import ImagePoseLifting
from .integral_regression_label import IntegralRegressionLabel
from .megvii_heatmap import MegviiHeatmap
Expand All @@ -18,5 +19,6 @@
'MSRAHeatmap', 'MegviiHeatmap', 'UDPHeatmap', 'RegressionLabel',
'SimCCLabel', 'IntegralRegressionLabel', 'AssociativeEmbedding', 'SPR',
'DecoupledHeatmap', 'VideoPoseLifting', 'ImagePoseLifting',
'MotionBERTLabel', 'YOLOXPoseAnnotationProcessor', 'EDPoseLabel'
'MotionBERTLabel', 'YOLOXPoseAnnotationProcessor', 'EDPoseLabel',
'Hand3DHeatmap'
]
Loading
Loading