Explicit Box Detection Unifies End-to-End Multi-Person Pose Estimation

This is the official pytorch implementation of our ICLR 2023 paper "Explicit Box Detection Unifies End-to-End Multi-Person Pose Estimation ".

⭐ ED-Pose

We present ED-Pose, an end-to-end framework with Explicit box Detection for multi-person Pose estimation. ED-Pose re-considers this task as two explicit box detection processes with a unified representation and regression supervision. In general, ED-Pose is conceptually simple without post-processing and dense heatmap supervision.

For the first time, ED-Pose, as a fully end-to-end framework with a L1 regression loss, surpasses heatmap-based Top-down methods under the same backbone by 1.2 AP on COCO.
ED-Pose achieves the state-of-the-art with 76.6 AP on CrowdPose without test-time augmentation.

🔥 News

2023/08/08: 1. We support ED-Pose on the Human-Art dataset. 2. We upload the inference script for faster virtualization.

🐟 Todo

This repo contains further modifications including:

Integrated into detrex.
Integrated into Huggingface Spaces 🤗 using Gradio.

🚀 Model Zoo

We have put our model checkpoints here.

Results on COCO val2017 dataset

Model	Backbone	Lr schd	mAP	AP⁵⁰	AP⁷⁵	AP^M	AP^L	Time (ms)	Download
ED-Pose	R-50	60e	71.7	89.7	78.8	66.2	79.7	51	Google Drive
ED-Pose	Swin-L	60e	74.3	91.5	81.7	68.5	82.7	88	Google Drive
ED-Pose	Swin-L-5scale	60e	75.8	92.3	82.9	70.4	83.5	142	Google Drive

Results on CrowdPose test dataset

Model	Backbone	Lr schd	mAP	AP⁵⁰	AP⁷⁵	AP^E	AP^M	AP^H	Download
ED-Pose	R-50	80e	69.9	88.6	75.8	77.7	70.6	60.9	Google Drive
ED-Pose	Swin-L	80e	73.1	90.5	79.8	80.5	73.8	63.8	Google Drive
ED-Pose	Swin-L-5scale	80e	76.6	92.4	83.3	83.0	77.3	68.3	Google Drive

Results on COCO test-dev dataset

Model	Backbone	Loss	mAP	AP⁵⁰	AP⁷⁵	AP^M	AP^L
DirectPose	R-50	Reg	62.2	86.4	68.2	56.7	69.8
DirectPose	R-101	Reg	63.3	86.7	69.4	57.8	71.2
FCPose	R-50	Reg+HM	64.3	87.3	71.0	61.6	70.5
FCPose	R-101	Reg+HM	65.6	87.9	72.6	62.1	72.3
InsPose	R-50	Reg+HM	65.4	88.9	71.7	60.2	72.7
InsPose	R-101	Reg+HM	66.3	89.2	73.0	61.2	73.9
PETR	R-50	Reg+HM	67.6	89.8	75.3	61.6	76.0
PETR	Swin-L	Reg+HM	70.5	91.5	78.7	65.2	78.0
ED-Pose	R-50	Reg	69.8	90.2	77.2	64.3	77.4
ED-Pose	Swin-L	Reg	72.7	92.3	80.9	67.6	80.0

Results on COCO test-dev dataset

Results when joint-training using Human-Art and COCO datasets

🥂 Noted that training with Human-Art on ED-Pose can lead to a performance boost on MSCOCO!

Results on Human-Art validation set

Arch	Backbone	mAP	AP⁵⁰	AP⁷⁵	AR	AR⁵⁰	Download
ED-Pose	ResNet-50	0.723	0.861	0.774	0.808	0.921	Google Drive

Results on COCO val2017

Arch	Backbone	AP	AP⁵⁰	AP⁷⁵	AR	AR⁵⁰	Download
ED-Pose	ResNet-50	0.724	0.898	0.794	0.799	0.946	Google Drive

Note:

Any test-time augmentations is not used for ED-Pose.
We use the Object365 dataset to pretrain the human detection of ED-Pose under the Swin-L-5scale setting.

🚢 Environment Setup

Installation

We use the DN-Deformable-DETR as our codebase. We test our models under python=3.7.3,pytorch=1.9.0,cuda=11.1. Other versions might be available as well.

Clone this repo

git clone https://github.com/IDEA-Research/ED-Pose.git
cd ED-Pose

Install Pytorch and torchvision

Follow the instruction on https://pytorch.org/get-started/locally/.

# an example:
conda install -c pytorch pytorch torchvision

Install other needed packages

pip install -r requirements.txt

Compiling CUDA operators

cd models/edpose/ops
python setup.py build install
# unit test (should see all checking is True)
python test.py
cd ../../..

Data Preparation

For COCO data, please download from COCO download. The coco_dir should look like this:

|-- EDPose
`-- |-- coco_dir
    `-- |-- annotations
        |   |-- person_keypoints_train2017.json
        |   `-- person_keypoints_val2017.json
        `-- images
            |-- train2017
            |   |-- 000000000009.jpg
            |   |-- 000000000025.jpg
            |   |-- 000000000030.jpg
            |   |-- ... 
            `-- val2017
                |-- 000000000139.jpg
                |-- 000000000285.jpg
                |-- 000000000632.jpg
                |-- ...

For CrowdPose data, please download from CrowdPose download, The crowdpose_dir should look like this:

|-- ED-Pose
`-- |-- crowdpose_dir
    `-- |-- json
        |   |-- crowdpose_train.json
        |   |-- crowdpose_val.json
        |   |-- crowdpose_trainval.json (generated by util/crowdpose_concat_train_val.py)
        |   `-- crowdpose_test.json
        `-- images
            |-- 100000.jpg
            |-- 100001.jpg
            |-- 100002.jpg
            |-- 100003.jpg
            |-- 100004.jpg
            |-- 100005.jpg
            |-- ...

🥳 Run

Training on COCO:

Single GPU

#For ResNet-50:
export EDPOSE_COCO_PATH=/path/to/your/cocodir
  python main.py \
 --output_dir "logs/coco_r50" \
 -c config/edpose.cfg.py \
 --options batch_size=4 epochs=60 lr_drop=55 num_body_points=17 backbone='resnet50' \
 --dataset_file="coco"

#For Swin-L:
export EDPOSE_COCO_PATH=/path/to/your/cocodir
export pretrain_model_path=/path/to/your/swin_L_384_22k
  python main.py \
 --output_dir "logs/coco_swinl" \
 -c config/edpose.cfg.py \
 --options batch_size=4 epochs=60 lr_drop=55 num_body_points=17 backbone='swin_L_384_22k' \
 --dataset_file="coco"

Distributed Run

#For ResNet-50:
export EDPOSE_COCO_PATH=/path/to/your/cocodir
  python -m torch.distributed.launch --nproc_per_node=4  main.py \
 --output_dir "logs/coco_r50" \
 -c config/edpose.cfg.py \
 --options batch_size=4 epochs=60 lr_drop=55 num_body_points=17 backbone='resnet50' \
 --dataset_file="coco"

#For Swin-L:
export EDPOSE_COCO_PATH=/path/to/your/cocodir
export pretrain_model_path=/path/to/your/swin_L_384_22k
  python -m torch.distributed.launch --nproc_per_node=4 main.py \
 --output_dir "logs/coco_swinl" \
 -c config/edpose.cfg.py \
 --options batch_size=4 epochs=60 lr_drop=55 num_body_points=17 backbone='swin_L_384_22k' \
 --dataset_file="coco"

Training on CrowdPose:

Single GPU

#For ResNet-50:
export EDPOSE_CrowdPose_PATH=/path/to/your/crowdpose_dir
  python main.py \
 --output_dir "logs/crowdpose_r50" \
 -c config/edpose.cfg.py \
 --options batch_size=4 epochs=80 lr_drop=75 num_body_points=14 backbone='resnet50' \
 --dataset_file="crowdpose"

#For Swin-L:
export EDPOSE_CrowdPose_PATH=/path/to/your/crowdpose_dir
export pretrain_model_path=/path/to/your/swin_L_384_22k
  python main.py \
 --output_dir "logs/crowdpose_swinl" \
 -c config/edpose.cfg.py \
 --options batch_size=4 epochs=80 lr_drop=75 num_body_points=14 backbone='swin_L_384_22k' \
 --dataset_file="crowdpose"

Distributed Run

#For ResNet-50:
export EDPOSE_CrowdPose_PATH=/path/to/your/crowdpose_dir
  python -m torch.distributed.launch --nproc_per_node=4  main.py \
 --output_dir "logs/crowdpose_r50" \
 -c config/edpose.cfg.py \
 --options batch_size=4 epochs=80 lr_drop=75 num_body_points=14 backbone='resnet50' \
 --dataset_file="crowdpose"

#For Swin-L:
export EDPOSE_CrowdPose_PATH=/path/to/your/crowdpose_dir
export pretrain_model_path=/path/to/your/swin_L_384_22k
  python -m torch.distributed.launch --nproc_per_node=4 main.py \
 --output_dir "logs/crowdpose_swinl" \
 -c config/edpose.cfg.py \
 --options batch_size=4 epochs=80 lr_drop=75 num_body_points=14 backbone='swin_L_384_22k' \
 --dataset_file="crowdpose"

We have put the Swin-L model pretrained on ImageNet-22k here.

Evaluation on COCO:

ResNet-50

export EDPOSE_COCO_PATH=/path/to/your/cocodir
  python -m torch.distributed.launch --nproc_per_node=4  main.py \
 --output_dir "logs/coco_r50" \
 -c config/edpose.cfg.py \
 --options batch_size=4 epochs=60 lr_drop=55 num_body_points=17 backbone='resnet50' \
 --dataset_file="coco" \
 --pretrain_model_path "./models/edpose_r50_coco.pth" \
 --eval

Swin-L

export EDPOSE_COCO_PATH=/path/to/your/cocodir
export pretrain_model_path=/path/to/your/swin_L_384_22k
  python -m torch.distributed.launch --nproc_per_node=4 main.py \
 --output_dir "logs/coco_swinl" \
 -c config/edpose.cfg.py \
 --options batch_size=4 epochs=60 lr_drop=55 num_body_points=17 backbone='swin_L_384_22k' \
 --dataset_file="coco" \
 --pretrain_model_path "./models/edpose_swinl_coco.pth" \
 --eval

Swin-L-5scale

export EDPOSE_COCO_PATH=/path/to/your/cocodir
export pretrain_model_path=/path/to/your/swin_L_384_22k
  python -m torch.distributed.launch --nproc_per_node=4 main.py \
 --output_dir "logs/coco_swinl" \
 -c config/edpose.cfg.py \
 --options batch_size=4 epochs=60 lr_drop=55 num_body_points=17 backbone='swin_L_384_22k' \
  return_interm_indices=0,1,2,3 num_feature_levels=5 \
 --dataset_file="coco" \
 --pretrain_model_path "./models/edpose_swinl_5scale_coco.pth" \
 --eval

Evaluation on CrowdPose:

ResNet-50

export EDPOSE_CrowdPose_PATH=/path/to/your/crowdpose_dir
  python main.py \
 --output_dir "logs/crowdpose_r50" \
 -c config/edpose.cfg.py \
 --options batch_size=4 epochs=80 lr_drop=75 num_body_points=14 backbone='resnet50' \
 --dataset_file="crowdpose"\
 --pretrain_model_path "./models/edpose_r50_crowdpose.pth" \
 --eval

Swin-L

export EDPOSE_CrowdPose_PATH=/path/to/your/crowdpose_dir
export pretrain_model_path=/path/to/your/swin_L_384_22k
  python main.py \
 --output_dir "logs/crowdpose_swinl" \
 -c config/edpose.cfg.py \
 --options batch_size=4 epochs=80 lr_drop=75 num_body_points=14 backbone='swin_L_384_22k' \
 --dataset_file="crowdpose" \
 --pretrain_model_path "./models/edpose_swinl_crowdpose.pth" \
 --eval

Swin-L-5scale

export EDPOSE_CrowdPose_PATH=/path/to/your/crowdpose_dir
export pretrain_model_path=/path/to/your/swin_L_384_22k
  python -m torch.distributed.launch --nproc_per_node=4 main.py \
 --output_dir "logs/crowdpose_swinl" \
 -c config/edpose.cfg.py \
 --options batch_size=4 epochs=80 lr_drop=75 num_body_points=14 backbone='swin_L_384_22k' \
 return_interm_indices=0,1,2,3 num_feature_levels=5 \
 -- dataset_file="crowdpose" \
 --pretrain_model_path "./models/edpose_swinl_5scale_crowdpose.pth" \
 --eval

Virtualization via COCO Keypoints Format:

ResNet-50

export EDPOSE_COCO_PATH=/path/to/your/cocodir
export Inference_Path=/path/to/your/inference_dir
  python -m torch.distributed.launch --nproc_per_node=1  main.py \
 --output_dir "logs/coco_r50" \
 -c config/edpose.cfg.py \
 --options batch_size=1 epochs=60 lr_drop=55 num_body_points=17 backbone='resnet50' \
 --dataset_file="coco" \
 --pretrain_model_path "./models/edpose_r50_coco.pth" \
 --eval

Swin-L

export EDPOSE_COCO_PATH=/path/to/your/cocodir
export Inference_Path=/path/to/your/inference_dir
  python -m torch.distributed.launch --nproc_per_node=1 main.py \
 --output_dir "logs/coco_swinl" \
 -c config/edpose.cfg.py \
 --options batch_size=1 epochs=60 lr_drop=55 num_body_points=17 backbone='swin_L_384_22k' \
 --dataset_file="coco" \
 --pretrain_model_path "./models/edpose_swinl_coco.pth" \
 --eval

Swin-L-5scale

export EDPOSE_COCO_PATH=/path/to/your/cocodir
export Inference_Path=/path/to/your/inference_dir
  python -m torch.distributed.launch --nproc_per_node=1 main.py \
 --output_dir "logs/coco_swinl" \
 -c config/edpose.cfg.py \
 --options batch_size=1 epochs=60 lr_drop=55 num_body_points=17 backbone='swin_L_384_22k' \
  return_interm_indices=0,1,2,3 num_feature_levels=5 \
 --dataset_file="coco" \
 --pretrain_model_path "./models/edpose_swinl_5scale_coco.pth" \
 --eval

💃🏻 Cite ED-Pose

@inproceedings{
yang2023explicit,
title={Explicit Box Detection Unifies End-to-End Multi-Person Pose Estimation},
author={Jie Yang and Ailing Zeng and Shilong Liu and Feng Li and Ruimao Zhang and Lei Zhang},
booktitle={International Conference on Learning Representations},
year={2023},
url={https://openreview.net/forum?id=s4WVupnJjmX}
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
config		config
datasets		datasets
figs		figs
models		models
util		util
LICENSE		LICENSE
README.md		README.md
engine.py		engine.py
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Explicit Box Detection Unifies End-to-End Multi-Person Pose Estimation

⭐ ED-Pose

🔥 News

🐟 Todo

🚀 Model Zoo

Results on COCO val2017 dataset

Results on CrowdPose test dataset

Results on COCO test-dev dataset

Results when joint-training using Human-Art and COCO datasets

🥂 Noted that training with Human-Art on ED-Pose can lead to a performance boost on MSCOCO!

Results on Human-Art validation set

Results on COCO val2017

Note:

🚢 Environment Setup

🥳 Run

Training on COCO:

Training on CrowdPose:

Evaluation on COCO:

Evaluation on CrowdPose:

Virtualization via COCO Keypoints Format:

💃🏻 Cite ED-Pose

About

Releases

Packages

Contributors 2

Languages

License

IDEA-Research/ED-Pose

Folders and files

Latest commit

History

Repository files navigation

Explicit Box Detection Unifies End-to-End Multi-Person Pose Estimation

⭐ ED-Pose

🔥 News

🐟 Todo

🚀 Model Zoo

Results on COCO val2017 dataset

Results on CrowdPose test dataset

Results on COCO test-dev dataset

Results when joint-training using Human-Art and COCO datasets

🥂 Noted that training with Human-Art on ED-Pose can lead to a performance boost on MSCOCO!

Results on Human-Art validation set

Results on COCO val2017

Note:

🚢 Environment Setup

🥳 Run

Training on COCO:

Training on CrowdPose:

Evaluation on COCO:

Evaluation on CrowdPose:

Virtualization via COCO Keypoints Format:

💃🏻 Cite ED-Pose

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages