Ever tried to run a pretrained multi-view 3D pose estimation on your own data? We address the problem that these models perform significantly worse on novel camera arrangements, if it's even possible to run them. This is the source code for the CVPR 2022 paper Generalizable Human Pose Triangulation.
✅ Latest release: (v0.1)
- add inference script (assuming previously extracted 2D keypoints and known camera parameters);
- run inference from
main.py
; - add instructions and command line options (see 3D pose estimation model (inference)).
🚧 Next release: (v0.2)
- add a script to extract 2D keypoints (using off-the-shelf 2D detector such as OpenPose);
- estimate camera extrinsics for your cameras;
- short tutorial on how to estimate camera extrinsics and estimate 3D poses for any multi-view data!
It is already possible to estimate camera extrinsics if you previously extract 2D keypoints (see Relative camera pose estimation (inference)).
If you use our model in your research, please reference our paper:
@inproceedings{Bartol:CVPR:2022,
title = {Generalizable Human Pose Triangulation},
author = {Bartol, Kristijan and Bojani\'{c}, David and Petkovi\'{c}, Tomislav and Pribani\'{c}, Tomislav},
booktitle = {Proceedings of IEEE/CVF Conf.~on Computer Vision and Pattern Recognition (CVPR)},
month = jun,
year = {2022}
}
We plan to completely prepare the source code with the pretrained models, demos, and videos by mid May. The to-do list consists of:
- [19-04-2022] Instructions for training pose estimation model
- [19-04-2022] Fundamental matrix estimation algorithm
- [22-04-2022] Refactor the source code
- Complete the documentation
- [26-04-2022] Pretrained pose estimation learning model
- [26-04-2022] Demo to obtain camera parameters from multi-frame keypoints (
src/fundamental.py
) - Demo to obtain 3D poses from arbitrary image sequence (previously calibrated)
- Demo to obtain 3D poses from arbitrary image sequence (uncalibrated)
- Short tutorial on how to obtain camera parameters and 3D poses on any multi-view data
- [28-04-2022] Instructions for running inference
- [21-07-2022] Training and evaluation functions
- Project page
First download pretrained backbone and place it in ./models/pretrained/.
To install and prepare the environment, use docker:
docker build -t <image-name> .
docker run --rm --gpus all --name <container-name> -it \
-v ${REPO_DIR}:/generalizable-triangulation \
-v ${BASE_DATA_DIR}/:/data/ <image-name>
Prior to running any training/evaluation/inference, 2D pose detections need to be extracted. Our backmode 2D pose detector is the baseline model, i.e., the version available in karfly/learnable-triangulation-pytorch, but the straightforward inference method is not provided so it's not straightforward to use it. Instead, but with no guarantees, pose detectors such as OpenPose or MMPose can be used.
But we already prepared some training/evaluation data :) (password: data-3d-humans, directory: pretrained). Extract the folder into data/<dataset>
. Note that the Human3.6M dataset already contains bounding boxes obtained as described here.
To train on the base configuration (use Human3.6M for training), run:
python main.py
A more convenient way to specify the arguments is through the .vscode/launch.json, if the VSCode IDE is used. All the options are available in src/options.py
.
Download pretrained models from SharePoint (password: data-3d-humans, directory: data).
python main.py --run_mode eval
To run an inference on novel views, first use a 2D keypoint detector in all views and frames to generated 2D keypoint estimates.
Once the poses are obtained, you can run:
python main.py --run_mode infer
To estimate relative camera poses on Human3.6M using the keypoint estimation on the test data, run:
python src/fundamental.py
The rotation and translation estimations are produced and stored in est_Rs.npy
and est_ts.npy
.
The results for base, intra, and inter configurations are:
Base (H36M) | Intra (CMU) | Inter (CMU->H36M) |
---|---|---|
29.1 mm | 25.6 mm | 31.0 mm |
The data used for the above commands is in ./data/ folder. Note that, in this submission, we only include subject 1 (Human3.6M) for training, but it should be sufficient to reproduce the original results.
Parts of the source code were adapted from cvlab-dresden/DSAC and karfly/learnable-triangulation-pytorch and directly inspired by some of the following publications:
[1] DSAC - Differentiable RANSAC for Camera Localization
[2] Learnable Triangulation of Human Pose
[3] Neural-Guided RANSAC: Learning Where to Sample Model Hypotheses
[4] Categorical Reparameterization with Gumbel-Softmax
[5] The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables