Skip to content

Commit

Permalink
Adaptive HR Pose Estimation (opendr-eu#479)
Browse files Browse the repository at this point in the history
* new functions added for adaptive high-resolution pose estimation.
some offset variables that existed have been erased due to changes in the annotation file, and the sample image for evaluation

* added extra arguments for adaptive pose estimation

* Demos:
Changed the image that used for inference and evaluation
typos

Learner:
removed unnecessary variables (e.g. offset value) due to new sample image and annotation
changed the path that _download function, gets the images from ftp
typos

Tests:
added the new evaluation and inference functions that perform the "adaptive ROI selection pose estimation"

* fix bugs

* Learner fix bugs
optimization webcam_demo

* Learner fix bugs
Update docs file

* Apply suggestions from code review

Co-authored-by: Kostas Tsampazis <[email protected]>

* Update docs file

* Apply suggestions from code review

Co-authored-by: Kostas Tsampazis <[email protected]>

* Update docstrings

* fixed webcam demo issues
edit learner

* pep8 fix

* webcam demo runs hr pose estimation
if --run-comparison is enabled then it shows the comparison between lightweight openpose, hr-pose estimation primary method and adaptive method

* Readme edit for webcam demo

* Update projects/python/perception/pose_estimation/high_resolution_pose_estimation/demos/webcam_demo.py

Co-authored-by: Kostas Tsampazis <[email protected]>

---------

Co-authored-by: Nikolaos Passalis <[email protected]>
Co-authored-by: Kostas Tsampazis <[email protected]>
  • Loading branch information
3 people authored and Luca Marchionni committed Nov 23, 2023
1 parent 6fca1be commit 94dc051
Show file tree
Hide file tree
Showing 8 changed files with 1,354 additions and 179 deletions.
100 changes: 96 additions & 4 deletions docs/reference/high-resolution-pose-estimation.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,9 @@ Constructor parameters:
Specifies the height that the input image will be resized during the heatmap generation procedure.
- **second_pass_height**: *int, default=540*\
Specifies the height of the image on the second inference for pose estimation procedure.
- **percentage_arround_crop**: *float, default=0.3*\
- **method**: *str, default='adaptive'
- Determines which method (*adaptive* or *primary*) is used for ROI extraction.
- **percentage_around_crop**: *float, default=0.3*\
Specifies the percentage of an extra pad arround the cropped image
- **heatmap_threshold**: *float, default=0.1*\
Specifies the threshold value that the heatmap elements should have during the first pass in order to trigger the second pass
Expand Down Expand Up @@ -74,7 +76,61 @@ Constructor parameters:
- **half_precision**: *bool, default=False*\
Enables inference using half (fp16) precision instead of single (fp32) precision. Valid only for GPU-based inference.

#### High Resolution Pose estimation using Adaptive ROI selection method
#### `HighResolutionPoseEstimationLearner.eval_adaptive`
```python
HighResolutionPoseEstimationLearner.eval_adaptive(self, dataset, silent, verbose, use_subset, subset_size, upsample_ratio, images_folder_name, annotations_filename)
```

This method is used to evaluate a trained model on an evaluation dataset.
Returns a dictionary containing statistics regarding evaluation.

Parameters:

- **dataset**: *object*\
Object that holds the evaluation dataset.
Can be of type `ExternalDataset` or a custom dataset inheriting from `DatasetIterator`.
- **silent**: *bool, default=False*\
If set to True, disables all printing of evaluation progress reports and other information to STDOUT.
- **verbose**: *bool, default=True*\
If set to True, enables the maximum verbosity.
- **use_subset**: *bool, default=True*\
If set to True, a subset of the validation dataset is created and used in evaluation.
- **subset_size**: *int, default=250*\
Controls the size of the validation subset.
- **upsample_ratio**: *int, default=4*\
Defines the amount of upsampling to be performed on the heatmaps and PAFs when resizing,defaults to 4
- **images_folder_name**: *str, default='val2017'*\
Folder name that contains the dataset images.
This folder should be contained in the dataset path provided.
Note that this is a folder name, not a path.
- **annotations_filename**: *str, default='person_keypoints_val2017.json'*\
Filename of the annotations JSON file.
This file should be contained in the dataset path provided.

#### `HighResolutionPoseEstimation.infer_adaptive`
```python
HighResolutionPoseEstimation.infer_adaptive(self, img, upsample_ratio, stride)
```
This method is used to perform pose estimation on an image.
The predicted poses are estimated through an adaptive ROI selection method that is applied on the high-resolution images.
The difference between the `HighResolutionPoseEstimation.infer` method is that the adaptive technique tries to separate the
detected ROI's instead of using the minimum enclosing bounding box of them as the `infer` does.
Returns a list of engine.target. Pose objects, where each holds a pose
and a heatmap that contains human silhouettes of the input image.
If no detections were made it returns an empty list for poses and a black frame for heatmap.
Parameters:

- **img**: *object***\
Object of type engine.data.Image.
- **upsample_ratio**: *int, default=4*\
Defines the amount of upsampling to be performed on the heatmaps and PAFs when resizing.
- **stride**: *int, default=8*\
Defines the stride value for creating a padded image.



#### High Resolution Pose estimation using Primary ROI selection method
#### `HighResolutionPoseEstimationLearner.eval`
```python
HighResolutionPoseEstimationLearner.eval(self, dataset, silent, verbose, use_subset, subset_size, images_folder_name, annotations_filename)
Expand Down Expand Up @@ -170,7 +226,6 @@ Parameters:
- **img_scale**: *float, default=1/256*\
Specifies the scale based on which the images are normalized.


#### `HighResolutionPoseEstimation.download`
```python
HighResolutionPoseEstimation.download(self, path, mode, verbose, url)
Expand Down Expand Up @@ -258,7 +313,7 @@ The experiments are conducted on a 1080p image.
| OpenDR - Full | 0.2 | 10.8 | 1.4 | 3.1 |


#### High-Resolution Pose Estimation
#### High-Resolution Pose Estimation (Primary ROI Selection)
| Method | CPU i7-9700K (FPS) | RTX 2070 (FPS) | Jetson TX2 (FPS) | Xavier NX (FPS) |
|------------------------|--------------------|----------------|------------------|-----------------|
| HRPoseEstim - Baseline | 2.3 | 13.6 | 1.4 | 1.8 |
Expand All @@ -283,6 +338,28 @@ was used as input to the models.
The average precision and average recall on the COCO evaluation split is also reported in the tables below:


#### High-Resolution Pose Estimation (Adaptive ROI Selection)
| Method | CPU i7-9700K (FPS) | RTX 2070 (FPS) | Jetson TX2 (FPS) | Xavier NX (FPS) |
|------------------------|--------------------|----------------|------------------|-----------------|
| HRPoseEstim - Baseline | 2.4 | 10.5 | 2.1 | 1.5 |
| HRPoseEstim - Half | 2.5 | 11.3 | 2.6 | 1.9 |
| HRPoseEstim - Stride | 11.3 | 38.1 | 6.8 | 5.2 |
| HRPoseEstim - Stages | 2.8 | 10 | 2.3 | 1.9 |
| HRPoseEstim - H+S | 11.4 | 38 | 6.5 | 4.5 |
| HRPoseEstim - Full | 11.6 | 48.3 | 7.7 | 6.4 |

As it is shown in the previous tables, OpenDR Lightweight OpenPose achieves higher FPS when it is resizing the input image into 256 pixels.
It is easier to process that image, but as it is shown in the next tables the method falls apart when it comes to accuracy and there are no detections.

We have evaluated the effect of using different inference settings, namely:
- *HRPoseEstim - Baseline*, which refers to directly using the High Resolution Pose Estimation method, which is based on Lightweight OpenPose,
- *HRPoseEstim - Half*, which refers to enabling inference in half (FP) precision,
- *HRPoseEstim - Stride*, which refers to increasing stride by two in the input layer of the model,
- *HRPoseEstim - Stages*, which refers to removing the refinement stages,
- *HRPoseEstim - H+S*, which uses both half precision and increased stride, and
- *HRPoseEstim - Full*, which refers to combining all three available optimization and were used as input to the models.


#### Lightweight OpenPose with resizing
| Method | Average Precision (IoU=0.50) | Average Recall (IoU=0.50) |
|-------------------|------------------------------|---------------------------|
Expand All @@ -300,7 +377,7 @@ The average precision and average recall on the COCO evaluation split is also re



#### High Resolution Pose Estimation
#### High Resolution Pose Estimation (Primary ROI Selection)
| Method | Average Precision (IoU=0.50) | Average Recall (IoU=0.50) |
|------------------------|------------------------------|---------------------------|
| HRPoseEstim - Baseline | 0.615 | 0.637 |
Expand All @@ -323,6 +400,21 @@ The average precision and the average recall have been calculated on a 1080p ver

For measuring the precision and recall we used the standard approach proposed for COCO, using an Intersection of Union (IoU) metric at 0.5.

#### High Resolution Pose Estimation (Adaptive ROI Selection)

The average precision and the average recall have been calculated on a 1080p version of COCO2017 validation dataset and the results are reported in the table below:

| Method | Average Precision (IoU=0.50) | Average Recall (IoU=0.50) |
|-------------------|------------------------------|---------------------------|
| HRPoseEstim - Baseline | 0.594 | 0.617 |
| HRPoseEstim - Half | 0.586 | 0.606 |
| HRPoseEstim - Stride | 0.251 | 0.271 |
| HRPoseEstim - Stages | 0.511 | 0.534 |
| HRPoseEstim - H+S | 0.251 | 0.263 |
| HRPoseEstim - Full | 0.229 | 0.247 |

For measuring the precision and recall we used the standard approach proposed for COCO, using an Intersection of Union (IoU) metric at 0.5.


#### Notes

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,9 @@ More specifically, the applications provided are:
1. demos/inference_demo.py: A tool that demonstrates how to perform inference on a single high resolution image and then draw the detected poses.
2. demos/eval_demo.py: A tool that demonstrates how to perform evaluation using the High Resolution Pose Estimation algorithm on 720p, 1080p and 1440p datasets.
3. demos/benchmarking_demo.py: A simple benchmarking tool for measuring the performance of High Resolution Pose Estimation in various platforms.
4. demos/webcam_demo.py: A tool that performs live pose estimation both with high resolution and simple lightweight OpenPose using a webcam.
4. demos/webcam_demo.py: A tool that performs live pose estimation with high resolution pose estimation method using a webcam.
If `--run-comparison` is enabled then it shows the differences between Lightweight OpenPose, and both adaptive and primary methods in High_resolution pose estimation.

NOTE: All demos can run either with "primary ROI selection" mode or with "adaptive ROI selection".
Use `--method primary` or `--method adaptive` for each case.

Original file line number Diff line number Diff line change
Expand Up @@ -28,11 +28,12 @@
action="store_true")
parser.add_argument("--height1", help="Base height of resizing in heatmap generation", default=360)
parser.add_argument("--height2", help="Base height of resizing in second inference", default=540)

parser.add_argument("--method", help="Choose between primary or adaptive ROI selection methodology defaults to adaptive",
default="adaptive", choices=["primary", "adaptive"])
args = parser.parse_args()

device, accelerate, base_height1, base_height2 = args.device, args.accelerate,\
args.height1, args.height2
device, accelerate, base_height1, base_height2, method = args.device, args.accelerate, \
args.height1, args.height2, args.method

if device == 'cpu':
import torch
Expand All @@ -51,22 +52,25 @@
pose_estimator = HighResolutionPoseEstimationLearner(device=device, num_refinement_stages=stages,
mobilenet_use_stride=stride, half_precision=half_precision,
first_pass_height=int(base_height1),
second_pass_height=int(base_height2))
second_pass_height=int(base_height2),
method=method)
pose_estimator.download(path=".", verbose=True)
pose_estimator.load("openpose_default")

# Download one sample image
pose_estimator.download(path=".", mode="test_data")
image_path = join("temp", "dataset", "image", "000000000785_1080.jpg")
image_path = join("temp", "dataset", "image", "000000052591_1080.jpg")
img = cv2.imread(image_path)

fps_list = []
print("Benchmarking...")
for i in tqdm(range(50)):
start_time = time.perf_counter()
# Perform inference
poses = pose_estimator.infer(img)

if method == 'primary':
poses, _, _ = pose_estimator.infer(img)
if method == 'adaptive':
poses, _, _ = pose_estimator.infer_adaptive(img)
end_time = time.perf_counter()
fps_list.append(1.0 / (end_time - start_time))
print("Average FPS: %.2f" % (np.mean(fps_list)))
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -26,26 +26,29 @@
action="store_true")
parser.add_argument("--height1", help="Base height of resizing in first inference", default=360)
parser.add_argument("--height2", help="Base height of resizing in second inference", default=540)

parser.add_argument("--method", help="Choose between primary or adaptive ROI selection "
"methodology defaults to adaptive",
default="adaptive", choices=["primary", "adaptive"])
args = parser.parse_args()

device, accelerate, base_height1, base_height2 = args.device, args.accelerate,\
args.height1, args.height2
device, accelerate, base_height1, base_height2, method = args.device, args.accelerate, \
args.height1, args.height2, args.method

if accelerate:
stride = True
stages = 0
half_precision = True
else:
stride = True
stride = False
stages = 2
half_precision = True
half_precision = False

pose_estimator = HighResolutionPoseEstimationLearner(device=device, num_refinement_stages=stages,
mobilenet_use_stride=stride,
half_precision=half_precision,
first_pass_height=int(base_height1),
second_pass_height=int(base_height2))
second_pass_height=int(base_height2),
method=method)
pose_estimator.download(path=".", verbose=True)
pose_estimator.load("openpose_default")

Expand All @@ -55,8 +58,12 @@
eval_dataset = ExternalDataset(path=join("temp", "dataset"), dataset_type="COCO")

t0 = time.time()
results_dict = pose_estimator.eval(eval_dataset, use_subset=False, verbose=True, silent=True,
images_folder_name="image", annotations_filename="annotation.json")
if method == "primary":
results_dict = pose_estimator.eval(eval_dataset, use_subset=False, verbose=True, silent=False,
images_folder_name="image", annotations_filename="annotation.json")
if method == "adaptive":
results_dict = pose_estimator.eval_adaptive(eval_dataset, use_subset=False, verbose=True, silent=False,
images_folder_name="image", annotations_filename="annotation.json")
t1 = time.time()
print("\n Evaluation time: ", t1 - t0, "seconds")
print("Evaluation results = ", results_dict)
Original file line number Diff line number Diff line change
Expand Up @@ -27,11 +27,12 @@
action="store_true")
parser.add_argument("--height1", help="Base height of resizing in first inference", default=360)
parser.add_argument("--height2", help="Base height of resizing in second inference", default=540)

parser.add_argument("--method", help="Choose between primary or adaptive ROI selection methodology defaults to adaptive",
default="adaptive", choices=["primary", "adaptive"])
args = parser.parse_args()

device, accelerate, base_height1, base_height2 = args.device, args.accelerate,\
args.height1, args.height2
device, accelerate, base_height1, base_height2, method = args.device, args.accelerate, \
args.height1, args.height2, args.method

if accelerate:
stride = True
Expand All @@ -45,21 +46,29 @@
pose_estimator = HighResolutionPoseEstimationLearner(device=device, num_refinement_stages=stages,
mobilenet_use_stride=stride, half_precision=half_precision,
first_pass_height=int(base_height1),
second_pass_height=int(base_height2))
second_pass_height=int(base_height2),
method=method)
pose_estimator.download(path=".", verbose=True)
pose_estimator.load("openpose_default")

# Download one sample image
pose_estimator.download(path=".", mode="test_data")

image_path = join("temp", "dataset", "image", "000000000785_1080.jpg")

image_path = join("temp", "dataset", "image", "000000052591_1080.jpg")
img = Image.open(image_path)

poses, _ = pose_estimator.infer(img)

if method == 'primary':
poses, _, bounds = pose_estimator.infer(img)
if method == 'adaptive':
poses, _, bounds = pose_estimator.infer_adaptive(img)
img_cv = img.opencv()
for pose in poses:
draw(img_cv, pose)

for i in range(len(bounds)):
if bounds[i][0] is not None:
cv2.rectangle(img_cv, (int(bounds[i][0]), int(bounds[i][2])),
(int(bounds[i][1]), int(bounds[i][3])), (0, 0, 255), thickness=2)
img_cv = cv2.resize(img_cv, (1280, 720), interpolation=cv2.INTER_CUBIC)
cv2.imshow('Results', img_cv)
cv2.waitKey(0)
Loading

0 comments on commit 94dc051

Please sign in to comment.