Adaptive HR Pose Estimation (opendr-eu#479)

* new functions added for adaptive high-resolution pose estimation. some offset variables that existed have been erased due to changes in the annotation file, and the sample image for evaluation * added extra arguments for adaptive pose estimation * Demos: Changed the image that used for inference and evaluation typos Learner: removed unnecessary variables (e.g. offset value) due to new sample image and annotation changed the path that _download function, gets the images from ftp typos Tests: added the new evaluation and inference functions that perform the "adaptive ROI selection pose estimation" * fix bugs * Learner fix bugs optimization webcam_demo * Learner fix bugs Update docs file * Apply suggestions from code review Co-authored-by: Kostas Tsampazis <[email protected]> * Update docs file * Apply suggestions from code review Co-authored-by: Kostas Tsampazis <[email protected]> * Update docstrings * fixed webcam demo issues edit learner * pep8 fix * webcam demo runs hr pose estimation if --run-comparison is enabled then it shows the comparison between lightweight openpose, hr-pose estimation primary method and adaptive method * Readme edit for webcam demo * Update projects/python/perception/pose_estimation/high_resolution_pose_estimation/demos/webcam_demo.py Co-authored-by: Kostas Tsampazis <[email protected]> --------- Co-authored-by: Nikolaos Passalis <[email protected]> Co-authored-by: Kostas Tsampazis <[email protected]>
lucamarchionni · Nov 23, 2023 · 94dc051 · 94dc051
1 parent 6fca1be
commit 94dc051
Show file tree

Hide file tree

Showing 8 changed files with 1,354 additions and 179 deletions.
diff --git a/docs/reference/high-resolution-pose-estimation.md b/docs/reference/high-resolution-pose-estimation.md
@@ -45,7 +45,9 @@ Constructor parameters:
   Specifies the height that the input image will be resized during the heatmap generation procedure.
 - **second_pass_height**: *int, default=540*\
   Specifies the height of the image on the second inference for pose estimation procedure.
-- **percentage_arround_crop**: *float, default=0.3*\
+- **method**: *str, default='adaptive'
+- Determines which method (*adaptive* or *primary*) is used for ROI extraction. 
+- **percentage_around_crop**: *float, default=0.3*\
   Specifies the percentage of an extra pad arround the cropped image
 - **heatmap_threshold**: *float, default=0.1*\
   Specifies the threshold value that the heatmap elements should have during the first pass in order to trigger the second pass
@@ -74,7 +76,61 @@ Constructor parameters:
 - **half_precision**: *bool, default=False*\
   Enables inference using half (fp16) precision instead of single (fp32) precision. Valid only for GPU-based inference.
 
+#### High Resolution Pose estimation using Adaptive ROI selection method
+#### `HighResolutionPoseEstimationLearner.eval_adaptive`
+```python
+HighResolutionPoseEstimationLearner.eval_adaptive(self, dataset, silent, verbose, use_subset, subset_size, upsample_ratio, images_folder_name, annotations_filename)
+```
+
+This method is used to evaluate a trained model on an evaluation dataset.
+Returns a dictionary containing statistics regarding evaluation.
 
+Parameters:
+
+- **dataset**: *object*\
+  Object that holds the evaluation dataset.
+  Can be of type `ExternalDataset` or a custom dataset inheriting from `DatasetIterator`.
+- **silent**: *bool, default=False*\
+  If set to True, disables all printing of evaluation progress reports and other information to STDOUT.
+- **verbose**: *bool, default=True*\
+  If set to True, enables the maximum verbosity.
+- **use_subset**: *bool, default=True*\
+  If set to True, a subset of the validation dataset is created and used in evaluation.
+- **subset_size**: *int, default=250*\
+  Controls the size of the validation subset.
+- **upsample_ratio**: *int, default=4*\
+  Defines the amount of upsampling to be performed on the heatmaps and PAFs when resizing,defaults to 4
+- **images_folder_name**: *str, default='val2017'*\
+  Folder name that contains the dataset images.
+  This folder should be contained in the dataset path provided.
+  Note that this is a folder name, not a path.
+- **annotations_filename**: *str, default='person_keypoints_val2017.json'*\
+  Filename of the annotations JSON file.
+  This file should be contained in the dataset path provided.
+
+#### `HighResolutionPoseEstimation.infer_adaptive`
+```python
+HighResolutionPoseEstimation.infer_adaptive(self, img, upsample_ratio, stride)
+```
+This method is used to perform pose estimation on an image.
+The predicted poses are estimated through an adaptive ROI selection method that is applied on the high-resolution images.
+The difference between the `HighResolutionPoseEstimation.infer` method is that the adaptive technique tries to separate the
+detected ROI's instead of using the minimum enclosing bounding box of them as the `infer` does. 
+Returns a list of engine.target. Pose objects, where each holds a pose 
+and a heatmap that contains human silhouettes of the input image. 
+If no detections were made it returns an empty list for poses and a black frame for heatmap.
+Parameters:
+
+- **img**: *object***\
+  Object of type engine.data.Image.
+- **upsample_ratio**: *int, default=4*\
+  Defines the amount of upsampling to be performed on the heatmaps and PAFs when resizing.
+- **stride**: *int, default=8*\
+  Defines the stride value for creating a padded image.
+
+
+
+#### High Resolution Pose estimation using Primary ROI selection method
 #### `HighResolutionPoseEstimationLearner.eval`
 ```python
 HighResolutionPoseEstimationLearner.eval(self, dataset, silent, verbose, use_subset, subset_size, images_folder_name, annotations_filename)
@@ -170,7 +226,6 @@ Parameters:
 - **img_scale**: *float, default=1/256*\
   Specifies the scale based on which the images are normalized.
 
-
 #### `HighResolutionPoseEstimation.download`
 ```python
 HighResolutionPoseEstimation.download(self, path, mode, verbose, url)
@@ -258,7 +313,7 @@ The experiments are conducted on a 1080p image.
 | OpenDR - Full     | 0.2                | 10.8            | 1.4              | 3.1             |
 
 
-#### High-Resolution Pose Estimation
+#### High-Resolution Pose Estimation (Primary ROI Selection)
 | Method                 | CPU i7-9700K (FPS) | RTX 2070 (FPS) | Jetson TX2 (FPS) | Xavier NX (FPS) |
 |------------------------|--------------------|----------------|------------------|-----------------|
 | HRPoseEstim - Baseline | 2.3                | 13.6           | 1.4              | 1.8             |
@@ -283,6 +338,28 @@ was used as input to the models.
 The average precision and average recall on the COCO evaluation split is also reported in the tables below:
 
 
+#### High-Resolution Pose Estimation (Adaptive ROI Selection)
+| Method                 | CPU i7-9700K (FPS) | RTX 2070 (FPS) | Jetson TX2 (FPS) | Xavier NX (FPS) |
+|------------------------|--------------------|----------------|------------------|-----------------|
+| HRPoseEstim - Baseline | 2.4                | 10.5           | 2.1              | 1.5             |
+| HRPoseEstim - Half     | 2.5                | 11.3           | 2.6              | 1.9             |
+| HRPoseEstim - Stride   | 11.3               | 38.1           | 6.8              | 5.2             |
+| HRPoseEstim - Stages   | 2.8                | 10             | 2.3              | 1.9             |
+| HRPoseEstim - H+S      | 11.4               | 38             | 6.5              | 4.5             |
+| HRPoseEstim - Full     | 11.6               | 48.3           | 7.7              | 6.4             |
+
+As it is shown in the previous tables, OpenDR Lightweight OpenPose achieves higher FPS when it is resizing the input image into 256 pixels.
+It is easier to process that image, but as it is shown in the next tables the method falls apart when it comes to accuracy and there are no detections.
+
+We have evaluated the effect of using different inference settings, namely:
+- *HRPoseEstim - Baseline*, which refers to directly using the High Resolution Pose Estimation method, which is based on Lightweight OpenPose,
+- *HRPoseEstim - Half*, which refers to enabling inference in half (FP) precision,
+- *HRPoseEstim - Stride*, which refers to increasing stride by two in the input layer of the model,
+- *HRPoseEstim - Stages*, which refers to removing the refinement stages,
+- *HRPoseEstim - H+S*, which uses both half precision and increased stride, and
+- *HRPoseEstim - Full*, which refers to combining all three available optimization and were used as input to the models.
+
+
 #### Lightweight OpenPose with resizing
 | Method            | Average Precision (IoU=0.50) | Average Recall (IoU=0.50) |
 |-------------------|------------------------------|---------------------------|
@@ -300,7 +377,7 @@ The average precision and average recall on the COCO evaluation split is also re
 
 
 
-#### High Resolution Pose Estimation
+#### High Resolution Pose Estimation (Primary ROI Selection)
 | Method                 | Average Precision (IoU=0.50) | Average Recall (IoU=0.50) |
 |------------------------|------------------------------|---------------------------|
 | HRPoseEstim - Baseline | 0.615                        | 0.637                     |
@@ -323,6 +400,21 @@ The average precision and the average recall have been calculated on a 1080p ver
 
 For measuring the precision and recall we used the standard approach proposed for COCO, using an Intersection of Union (IoU) metric at 0.5.
 
+#### High Resolution Pose Estimation (Adaptive ROI Selection)
+
+The average precision and the average recall have been calculated on a 1080p version of COCO2017 validation dataset and the results are reported in the table below:
+
+| Method | Average Precision (IoU=0.50) | Average Recall (IoU=0.50) |
+|-------------------|------------------------------|---------------------------|
+| HRPoseEstim - Baseline | 0.594                        | 0.617                     |
+| HRPoseEstim - Half     | 0.586                        | 0.606                     |
+| HRPoseEstim - Stride   | 0.251                        | 0.271                     |
+| HRPoseEstim - Stages   | 0.511                        | 0.534                     |
+| HRPoseEstim - H+S      | 0.251                        | 0.263                     |
+| HRPoseEstim - Full     | 0.229                        | 0.247                     |
+
+For measuring the precision and recall we used the standard approach proposed for COCO, using an Intersection of Union (IoU) metric at 0.5.
+
 
 #### Notes
 

diff --git a/...cts/python/perception/pose_estimation/high_resolution_pose_estimation/README.md b/...cts/python/perception/pose_estimation/high_resolution_pose_estimation/README.md
@@ -7,6 +7,9 @@ More specifically, the applications provided are:
 1. demos/inference_demo.py: A tool that demonstrates how to perform inference on a single high resolution image and then draw the detected poses. 
 2. demos/eval_demo.py: A tool that demonstrates how to perform evaluation using the High Resolution Pose Estimation algorithm on 720p, 1080p and 1440p datasets. 
 3. demos/benchmarking_demo.py: A simple benchmarking tool for measuring the performance of High Resolution Pose Estimation in various platforms.
-4. demos/webcam_demo.py: A tool that performs live pose estimation both with high resolution and simple lightweight OpenPose using a webcam.
+4. demos/webcam_demo.py: A tool that performs live pose estimation with high resolution pose estimation method using a webcam.
+    If `--run-comparison` is enabled then it shows the differences between Lightweight OpenPose, and both adaptive and primary methods in High_resolution pose estimation. 
 
+NOTE: All demos can run either with "primary ROI selection" mode or with "adaptive ROI selection".
+Use `--method primary` or `--method adaptive` for each case.
 
diff --git a/...hon/perception/pose_estimation/high_resolution_pose_estimation/demos/benchmarking_demo.py b/...hon/perception/pose_estimation/high_resolution_pose_estimation/demos/benchmarking_demo.py
@@ -28,11 +28,12 @@
                         action="store_true")
     parser.add_argument("--height1", help="Base height of resizing in heatmap generation", default=360)
     parser.add_argument("--height2", help="Base height of resizing in second inference", default=540)
-
+    parser.add_argument("--method", help="Choose between primary or adaptive ROI selection methodology defaults to adaptive",
+                        default="adaptive", choices=["primary", "adaptive"])
     args = parser.parse_args()
 
-    device, accelerate, base_height1, base_height2 = args.device, args.accelerate,\
-        args.height1, args.height2
+    device, accelerate, base_height1, base_height2, method = args.device, args.accelerate, \
+        args.height1, args.height2, args.method
 
     if device == 'cpu':
         import torch
@@ -51,22 +52,25 @@
     pose_estimator = HighResolutionPoseEstimationLearner(device=device, num_refinement_stages=stages,
                                                          mobilenet_use_stride=stride, half_precision=half_precision,
                                                          first_pass_height=int(base_height1),
-                                                         second_pass_height=int(base_height2))
+                                                         second_pass_height=int(base_height2),
+                                                         method=method)
     pose_estimator.download(path=".", verbose=True)
     pose_estimator.load("openpose_default")
 
     # Download one sample image
     pose_estimator.download(path=".", mode="test_data")
-    image_path = join("temp", "dataset", "image", "000000000785_1080.jpg")
+    image_path = join("temp", "dataset", "image", "000000052591_1080.jpg")
     img = cv2.imread(image_path)
 
     fps_list = []
     print("Benchmarking...")
     for i in tqdm(range(50)):
         start_time = time.perf_counter()
         # Perform inference
-        poses = pose_estimator.infer(img)
-
+        if method == 'primary':
+            poses, _, _ = pose_estimator.infer(img)
+        if method == 'adaptive':
+            poses, _, _ = pose_estimator.infer_adaptive(img)
         end_time = time.perf_counter()
         fps_list.append(1.0 / (end_time - start_time))
     print("Average FPS: %.2f" % (np.mean(fps_list)))

diff --git a/...ects/python/perception/pose_estimation/high_resolution_pose_estimation/demos/eval_demo.py b/...ects/python/perception/pose_estimation/high_resolution_pose_estimation/demos/eval_demo.py
@@ -26,26 +26,29 @@
                         action="store_true")
     parser.add_argument("--height1", help="Base height of resizing in first inference", default=360)
     parser.add_argument("--height2", help="Base height of resizing in second inference", default=540)
-
+    parser.add_argument("--method", help="Choose between primary or adaptive ROI selection "
+                                         "methodology defaults to adaptive",
+                        default="adaptive", choices=["primary", "adaptive"])
     args = parser.parse_args()
 
-    device, accelerate, base_height1, base_height2 = args.device, args.accelerate,\
-        args.height1, args.height2
+    device, accelerate, base_height1, base_height2, method = args.device, args.accelerate, \
+        args.height1, args.height2, args.method
 
     if accelerate:
         stride = True
         stages = 0
         half_precision = True
     else:
-        stride = True
+        stride = False
         stages = 2
-        half_precision = True
+        half_precision = False
 
     pose_estimator = HighResolutionPoseEstimationLearner(device=device, num_refinement_stages=stages,
                                                          mobilenet_use_stride=stride,
                                                          half_precision=half_precision,
                                                          first_pass_height=int(base_height1),
-                                                         second_pass_height=int(base_height2))
+                                                         second_pass_height=int(base_height2),
+                                                         method=method)
     pose_estimator.download(path=".", verbose=True)
     pose_estimator.load("openpose_default")
 
@@ -55,8 +58,12 @@
     eval_dataset = ExternalDataset(path=join("temp", "dataset"), dataset_type="COCO")
 
     t0 = time.time()
-    results_dict = pose_estimator.eval(eval_dataset, use_subset=False, verbose=True, silent=True,
-                                       images_folder_name="image", annotations_filename="annotation.json")
+    if method == "primary":
+        results_dict = pose_estimator.eval(eval_dataset, use_subset=False, verbose=True, silent=False,
+                                           images_folder_name="image", annotations_filename="annotation.json")
+    if method == "adaptive":
+        results_dict = pose_estimator.eval_adaptive(eval_dataset, use_subset=False, verbose=True, silent=False,
+                                                    images_folder_name="image", annotations_filename="annotation.json")
     t1 = time.time()
     print("\n Evaluation time:  ", t1 - t0, "seconds")
     print("Evaluation results = ", results_dict)
diff --git a/...python/perception/pose_estimation/high_resolution_pose_estimation/demos/inference_demo.py b/...python/perception/pose_estimation/high_resolution_pose_estimation/demos/inference_demo.py
@@ -27,11 +27,12 @@
                         action="store_true")
     parser.add_argument("--height1", help="Base height of resizing in first inference", default=360)
     parser.add_argument("--height2", help="Base height of resizing in second inference", default=540)
-
+    parser.add_argument("--method", help="Choose between primary or adaptive ROI selection methodology defaults to adaptive",
+                        default="adaptive", choices=["primary", "adaptive"])
     args = parser.parse_args()
 
-    device, accelerate, base_height1, base_height2 = args.device, args.accelerate,\
-        args.height1, args.height2
+    device, accelerate, base_height1, base_height2, method = args.device, args.accelerate, \
+        args.height1, args.height2, args.method
 
     if accelerate:
         stride = True
@@ -45,21 +46,29 @@
     pose_estimator = HighResolutionPoseEstimationLearner(device=device, num_refinement_stages=stages,
                                                          mobilenet_use_stride=stride, half_precision=half_precision,
                                                          first_pass_height=int(base_height1),
-                                                         second_pass_height=int(base_height2))
+                                                         second_pass_height=int(base_height2),
+                                                         method=method)
     pose_estimator.download(path=".", verbose=True)
     pose_estimator.load("openpose_default")
 
     # Download one sample image
     pose_estimator.download(path=".", mode="test_data")
 
-    image_path = join("temp", "dataset", "image", "000000000785_1080.jpg")
-
+    image_path = join("temp", "dataset", "image", "000000052591_1080.jpg")
     img = Image.open(image_path)
 
-    poses, _ = pose_estimator.infer(img)
-
+    if method == 'primary':
+        poses, _, bounds = pose_estimator.infer(img)
+    if method == 'adaptive':
+        poses, _, bounds = pose_estimator.infer_adaptive(img)
     img_cv = img.opencv()
     for pose in poses:
         draw(img_cv, pose)
+
+    for i in range(len(bounds)):
+        if bounds[i][0] is not None:
+            cv2.rectangle(img_cv, (int(bounds[i][0]), int(bounds[i][2])),
+                          (int(bounds[i][1]), int(bounds[i][3])), (0, 0, 255), thickness=2)
+    img_cv = cv2.resize(img_cv, (1280, 720), interpolation=cv2.INTER_CUBIC)
     cv2.imshow('Results', img_cv)
     cv2.waitKey(0)