Project import generated by Copybara.

GitOrigin-RevId: 5b4c149782c086ebf9ef390195fb260ad0103217
google-ai-edge · Feb 27, 2021 · a92cff7 · a92cff7
1 parent 350fbb2
commit a92cff7
Show file tree

Hide file tree

Showing 2 changed files with 147 additions and 121 deletions.
diff --git a/docs/solutions/pose.md b/docs/solutions/pose.md
@@ -2,6 +2,8 @@
 layout: default
 title: Pose
 parent: Solutions
+has_children: true
+has_toc: false
 nav_order: 5
 ---
 
@@ -21,10 +23,9 @@ nav_order: 5
 ## Overview
 
 Human pose estimation from video plays a critical role in various applications
-such as
-[quantifying physical exercises](#pose-classification-and-repetition-counting),
-sign language recognition, and full-body gesture control. For example, it can
-form the basis for yoga, dance, and fitness applications. It can also enable the
+such as [quantifying physical exercises](./pose_classification.md), sign
+language recognition, and full-body gesture control. For example, it can form
+the basis for yoga, dance, and fitness applications. It can also enable the
 overlay of digital content and information on top of the physical world in
 augmented reality.
 
@@ -387,121 +388,6 @@ on how to build MediaPipe examples.
     *   Target:
         [`mediapipe/examples/desktop/upper_body_pose_tracking:upper_body_pose_tracking_gpu`](https://github.com/google/mediapipe/tree/master/mediapipe/examples/desktop/upper_body_pose_tracking/BUILD)
 
-## Pose Classification and Repetition Counting
-
-One of the applications
-[BlazePose](https://ai.googleblog.com/2020/08/on-device-real-time-body-pose-tracking.html)
-can enable is fitness. More specifically - pose classification and repetition
-counting. In this section we'll provide basic guidance on building a custom pose
-classifier with the help of a
-[Colab](https://drive.google.com/file/d/19txHpN8exWhstO6WVkfmYYVC6uug_oVR/view?usp=sharing)
-and wrap it in a simple
-[fitness app](https://mediapipe.page.link/mlkit-pose-classification-demo-app)
-powered by [ML Kit](https://developers.google.com/ml-kit). Push-ups and squats
-are used for demonstration purposes as the most common exercises.
-
-![pose_classification_pushups_and_squats.gif](../images/mobile/pose_classification_pushups_and_squats.gif) |
-:--------------------------------------------------------------------------------------------------------: |
-*Fig 4. Pose classification and repetition counting with MediaPipe Pose.*                                  |
-
-We picked the
-[k-nearest neighbors algorithm](https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm)
-(k-NN) as the classifier. It's simple and easy to start with. The algorithm
-determines the object's class based on the closest samples in the training set.
-To build it, one needs to:
-
-*   Collect image samples of the target exercises and run pose prediction on
-    them,
-*   Convert obtained pose landmarks to a representation suitable for the k-NN
-    classifier and form a training set,
-*   Perform the classification itself followed by repetition counting.
-
-### Training Set
-
-To build a good classifier appropriate samples should be collected for the
-training set: about a few hundred samples for each terminal state of each
-exercise (e.g., "up" and "down" positions for push-ups). It's important that
-collected samples cover different camera angles, environment conditions, body
-shapes, and exercise variations.
-
-![pose_classification_pushups_un_and_down_samples.jpg](../images/mobile/pose_classification_pushups_un_and_down_samples.jpg) |
-:--------------------------------------------------------------------------------------------------------------------------: |
-*Fig 5. Two terminal states of push-ups.*                                                                                    |
-
-To transform samples into a k-NN classifier training set, either
-[basic](https://drive.google.com/file/d/1z4IM8kG6ipHN6keadjD-F6vMiIIgViKK/view?usp=sharing)
-or
-[extended](https://drive.google.com/file/d/19txHpN8exWhstO6WVkfmYYVC6uug_oVR/view?usp=sharing)
-Colab could be used. They both use the
-[Python Solution API](#python-solution-api) to run the BlazePose models on given
-images and dump predicted pose landmarks to a CSV file. Additionally, the
-extended Colab provides useful tools to find outliers (e.g., wrongly predicted
-poses) and underrepresented classes (e.g., not covering all camera angles) by
-classifying each sample against the entire training set. After that, you'll be
-able to test the classifier on an arbitrary video right in the Colab.
-
-### Classification
-
-Code of the classifier is available both in the
-[extended](https://drive.google.com/file/d/19txHpN8exWhstO6WVkfmYYVC6uug_oVR/view?usp=sharing)
-Colab and in the
-[ML Kit demo app](https://mediapipe.page.link/mlkit-pose-classification-demo-app).
-Please refer to them for details of the approach described below.
-
-The k-NN algorithm used for pose classification requires a feature vector
-representation of each sample and a metric to compute the distance between two
-such vectors to find the nearest pose samples to a target one.
-
-To convert pose landmarks to a feature vector, we use pairwise distances between
-predefined lists of pose joints, such as distances between wrist and shoulder,
-ankle and hip, and two wrists. Since the algorithm relies on distances, all
-poses are normalized to have the same torso size and vertical torso orientation
-before the conversion.
-
-![pose_classification_pairwise_distances.png](../images/mobile/pose_classification_pairwise_distances.png) |
-:--------------------------------------------------------------------------------------------------------: |
-*Fig 6. Main pairwise distances used for the pose feature vector.*                                         |
-
-To get a better classification result, k-NN search is invoked twice with
-different distance metrics:
-
-*   First, to filter out samples that are almost the same as the target one but
-    have only a few different values in the feature vector (which means
-    differently bent joints and thus other pose class), minimum per-coordinate
-    distance is used as distance metric,
-*   Then average per-coordinate distance is used to find the nearest pose
-    cluster among those from the first search.
-
-Finally, we apply
-[exponential moving average](https://en.wikipedia.org/wiki/Moving_average#Exponential_moving_average)
-(EMA) smoothing to level any noise from pose prediction or classification. To do
-that, we search not only for the nearest pose cluster, but we calculate a
-probability for each of them and use it for smoothing over time.
-
-### Repetition Counter
-
-To count the repetitions, the algorithm monitors the probability of a target
-pose class. Let's take push-ups with its "up" and "down" terminal states:
-
-*   When the probability of the "down" pose class passes a certain threshold for
-    the first time, the algorithm marks that the "down" pose class is entered.
-*   Once the probability drops below the threshold, the algorithm marks that the
-    "down" pose class has been exited and increases the counter.
-
-To avoid cases when the probability fluctuates around the threshold (e.g., when
-the user pauses between "up" and "down" states) causing phantom counts, the
-threshold used to detect when the state is exited is actually slightly lower
-than the one used to detect when the state is entered. It creates an interval
-where the pose class and the counter can't be changed.
-
-### Future Work
-
-We are actively working on improving BlazePose GHUM 3D's Z prediction. It will
-allow us to use joint angles in the feature vectors, which are more natural and
-easier to configure (although distances can still be useful to detect touches
-between body parts) and to perform rotation normalization of poses and reduce
-the number of camera angles required for accurate k-NN classification.
-
 ## Resources
 
 *   Google AI Blog:
@@ -512,5 +398,3 @@ the number of camera angles required for accurate k-NN classification.
 *   [Models and model cards](./models.md#pose)
 *   [Web demo](https://code.mediapipe.dev/codepen/pose)
 *   [Python Colab](https://mediapipe.page.link/pose_py_colab)
-*   [Pose Classification Colab (Basic)](https://mediapipe.page.link/pose_classification_basic)
-*   [Pose Classification Colab (Extended)](https://mediapipe.page.link/pose_classification_extended)
diff --git a/docs/solutions/pose_classification.md b/docs/solutions/pose_classification.md
@@ -0,0 +1,142 @@
+---
+layout: default
+title: Pose Classification
+parent: Pose
+grand_parent: Solutions
+nav_order: 1
+---
+
+# Pose Classification
+{: .no_toc }
+
+<details close markdown="block">
+  <summary>
+    Table of contents
+  </summary>
+  {: .text-delta }
+1. TOC
+{:toc}
+</details>
+---
+
+## Overview
+
+One of the applications
+[BlazePose](https://ai.googleblog.com/2020/08/on-device-real-time-body-pose-tracking.html)
+can enable is fitness. More specifically - pose classification and repetition
+counting. In this section we'll provide basic guidance on building a custom pose
+classifier with the help of [Colabs](#colabs) and wrap it in a simple
+[fitness app](https://mediapipe.page.link/mlkit-pose-classification-demo-app)
+powered by [ML Kit](https://developers.google.com/ml-kit). Push-ups and squats
+are used for demonstration purposes as the most common exercises.
+
+![pose_classification_pushups_and_squats.gif](../images/mobile/pose_classification_pushups_and_squats.gif) |
+:--------------------------------------------------------------------------------------------------------: |
+*Fig 1. Pose classification and repetition counting with MediaPipe Pose.*                                  |
+
+We picked the
+[k-nearest neighbors algorithm](https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm)
+(k-NN) as the classifier. It's simple and easy to start with. The algorithm
+determines the object's class based on the closest samples in the training set.
+
+**To build it, one needs to:**
+
+1.  Collect image samples of the target exercises and run pose prediction on
+    them,
+2.  Convert obtained pose landmarks to a representation suitable for the k-NN
+    classifier and form a training set using these [Colabs](#colabs),
+3.  Perform the classification itself followed by repetition counting (e.g., in
+    the
+    [ML Kit demo app](https://mediapipe.page.link/mlkit-pose-classification-demo-app)).
+
+## Training Set
+
+To build a good classifier appropriate samples should be collected for the
+training set: about a few hundred samples for each terminal state of each
+exercise (e.g., "up" and "down" positions for push-ups). It's important that
+collected samples cover different camera angles, environment conditions, body
+shapes, and exercise variations.
+
+![pose_classification_pushups_un_and_down_samples.jpg](../images/mobile/pose_classification_pushups_un_and_down_samples.jpg) |
+:--------------------------------------------------------------------------------------------------------------------------: |
+*Fig 2. Two terminal states of push-ups.*                                                                                    |
+
+To transform samples into a k-NN classifier training set, both
+[`Pose Classification Colab (Basic)`] and
+[`Pose Classification Colab (Extended)`] could be used. They use the
+[Python Solution API](./pose.md#python-solution-api) to run the BlazePose models
+on given images and dump predicted pose landmarks to a CSV file. Additionally,
+the [`Pose Classification Colab (Extended)`] provides useful tools to find
+outliers (e.g., wrongly predicted poses) and underrepresented classes (e.g., not
+covering all camera angles) by classifying each sample against the entire
+training set. After that, you'll be able to test the classifier on an arbitrary
+video right in the Colab.
+
+## Classification
+
+Code of the classifier is available both in the
+[`Pose Classification Colab (Extended)`] and in the
+[ML Kit demo app](https://mediapipe.page.link/mlkit-pose-classification-demo-app).
+Please refer to them for details of the approach described below.
+
+The k-NN algorithm used for pose classification requires a feature vector
+representation of each sample and a metric to compute the distance between two
+such vectors to find the nearest pose samples to a target one.
+
+To convert pose landmarks to a feature vector, we use pairwise distances between
+predefined lists of pose joints, such as distances between wrist and shoulder,
+ankle and hip, and two wrists. Since the algorithm relies on distances, all
+poses are normalized to have the same torso size and vertical torso orientation
+before the conversion.
+
+![pose_classification_pairwise_distances.png](../images/mobile/pose_classification_pairwise_distances.png) |
+:--------------------------------------------------------------------------------------------------------: |
+*Fig 3. Main pairwise distances used for the pose feature vector.*                                         |
+
+To get a better classification result, k-NN search is invoked twice with
+different distance metrics:
+
+*   First, to filter out samples that are almost the same as the target one but
+    have only a few different values in the feature vector (which means
+    differently bent joints and thus other pose class), minimum per-coordinate
+    distance is used as distance metric,
+*   Then average per-coordinate distance is used to find the nearest pose
+    cluster among those from the first search.
+
+Finally, we apply
+[exponential moving average](https://en.wikipedia.org/wiki/Moving_average#Exponential_moving_average)
+(EMA) smoothing to level any noise from pose prediction or classification. To do
+that, we search not only for the nearest pose cluster, but we calculate a
+probability for each of them and use it for smoothing over time.
+
+## Repetition Counting
+
+To count the repetitions, the algorithm monitors the probability of a target
+pose class. Let's take push-ups with its "up" and "down" terminal states:
+
+*   When the probability of the "down" pose class passes a certain threshold for
+    the first time, the algorithm marks that the "down" pose class is entered.
+*   Once the probability drops below the threshold, the algorithm marks that the
+    "down" pose class has been exited and increases the counter.
+
+To avoid cases when the probability fluctuates around the threshold (e.g., when
+the user pauses between "up" and "down" states) causing phantom counts, the
+threshold used to detect when the state is exited is actually slightly lower
+than the one used to detect when the state is entered. It creates an interval
+where the pose class and the counter can't be changed.
+
+## Future Work
+
+We are actively working on improving BlazePose GHUM 3D's Z prediction. It will
+allow us to use joint angles in the feature vectors, which are more natural and
+easier to configure (although distances can still be useful to detect touches
+between body parts) and to perform rotation normalization of poses and reduce
+the number of camera angles required for accurate k-NN classification.
+
+## Colabs
+
+*   [`Pose Classification Colab (Basic)`]
+*   [`Pose Classification Colab (Extended)`]
+
+[`Pose Classification Colab (Basic)`]: https://mediapipe.page.link/pose_classification_basic
+[`Pose Classification Colab (Extended)`]: https://mediapipe.page.link/pose_classification_extended