diff --git a/README.md b/README.md index 835164cae25..9219f7e3aee 100644 --- a/README.md +++ b/README.md @@ -187,6 +187,7 @@ More amazing notebooks here! | [403-action-recognition-webcam](notebooks/403-action-recognition-webcam/)
[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/openvinotoolkit/openvino_notebooks/HEAD?filepath=notebooks%2F403-action-recognition-webcam%2F403-action-recognition-webcam.ipynb) | Human action recognition with a webcam or video file | | | [404-style-transfer-webcam](notebooks/404-style-transfer-webcam/)
[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/openvinotoolkit/openvino_notebooks/HEAD?filepath=notebooks%2F404-style-transfer-webcam%2F404-style-transfer.ipynb) | Style Transfer with a webcam or video file | | | [405-paddle-ocr-webcam](notebooks/405-paddle-ocr-webcam/)
[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/openvinotoolkit/openvino_notebooks/HEAD?labpath=notebooks%2F405-paddle-ocr-webcam%2F405-paddle-ocr-webcam.ipynb) | OCR with a webcam or video file | | +| [406-3D-pose-estimation-webcam](notebooks/406-3D-pose-estimation-webcam/)
[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/openvinotoolkit/openvino_notebooks.git/main?labpath=notebooks%2F406-3D-pose-estimation-webcam%2F406-3D-pose-estimation.ipynb) | 3D display of human pose estimation with a webcam or video file | | diff --git a/README_cn.md b/README_cn.md index 1d85041c355..0b1b35d7785 100644 --- a/README_cn.md +++ b/README_cn.md @@ -1,5 +1,6 @@ [English](README.md) | 简体中文 +

📚 OpenVINO™ Notebooks

[![Apache License Version 2.0](https://img.shields.io/badge/license-Apache_2.0-green.svg)](https://github.com/openvinotoolkit/openvino_notebooks/blob/main/LICENSE) @@ -166,10 +167,10 @@ Jupyter notebooks 分为四个大类,选择一个跟你需求相关的开始 ### 📺 实时演示 在网络摄像头或视频文件上运行的实时推理演示。 -| [401-object-detection-webcam](notebooks/401-object-detection-webcam/)
[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/openvinotoolkit/openvino_notebooks/HEAD?filepath=notebooks%2F401-object-detection-webcam%2F401-object-detection.ipynb) | [402-pose-estimation-webcam](notebooks/402-pose-estimation-webcam/)
[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/openvinotoolkit/openvino_notebooks/HEAD?filepath=notebooks%2F402-pose-estimation-webcam%2F402-pose-estimation.ipynb) | [403-action-recognition-webcam](notebooks/403-action-recognition-webcam/)
[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/openvinotoolkit/openvino_notebooks/HEAD?filepath=notebooks%2F403-action-recognition-webcam%2F403-action-recognition-webcam.ipynb) | [405-paddle-ocr-webcam](notebooks/405-paddle-ocr-webcam/)
[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/openvinotoolkit/openvino_notebooks/HEAD?labpath=notebooks%2F405-paddle-ocr-webcam%2F405-paddle-ocr-webcam.ipynb) | -| -------------------------------------------------------------------------------- | --------------------------------------------------------------------------- | ------------------------------------------------------------------------- | --------------------------------------------------------------------------- | -| 使用网络摄像头或视频文件进行目标检测 | 使用网络摄像头或视频文件进行人体姿态检测 | 使用网络摄像头或视频文件进行动作识别 | 使用网络摄像头或视频文件进行OCR | -| | | | | +| [401-object-detection-webcam](notebooks/401-object-detection-webcam/)
[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/openvinotoolkit/openvino_notebooks/HEAD?filepath=notebooks%2F401-object-detection-webcam%2F401-object-detection.ipynb) | [402-pose-estimation-webcam](notebooks/402-pose-estimation-webcam/)
[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/openvinotoolkit/openvino_notebooks/HEAD?filepath=notebooks%2F402-pose-estimation-webcam%2F402-pose-estimation.ipynb) | [403-action-recognition-webcam](notebooks/403-action-recognition-webcam/)
[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/openvinotoolkit/openvino_notebooks/HEAD?filepath=notebooks%2F403-action-recognition-webcam%2F403-action-recognition-webcam.ipynb) | [405-paddle-ocr-webcam](notebooks/405-paddle-ocr-webcam/)
[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/openvinotoolkit/openvino_notebooks/HEAD?labpath=notebooks%2F405-paddle-ocr-webcam%2F405-paddle-ocr-webcam.ipynb) | [406-3D-pose-estimation-webcam](notebooks/406-3D-pose-estimation-webcam/)
[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/openvinotoolkit/openvino_notebooks.git/main?labpath=notebooks%2F406-3D-pose-estimation-webcam%2F406-3D-pose-estimation.ipynb) | +| -------------------------------------------------------------------------------- | --------------------------------------------------------------------------- | ------------------------------------------------------------------------- | --------------------------------------------------------------------------- | -------------------------------------------------------------------------------- | +| 使用网络摄像头或视频文件进行目标检测 | 使用网络摄像头或视频文件进行人体姿态检测 | 使用网络摄像头或视频文件进行动作识别 | 使用网络摄像头或视频文件进行OCR | 使用网络摄像头或视频文件进行三维人体姿态估计 | +| | | | | | diff --git a/notebooks/406-3D-pose-estimation-webcam/406-3D-pose-estimation.ipynb b/notebooks/406-3D-pose-estimation-webcam/406-3D-pose-estimation.ipynb new file mode 100644 index 00000000000..5b79827d695 --- /dev/null +++ b/notebooks/406-3D-pose-estimation-webcam/406-3D-pose-estimation.ipynb @@ -0,0 +1,592 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "9de9a93e-9247-4799-a5bb-2ec1575ae8c2", + "metadata": {}, + "source": [ + "# Live 3D Human Pose Estimation with OpenVINO\n", + "\n", + "This notebook demonstrates live 3D Human Pose Estimation via a webcam with OpenVINO. We utilize the model [human-pose-estimation-3d-0001](https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/human-pose-estimation-3d-0001) from [Open Model Zoo](https://github.com/openvinotoolkit/open_model_zoo/). At the end of this notebook, you will see live inference results from your webcam (if available). Alternatively, you can also upload a video file to test out the algorithms.\n", + "**Make sure you have properly installed the [Jupyter extension](https://github.com/jupyter-widgets/pythreejs#jupyterlab) and using Jupyterlab to run the demo as suggested in the README.md**\n", + "\n", + "> **NOTE**: _To use the webcam, you must run this Jupyter notebook on a computer with a webcam. If you run on a remote server, the webcam will not work. However, you can still do inference on a video file in the final step. This demo utilizes the Python interface in Three.js integrated with WebGL to process data from the model inference. These results are processed and displayed in the notebook._\n", + "\n", + "_To ensure that the results are displayed correctly, we recommend that you run the code in a browser on one of the following operating systems:_ \n", + "_Ubuntu, Windows: chrome_ \n", + "_macOS: Safari_" + ] + }, + { + "cell_type": "markdown", + "id": "7925a51b-26ec-43c5-9660-0705c03d724d", + "metadata": {}, + "source": [ + "## Prerequisites\n", + "\n", + "**The Pythreejs extension may not display properly when using the latest Jupyter Notebook release (2.4.1), so it is recommended to use Jupyter Lab instead.**" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b84c1f5e-502b-4037-b871-9f84b4e8cef0", + "metadata": {}, + "outputs": [], + "source": [ + "!pip install pythreejs" + ] + }, + { + "cell_type": "markdown", + "id": "5a9332fb-1cee-4faa-9555-731ddf0e0df7", + "metadata": {}, + "source": [ + "## Imports" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "316ad889-8514-430f-baf4-4f32abd43356", + "metadata": {}, + "outputs": [], + "source": [ + "import collections\n", + "import sys\n", + "import time\n", + "from pathlib import Path\n", + "\n", + "import cv2\n", + "import ipywidgets as widgets\n", + "import numpy as np\n", + "from IPython.display import clear_output, display\n", + "from openvino.runtime import Core\n", + "\n", + "sys.path.append(\"../utils\")\n", + "import notebook_utils as utils\n", + "\n", + "sys.path.append(\"./engine\")\n", + "import engine.engine3js as engine\n", + "from engine.parse_poses import parse_poses" + ] + }, + { + "cell_type": "markdown", + "id": "c96ad61a-59ff-4873-b2f3-3994d6826f51", + "metadata": {}, + "source": [ + "## The model\n", + "\n", + "### Download the model\n", + "\n", + "We use `omz_downloader`, which is a command line tool from the `openvino-dev` package. `omz_downloader` automatically creates a directory structure and downloads the selected model." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "31bd89c7-be8a-4b03-ba38-c19d328e332d", + "metadata": {}, + "outputs": [], + "source": [ + "# directory where model will be downloaded\n", + "base_model_dir = \"model\"\n", + "\n", + "# model name as named in Open Model Zoo\n", + "model_name = \"human-pose-estimation-3d-0001\"\n", + "# selected precision (FP32, FP16)\n", + "precision = \"FP32\"\n", + "\n", + "BASE_MODEL_NAME = f\"{base_model_dir}/public/{model_name}/{model_name}\"\n", + "model_path = Path(BASE_MODEL_NAME).with_suffix(\".pth\")\n", + "onnx_path = Path(BASE_MODEL_NAME).with_suffix(\".onnx\")\n", + "\n", + "ir_model_path = f\"model/public/{model_name}/{precision}/{model_name}.xml\"\n", + "model_weights_path = f\"model/public/{model_name}/{precision}/{model_name}.bin\"\n", + "\n", + "if not model_path.exists():\n", + " download_command = (\n", + " f\"omz_downloader \" f\"--name {model_name} \" f\"--output_dir {base_model_dir}\"\n", + " )\n", + " ! $download_command" + ] + }, + { + "cell_type": "markdown", + "id": "88f39f76-2f81-4c18-9fda-98ea6a944220", + "metadata": {}, + "source": [ + "### Convert Model to OpenVINO IR format\n", + "The selected model comes from the public directory, which means it must be converted into OpenVINO Intermediate Representation (OpenVINO IR). We use omz_converter to convert the ONNX format model to the OpenVINO IR format." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c9bdfdee-c2ef-4710-96c1-8a6a896a8cba", + "metadata": {}, + "outputs": [], + "source": [ + "if not onnx_path.exists():\n", + " convert_command = (\n", + " f\"omz_converter \"\n", + " f\"--name {model_name} \"\n", + " f\"--precisions {precision} \"\n", + " f\"--download_dir {base_model_dir} \"\n", + " f\"--output_dir {base_model_dir}\"\n", + " )\n", + " ! $convert_command" + ] + }, + { + "cell_type": "markdown", + "id": "986a07ac-d092-4254-848a-dd48f4934fb5", + "metadata": {}, + "source": [ + "### Load the model\n", + "\n", + "Converted models are located in a fixed structure, which indicates vendor, model name and precision.\n", + "\n", + "First, we initialize the inference engine, OpenVINO Runtime. Then read the network architecture and model weights from the .bin and .xml files to compile for the desired device. An inference request is then created to infer the compiled model." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "92a04102-aebf-4976-874b-b98dca97ec48", + "metadata": {}, + "outputs": [], + "source": [ + "# initialize inference engine\n", + "ie_core = Core()\n", + "# read the network and corresponding weights from file\n", + "model = ie_core.read_model(model=ir_model_path, weights=model_weights_path)\n", + "# load the model on the CPU (you can use GPU or MYRIAD as well)\n", + "compiled_model = ie_core.compile_model(model=model, device_name=\"CPU\")\n", + "infer_request = compiled_model.create_infer_request()\n", + "input_tensor_name = model.inputs[0].get_any_name()\n", + "\n", + "# get input and output names of nodes\n", + "input_layer = compiled_model.input(0)\n", + "output_layers = list(compiled_model.outputs)" + ] + }, + { + "cell_type": "markdown", + "id": "5c0ffd17-df71-4178-8df8-db4ccf431621", + "metadata": {}, + "source": [ + "The input to the model is data from the input image and the outputs are heatmaps, PAF (part affinity fields) and features" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e1b25847-fc80-41a1-930b-7c304fd1fe70", + "metadata": {}, + "outputs": [], + "source": [ + "input_layer.any_name, [o.any_name for o in output_layers]" + ] + }, + { + "cell_type": "markdown", + "id": "48eb5032-a06e-48c1-a3d6-f0fbad9924fb", + "metadata": {}, + "source": [ + "## Processing\n", + "### Model Inference\n", + "Frames captured from video files or the live webcam are used as the input to the 3D model. This is how we obtain the output heatmaps, PAF (part affinity fields) and features." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "08f8055b-a6cf-4003-8232-6f73a86d6034", + "metadata": {}, + "outputs": [], + "source": [ + "def model_infer(scaled_img, stride):\n", + " \"\"\"\n", + " Run model inference on the input image\n", + "\n", + " Parameters:\n", + " scaled_img: resized image according to the input size of the model\n", + " stride: int, the stride of the window\n", + " \"\"\"\n", + "\n", + " # Remove excess space from the picture\n", + " img = scaled_img[\n", + " 0 : scaled_img.shape[0] - (scaled_img.shape[0] % stride),\n", + " 0 : scaled_img.shape[1] - (scaled_img.shape[1] % stride),\n", + " ]\n", + "\n", + " img = np.transpose(img, (2, 0, 1))[\n", + " None,\n", + " ]\n", + " infer_request.infer({input_tensor_name: img})\n", + " # A set of three inference results is obtained\n", + " results = {\n", + " name: infer_request.get_tensor(name).data[:]\n", + " for name in {\"features\", \"heatmaps\", \"pafs\"}\n", + " }\n", + " # Get the results\n", + " results = (results[\"features\"][0], results[\"heatmaps\"][0], results[\"pafs\"][0])\n", + "\n", + " return results" + ] + }, + { + "cell_type": "markdown", + "id": "6991403a-4f87-45be-9b3f-d30b23a46dbe", + "metadata": {}, + "source": [ + "### Draw 2D Pose Overlays\n", + "We need to define some connections between the joints in advance, so that we can draw the structure of the human body in the resulting image after obtaining the inference results.\n", + "Joints are drawn as circles and limbs are drawn as lines. The code is based on the [3D Human Pose Estimation Demo](https://github.com/openvinotoolkit/open_model_zoo/tree/master/demos/human_pose_estimation_3d_demo/python) from Open Model Zoo." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "22fd3e08-ed3b-44ac-bd07-4a80130d6681", + "metadata": {}, + "outputs": [], + "source": [ + "# 3D edge index array\n", + "body_edges = np.array(\n", + " [\n", + " [0, 1], \n", + " [0, 9], [9, 10], [10, 11], # neck - r_shoulder - r_elbow - r_wrist\n", + " [0, 3], [3, 4], [4, 5], # neck - l_shoulder - l_elbow - l_wrist\n", + " [1, 15], [15, 16], # nose - l_eye - l_ear\n", + " [1, 17], [17, 18], # nose - r_eye - r_ear\n", + " [0, 6], [6, 7], [7, 8], # neck - l_hip - l_knee - l_ankle\n", + " [0, 12], [12, 13], [13, 14], # neck - r_hip - r_knee - r_ankle\n", + " ]\n", + ")\n", + "\n", + "\n", + "body_edges_2d = np.array(\n", + " [\n", + " [0, 1], # neck - nose\n", + " [1, 16], [16, 18], # nose - l_eye - l_ear\n", + " [1, 15], [15, 17], # nose - r_eye - r_ear\n", + " [0, 3], [3, 4], [4, 5], # neck - l_shoulder - l_elbow - l_wrist\n", + " [0, 9], [9, 10], [10, 11], # neck - r_shoulder - r_elbow - r_wrist\n", + " [0, 6], [6, 7], [7, 8], # neck - l_hip - l_knee - l_ankle\n", + " [0, 12], [12, 13], [13, 14], # neck - r_hip - r_knee - r_ankle\n", + " ]\n", + ") \n", + "\n", + "\n", + "def draw_poses(frame, poses_2d, scaled_img, use_popup):\n", + " \"\"\"\n", + " Draw 2D pose overlays on the image to visualize estimated poses.\n", + " Joints are drawn as circles and limbs are drawn as lines.\n", + "\n", + " :param frame: the input image\n", + " :param poses_2d: array of human joint pairs\n", + " \"\"\"\n", + " for pose in poses_2d:\n", + " pose = np.array(pose[0:-1]).reshape((-1, 3)).transpose()\n", + " was_found = pose[2] > 0\n", + "\n", + " pose[0], pose[1] = (\n", + " pose[0] * frame.shape[1] / scaled_img.shape[1],\n", + " pose[1] * frame.shape[0] / scaled_img.shape[0],\n", + " )\n", + "\n", + " # Draw joints.\n", + " for edge in body_edges_2d:\n", + " if was_found[edge[0]] and was_found[edge[1]]:\n", + " cv2.line(\n", + " frame,\n", + " tuple(pose[0:2, edge[0]].astype(np.int32)),\n", + " tuple(pose[0:2, edge[1]].astype(np.int32)),\n", + " (255, 255, 0),\n", + " 4,\n", + " cv2.LINE_AA,\n", + " )\n", + " # Draw limbs.\n", + " for kpt_id in range(pose.shape[1]):\n", + " if pose[2, kpt_id] != -1:\n", + " cv2.circle(\n", + " frame,\n", + " tuple(pose[0:2, kpt_id].astype(np.int32)),\n", + " 3,\n", + " (0, 255, 255),\n", + " -1,\n", + " cv2.LINE_AA,\n", + " )\n", + "\n", + " return frame" + ] + }, + { + "cell_type": "markdown", + "id": "a6894ce8-ac91-464d-a7f7-54d09f399f4f", + "metadata": {}, + "source": [ + "### Main Processing Function\n", + "\n", + "Run 3D pose estimation on the specified source. It could be either a webcam or a video file." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "3be526d0-75ad-4bd1-85b1-ca8185eca918", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "def run_pose_estimation(source=0, flip=False, use_popup=False, skip_frames=0):\n", + " \"\"\"\n", + " 2D image as input, using OpenVINO as inference backend,\n", + " get joints 3D coordinates, and draw 3D human skeleton in the scene\n", + "\n", + " :param source: The webcam number to feed the video stream with primary webcam set to \"0\", or the video path.\n", + " :param flip: To be used by VideoPlayer function for flipping capture image.\n", + " :param use_popup: False for showing encoded frames over this notebook, True for creating a popup window.\n", + " :param skip_frames: Number of frames to skip at the beginning of the video.\n", + " \"\"\"\n", + "\n", + " focal_length = -1 # default\n", + " stride = 8\n", + " player = None\n", + " skeleton_set = None\n", + "\n", + " try:\n", + " # create video player to play with target fps video_path\n", + " # get the frame from camera\n", + " # You can skip first N frames to fast forward video. change 'skip_first_frames'\n", + " player = utils.VideoPlayer(source, flip=flip, fps=30, skip_first_frames=skip_frames)\n", + " # start capturing\n", + " player.start()\n", + "\n", + " input_image = player.next()\n", + " # set the window size\n", + " resize_scale = 450 / input_image.shape[1]\n", + " windows_width = int(input_image.shape[1] * resize_scale)\n", + " windows_height = int(input_image.shape[0] * resize_scale)\n", + "\n", + " # use visualization library\n", + " engine3D = engine.Engine3js(grid=True, axis=True, view_width=windows_width, view_height=windows_height)\n", + "\n", + " if use_popup:\n", + " # display the 3D human pose in this notebook, and origin frame in popup window\n", + " display(engine3D.renderer)\n", + " title = \"Press ESC to Exit\"\n", + " cv2.namedWindow(title, cv2.WINDOW_KEEPRATIO | cv2.WINDOW_AUTOSIZE)\n", + " else:\n", + " # set the 2D image box, show both human pose and image in the notebook\n", + " imgbox = widgets.Image(\n", + " format=\"jpg\", height=windows_height, width=windows_width\n", + " )\n", + " display(widgets.HBox([engine3D.renderer, imgbox]))\n", + "\n", + " skeleton = engine.Skeleton(body_edges=body_edges)\n", + "\n", + " processing_times = collections.deque()\n", + "\n", + " while True:\n", + " # grab the frame\n", + " frame = player.next()\n", + " if frame is None:\n", + " print(\"Source ended\")\n", + " break\n", + "\n", + " # resize image and change dims to fit neural network input\n", + " # (see https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/human-pose-estimation-3d-0001)\n", + " scaled_img = cv2.resize(frame, dsize=(model.inputs[0].shape[3], model.inputs[0].shape[2]))\n", + "\n", + " if focal_length < 0: # Focal length is unknown\n", + " focal_length = np.float32(0.8 * scaled_img.shape[1])\n", + "\n", + " # inference start\n", + " start_time = time.time()\n", + " # get results\n", + " inference_result = model_infer(scaled_img, stride)\n", + "\n", + " # inference stop\n", + " stop_time = time.time()\n", + " processing_times.append(stop_time - start_time)\n", + " # Process the point to point coordinates of the data\n", + " poses_3d, poses_2d = parse_poses(inference_result, 1, stride, focal_length, True)\n", + "\n", + " # use processing times from last 200 frames\n", + " if len(processing_times) > 200:\n", + " processing_times.popleft()\n", + "\n", + " processing_time = np.mean(processing_times) * 1000\n", + " fps = 1000 / processing_time\n", + "\n", + " if len(poses_3d) > 0:\n", + " # From here, you can rotate the 3D point positions using the function \"draw_poses\",\n", + " # or you can directly make the correct mapping below to properly display the object image on the screen\n", + " poses_3d_copy = poses_3d.copy()\n", + " x = poses_3d_copy[:, 0::4]\n", + " y = poses_3d_copy[:, 1::4]\n", + " z = poses_3d_copy[:, 2::4]\n", + " poses_3d[:, 0::4], poses_3d[:, 1::4], poses_3d[:, 2::4] = (\n", + " -z + np.ones(poses_3d[:, 2::4].shape) * 200,\n", + " -y + np.ones(poses_3d[:, 2::4].shape) * 100,\n", + " -x,\n", + " )\n", + "\n", + " poses_3d = poses_3d.reshape(poses_3d.shape[0], 19, -1)[:, :, 0:3]\n", + " people = skeleton(poses_3d=poses_3d)\n", + "\n", + " try:\n", + " engine3D.scene_remove(skeleton_set)\n", + " except Exception:\n", + " pass\n", + "\n", + " engine3D.scene_add(people)\n", + " skeleton_set = people\n", + "\n", + " # draw 2D\n", + " frame = draw_poses(frame, poses_2d, scaled_img, use_popup)\n", + "\n", + " else:\n", + " try:\n", + " engine3D.scene_remove(skeleton_set)\n", + " skeleton_set = None\n", + " except Exception:\n", + " pass\n", + "\n", + " cv2.putText(\n", + " frame,\n", + " f\"Inference time: {processing_time:.1f}ms ({fps:.1f} FPS)\",\n", + " (10, 30),\n", + " cv2.FONT_HERSHEY_COMPLEX,\n", + " 0.7,\n", + " (0, 0, 255),\n", + " 1,\n", + " cv2.LINE_AA,\n", + " )\n", + "\n", + " if use_popup:\n", + " cv2.imshow(title, frame)\n", + " key = cv2.waitKey(1)\n", + " # escape = 27, use ESC to exit\n", + " if key == 27:\n", + " break\n", + " else:\n", + " # encode numpy array to jpg\n", + " imgbox.value = cv2.imencode(\n", + " \".jpg\",\n", + " frame,\n", + " params=[cv2.IMWRITE_JPEG_QUALITY, 90],\n", + " )[1].tobytes()\n", + "\n", + " engine3D.renderer.render(engine3D.scene, engine3D.cam)\n", + "\n", + " except KeyboardInterrupt:\n", + " print(\"Interrupted\")\n", + " except RuntimeError as e:\n", + " print(e)\n", + " finally:\n", + " clear_output()\n", + " if player is not None:\n", + " # stop capturing\n", + " player.stop()\n", + " if use_popup:\n", + " cv2.destroyAllWindows()\n", + " if skeleton_set:\n", + " engine3D.scene_remove(skeleton_set)" + ] + }, + { + "cell_type": "markdown", + "id": "344840a6-9660-4a11-8b05-729ac2969e28", + "metadata": {}, + "source": [ + "## Run\n", + "\n", + "### Run Live Pose Estimation\n", + "\n", + "Run using a webcam as the video input. By default, the primary webcam is set with `source=0`. If you have multiple webcams, each one will be assigned a consecutive number starting at 0. Set `flip=True` when using a front-facing camera. Some web browsers, especially Mozilla Firefox, may cause flickering. If you experience flickering, set `use_popup=True`.\n", + "\n", + "*Note:*\n", + "\n", + "*1. To use this notebook with a webcam, you need to run the notebook on a computer with a webcam. If you run the notebook on a server (e.g. Binder), the webcam will not work.*\n", + " \n", + "*2. Popup mode may not work if you run this notebook on a remote computer (e.g. Binder).*" + ] + }, + { + "cell_type": "markdown", + "id": "d2d1a143-afcb-4f22-a4cc-657a080b70bf", + "metadata": {}, + "source": [ + "Using the following method, you can click and move your mouse over the picture on the left to interact." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "3f82e298-5912-48c7-90b5-339aea3c177d", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "run_pose_estimation(source=0, flip=True, use_popup=False)" + ] + }, + { + "cell_type": "markdown", + "id": "9ee7b3c4-a143-4d54-bf5d-0aa758f99928", + "metadata": {}, + "source": [ + "### Run Pose Estimation on a Video File\n", + "\n", + "If you don't have a webcam, you can still run this demo with a video file. Any [format supported by OpenCV](https://docs.opencv.org/4.5.1/dd/d43/tutorial_py_video_display.html) will work. \n", + "\n", + "\n", + "You can click and move your mouse over the picture on the left to interact." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "da463932-dee0-4f0f-8379-133d16fce3a6", + "metadata": { + "test_replace": { + "skip_frames=10": "skip_frames=675" + } + }, + "outputs": [], + "source": [ + "# video url\n", + "video_path = \"https://github.com/intel-iot-devkit/sample-videos/raw/master/face-demographics-walking.mp4\"\n", + "run_pose_estimation(source=video_path, flip=False, use_popup=False, skip_frames=10)" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.8.15" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/notebooks/406-3D-pose-estimation-webcam/README.md b/notebooks/406-3D-pose-estimation-webcam/README.md new file mode 100644 index 00000000000..8486c07caec --- /dev/null +++ b/notebooks/406-3D-pose-estimation-webcam/README.md @@ -0,0 +1,37 @@ +# 3D Human Pose Estimation with OpenVINO + +[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/openvinotoolkit/openvino_notebooks.git/main?labpath=notebooks%2F406-3D-pose-estimation-webcam%2F406-3D-pose-estimation.ipynb) + +*Binder is a free service where the webcam will not work, and performance on the video will not be good. For best performance, we recommend installing the notebooks locally.* + +![pose estimation_webgl](https://user-images.githubusercontent.com/42672437/183292131-576cc05a-a724-472c-8dc9-f6bc092190bf.gif) + +This notebook contains a 3D multi-person pose estimation demo.The model used in this demo is based on [Lightweight OpenPose](https://arxiv.org/abs/1811.12004) and [Single-Shot Multi-Person 3D Pose Estimation From Monocular RGB](https://arxiv.org/abs/1712.03453). It detects 2D coordinates of up to 18 types of keypoints: ears, eyes, nose, neck, shoulders, elbows, wrists, hips, knees, and ankles, as well as their 3D coordinates, which could then be used to construct the 3D display of human poses. OpenVINO™ is used to accelerate the inference on multiple devices, such as CPU, GPU and VPU. Also, this 3D display method could be extended to display the inference results of other 3D models without much effort. + +## Notebook Contents + +This notebook uses the "human-pose-estimation-3d-0001" model from OpenVINO Open Model Zoo, to estimate 3D human pose and represent on a 2D screen. Details of the model can be found [here](https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/human-pose-estimation-3d-0001). The input source could be video files or a webcam. It uses the [Three.js](https://pythreejs.readthedocs.io/en/stable/installing.html Python API to display 3D results in a web browser. Note that to display the 3D inference results properly, for Windows and Ubuntu, Chrome is recommended as the web browser. While on macOS, Safari is recommended. + +## Installation Instructions + +If you have not done so already, please follow the [Installation Guide](../../README.md) to install all required dependencies. + +Make sure your [Jupyter extension](https://github.com/jupyter-widgets/pythreejs#jupyterlab) is working properly. +To avoid errors that may arise from the version of the dependency package, we recommend using the **Jupyterlab** instead of the Jupyter notebook to display image results. +``` +- pip install --upgrade pip && pip install -r requirements.txt +- jupyter labextension install --no-build @jupyter-widgets/jupyterlab-manager +- jupyter labextension install --no-build jupyter-datawidgets/extension +- jupyter labextension install jupyter-threejs +- jupyter labextension list +``` + +You should see: +``` +JupyterLab v... + ... + jupyterlab-datawidgets v... enabled OK + @jupyter-widgets/jupyterlab-manager v... enabled OK + jupyter-threejs v... enabled OK +``` + diff --git a/notebooks/406-3D-pose-estimation-webcam/engine/__init__.py b/notebooks/406-3D-pose-estimation-webcam/engine/__init__.py new file mode 100644 index 00000000000..e69de29bb2d diff --git a/notebooks/406-3D-pose-estimation-webcam/engine/engine3js.py b/notebooks/406-3D-pose-estimation-webcam/engine/engine3js.py new file mode 100644 index 00000000000..8a2dd310f4f --- /dev/null +++ b/notebooks/406-3D-pose-estimation-webcam/engine/engine3js.py @@ -0,0 +1,275 @@ +from IPython.display import display +from pythreejs import * + + +class Engine3js: + """ + Summary of class here. + + The implementation of these interfaces depends on Pythreejs, so make + sure you install and authorize the Jupyter Widgets plug-in correctly. + + Attributes: + view_width, view_height: Size of the view window. + position: Position of the camera. + lookAtPos: The point at which the camera looks at. + axis_size: The size of axis. + grid_length: grid size, length == width. + grid_num: grid number. + grid: If use grid. + axis: If use axis. + """ + + def __init__( + self, + view_width=455, + view_height=256, + position=[300, 100, 0], + lookAtPos=[0, 0, 0], + axis_size=300, + grid_length=600, + grid_num=20, + grid=False, + axis=False, + ): + + self.view_width = view_width + self.view_height = view_height + self.position = position + + # set the camera + self.cam = PerspectiveCamera( + position=self.position, aspect=self.view_width / self.view_height + ) + self.cam.lookAt(lookAtPos) + + # x,y,z axis + self.axis = AxesHelper(axis_size) # axes length + + # set grid size + self.gridHelper = GridHelper(grid_length, grid_num) + + # set scene + self.scene = Scene( + children=[ + self.cam, + DirectionalLight(position=[3, 5, 1], intensity=0.6), + AmbientLight(intensity=0.5), + ] + ) + + # add axis or grid + if grid: + self.scene.add(self.gridHelper) + + if axis: + self.scene.add(self.axis) + + # render the objects in scene + self.renderer = Renderer( + camera=self.cam, + scene=self.scene, + controls=[OrbitControls(controlling=self.cam)], + width=self.view_width, + height=self.view_height, + ) + # display(renderer4) + + def get_width(self): + return self.view_width + + def plot(self): + self.renderer.render(self.scene, self.cam) + + def scene_add(self, object): + self.scene.add(object) + + def scene_remove(self, object): + self.scene.remove(object) + + +class Geometry: + """ + This is the geometry base class that defines buffer and material. + """ + + def __init__(self, name="geometry"): + self.geometry = None + self.material = None + self.name = name + + def get_Name(): + return self.name + + +class Skeleton(Geometry): + """ + This is the class for drawing human body poses. + """ + + def __init__(self, name="skeleton", lineWidth=3, body_edges=[]): + super(Skeleton, self).__init__(name) + self.material = LineBasicMaterial( + vertexColors="VertexColors", linewidth=lineWidth + ) + self.colorSet = BufferAttribute( + np.array( + [ + [1, 0, 0], + [1, 0, 0], + [0, 1, 0], + [0, 0, 1], + [1, 0, 0], + [1, 0, 0], + [0, 1, 0], + [0, 0, 1], + [1, 0, 0], + [1, 0, 0], + [0, 1, 0], + [0, 0, 1], + [1, 0, 0], + [1, 0, 0], + [0, 1, 0], + [0, 0, 1], + [1, 0, 0], + [1, 0, 0], + [0, 1, 0], + ], + dtype=np.float32, + ), + normalized=False, + ) + # self.geometry.attributes["color"] = self.colorSet + self.body_edges = body_edges + + def __call__(self, poses_3d): + poses = [] + for pose_position_tmp in poses_3d: + bones = [] + for edge in self.body_edges: + # put pair of points as limbs + bones.append(pose_position_tmp[edge[0]]) + bones.append(pose_position_tmp[edge[1]]) + + bones = np.asarray(bones, dtype=np.float32) + + # You can find the api in https://github.com/jupyter-widgets/pythreejs + + self.geometry = BufferGeometry( + attributes={ + "position": BufferAttribute(bones, normalized=False), + # It defines limbs' color + "color": self.colorSet, + } + ) + + pose = LineSegments(self.geometry, self.material) + poses.append(pose) + # self.geometry.close() + return poses + + def plot(self, pose_points=None): + return self.__call__(pose_points) + + +class Cloudpoint(Geometry): + """ + This is the class for drawing cloud points. + """ + + def __init__( + self, name="cloudpoint", points=[], point_size=5, line=None, points_color="blue" + ): + super(Cloudpoint, self).__init__(name) + self.material = PointsMaterial(size=point_size, color=points_color) + self.points = points + self.line = line + + def __call__(self, points_3d): + self.geometry = BufferGeometry( + attributes={ + "position": BufferAttribute(points_3d, normalized=False), + # It defines points' vertices' color + "color": BufferAttribute( + np.array( + [ + [1, 0, 0], + [1, 0, 0], + [0, 1, 0], + [0, 0, 1], + [1, 0, 0], + [1, 0, 0], + [0, 1, 0], + [0, 0, 1], + [1, 0, 0], + [1, 0, 0], + [0, 1, 0], + [0, 0, 1], + [1, 0, 0], + [1, 0, 0], + [0, 1, 0], + [0, 0, 1], + [1, 0, 0], + [1, 0, 0], + [0, 1, 0], + ], + dtype=np.float32, + ), + normalized=False, + ), + }, + ) + cloud_points = Points(self.geometry, self.material) + + if self.line is not None: + g1 = BufferGeometry( + attributes={ + "position": BufferAttribute(line, normalized=False), + # It defines limbs' color + "color": BufferAttribute( + # Here you can set vertex colors, if you set the 'color' option = vertexes + np.array( + [ + [1, 0, 0], + [1, 0, 0], + [0, 1, 0], + [0, 0, 1], + [1, 0, 0], + [1, 0, 0], + [0, 1, 0], + [0, 0, 1], + [1, 0, 0], + [1, 0, 0], + [0, 1, 0], + [0, 0, 1], + [1, 0, 0], + [1, 0, 0], + [0, 1, 0], + [0, 0, 1], + [1, 0, 0], + [1, 0, 0], + [0, 1, 0], + ], + dtype=np.float32, + ), + normalized=False, + ), + }, + ) + m1 = LineBasicMaterial(color="red", linewidth=3) + facemesh = LineSegments(g1, m1) + return [cloud_points, facemesh] + + return cloud_points + + +def Box_bounding(Geometry): + def __init__(self, name="Box", lineWidth=3): + super(Box_bounding, self).__init__(name) + self.material = LineBasicMaterial( + vertexColors="VertexColors", linewidth=lineWidth + ) + self.edge = [] + + def __call__(self, points=None): + pass \ No newline at end of file diff --git a/notebooks/406-3D-pose-estimation-webcam/engine/legacy_pose_extractor.py b/notebooks/406-3D-pose-estimation-webcam/engine/legacy_pose_extractor.py new file mode 100644 index 00000000000..a7c5aeb86ce --- /dev/null +++ b/notebooks/406-3D-pose-estimation-webcam/engine/legacy_pose_extractor.py @@ -0,0 +1,234 @@ +from operator import itemgetter + +import cv2 +import math +import numpy as np + +BODY_PARTS_KPT_IDS = [[1, 2], [1, 5], [2, 3], [3, 4], [5, 6], [6, 7], [1, 8], [8, 9], [9, 10], [1, 11], + [11, 12], [12, 13], [1, 0], [0, 14], [14, 16], [0, 15], [15, 17], [2, 16], [5, 17]] +BODY_PARTS_PAF_IDS = ([12, 13], [20, 21], [14, 15], [16, 17], [22, 23], [24, 25], [0, 1], [2, 3], [4, 5], + [6, 7], [8, 9], [10, 11], [28, 29], [30, 31], [34, 35], [32, 33], [36, 37], [18, 19], [26, 27]) + + +def linspace2d(start, stop, n=10): + points = 1 / (n - 1) * (stop - start) + return points[:, None] * np.arange(n) + start[:, None] + + +def extract_keypoints(heatmap, all_keypoints, total_keypoint_num): + heatmap[heatmap < 0.1] = 0 + heatmap_with_borders = np.pad(heatmap, [(2, 2), (2, 2)], mode='constant') + heatmap_center = heatmap_with_borders[1:heatmap_with_borders.shape[0]-1, 1:heatmap_with_borders.shape[1]-1] + heatmap_left = heatmap_with_borders[1:heatmap_with_borders.shape[0]-1, 2:heatmap_with_borders.shape[1]] + heatmap_right = heatmap_with_borders[1:heatmap_with_borders.shape[0]-1, 0:heatmap_with_borders.shape[1]-2] + heatmap_up = heatmap_with_borders[2:heatmap_with_borders.shape[0], 1:heatmap_with_borders.shape[1]-1] + heatmap_down = heatmap_with_borders[0:heatmap_with_borders.shape[0]-2, 1:heatmap_with_borders.shape[1]-1] + + heatmap_peaks = (heatmap_center > heatmap_left) &\ + (heatmap_center > heatmap_right) &\ + (heatmap_center > heatmap_up) &\ + (heatmap_center > heatmap_down) + heatmap_peaks = heatmap_peaks[1:heatmap_center.shape[0]-1, 1:heatmap_center.shape[1]-1] + keypoints = list(zip(np.nonzero(heatmap_peaks)[1], np.nonzero(heatmap_peaks)[0])) # (w, h) + keypoints = sorted(keypoints, key=itemgetter(0)) + + suppressed = np.zeros(len(keypoints), np.uint8) + keypoints_with_score_and_id = [] + keypoint_num = 0 + for i in range(len(keypoints)): + if suppressed[i]: + continue + for j in range(i+1, len(keypoints)): + if math.sqrt((keypoints[i][0] - keypoints[j][0]) ** 2 + + (keypoints[i][1] - keypoints[j][1]) ** 2) < 6: + suppressed[j] = 1 + keypoint_with_score_and_id = (keypoints[i][0], keypoints[i][1], heatmap[keypoints[i][1], keypoints[i][0]], + total_keypoint_num + keypoint_num) + keypoints_with_score_and_id.append(keypoint_with_score_and_id) + keypoint_num += 1 + all_keypoints.append(keypoints_with_score_and_id) + return keypoint_num + + +def group_keypoints(all_keypoints_by_type, pafs, pose_entry_size=20, min_paf_score=0.05): + pose_entries = [] + all_keypoints = np.array([item for sublist in all_keypoints_by_type for item in sublist]) + for part_id in range(len(BODY_PARTS_PAF_IDS)): + part_pafs = pafs[BODY_PARTS_PAF_IDS[part_id]] + kpts_a = all_keypoints_by_type[BODY_PARTS_KPT_IDS[part_id][0]] + kpts_b = all_keypoints_by_type[BODY_PARTS_KPT_IDS[part_id][1]] + num_kpts_a = len(kpts_a) + num_kpts_b = len(kpts_b) + kpt_a_id = BODY_PARTS_KPT_IDS[part_id][0] + kpt_b_id = BODY_PARTS_KPT_IDS[part_id][1] + + if num_kpts_a == 0 and num_kpts_b == 0: # no keypoints for such body part + continue + elif num_kpts_a == 0: # body part has just 'b' keypoints + for i in range(num_kpts_b): + num = 0 + for j in range(len(pose_entries)): # check if already in some pose, was added by another body part + if pose_entries[j][kpt_b_id] == kpts_b[i][3]: + num += 1 + continue + if num == 0: + pose_entry = np.ones(pose_entry_size) * -1 + pose_entry[kpt_b_id] = kpts_b[i][3] # keypoint idx + pose_entry[-1] = 1 # num keypoints in pose + pose_entry[-2] = kpts_b[i][2] # pose score + pose_entries.append(pose_entry) + continue + elif num_kpts_b == 0: # body part has just 'a' keypoints + for i in range(num_kpts_a): + num = 0 + for j in range(len(pose_entries)): + if pose_entries[j][kpt_a_id] == kpts_a[i][3]: + num += 1 + continue + if num == 0: + pose_entry = np.ones(pose_entry_size) * -1 + pose_entry[kpt_a_id] = kpts_a[i][3] + pose_entry[-1] = 1 + pose_entry[-2] = kpts_a[i][2] + pose_entries.append(pose_entry) + continue + + connections = [] + for i in range(num_kpts_a): + kpt_a = np.array(kpts_a[i][0:2]) + for j in range(num_kpts_b): + kpt_b = np.array(kpts_b[j][0:2]) + mid_point = [(), ()] + mid_point[0] = (int(round((kpt_a[0] + kpt_b[0]) * 0.5)), + int(round((kpt_a[1] + kpt_b[1]) * 0.5))) + mid_point[1] = mid_point[0] + + vec = [kpt_b[0] - kpt_a[0], kpt_b[1] - kpt_a[1]] + vec_norm = math.sqrt(vec[0] ** 2 + vec[1] ** 2) + if vec_norm == 0: + continue + vec[0] /= vec_norm + vec[1] /= vec_norm + cur_point_score = (vec[0] * part_pafs[0, mid_point[0][1], mid_point[0][0]] + + vec[1] * part_pafs[1, mid_point[1][1], mid_point[1][0]]) + + height_n = pafs.shape[1] // 2 + success_ratio = 0 + point_num = 10 # number of points to integration over paf + ratio = 0 + if cur_point_score > -100: + passed_point_score = 0 + passed_point_num = 0 + x, y = linspace2d(kpt_a, kpt_b) + for point_idx in range(point_num): + px = int(x[point_idx]) + py = int(y[point_idx]) + paf = part_pafs[:, py, px] + cur_point_score = vec[0] * paf[0] + vec[1] * paf[1] + if cur_point_score > min_paf_score: + passed_point_score += cur_point_score + passed_point_num += 1 + success_ratio = passed_point_num / point_num + if passed_point_num > 0: + ratio = passed_point_score / passed_point_num + ratio += min(height_n / vec_norm - 1, 0) + if ratio > 0 and success_ratio > 0.8: + score_all = ratio + kpts_a[i][2] + kpts_b[j][2] + connections.append([i, j, ratio, score_all]) + if len(connections) > 0: + connections = sorted(connections, key=itemgetter(2), reverse=True) + + num_connections = min(num_kpts_a, num_kpts_b) + has_kpt_a = np.zeros(num_kpts_a, dtype=np.int32) + has_kpt_b = np.zeros(num_kpts_b, dtype=np.int32) + filtered_connections = [] + for row in range(len(connections)): + if len(filtered_connections) == num_connections: + break + i, j, cur_point_score = connections[row][0:3] + if not has_kpt_a[i] and not has_kpt_b[j]: + filtered_connections.append([kpts_a[i][3], kpts_b[j][3], cur_point_score]) + has_kpt_a[i] = 1 + has_kpt_b[j] = 1 + connections = filtered_connections + if len(connections) == 0: + continue + + if part_id == 0: + pose_entries = [np.ones(pose_entry_size) * -1 for _ in range(len(connections))] + for i in range(len(connections)): + pose_entries[i][BODY_PARTS_KPT_IDS[0][0]] = connections[i][0] + pose_entries[i][BODY_PARTS_KPT_IDS[0][1]] = connections[i][1] + pose_entries[i][-1] = 2 + pose_entries[i][-2] = np.sum(all_keypoints[connections[i][0:2], 2]) + connections[i][2] + elif part_id == 17 or part_id == 18: + kpt_a_id = BODY_PARTS_KPT_IDS[part_id][0] + kpt_b_id = BODY_PARTS_KPT_IDS[part_id][1] + for i in range(len(connections)): + for j in range(len(pose_entries)): + if pose_entries[j][kpt_a_id] == connections[i][0] and pose_entries[j][kpt_b_id] == -1: + pose_entries[j][kpt_b_id] = connections[i][1] + elif pose_entries[j][kpt_b_id] == connections[i][1] and pose_entries[j][kpt_a_id] == -1: + pose_entries[j][kpt_a_id] = connections[i][0] + continue + else: + kpt_a_id = BODY_PARTS_KPT_IDS[part_id][0] + kpt_b_id = BODY_PARTS_KPT_IDS[part_id][1] + for i in range(len(connections)): + num = 0 + for j in range(len(pose_entries)): + if pose_entries[j][kpt_a_id] == connections[i][0]: + pose_entries[j][kpt_b_id] = connections[i][1] + num += 1 + pose_entries[j][-1] += 1 + pose_entries[j][-2] += all_keypoints[connections[i][1], 2] + connections[i][2] + if num == 0: + pose_entry = np.ones(pose_entry_size) * -1 + pose_entry[kpt_a_id] = connections[i][0] + pose_entry[kpt_b_id] = connections[i][1] + pose_entry[-1] = 2 + pose_entry[-2] = np.sum(all_keypoints[connections[i][0:2], 2]) + connections[i][2] + pose_entries.append(pose_entry) + + filtered_entries = [] + for i in range(len(pose_entries)): + if pose_entries[i][-1] < 3 or (pose_entries[i][-2] / pose_entries[i][-1] < 0.2): + continue + filtered_entries.append(pose_entries[i]) + pose_entries = np.asarray(filtered_entries) + return pose_entries, all_keypoints + + +def extract_poses(heatmaps, pafs, upsample_ratio): + heatmaps = np.transpose(heatmaps, (1, 2, 0)) + pafs = np.transpose(pafs, (1, 2, 0)) + heatmaps = cv2.resize(heatmaps, dsize=None, fx=upsample_ratio, fy=upsample_ratio) + pafs = cv2.resize(pafs, dsize=None, fx=upsample_ratio, fy=upsample_ratio) + heatmaps = np.transpose(heatmaps, (2, 0, 1)) + pafs = np.transpose(pafs, (2, 0, 1)) + + num_keypoints = heatmaps.shape[0] + total_keypoints_num = 0 + all_keypoints_by_type = [] + for kpt_idx in range(num_keypoints): + total_keypoints_num += extract_keypoints(heatmaps[kpt_idx], all_keypoints_by_type, total_keypoints_num) + + pose_entries, all_keypoints = group_keypoints(all_keypoints_by_type, pafs) + + found_poses = [] + for pose_entry in pose_entries: + if len(pose_entry) == 0: + continue + pose_keypoints = np.ones((num_keypoints * 3 + 1), dtype=np.float32) * -1 + for kpt_id in range(num_keypoints): + if pose_entry[kpt_id] != -1.0: + pose_keypoints[kpt_id * 3 + 0] = all_keypoints[int(pose_entry[kpt_id]), 0] + pose_keypoints[kpt_id * 3 + 1] = all_keypoints[int(pose_entry[kpt_id]), 1] + pose_keypoints[kpt_id * 3 + 2] = all_keypoints[int(pose_entry[kpt_id]), 2] + pose_keypoints[-1] = pose_entry[18] + found_poses.append(pose_keypoints) + + if not found_poses: + return np.array(found_poses, dtype=np.float32).reshape((0,0)), None + + return np.array(found_poses, dtype=np.float32), None \ No newline at end of file diff --git a/notebooks/406-3D-pose-estimation-webcam/engine/one_euro_filter.py b/notebooks/406-3D-pose-estimation-webcam/engine/one_euro_filter.py new file mode 100644 index 00000000000..06bf007c35d --- /dev/null +++ b/notebooks/406-3D-pose-estimation-webcam/engine/one_euro_filter.py @@ -0,0 +1,64 @@ +""" + Copyright (c) 2022 Intel Corporation + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + http://www.apache.org/licenses/LICENSE-2.0 + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. +""" + +import math + + +def get_alpha(rate=30, cutoff=1): + tau = 1 / (2 * math.pi * cutoff) + te = 1 / rate + return 1 / (1 + tau / te) + + +class LowPassFilter: + def __init__(self): + self.x_previous = None + + def __call__(self, x, alpha=0.5): + if self.x_previous is None: + self.x_previous = x + return x + x_filtered = alpha * x + (1 - alpha) * self.x_previous + self.x_previous = x_filtered + return x_filtered + + +class OneEuroFilter: + def __init__(self, freq=15, mincutoff=1, beta=1, dcutoff=1): + self.freq = freq + self.mincutoff = mincutoff + self.beta = beta + self.dcutoff = dcutoff + self.filter_x = LowPassFilter() + self.filter_dx = LowPassFilter() + self.x_previous = None + self.dx = None + + def __call__(self, x): + if self.dx is None: + self.dx = 0 + else: + self.dx = (x - self.x_previous) * self.freq + dx_smoothed = self.filter_dx(self.dx, get_alpha(self.freq, self.dcutoff)) + cutoff = self.mincutoff + self.beta * abs(dx_smoothed) + x_filtered = self.filter_x(x, get_alpha(self.freq, cutoff)) + self.x_previous = x + return x_filtered + + +if __name__ == '__main__': + filter = OneEuroFilter(freq=15, beta=0.1) + for val in range(10): + x = val + (-1)**(val % 2) + x_filtered = filter(x) + print(x_filtered, x) diff --git a/notebooks/406-3D-pose-estimation-webcam/engine/parse_poses.py b/notebooks/406-3D-pose-estimation-webcam/engine/parse_poses.py new file mode 100644 index 00000000000..adb7cc805bc --- /dev/null +++ b/notebooks/406-3D-pose-estimation-webcam/engine/parse_poses.py @@ -0,0 +1,144 @@ +import numpy as np + +from engine.pose import Pose, propagate_ids +try: + from engine.legacy_pose_extractor import extract_poses +except: + print('legacy_pose_extractor has sth wrong') + +AVG_PERSON_HEIGHT = 180 + +# pelvis (body center) is missing, id == 2 +map_id_to_panoptic = [1, 0, 9, 10, 11, 3, 4, 5, 12, 13, 14, 6, 7, 8, 15, 16, 17, 18] + +limbs = [[18, 17, 1], + [16, 15, 1], + [5, 4, 3], + [8, 7, 6], + [11, 10, 9], + [14, 13, 12]] + + +def get_root_relative_poses(inference_results): + features, heatmap, paf_map = inference_results + + upsample_ratio = 4 + found_poses = extract_poses(heatmap[0:-1], paf_map, upsample_ratio)[0] + # scale coordinates to features space + found_poses[:, 0:-1:3] /= upsample_ratio + found_poses[:, 1:-1:3] /= upsample_ratio + + poses_2d = [] + num_kpt_panoptic = 19 + num_kpt = 18 + for pose_id in range(found_poses.shape[0]): + if found_poses[pose_id, 5] == -1: # skip pose if is not found neck + continue + pose_2d = np.ones(num_kpt_panoptic * 3 + 1, dtype=np.float32) * -1 # +1 for pose confidence + for kpt_id in range(num_kpt): + if found_poses[pose_id, kpt_id * 3 + 2] != -1: + x_2d, y_2d = found_poses[pose_id, kpt_id * 3:kpt_id * 3 + 2] + conf = found_poses[pose_id, kpt_id * 3 + 2] + pose_2d[map_id_to_panoptic[kpt_id] * 3] = x_2d # just repacking + pose_2d[map_id_to_panoptic[kpt_id] * 3 + 1] = y_2d + pose_2d[map_id_to_panoptic[kpt_id] * 3 + 2] = conf + pose_2d[-1] = found_poses[pose_id, -1] + poses_2d.append(pose_2d) + + keypoint_treshold = 0.1 + poses_3d = np.ones((len(poses_2d), num_kpt_panoptic * 4), dtype=np.float32) * -1 + for pose_id in range(len(poses_3d)): + if poses_2d[pose_id][2] > keypoint_treshold: + neck_2d = poses_2d[pose_id][:2].astype(int) + # read all pose coordinates at neck location + for kpt_id in range(num_kpt_panoptic): + map_3d = features[kpt_id * 3:(kpt_id + 1) * 3] + poses_3d[pose_id][kpt_id * 4] = map_3d[0, neck_2d[1], neck_2d[0]] * AVG_PERSON_HEIGHT + poses_3d[pose_id][kpt_id * 4 + 1] = map_3d[1, neck_2d[1], neck_2d[0]] * AVG_PERSON_HEIGHT + poses_3d[pose_id][kpt_id * 4 + 2] = map_3d[2, neck_2d[1], neck_2d[0]] * AVG_PERSON_HEIGHT + poses_3d[pose_id][kpt_id * 4 + 3] = poses_2d[pose_id][kpt_id * 3 + 2] + + # refine keypoints coordinates at corresponding limbs locations + for limb in limbs: + for kpt_id_from in limb: + if poses_2d[pose_id][kpt_id_from * 3 + 2] > keypoint_treshold: + for kpt_id_where in limb: + kpt_from_2d = poses_2d[pose_id][kpt_id_from*3: kpt_id_from*3 + 2].astype(int) + map_3d = features[kpt_id_where * 3:(kpt_id_where + 1) * 3] + poses_3d[pose_id][kpt_id_where * 4] = map_3d[0, kpt_from_2d[1], kpt_from_2d[0]] * AVG_PERSON_HEIGHT + poses_3d[pose_id][kpt_id_where * 4 + 1] = map_3d[1, kpt_from_2d[1], kpt_from_2d[0]] * AVG_PERSON_HEIGHT + poses_3d[pose_id][kpt_id_where * 4 + 2] = map_3d[2, kpt_from_2d[1], kpt_from_2d[0]] * AVG_PERSON_HEIGHT + break + + return poses_3d, np.array(poses_2d), features.shape + + +previous_poses_2d = [] + + +def parse_poses(inference_results, input_scale, stride, fx, is_video=False): + global previous_poses_2d + poses_3d, poses_2d, features_shape = get_root_relative_poses(inference_results) + poses_2d_scaled = [] + for pose_2d in poses_2d: + num_kpt = (pose_2d.shape[0] - 1) // 3 + pose_2d_scaled = np.ones(pose_2d.shape[0], dtype=np.float32) * -1 # +1 for pose confidence + for kpt_id in range(num_kpt): + if pose_2d[kpt_id * 3 + 2] != -1: + pose_2d_scaled[kpt_id * 3] = int(pose_2d[kpt_id * 3] * stride / input_scale) + pose_2d_scaled[kpt_id * 3 + 1] = int(pose_2d[kpt_id * 3 + 1] * stride / input_scale) + pose_2d_scaled[kpt_id * 3 + 2] = pose_2d[kpt_id * 3 + 2] + pose_2d_scaled[-1] = pose_2d[-1] + poses_2d_scaled.append(pose_2d_scaled) + + if is_video: # track poses ids + current_poses_2d = [] + for pose_id in range(len(poses_2d_scaled)): + pose_keypoints = np.ones((Pose.num_kpts, 2), dtype=np.int32) * -1 + for kpt_id in range(Pose.num_kpts): + if poses_2d_scaled[pose_id][kpt_id * 3 + 2] != -1.0: # keypoint is found + pose_keypoints[kpt_id, 0] = int(poses_2d_scaled[pose_id][kpt_id * 3 + 0]) + pose_keypoints[kpt_id, 1] = int(poses_2d_scaled[pose_id][kpt_id * 3 + 1]) + pose = Pose(pose_keypoints, poses_2d_scaled[pose_id][-1]) + current_poses_2d.append(pose) + propagate_ids(previous_poses_2d, current_poses_2d) + previous_poses_2d = current_poses_2d + + translated_poses_3d = [] + # translate poses + for pose_id in range(len(poses_3d)): + pose_3d = poses_3d[pose_id].reshape((-1, 4)).transpose() + pose_2d = poses_2d[pose_id][:-1].reshape((-1, 3)).transpose() + num_valid = np.count_nonzero(pose_2d[2] != -1) + pose_3d_valid = np.zeros((3, num_valid), dtype=np.float32) + pose_2d_valid = np.zeros((2, num_valid), dtype=np.float32) + valid_id = 0 + for kpt_id in range(pose_3d.shape[1]): + if pose_2d[2, kpt_id] == -1: + continue + pose_3d_valid[:, valid_id] = pose_3d[0:3, kpt_id] + pose_2d_valid[:, valid_id] = pose_2d[0:2, kpt_id] + valid_id += 1 + + pose_2d_valid[0] = pose_2d_valid[0] - features_shape[2]/2 + pose_2d_valid[1] = pose_2d_valid[1] - features_shape[1]/2 + mean_3d = np.expand_dims(pose_3d_valid.mean(axis=1), axis=1) + mean_2d = np.expand_dims(pose_2d_valid.mean(axis=1), axis=1) + numerator = np.trace(np.dot((pose_3d_valid[:2, :] - mean_3d[:2, :]).transpose(), + pose_3d_valid[:2, :] - mean_3d[:2, :])).sum() + numerator = np.sqrt(numerator) + denominator = np.sqrt(np.trace(np.dot((pose_2d_valid[:2, :] - mean_2d[:2, :]).transpose(), + pose_2d_valid[:2, :] - mean_2d[:2, :])).sum()) + mean_2d = np.array([mean_2d[0, 0], mean_2d[1, 0], fx * input_scale / stride]) + mean_3d = np.array([mean_3d[0, 0], mean_3d[1, 0], 0]) + translation = numerator / denominator * mean_2d - mean_3d + + if is_video: + translation = current_poses_2d[pose_id].filter(translation) + for kpt_id in range(19): + pose_3d[0, kpt_id] = pose_3d[0, kpt_id] + translation[0] + pose_3d[1, kpt_id] = pose_3d[1, kpt_id] + translation[1] + pose_3d[2, kpt_id] = pose_3d[2, kpt_id] + translation[2] + translated_poses_3d.append(pose_3d.transpose().reshape(-1)) + + return np.array(translated_poses_3d), np.array(poses_2d_scaled) diff --git a/notebooks/406-3D-pose-estimation-webcam/engine/pose.py b/notebooks/406-3D-pose-estimation-webcam/engine/pose.py new file mode 100644 index 00000000000..554097d42ce --- /dev/null +++ b/notebooks/406-3D-pose-estimation-webcam/engine/pose.py @@ -0,0 +1,106 @@ +""" + Copyright (c) 2022 Intel Corporation + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + http://www.apache.org/licenses/LICENSE-2.0 + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. +""" + +import cv2 +import numpy as np + +from engine.one_euro_filter import OneEuroFilter + + +class Pose: + num_kpts = 18 + kpt_names = ['neck', 'nose', + 'l_sho', 'l_elb', 'l_wri', 'l_hip', 'l_knee', 'l_ank', + 'r_sho', 'r_elb', 'r_wri', 'r_hip', 'r_knee', 'r_ank', + 'r_eye', 'l_eye', + 'r_ear', 'l_ear'] + sigmas = np.array([.79, .26, .79, .72, .62, 1.07, .87, .89, .79, .72, .62, 1.07, .87, .89, .25, .25, .35, .35], + dtype=np.float32) / 10.0 + vars = (sigmas * 2) ** 2 + last_id = -1 + color = [0, 224, 255] + + def __init__(self, keypoints, confidence): + super().__init__() + self.keypoints = keypoints + self.confidence = confidence + found_keypoints = np.zeros((np.count_nonzero(keypoints[:, 0] != -1), 2), dtype=np.int32) + found_kpt_id = 0 + for kpt_id in range(keypoints.shape[0]): + if keypoints[kpt_id, 0] == -1: + continue + found_keypoints[found_kpt_id] = keypoints[kpt_id] + found_kpt_id += 1 + self.bbox = cv2.boundingRect(found_keypoints) + self.id = None + self.translation_filter = [OneEuroFilter(freq=80, beta=0.01), + OneEuroFilter(freq=80, beta=0.01), + OneEuroFilter(freq=80, beta=0.01)] + + def update_id(self, id=None): + self.id = id + if self.id is None: + self.id = Pose.last_id + 1 + Pose.last_id += 1 + + def filter(self, translation): + filtered_translation = [] + for coordinate_id in range(3): + filtered_translation.append(self.translation_filter[coordinate_id](translation[coordinate_id])) + return filtered_translation + + +def get_similarity(a, b, threshold=0.5): + num_similar_kpt = 0 + for kpt_id in range(Pose.num_kpts): + if a.keypoints[kpt_id, 0] != -1 and b.keypoints[kpt_id, 0] != -1: + distance = np.sum((a.keypoints[kpt_id] - b.keypoints[kpt_id]) ** 2) + area = max(a.bbox[2] * a.bbox[3], b.bbox[2] * b.bbox[3]) + similarity = np.exp(-distance / (2 * (area + np.spacing(1)) * Pose.vars[kpt_id])) + if similarity > threshold: + num_similar_kpt += 1 + return num_similar_kpt + + +def propagate_ids(previous_poses, current_poses, threshold=3): + """Propagate poses ids from previous frame results. Id is propagated, + if there are at least `threshold` similar keypoints between pose from previous frame and current. + + :param previous_poses: poses from previous frame with ids + :param current_poses: poses from current frame to assign ids + :param threshold: minimal number of similar keypoints between poses + :return: None + """ + current_poses_sorted_ids = list(range(len(current_poses))) + current_poses_sorted_ids = sorted( + current_poses_sorted_ids, key=lambda pose_id: current_poses[pose_id].confidence, reverse=True) # match confident poses first + mask = np.ones(len(previous_poses), dtype=np.int32) + for current_pose_id in current_poses_sorted_ids: + best_matched_id = None + best_matched_pose_id = None + best_matched_iou = 0 + for previous_pose_id in range(len(previous_poses)): + if not mask[previous_pose_id]: + continue + iou = get_similarity(current_poses[current_pose_id], previous_poses[previous_pose_id]) + if iou > best_matched_iou: + best_matched_iou = iou + best_matched_pose_id = previous_poses[previous_pose_id].id + best_matched_id = previous_pose_id + if best_matched_iou >= threshold: + mask[best_matched_id] = 0 + else: # pose not similar to any previous + best_matched_pose_id = None + current_poses[current_pose_id].update_id(best_matched_pose_id) + if best_matched_pose_id is not None: + current_poses[current_pose_id].translation_filter = previous_poses[best_matched_id].translation_filter diff --git a/notebooks/README.md b/notebooks/README.md index 388b37dca6f..7cc4c0c7902 100644 --- a/notebooks/README.md +++ b/notebooks/README.md @@ -188,7 +188,7 @@ More amazing notebooks here! | [403-action-recognition-webcam](403-action-recognition-webcam/)
[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/openvinotoolkit/openvino_notebooks/HEAD?filepath=notebooks%2F403-action-recognition-webcam%2F403-action-recognition-webcam.ipynb) | Human action recognition with a webcam or video file | < | | [404-style-transfer-webcam](404-style-transfer-webcam/)
[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/openvinotoolkit/openvino_notebooks/HEAD?filepath=notebooks%2F404-style-transfer-webcam%2F404-style-transfer.ipynb) | Style Transfer with a webcam or video file | | | [405-paddle-ocr-webcam](405-paddle-ocr-webcam/)
[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/openvinotoolkit/openvino_notebooks/HEAD?labpath=notebooks%2F405-paddle-ocr-webcam%2F405-paddle-ocr-webcam.ipynb) | OCR with a webcam or video file | | - +| [406-3D-pose-estimation-webcam](406-3D-pose-estimation-webcam/)
[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/openvinotoolkit/openvino_notebooks.git/HEAD?labpath=notebooks%2F406-3D-pose-estimation-webcam%2F406-3D-pose-estimation.ipynb) | 3D display of human pose estimation with a webcam or video file | |