title | tags | authors | affiliations | date | bibliography | ||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
DynaMo: Dynamic Body Shape and Motion Capture with Intel RealSense Cameras |
|
|
|
8 May 2019 |
paper.bib |
Human body shape can be captured with a variety of methodologies, including laser lines, structured light, photogrammetry, and millimeter waves [@Dannen:2013]. However, these technologies require expensive modules and have limited ability to capture dynamic changes in body shape.
Motion capture with specific markers is commonly done through camera-based motion tracking [@Windolf:2008] These systems for marker tracking are often cost prohibitive and unable to capture surface morphology.
Recently, Intel released the D415 and D435 RealSense Depth Cameras, which use near-infrared structured light patterns and two infrared imagers to capture depth information at up to 90 frames per second. Purchasing a set of these cameras is more affordable than buying a dedicated motion-capture system for shape or marker tracking.
While Intel provides the librealsense
library to interface with their cameras, it lacks tools to use multiple devices at once to capture shape and marker-tracking information.
DynaMo
builds upon librealsense
to provide additional capability for researchers looking to capture such data.
DynaMo
is designed to primarily assist those in the biomechanics and medical fields in capturing motion or body-shape data.
It is currently being used in the Anderson Bioastronautics Research Group to capture dynamic changes in foot morphology.
Figure 1: Sample frames collected by DynaMo showing dynamic shape capture (green) and marker identifaction (gray spheres)
DynaMo
is a Python library that provides tools to capture dynamic changes in body shape and track locations of markers using Intel RealSense D4XX cameras.
DynaMo
was developed from the examples provided by Intel in the Python librealsense
library. It has been successfully tested streaming six cameras at 90 frames per second, all connected to one computer.
DynaMo
consists of several scripts that allow for calibration of multiple RealSense D4XX cameras to a common global coordinate system, simultaneous streaming of multiple RealSense D4XX cameras, viewing of data from multiple RealSense D4XX cameras in pointcloud format, and identification of reflecting markers from the pointclouds.
The library is optimized to reduce the number of dropped frames while streaming.
DynaMo
allows for the capture of depth, infrared, and color frames at an
- Depth frames:
$s$ , where$s$ is the distance to the object - Infrared frames:
$Y$ , where$Y$ is a single value from 0-255 denoting the monochrome pixel value - Color frames:
$[R,G,B]$ , where$R,G,B$ are red, green, and blue values, stacked to represent the color value of the pixel. This results in a$(u\times v \times 3)$ dimensional frame.
The pinhole camera model [@Sturm:2014] projects 3D points from the world
Where
Since we are collecting 2D frames and we want to know the 3D location of the point to reconstruct the pointcloud, we can simply invert the
$$ \begin{bmatrix}x\y\z\end{bmatrix}=\begin{bmatrix}\dfrac{1}{f_{x}}&0&\dfrac{-pp_{x}}{f_{x}}\0&\dfrac{1}{f_{y}}&\dfrac{-pp_{y}}{f_{y}}\0&0&1\end{bmatrix}\begin{bmatrix}su\sv\s\end{bmatrix} $$
This transformation is known in the computer vision community, and is crucial to the functions present in DynaMo
.
DynaMo
uses this transformation extensively in its calibration, streaming, and marker-tracking features.
Connected cameras are setup using a device_manager
object which handles calls for communicating with the cameras.
Cameras are first calibrated to a common global coordinate system by using a defined chessboard viewable by all cameras.
The chessboard points are detected using the findChessboardCorners
function of the OpenCV library [@opencv_library] for each camera's color image.
Once the chessboard corners are found, they are translated to 3D points from the perspective of each camera and centered.
The Kabsch algorithm [@Kabsch:1976] is used to compute the
Streaming is achieved by reading frames from each camera into a dictionary object saved in the computer's RAM.
DynaMo
checks frame numbers for continuity to ensure that frames are collected synchronously and are not repeated.
Once streaming is complete, DynaMo
aligns the images collected by the sensors in each camera to a common image center and saves the images as pickle
objects to the disk.
The data from all cameras can then be viewed as a single pointcloud for each frame from all cameras by using the previously computed transformation matrix.
A script is included to extract the locations of reflective markers on the pointcloud by simply thresholding for bright pixels on the infrared frame. Contours are then drawn fo each cluster of pixels on each camera's infrared frame; these contours highlight the detected markers by each camera. The center of each cluster is calculated and then translated into a 3D point using the depth frame. All points from all cameras are translated into a global coordinate system using the previously computed transformation matrix, and clusters are scanned for duplicates seen from multiple cameras.
This work was supported by a National Science Foundation Graduate Research under grant DGE 1650115. The authors would like to thank Dr. Rodger Kram and Dr. Wouter Hoogkamer for the use of their laboratory space for development and testing of the package.