-
Notifications
You must be signed in to change notification settings - Fork 3
Infrared encoding in P010LE UV plane
This encoding was replaced with newer solution (HEVC Main10 depth + HEVC Main ir).
You may still try it out with previous releases:
- RNHVE hardware-accelerated-infrared-textured-point-cloud-streaming
- UNHVD hardware-accelerated-infrared-textured-point-cloud-streaming
The Y luminance plane, which is ideal for encoding infrared, is already taken by the depth map.
P010LE chroma UV plane is used for encoding infrared greyscale data instead.
The UV plane is:
- half the size of Y (H/2 height and same stride as Y)
- with P010LE this means 16 bits for each color value of which only 10 MSBs are used
The P010LE UV plane size exactly matches D435 infrared greyscale data size with RS2_FORMAT_Y8 format.
For Realsense the stride of infrared matches its width which makes possible to:
- directly map infrared data to UV plane data
- at the cost of losing every second value (P010LE 16 bits of which only 10 MSBs are used)
The 2 excess LSB bits should not affect the encoding too much (LSB) and may be ignored after decoding.
This is somewhat like 4:2:2 chroma sampling (if you consider IR "to be" the UV chroma).
The left D435 imager is already aligned with depth data (no CPU expansive alignment process needed).
After decoding the data to P010LE format for UV plane:
- the stride will be at least 2*width size (16 bit P010LE data)
- the height is half the size of Y
This means that we have 2 rows of infrared data in every logical row of UV plane data.
If we didn't have every second value in UV plane missing (the 4:2:2 like sampling resulting from 10 bit encoding) the mapping would be:
//depth->colors as UV plane of HEVC Main10 P010LE
int r; //the row in HEVC Main10 P010LE decoded data
int c;//the column in HEVC Main10 P010LE decoded data
int stride; //stride in bytes of HEVC Main10 P010LE decoded data, at least 2*width
//two rows of infrared colors encoded in every row of data
pc->colors[points] = depth->colors[r/2 * depth->stride + (r % 2) * depth->width + c];
Additionally every even value is missing. To take the right infrared value for missing values:
//offset the above index by +1 in case of even column
pc->colors[points] = depth->colors[r/2 * depth->stride + (r % 2) * depth->width + c + 1 - c % 2];
See example implementation if you are interested.
You may average the values of neighbors instead. From my quick subjective tests it is not worth the effort.
Encoding infrared luma information in interleaved UV chroma plane has its price.
Recall that we map directly infrared plane to interleaved UV plane.
By exploiting the P010LE 10 bit (of 16 bits present) we already lose every second horizontal value.
The chroma U and V are interleaved in P010LE UV plane and later encoded separately. As a consequence each encoded image (U and V) has every fourth pixel of original infrared data horizontally. The U and V images will likely be identical or very close. This increases required bitrate. Taking every fourth pixel instead of direct neighbors decreases correlations. This again increases required bitrate.
Each row of interleaved UV plane encodes two rows of infrared data. As a consequence, from the encoder's point of view, we see every second vertical pixel in CTUs (HEVC analog of H.264 macroblock). This decreases correlations and again increases required bitrate.
Finally, 10 bit encoding considers 8 bits of one infrared value and 2 bits of the second that originally were expected to carry increased precision information. Those 2 bits are MSBs of neighbor's infrared value and statistically speaking come from different distribution. In the worst case this could be seen as adding 2 LSB bits of noise to every value. This again increases required bitrate.
Summing up:
- encoding nearly identical U and V images separately increases required bitrate
- encoder sees every fourth horizontal pixel in U and V which increases required bitrate
- encoder sees every second vertical pixel in U and V which increases required bitrate
- encoding 2 MSBs of neighbor's value as our additional 2 LSBs increases required bitrate
So much for the theory. In practice the result is visually good and greatly supplements geometrical point cloud data:
- for human perception it is a matter of having or not having the required bandwidth to spare
- for machine processing one is often only interested in geometric data and texture is a waste of bandwidth
Point cloud only (8 Mb bitrate, no B frames):
Textured point cloud (8 Mb bitrate, no B frames):
See also video showing encoding in action: