Neural-Assisted Disparity Depth Estimation #173

Luxonis-Brandon · 2020-08-07T20:09:01Z

Start with the `why`:

The why of this effort (and initial research) is that any many applications depth cameras (and even sometimes LIDAR) are not sufficient to successfully detect objects in varied conditions. Specifically, for Luxonis’ potential customers, this is directly limiting their business success:

Autonomous wheelchairs. The functionality above it would be HUGE for this application as existing solutions are struggling with the output of D435 depth. It gets tricked too easily and misses objects even w/ aggressive host-side filtering and other detection techniques.
Autonomous lawn mowing. This use-case is also struggling with object detection using D435. The system can't identify soccer-ball sized things reliably even with significant host-side post-processing and then need to be able to identify down to baseball sized things.
Volumetric estimation of low-visual-interest objects. Disparity depth struggles significantly with objects (particularly large objects) of low visual interest as it lacks features to match. Neural networks can leverage latent information from training that overcomes this limitation - allowing volumetric estimation where traditional algorithmic-based disparity-depth solutions cannot adequately perform.

The original idea of DepthAI is to not solve this sort of problem, but it is well suited to solving it.

Background:

As of now, the core use of DepthAI is to run 2D Object Detectors (e.g. MobileNetSSDv2) and fuse them with stereo depth to be able to get real-time 3D position of objects that the neural network identifies. See here for it finding my son's XYZ position for example. This solution is not applicable to the above two customers because the type of object must be known to the neural network. Their needs are to avoid any object, not just known ones, and specifically objects which are hard to pick up, which are lost/missed by traditional stereo depth vision.

New Modality of Use

So one idea we had recently was to leverage the neural compute engines (and SHAVES) of the Myriad X to make better depth - so that such difficult objects which traditional stereo depth misses - could be detected with the depth that’s improved by the neural network.

Implementing this capability, the capability to run neural inference to produce the depth map directly, or to improve the results of the disparity-produced depth map, is hugely enabling for the use-cases mentioned above, and likely many others.

Move to the `how`:

The majority of the work of how to make this happen will be in researching what research has been done, and what techniques are sufficiently light-weight to be run on DepthAI directly. Below is some initial research to that end:

Google Mannequin Challenge:

Blog Explaining it: https://ai.googleblog.com/2019/05/moving-camera-moving-people-deep.html
Dataset: https://google.github.io/mannequinchallenge/www/index.html
Github: https://github.com/google/mannequinchallenge
Notice in a lot of caes this is actually quite good looking depth just from a single camera. Imagine how amazing it could look with 2 or 3 cameras.

Could produce just insanely good depth maps.

KITTI DataSet:

http://www.cvlibs.net/datasets/kitti/eval_scene_flow.php?benchmark=stereo

So check this out. A whole bunch of ground truth data, with calibration pictures, etc. So this could be used to train a neural network for sure on this sort of processing.

And then there's a leaderboard downbelow of those who have.

PapersWithCode:

PapersWithCode is generally awesome. They have a slack even.

https://paperswithcode.com/task/stereo-depth-estimation

Others and Random Notes:

So have a dig through there. This one from there seems pretty neat:
https://github.com/CVLAB-Unibo/Real-time-self-adaptive-deep-stereo

These guys seem like they're getting decent results too:
https://arxiv.org/pdf/1803.09719v3.pdf

So on a lot of these it's a matter of figuring out which ones are light enough weight and so on to see about porting.

Notice this one uses KITTI dataset as well:
https://www.cs.toronto.edu/~urtasun/publications/luo_etal_cvpr16.pdf

From Intel R&D directly: https://arxiv.org/pdf/2001.04552.pdf Apparently this was never implemented. Deep Learning Stereo Vision at the edge
Google’s StereoNet looks really fast/lightweight: https://arxiv.org/pdf/1807.08865.pdf
Github summarizing depth quality enhancements using CNNs: https://github.com/mdcnn/Depth-Image-Quality-Enhancement
This one looks pretty interesting: https://arxiv.org/pdf/1910.00541.pdf

SparseNN depth completion
https://www.youtube.com/watch?v=rN6D3QmMNuU&feature=youtu.be

ROXANNE Consistent video depth estimation
https://roxanneluo.github.io/Consistent-Video-Depth-Estimation/

https://web.stanford.edu/class/ee368/Project_Autumn_1516/Reports/Jordan_Shridhar.pdf Seems like the Myriad X 2x NCE + SHAVES are plenty fast enough to real-time make a super-great disparity depth output.
https://arxiv.org/pdf/1910.13708.pdf
DDRNet: Depth Map Denoising and Refinement for Consumer Depth Cameras Using Cascaded CNNs:
- http://openaccess.thecvf.com/content_ECCV_2018/papers/Shi_Yan_DDRNet_Depth_Map_ECCV_2018_paper.pdf
- https://github.com/neycyanshi/DDRNet
AMNet: Deep Atrous Multiscale Stereo Disparity Estimation Networks: https://arxiv.org/pdf/1904.09099.pdf
Stereo Matching by Training a Convolutional Neural Network to Compare Image Patches: https://github.com/jzbontar/mc-cnn/blob/master/README.md
Siamese network. Probably way too big ass it shows multi-second run-times: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6472548/
The middlebury stereo dataset seems incredibly useful
https://github.com/kelkelcheng/GC-Net-Tensorflow/blob/master/README.md
DispNetC shows 0.06 runtime, which is encouraging.
Real-time self-adaptive deep stereo
https://zpascal.net/cvpr2019/Tonioni_Real-Time_Self-Adaptive_Deep_Stereo_CVPR_2019_paper.pdf
https://github.com/CVLAB-Unibo/Real-time-self-adaptive-deep-stereo/blob/master/README.MD
Pytorch implementation of the several Deep Stereo Matching Network(DSMnet) https://github.com/wyf2017/DSMnet/blob/master/README.md

The text was updated successfully, but these errors were encountered:

2emoore4 · 2020-08-14T19:45:37Z

As a DepthAI user, I want to emphasize the importance of having clean/accurate/precise depth maps - it's clear that deep learning is the key to achieving this.

It's definitely possible to clean up depth maps with more traditional filtering, with something like the Bilateral Solver: https://drive.google.com/file/d/1zFzCaFwkGK1EGmJ_KEqb-ZsRJhfUKN2S/view

However there has been much more work recently to apply deep learning to 3d image generation, and more work is coming all the time.

Stereo Magnification introduced Multi Plane Images, and used differentiable rendering to learn to generate them from stereo images: https://people.eecs.berkeley.edu/~tinghuiz/projects/mpi/

Many have extended on this idea, but much of the latest work uses dozens of input images, instead of just two:

DeepView: https://augmentedperception.github.io/deepview/
Immersive Light Field Video w/ Layered Meshes: https://augmentedperception.github.io/deepviewvideo/
Neural Radiance Fields: https://www.matthewtancik.com/nerf

(Not all of these output MPIs, but all are fairly similar)

There's also plenty of recent work around monocular depth estimation, like MiDaS from Intel: https://github.com/intel-isl/MiDaS

Some take existing 3d photos, and try to inpaint disocclusions, so that inaccuracies are less noticeable: https://shihmengli.github.io/3D-Photo-Inpainting/

Luxonis-Brandon · 2020-08-14T20:00:56Z

Thanks @2emoore4 ! Super appreciate it. Will review all these shortly. And also sharing with the team!

saching13 · 2020-08-25T20:33:01Z

I am adding the paper by Skydio which carries out end to end learning for stereo.
https://arxiv.org/pdf/1703.04309.pdf

Luxonis-Brandon · 2020-08-25T20:36:19Z

Thanks!

Luxonis-Brandon · 2020-11-06T16:09:44Z

This looks quite interesting (Martin brought up internally):
https://geometry.cs.ucl.ac.uk/projects/2018/depthcut/

Luxonis-Brandon · 2020-11-10T19:09:16Z

Check out the datasets referenced near the end of this paper:
https://arxiv.org/pdf/1612.02401.pdf
The approach is also interesting IMO, and could be adapted for deep learning from stereo.
(they are solving a harder problem which is both motion and depth from a pair of images, but you could fix motion since it's known and just focus on the depth part).

Luxonis-Brandon · 2020-12-04T18:18:28Z

PatchmatchNet: Learned Multi-View Patchmatch Stereo
Looks like an interesting paper for resource limited devices.
https://github.com/FangjinhuaWang/PatchmatchNet
https://arxiv.org/pdf/2012.01411v1.pdf

Luxonis-Brandon · 2021-03-07T02:40:16Z

Some additional resources from Discord:

https://www.hindawi.com/journals/cin/2020/8562323/

Luxonis-Brandon · 2021-08-13T16:39:42Z

https://antabangun.github.io/projects/CoEx/#dem

Luxonis-Brandon · 2021-09-05T20:06:27Z

https://github.com/ibaiGorordo/UnrealCV-stereo-depth-generation

tersekmatija · 2021-09-05T21:43:55Z

https://arxiv.org/pdf/2007.12140.pdf
https://github.com/ibaiGorordo/HITNET-Stereo-Depth-estimation

dhruvmsheth · 2021-09-07T09:10:54Z

https://github.com/ibaiGorordo/HITNET-Stereo-Depth-estimation

This seems to be pretty accurate. Achieved results on TFlite HITNET Stereo Depth Estimation -

Compared to original results -

Luxonis-Brandon · 2021-09-07T20:43:12Z

Looks great - thanks for sharing!

Luxonis-Brandon · 2021-09-08T22:03:01Z

https://github.com/cogsys-tuebingen/mobilestereonet - From @PINTO0309 in Discord.

Luxonis-Brandon · 2021-09-12T19:43:17Z

The first results are starting to come. Here's MIT Fast Depth (https://github.com/dwofk/fast-depth) running on OAK-D-(anything):

nickjrz · 2021-09-13T02:45:36Z

Hey @Luxonis-Brandon, this looks like a great starting point for neural network assisted depth estimation. I wonder how precise it can get if we added the depth ground truth in a self-supervised training. Is the inference part running on host and if this is the case, what would it look like to try to optimize the network run on the OAK-D onboard?

Luxonis-Brandon · 2021-09-13T03:00:33Z

This is running on OAK-D directly, not on the host. Matija will be making a pull request soon so you'll be able to try it. (He may have already and I missed it - unsure... he just got it working this weekend.)

nickjrz · 2021-09-14T08:37:27Z

I was able to run real-time inference on HITNET Stereo depth estimation (middlebury) using OAK-D and having the inference on the host. Here are my results:

PINTO0309 · 2021-09-14T09:45:10Z

Due to a problem with OpenVINO's conversion to Myriad Blob, I submitted an issue to Intel's engineers (OpenVINO). So far, Intel engineers seem to be concerned that the structure of the model is wrong, but we are able to infer it successfully in ONNX runtime and TFLite runtime.

[Bug] GatherND shape conversion from ONNX is inaccurate #7379 (HITNET to blob / OpenVINO)
[Bug] GatherND shape conversion from ONNX is inaccurate openvinotoolkit/openvino#7379

ibaiGorordo · 2021-09-14T13:10:48Z

Also, HITNET looks nice, but it is quite slow. Currently, monocular depth estimation models (fastnet, Midas 2.1 small...) seem to be faster than the stereo ones (current ones are too complex with 3D convolutions and the cost aggregation). But, I still have hope that there is somewhere some fast stereo model 🧐

PINTO0309 · 2021-09-21T23:30:24Z

It looks like the issue I posted has been triaged and escalated to the development team. I can somewhat predict that it will run faster if I reason with OpenVINO, so I will be patient and interact with it.

Luxonis-Brandon · 2021-09-21T23:43:43Z

Awesome - thanks!

ghost · 2021-10-15T12:45:42Z

Can Sb submit algorithm results to benchmark? https://vision.middlebury.edu/stereo/eval3/

gurbain · 2021-12-10T15:30:18Z

I was able to run real-time inference on TFLite HITNET Stereo depth estimation (middlebury) using OAK-D and having the inference on the host. Here are my results:

Hey,

Sorry for the spam but I am trying to reproduce the same example that you showed @nickjrz (stereo depth estimation on the host with an oak-d and hitnet) and I can't get as good results as you show. I actually started from the same project (https://github.com/ibaiGorordo/HITNET-Stereo-Depth-estimation) but it looks like my results are much worse than yours (maybe the pre-processing?). Could you maybe provide a link to your code, it would be really interesting. Thank you!

PINTO0309 · 2022-05-23T15:22:40Z

rtstereonet_maxdisp192_720x1280.zip

Luxonis-Brandon · 2022-06-06T18:57:04Z

https://twitter.com/nburrus/status/1528750927037046784?s=21&t=-1nO4bfsI7ZhImVwyWTpQw

cyberbeat · 2022-08-10T20:41:11Z

Some other neat ones: https://cvlab-unibo.github.io/neural-disparity-refinement-web/ https://arxiv.org/abs/2110.15367

@Luxonis-Brandon Did you made some progress with this one? I did not find anything about speed of this? And would it be possible to make it smaller (replacing backbone)?

tersekmatija · 2022-08-10T22:43:54Z

We did some extensive experiments on that one @cyberbeat . We replicated the code before it was released actually, but just replacing the backbone isn't enough. The 2 MLP heads at the end are pretty big, so the final model doesn't fit on device. So we've tried experimenting a bit with the two heads, but unfortunately they don't perform well enough when constrained as much as we needed them to be. We are constantly looking at different approaches and how to combine them.

justin-larking-pk · 2022-10-06T21:03:44Z

Have you guys looked at RAFT-Stereo (https://github.com/princeton-vl/RAFT-Stereo)? I was looking at implementing a lite variant of this myself (like how the og RAFT optical flow model has RAFT-S). It appears to have good results.

tersekmatija · 2022-10-13T11:38:17Z

@justin-larking-pk I don't think we tried exporting it, CREStereo mentioned above seems to get better results. But if you give it a try, do keep us updated! :)

kekeblom · 2022-10-30T21:30:56Z

Tried adapting the method from [1] using their code (https://github.com/kevinleestone/mmstereo) to run onboard the device. As such, at 720P resolution, I was not able to fit it in memory and ran into some issues with model export. With a few changes to the model, I was able to export it and run it on the device. Took a bit of tweaking to get the compute to a reasonable level (640x400 resolution, 8x downsampling factor), which runs on the device at ~4 FPS, although the lower resolution and heavy downsampling does come at the cost of accuracy and considerable artifacts are starting to be visible. Trained it on FlyingThings, Middlebury and ETH3D.

Might give it a couple more shots to see if I can find a setting that would work a little bit better.

Here are a couple of results:

Anyone know of any tools or general resources that could help figure out the bottlenecks on the embedded hardware?

[1] Krishna Shankar, Mark Tjersland, Jeremy Ma, Kevin Stone, and Max Bajracharya. A Learned Stereo Depth System for Robotic Manipulation in Homes. ICRA 2022

tersekmatija · 2022-11-02T13:42:06Z

Nice @kekeblom! Do you mind sharing the repository/describing all the changes you've made in the repo? I think this definitely shows some promise and I believe the artifacts could be removed by further tweaking the architecture.

As for figuring out the bottlenecks on the embedded hardware - you can try and install openvino-dev. There should be a benchmark_app which you can use to benchmark the model on the device. Simply add -d MYRIAD -report_type average_counters flags when calling it. Note that the version should match the version of OpenVINO used to produce the blob.

You can also inspect the FLOPs and parameters of your architecture using fvcore or something similar beforehand. While more FLOPs in certain module does not always mean there could be a bottleneck, it's a good start and it's usually worth investigating those modules.

kekeblom · 2022-11-02T20:36:55Z

Here is the code I used https://github.com/kekeblom/mmstereo. @tersekmatija

Currently training one with 640x400 resolution, 4x downsampling and 128 disparities computed. This seems to yield quite a bit cleaner output. The settings can be found in the example config config_sceneflow.yaml.

This one hasn't yet fully converged, but using an intermediate checkpoint, I get slightly better output:

Admittedly some artifacts are still visible.

It seems most of the compute is spent in computing and processing the cost volume. Trimming the convolutional network doesn't seem to change throughput, but reducing the size of the cost volume has a considerable impact. Can't think of any easy ways around that besides using lower resolution images. Maybe there exists a more efficient way to match features on this specific hardware.

I think the sweet spot for this type of network might be running at a lower resolution, but might require some tuning to get rid of the artifacts, as the network is kind of designed for very high resolutions and close range scenes with the large dilations in the convolutions.

themarpe · 2022-11-03T00:44:07Z

Nice!

@tersekmatija @kekeblom thoughts on perhaps having a network that "validates / augments" depth instead of computing it from scratch? Maybe a L + R + Depth(or Disp) + Confidence Map -> Depth. It'd either recalculate low confidence points, or observe known areas/patterns which should be "filled" (eg flat surfaces, etc...).

Mostly thinking from performance standpoint, running on existing hardware, where the Stereo HW already does a lot of compute and to see if that can be further utilized, without having the network do it. (the whole matching again)

tersekmatija · 2022-11-08T11:15:25Z

Nice @kekeblom!

And yeah, cost volumes are expensive. Currently I am not sure if there exist some fast way to compute/process them at arbitrary scales. This is why CREStereo and similar methods usually tackle the problem with RNNs rather than cost volumes.
I think that using smaller dilations could improve the artifacts on lower resolutions. But yeah, it is a challenging problem to do this efficiently with good accuracy.

Re depth/disparity filling - I think approaching this with CNNs might be hard, especially since the holes have dynamic shape and any conditional predictions are hard to do on edge accelerators. I think a more suitable approach would be something along the lines of NDR, but from my experience the MLPs there are the reason for a good performance. Making them lighter massively decreases the performance, but without doing this it's impossible to deploy them.

john-maidbot · 2023-05-03T13:11:02Z

I don't want to discount the awesome work going on with end to end disparity estimation, those results are really cool!! But I am also curious about the depth completion rather than full neural network estimation of the disparity. SGBM is already there, gives valid but sparse depth estimates, and runs at a high fps. It seems like inpainting a sparse depth map to a dense depth map given a reference image (e.g. the aligned gray scale camera) would be easier than going from image+image to dense disparity map. But It could be much harder in practice 😅

Also crestereo makes the myriad processor on my oakdlite run so hot 🥵 not complaining, just FYI for others that try this.

borongyuan · 2023-06-11T11:50:21Z

Here is a TinyHITNet implementation that is much less computationally intensive. But the model conversion looks a bit problematic.
https://github.com/zjjMaiMai/TinyHITNet
https://github.com/borongyuan/DepthAI-TinyHITNet

themarpe · 2023-06-19T18:52:33Z

Nice @borongyuan

Gave this a quick try - does seem to work fairly well, but has issues recognizing certain objects every now and then.

Looking forward how far you'll be able to bring this!

borongyuan · 2023-06-20T04:44:41Z

Hi @themarpe
In fact, I think the performance is not good enough. Although the MACs of the model appear to be smaller, the U-Net architecture is inefficient on many hardware platforms. We need to look at other options. I also prefer methods of depth refinement/enhancement.

borongyuan · 2024-06-17T02:57:56Z

Our new stereo depth solution, running on OAK-D Lite

neural_depth.mp4

tersekmatija · 2024-06-17T09:51:31Z

Nice @borongyuan !

Curious how you are approaching this?

borongyuan · 2024-07-24T08:28:09Z

We use a more clever way to construct the cost volume, which solves the bottleneck that @kekeblom mentioned earlier. Lightweight architectures are used for feature extraction and cost matching. This seems to be the best we can get with RVC2’s computing power. This model can theoretically run on RVC4, and it should be able to achieve good real-time performance at that time.
Different from the traditional SGBM method, this model has no limit on the disparity search range. That is it has a very short MinZ. But currently it has a disparity noise of several pixels for any target at any distance. So it has very good short-range measurement capabilities, but not as good as SGBM at long range. We tried modifying the model’s loss function to make the model pay more attention to distant objects, but there was no significant improvement so far. We are going to try it on OAK-D LR later.
In addition, we now need to use StereoNode because it can perform stereo rectification. If we can use only CamreaNode to complete this part, we may be able to improve the performance a little bit. Can you provide an example of using cv::stereoRectify() to generate maps on the host and then setWarpMesh()?
Due to its current performance and characteristics, we can only keep this as an experimental feature. It may currently only be useful for a few scenarios such as robotic arm grasping.

MaticTonin · 2024-07-25T15:28:47Z

Hi.

Here is an example of how to set stereoRectify() and warpMesh on OAK-D-PRO device

from dataclasses import dataclass
meshCellSize = 16

@dataclass
class RectificationMaps:
    map_x: np.ndarray
    map_y: np.ndarray
def rotate_mesh_90_ccw(map_x, map_y):
    direction = 1
    map_x_rot = np.rot90(map_x, direction)
    map_y_rot = np.rot90(map_y, direction)
    return map_x_rot, map_y_rot

def rotate_mesh_90_cw(map_x, map_y):
    direction = -1
    map_x_rot = np.rot90(map_x, direction)
    map_y_rot = np.rot90(map_y, direction)
    return map_x_rot, map_y_rot

def downSampleMesh(mapXL, mapYL, mapXR, mapYR):
    meshLeft = []
    meshRight = []

    for y in range(mapXL.shape[0] + 1):
        if y % meshCellSize == 0:
            rowLeft = []
            rowRight = []
            for x in range(mapXL.shape[1] + 1):
                if x % meshCellSize == 0:
                    if y == mapXL.shape[0] and x == mapXL.shape[1]:
                        rowLeft.append(mapYL[y - 1, x - 1])
                        rowLeft.append(mapXL[y - 1, x - 1])
                        rowRight.append(mapYR[y - 1, x - 1])
                        rowRight.append(mapXR[y - 1, x - 1])
                    elif y == mapXL.shape[0]:
                        rowLeft.append(mapYL[y - 1, x])
                        rowLeft.append(mapXL[y - 1, x])
                        rowRight.append(mapYR[y - 1, x])
                        rowRight.append(mapXR[y - 1, x])
                    elif x == mapXL.shape[1]:
                        rowLeft.append(mapYL[y, x - 1])
                        rowLeft.append(mapXL[y, x - 1])
                        rowRight.append(mapYR[y, x - 1])
                        rowRight.append(mapXR[y, x - 1])
                    else:
                        rowLeft.append(mapYL[y, x])
                        rowLeft.append(mapXL[y, x])
                        rowRight.append(mapYR[y, x])
                        rowRight.append(mapXR[y, x])
            if (mapXL.shape[1] % meshCellSize) % 2 != 0:
                rowLeft.append(0)
                rowLeft.append(0)
                rowRight.append(0)
                rowRight.append(0)

            meshLeft.append(rowLeft)
            meshRight.append(rowRight)

    meshLeft = np.array(meshLeft)
    meshRight = np.array(meshRight)

    return meshLeft, meshRight


def create_mesh_on_host(calibData, leftSocket, rightSocket, resolution, vertical=False):
    width = resolution[0]
    height = resolution[1]

    M1 = np.array(calibData.getCameraIntrinsics(leftSocket, width, height))
    d1 = np.array(calibData.getDistortionCoefficients(leftSocket))
    M2 = np.array(calibData.getCameraIntrinsics(rightSocket, width, height))
    d2 = np.array(calibData.getDistortionCoefficients(rightSocket))

    T = np.array(calibData.getCameraTranslationVector(leftSocket, rightSocket, False))
    extrinsics = np.array(calibData.getCameraExtrinsics(leftSocket, rightSocket))
    extrinsics = extrinsics.flatten()
    R = np.array([
        [extrinsics[0], extrinsics[1], extrinsics[2]],
        [extrinsics[4], extrinsics[5], extrinsics[6]],
        [extrinsics[8], extrinsics[9], extrinsics[10]]
    ])

    T2 = np.array(calibData.getCameraTranslationVector(leftSocket, rightSocket, True))

    def calc_fov_D_H_V(f, w, h):
        return np.degrees(2*np.arctan(np.sqrt(w*w+h*h)/(2*f))), np.degrees(2*np.arctan(w/(2*f))), np.degrees(2*np.arctan(h/(2*f)))

    R1, R2, P1, P2, Q, validPixROI1, validPixROI2 = cv2.stereoRectify(M1, d1, M2, d2, resolution, R, T)
    TARGET_MATRIX = M2
    mapXL, mapYL = cv2.initUndistortRectifyMap(M1, d1, R1, TARGET_MATRIX, resolution, cv2.CV_32FC1)
    mapXV, mapYV = cv2.initUndistortRectifyMap(M2, d2, R2, TARGET_MATRIX, resolution, cv2.CV_32FC1)
    if vertical:
        baseline = abs(T2[1])*10
        focal = TARGET_MATRIX[0][0]
        mapXL_rot, mapYL_rot = rotate_mesh_90_ccw(mapXL, mapYL)
        mapXV_rot, mapYV_rot = rotate_mesh_90_ccw(mapXV, mapYV)
    else:
        baseline = abs(T2[0])*10
        focal = TARGET_MATRIX[1][1]
        mapXL_rot, mapYL_rot = mapXL, mapYL
        mapXV_rot, mapYV_rot = mapXV, mapYV
    leftMeshRot, verticalMeshRot = downSampleMesh(mapXL_rot, mapYL_rot, mapXV_rot, mapYV_rot)

    meshLeft = list(leftMeshRot.tobytes())
    meshVertical = list(verticalMeshRot.tobytes())
    focalScaleFactor = baseline * focal * 32
    print("Focal scale factor", focalScaleFactor)

    leftMap = RectificationMaps(map_x=mapXL, map_y=mapYL)
    verticalMap = RectificationMaps(map_x=mapXV, map_y=mapYV)

    return leftMap, verticalMap, meshLeft, meshVertical, focalScaleFactor

The needed informations are leftSocket, rightSocket, calibration data and resolution of camera.
All you then need to do is to add in pipeline stereo.loadMeshData(meshLeft, meshRight) and it should work with the calibration mesh you created on host. If you need anything, ping me anytime.

borongyuan · 2024-07-26T06:08:55Z

Thank you @MaticTonin. I'll try it later. Does OAK have any limits on wrap mesh? I see that the mesh used internally has a step of 16 as well. In addition, I need to start the device first to obtain calibration params. Then I need to restart once to reconfigure the pipeline, like in the warp_mesh_interactive.py example.

MaticTonin · 2024-07-26T08:50:25Z

The minimal limit is 9. To obtain parameters, you dont need to start the pipeline, which boots the device, but extract them before starting the pipeline.

borongyuan · 2024-08-05T04:11:37Z

Due to Camera Node related issue, I can't use it for stereo rectification yet. But I still managed to improve performance a little bit. Here is my outdoor test using OAK-D W, and it looks good.

Kazam_screencast_00006.mp4

borongyuan · 2024-11-14T02:16:12Z

Factor_Perception_Lite_Neural_Depth.mp4

borongyuan · 2024-11-15T04:18:00Z

Initial test on OAK-D LR

Neural_Depth_LR_test1.mp4

Luxonis-Brandon added the enhancement New feature or request label Aug 7, 2020

Luxonis-Brandon changed the title ~~Neural-Assisted Depth Estimation~~ Neural-Assisted Disparity Depth Estimation Sep 18, 2020

Luxonis-Brandon added 2021 labels Oct 14, 2020

Luxonis-Brandon mentioned this issue Aug 8, 2022

Code for depth estimation luxonis/depthai-core#551

Open

michaelnguyen11 mentioned this issue Aug 19, 2022

RealtimeStereo - new improvement PINTO0309/PINTO_model_zoo#279

Closed

Serafadam mentioned this issue Jan 20, 2023

Delay, noise and no output on Oak D Pro Wide depth camera when running specific commands on ROS2 Galactic. luxonis/depthai-ros#200

Open

Erol444 removed 2021 labels Mar 27, 2023

borongyuan mentioned this issue Aug 3, 2024

[BUG] Pipeline crash when undistorting color image using Camera Node luxonis/depthai-core#1056

Closed

Neural-Assisted Disparity Depth Estimation #173

Neural-Assisted Disparity Depth Estimation #173

Comments

Luxonis-Brandon commented Aug 7, 2020 • edited Loading

Start with the why:

Background:

New Modality of Use

Move to the how:

Google Mannequin Challenge:

KITTI DataSet:

PapersWithCode:

Others and Random Notes:

2emoore4 commented Aug 14, 2020

Luxonis-Brandon commented Aug 14, 2020

saching13 commented Aug 25, 2020 • edited Loading

Luxonis-Brandon commented Aug 25, 2020

Luxonis-Brandon commented Nov 6, 2020 • edited Loading

Luxonis-Brandon commented Nov 10, 2020

Luxonis-Brandon commented Dec 4, 2020

Luxonis-Brandon commented Mar 7, 2021

Luxonis-Brandon commented Aug 13, 2021

Luxonis-Brandon commented Sep 5, 2021

tersekmatija commented Sep 5, 2021

dhruvmsheth commented Sep 7, 2021

Luxonis-Brandon commented Sep 7, 2021

Luxonis-Brandon commented Sep 8, 2021

Luxonis-Brandon commented Sep 12, 2021

nickjrz commented Sep 13, 2021

Luxonis-Brandon commented Sep 13, 2021

nickjrz commented Sep 14, 2021 • edited Loading

PINTO0309 commented Sep 14, 2021 • edited Loading

ibaiGorordo commented Sep 14, 2021 • edited Loading

PINTO0309 commented Sep 21, 2021 • edited Loading

Luxonis-Brandon commented Sep 21, 2021

ghost commented Oct 15, 2021

gurbain commented Dec 10, 2021

PINTO0309 commented May 23, 2022

Luxonis-Brandon commented Jun 6, 2022

cyberbeat commented Aug 10, 2022

tersekmatija commented Aug 10, 2022

justin-larking-pk commented Oct 6, 2022

tersekmatija commented Oct 13, 2022

kekeblom commented Oct 30, 2022

tersekmatija commented Nov 2, 2022

kekeblom commented Nov 2, 2022 • edited Loading

themarpe commented Nov 3, 2022

tersekmatija commented Nov 8, 2022

john-maidbot commented May 3, 2023 • edited Loading

borongyuan commented Jun 11, 2023 • edited Loading

themarpe commented Jun 19, 2023

borongyuan commented Jun 20, 2023

borongyuan commented Jun 17, 2024

tersekmatija commented Jun 17, 2024

borongyuan commented Jul 24, 2024

MaticTonin commented Jul 25, 2024 • edited Loading

borongyuan commented Jul 26, 2024

MaticTonin commented Jul 26, 2024 • edited Loading

borongyuan commented Aug 5, 2024

borongyuan commented Nov 14, 2024

borongyuan commented Nov 15, 2024

Luxonis-Brandon commented Aug 7, 2020 •

edited

Loading

Start with the `why`:

Move to the `how`:

saching13 commented Aug 25, 2020 •

edited

Loading

Luxonis-Brandon commented Nov 6, 2020 •

edited

Loading

nickjrz commented Sep 14, 2021 •

edited

Loading

PINTO0309 commented Sep 14, 2021 •

edited

Loading

ibaiGorordo commented Sep 14, 2021 •

edited

Loading

PINTO0309 commented Sep 21, 2021 •

edited

Loading

kekeblom commented Nov 2, 2022 •

edited

Loading

john-maidbot commented May 3, 2023 •

edited

Loading

borongyuan commented Jun 11, 2023 •

edited

Loading

MaticTonin commented Jul 25, 2024 •

edited

Loading

MaticTonin commented Jul 26, 2024 •

edited

Loading