Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aligning depth picture coordinates to real world coordinates #8333

Closed
Danilich1994 opened this issue Feb 9, 2021 · 23 comments
Closed

Aligning depth picture coordinates to real world coordinates #8333

Danilich1994 opened this issue Feb 9, 2021 · 23 comments

Comments

@Danilich1994
Copy link

Required Info
Camera Model D415
Firmware Version 2.38.1.2223
Operating System & Version Win 10
Platform PC/Raspberry Pi
SDK Version 2.0
Language python

Hello,

Here you can see my current camera setup (2x D415).
cam_setup

The main idea is to use two cameras (placed parallel to scanning plane) to increase overall FOV. Distance between cameras, linear FOV on certain height are calculated based on cameras FOV 69.4x42.5. The "h" height is chosen so that the edges of the frames between the cameras coincide without overlapping.
I'm trying to scan top view of objects which are placed on plain surface at about the same height from the camera (orange ones on the picture). "c" on picture represents threshold filter for depth image, so I can catch only objects surfaces (within some range), and ignore objects side surfaces (which can be seen coz of parallax). I'm taking "photos" with two cameras simultaneously and then just stitch them together to get larger image.
Now I have only two pieces of whole process: scanning tool and program which takes images and searches for objects on them. But I'm missing essential thing - connection between the real world coordinates and image coordinates. Coz I want to locate objects centers coordinates (x,y) in real world. On the picture, it can be seen "origin" point, this is origin of scanning area. How to bind stitched image origin to real world scene origin?
After active google searching, I found a lot info related to multicameras setups. There are info about inward/outward configurations, but none of the options doesn't look like mine.

Here is how I understand connection process:

  1. Get cameras intrinsic and extrinsic parameters
  2. Transform pointcloud from one camera to another using extrinsic parameters
  3. Merge pointclouds
  4. Using intrinsic parameters deproject pixels into points.
    Am I right?

Does my camera setup even work? Google research considering translation panorama creation shows that it is big pain.
What if I'll add two more cameras along x axis? Should I overlap frames to exclude "black" strip of invalid depth info on the left side of frame?

@MartyG-RealSense
Copy link
Collaborator

Hi @Danilich1994 The placement of your two cameras is fine, though you may achieve improved depth sensing results if you position the cameras closer together so that their fields of view overlap. This has the benefit of producing redundant depth data due to more than one camera observing the same area where the FOVs overlap.

Using overlapping may require using a third camera to achieve the same overall width of view as two cameras spaced further apart though. The more cameras that you use, the fewer blind-spots there will be.

The link below provides information on methods of stitching point clouds using RealSense.

#8044 (comment)

The RealSense user in that case was also using a top-down view and trying to exclude detail representing areas outside of the top view, so that overall case may be useful to read from the start.

@Danilich1994
Copy link
Author

@MartyG-RealSense, thank you for your reply. It, definitely helped me to understand logic behind process which must be done to solve my problem. But, let's make sure I understood in right way:

  1. Get depth frames from both cameras.
  2. Choose one camera as main camera for the scene.
  3. Find out extrinsic parameters of second camera (rotation and translation matrixes relative to main camera position). It's easy because I know exact positions of both cameras. May be It will be necessary to manually adjust extrinsic parameters.
  4. Affine transform depth frame from second camera using extrinsic found above.
  5. Stitch transformed depth frame to depth frame of main camera.
  6. Save this "bigger" frame.
  7. Convert it to colored image (for example grayscale), to perform object search algorithm to find center points coordinates (in pixels) of objects.
  8. Using x,y pixel coordinates and main camera intrinsic parameters perform pixel_to_point transform to obtain real world coordinates from saved "bigger" depth frame.
  9. These x,y,z coordinates will be found in relation to origin of main camera.
    Is it right?

Some clarifying questions:
May be it's necessary to convert depth frames into point clouds, and only after that affine transform and stitch them (followed with point_cloud -> depth_frame transformation)?
All provided functions for point_to_point, point_to_pixel, pixel_to_point in python require intrinsic and extrinsic parameters to be passed. These parameters are special objects in python. They can be obtained through another special function from cameras itself. How can I create custom extrinsic parameters object to use it in point_to_point function, to perform affine transform between two separate D415 cameras?

@MartyG-RealSense
Copy link
Collaborator

I have not performed the stitching process myself, so it is difficult for me to verify the listed procedure, beyond knowing that the procedure should involve moving and rotating individual clouds and then appending them together.

I would say though that my expectation- like your own 'clarifying question' about depth frame conversion - would be that the point cloud would be generated to produce XYZ coordinates before the affine transform is applied to the 3D point cloud data with rs2_transform_point_to_point.

Some more information about rs2_transform_point_to_point can be found in the link below.

#5583

The link below may also be useful in regard to multi-camera data alignment:

#2664

@Danilich1994
Copy link
Author

Great!

Only last step is unknown for me - how apply affine transform to pointcloud.
rs2_transform_point_to_point function needs point xyz coordinates to modify, how to get these coordinates from pointcloud?
pointcloud -> vertices and convert them to array? And then go through each array member with transform function?
If I'm right, then how to convert result vertices array back into pointcloud, to stitch it with pointcloud from main cam?

@MartyG-RealSense
Copy link
Collaborator

MartyG-RealSense commented Feb 12, 2021

I'm not certain if the Python method in the link below for obtaining point cloud vertices for a Python array will be applicable to your particular project, but I will share it in case it is helpful:

#1653

@Danilich1994
Copy link
Author

Hello @MartyG-RealSense, thanks for the assistance! I'm slowly going forward in terms of my project, and that is great!
I got another question relative images produced by camera. Here is gif which shows what I'm talking about:
ezgif com-gif-maker
Here camera has default preset with enabled laser emitter (150mW), "Dynamic" visual preset chosen for visualization. Decimation and spatial filters applied.
So, when camera is parallel to flat wall some kind of pattern appears in terms of depth data, and overall picture becomes noisy. The difference can be easily seen compared with moments when the camera is angled to the wall. Rotation angle is small. The same noise and "pattern" I face using camera in different light condition, with turned on/off laser emitter. When I'm converting colorized depth picture into grayscale picture to feed it through further image processing (edge detection, segmentation, etc.) this noise strongly distorts the result. The only solution I found - adjust threshold filter so that camera don't see this big surface and only catches object surface. So, output grayscale picture, mostly black, with gray objects, so I can simply binarize (0 pixels = 0, pixels > 0 = 255 pixels).
So, what is source of this noise and can I get rid of it? Or I'm doing something wrong?

@MartyG-RealSense
Copy link
Collaborator

MartyG-RealSense commented Feb 16, 2021

Is this view still a top-down view from the ceiling like the diagram in your opening comment of this discussion, please?

I ran extensive tests to replicate your image. I had similar color change on my image during camera movement and could not achieve much improvement. In lower light conditions (e.g late afternoon in a lounge sized indoor room), maximizing laser power to '360' instead of the default of '150' may offer some improvement in image quality. Also, other projects with a top-down view also sometimes take the same approach that you did of depth-clamping out (thresholding) the far distance (such as the floor of a top-down view)

If your project is able to do so (if it will not affect the greyscale image processing), using depth to color alignment can enhance an image considerably. It also makes it easier to distinguish foreground pixels from background pixels. Python has an example alignment program called align_depth2color.py

https://github.com/IntelRealSense/librealsense/blob/master/wrappers/python/examples/align-depth2color.py

If you do not require a fully colorized depth image, you could have a greyscale image by default by setting the White to Black color scheme instead of the Jet color scheme.

image

@Danilich1994
Copy link
Author

No, this is not top-down view. Camera is placed on table, and is looking on the wall with some objecst attached to that wall.
Color to depth alignment is not my variant, because I wanted to exclude color information from result image. The objects I want to capture have a lot of different pictures, texts, etc. on their surfaces, and may be colored in about 4 different colors. So, it's very color noisy thing to process. So, after trying to success with simple RGB cam, I changed my mind and choose Depth camera to exclude color mess.
Here is "White to Black" depth image captured when camera is placed parallel to the wall:
12_Depth
The image is "flickering" - the depth noise level is around 10mm, the image seems to be split into separate dots that flicker (any edge detection algorithm will catch all these dots, because there is sharp change in values between each of the "dots"). But if this noise would be uniform across the image - no big deal. But here it's clearly seen middle circle where depth is "closer" to camera (~0.835m), and ring of "further" depth (~0.84-0.85m), followed by another ring closer to camera (~0.84-0.83m). Furthermore, middle region is more stable (1-2mm noise) than these "ring" regions. This reminds me of Newton's rings )

If I will angle the camera respective to wall, I will get the same noise level, but I would see nice depth gradient (representing flat wall) across the image. If I'll feed such image through edge detection algorithm it would not catch any part of wall, only objects.
13_Depth
I'll try to add better lightning to the scene, may be that would help.

@MartyG-RealSense
Copy link
Collaborator

MartyG-RealSense commented Feb 17, 2021

Your scenario of the objects attached to the wall reminds me of a past case where a RealSense user did the same with VHS videotape boxes to try to detect them and measure the distance to them. They also had difficulties with picking the boxes out from the image. Their ultimate goal was to view objects from a top-down perspective.

https://support.intelrealsense.com/hc/en-us/community/posts/360033574174-415-depth-sense-granularity

I ran extensive further tests but the best improvement that I could get in image quality of the far wall was to use a D455 camera (which has 2x better accuracy over distance compared to the D435 models) or disable the Histogram Equalization option in the colorization settings, as shown below.

image

RealSense 400 Series cameras can also better perceive white walls at a far distance if a physical optical filter called a longpass filter is applied over the camera lenses on the outside of the camera, as described in Section 4.2.1 of Intel's white-paper document about optical filters.

https://dev.intelrealsense.com/docs/optical-filters-for-intel-realsense-depth-cameras-d400#section-4-the-use-of-optical-filters

As you are using a D415, may I ask which resolution you are using please, as it is not mentioned in the discussion. For the D415 model, the optimal depth accuracy resolution is 1280x720 (whereas on the D435 models it is 848x480).

@Danilich1994
Copy link
Author

Depth frame resolution, in my case, is 1280x720 while capturing. Then, decimation filter reduce it to 640x360.
If I turn off Histogram Equalization option in viewer picture becomes much better in terms of wall visualization, but objects would not be seen so clearly (the blend with background). Option of enabling\disabling Histogram Equalization exists only in viewer? Because I use viewer only for testing purposes, the main project runs on python code. Does this option change only visualization inside viewer, or it influences depth data itself?

@MartyG-RealSense
Copy link
Collaborator

MartyG-RealSense commented Feb 17, 2021

Python code for disabling Histogram Equalization can be found in the link below.

#7089 (comment)

The colorization options such as the Color Preset change how the depth data is colored, not the depth itself.

@MartyG-RealSense
Copy link
Collaborator

Hi @Danilich1994 Do you require further assistance with this case, please? Thanks!

@Danilich1994
Copy link
Author

Hi @MartyG-RealSense. Thank you for your assistance, you are very helpful! I think I found the solution for my problem. So I'll close this issue.

@MartyG-RealSense
Copy link
Collaborator

You are very welcome - great to hear that you found a solution that works for you. Thanks for the update!

@Danilich1994
Copy link
Author

Hello @MartyG-RealSense.
Some questions appeared considering this issue.
In my code I set "Hand" (2) visual preset for camera sensor. But DragonDeux in issue #7089 sets visual preset 1 for colorizer before equalization disable:

colorMap = rs.colorizer() colorMap.set_option(rs.option.visual_preset, 1) colorMap.set_option(rs.option.histogram_equalization_enabled, 0.0) #disable histogram equalization colorMap.set_option(rs.option.color_scheme, float) colorMap.set_option(rs.option.min_distance , float) colorMap.set_option(rs.option.max_distance , float)

What is difference between colorizer and sensor visual presets and do they interact somehow?
What the minimum and maximum distance settings do for colorizer (two last lines of code)?
"Color_sheme" is just "classic"/"jet"/"white to black"/etc. color shemes for colorizer, am I right?

@Danilich1994 Danilich1994 reopened this Mar 4, 2021
@MartyG-RealSense
Copy link
Collaborator

The Visual Presets that are best known are the ones that affect depth data, such as Default, Hand, High Accuracy, High Density, etc. These affect which depth coordinates are rendered in the depth image.

image

The Depth Visualization presets do not affect which depth coordinates are rendered, but instead determine the style in which those coordinates are color-shaded according to their depth values.

image

image

I do not believe that there is any interaction between the depth presets and the depth colorization presets.

Minimum Distance and Maximum Distance are the min and max distances at which color visualization will be applied to the image.

Yes, the Color Schemes such as Jet and White To Black / Black To White are color schemes for colorization.

Pages 11 to 14 of a PDF guide to the RealSense Viewer that Intel published explain the colorization settings very well.

https://www.intel.com/content/www/us/en/support/articles/000028593/emerging-technologies/intel-realsense-technology.html

The link below also provides useful resources about programming colorization settings.

https://support.intelrealsense.com/hc/en-us/community/posts/360048767633/comments/360012325493

@MartyG-RealSense
Copy link
Collaborator

Hi @Danilich1994 Do you require further assistance with this case, please? Thanks!

@MartyG-RealSense
Copy link
Collaborator

Case closed due to no further comments received.

@julienguegan
Copy link

@Danilich1994 Sorry to re-active this thread but I was wondering if you succeed to apply rs2_transform_point_to_point() to apply affine transformation to your point cloud ? I want to do the same in order to, then, save a .ply file that is rotated by 90° ...

Thank you for your help !

@Danilich1994
Copy link
Author

Danilich1994 commented Jul 26, 2021

Hello, @julienguegan. Yeah I did it. And it's not necessary to use rs2_transform_point_to_point if you want to just rotate ready pointcloud. This function in realsense is used to "change" poincloud origin from depth to color camera for future color data alignment. So it uses depth - color transformation matrix defined by camera properties.
But, to rotate any pointcloud (which is represented as numpy array of shape (3 x amount of points)) you have to multiply each point by transformation matrix Wiki. Transform matrix is 4x4 matrix -
[rot rot rot t_x]
[rot rot rot t_y]
[rot rot rot t_y]
[ 0 0 0 1 ]
where t_* - translate coefficients along each axis, rot - rotation coefficients. Last row is just a placeholder

image

So, if you need to rotate all points around X for 90 degrees without any translation, you have to put proper numbers into matrix:
image
Angle must be in radians!

And multiply all pointcloud points by the matrix:
assert(points.shape[0] == 3) ##check if pointcloud has proper shape
n = points.shape[1]
points_ = np.vstack((points, np.ones((1,n)))) ##add one more row to "align" transformation matrix and pointcloud shapes
points_trans_ = np.matmul(transformation_matrix, points_) ##multiply
points_transformed = np.true_divide(points_trans_[:3,:], points_trans_[[-1], :]) ##get rid of additional row

Code is taken from RealSense python example - box_dimensioner_multicam. So if you want, you can go through this example to get understanding of basic pointcloud usage.

@julienguegan
Copy link

@Danilich1994 Actually, I know that I can do affine transformation using the method you explained but I wanted to use rs2_transform_point_to_point() because it is from the librealsense library. I was hoping to make it works on a pointcloud object or a frame object (and not numpy array). Because my final objective is to save the point cloud as a .ply file and currently to save this file, I am using the following code lines :

# get frames from camera
_, frames = pipeline.try_wait_for_frames()
image_frame = frames.get_color_frame()
depth_frame = frames.get_depth_frame()
# save point cloud 
pc = rs.pointcloud()
pc.map_to(image_frame)
points = pc.calculate(depth_frame)
points.export_to_ply('point_cloud.ply', image_frame)

I am using only librealsense method and object to do so and I am not sure if it is really possible to interface it with numpy array ... I ask @MartyG-RealSense about this here but it seems that he does not have any good solutions 😐

@Danilich1994
Copy link
Author

@julienguegan if you want to use only realsense library functions, then you have to pass into rs2_transform_point_to_point() old origin to new origin extrinsic parameters as realsense object. And I don't know how to do it. May be @MartyG-RealSense can help?

Otherwise - get depth frame; convert it to numpy array; using rs2_deproject_pixel_to_point(...) convert depth map to poincloud; apply affine transform; THIS Link gives two options how to convert pointcloud as numpy array into ply file.

Otherwise - just save ply file. I suppose you are going to use this data in future, so, anyway conversion into numpy array will be present. Just don't forget to apply affine transform before using it.

@julienguegan
Copy link

@Danilich1994 yes, agree. I am thinking to use open3d as it seems more relevant, documented, flexible for this matter

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants