New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Changed depth perception to point around nose bridge #87

Closed

taylorkf wants to merge 1 commit into ros2-devel from taylorkf/face-detection-fixed-depth

Contributor

taylorkf commented Sep 11, 2023 •

edited

Loading

Description

This PR addresses issue #67. Occasionally, the fork is positioned in front of the user's mouth while face detection runs. This can lead to the camera detecting the depth of the fork rather than the user's mouth. This pull request updates the depth perception to a 9x9 square around the bridge of the nose, which does not get covered by the fork.

Testing procedure

Test and run using the same procedure as #36 (comment). I have not run this code on the robot yet.

Before opening a pull request

Format your code using black formatter python3 -m black .
Run your code through pylint and address all warnings/errors. The only warnings that are acceptable to not address is TODOs that should be addressed in a future PR. From the top-level ada_feeding directory, run: pylint --recursive=y --rcfile=.pylintrc ..

Before Merging

Squash & Merge


          Switched to eye center instead of stomion for depth. Untested on robot.

b32ba00

taylorkf requested a review from amalnanavati

September 11, 2023 22:51

amalnanavati changed the base branch from main to ros2-devel

September 11, 2023 22:53

amalnanavati requested changes

View reviewed changes

Contributor

amalnanavati left a comment

Mostly looks good, see requested changes below.

I think we should test it in-person before merging, just to verify everything makes sense.

ada_feeding_perception/ada_feeding_perception/face_detection.py

-                                  depth_sum += closest_depth[int(point[1])][int(point[0])]
-                              depth = depth_sum / float(len(img_mouth_points))
+                              for x in range(u - 4, u + 4):
+                                  for y in range(v - 4, v + 4):

Contributor

amalnanavati Sep 11, 2023 •

edited

Loading

Hmm, what do you think about making this 4 be configureable by parameter? To allow us to easily tune it.

ada_feeding_perception/ada_feeding_perception/face_detection.py

-                              depth = depth_sum / float(len(img_mouth_points))
+                              for x in range(u - 4, u + 4):
+                                  for y in range(v - 4, v + 4):
+                                      depth_sum += closest_depth[int(x)][int(y)]

Contributor

amalnanavati Sep 11, 2023

Suggested change

      
                                    depth_sum += closest_depth[int(x)][int(y)]
          
                                    depth_sum += closest_depth[int(y)][int(x)]

Y and X should be swapped, right?

Contributor

amalnanavati Sep 11, 2023

Another suggestion: int(x) rounds down. But if the landmark point's position is e.g., 51.99, that means the detector thought it was closer to 52 than 51. To account for this I think we should round the float before casting it to an int.

ada_feeding_perception/ada_feeding_perception/face_detection.py

-                              depth = depth_sum / float(len(img_mouth_points))
+                              for x in range(u - 4, u + 4):
+                                  for y in range(v - 4, v + 4):
+                                      depth_sum += closest_depth[int(x)][int(y)]

Contributor

amalnanavati Sep 11, 2023 •

edited

Loading

Soo...as I was testing bite transfer, I began running into issues where the points used here are out-of-bounds of the depth image. What I realized is that the LBF Landmark detector can detect a face even if it is only partially in the image. If it does, it extrapolates where the points outside of the image are. So landmark points outputted by that detector may be out-of-bounds of the image.

Therefore, can you add a check here for whether y and x are in-bounds? Maybe keep track of the number of (x,y) pairs that are out of bounds, and if it is greater than a threshold proportion (0.5? perhaps configureable by parameter?) then it does not publish a depth (since our depth estimate would be unreliable).

ada_feeding_perception/ada_feeding_perception/face_detection.py

+                              for x in range(u - 4, u + 4):
+                                  for y in range(v - 4, v + 4):
+                                      depth_sum += closest_depth[int(x)][int(y)]
+                              depth = depth_sum / float(81)

Contributor

amalnanavati Sep 11, 2023 •

edited

Loading

I don't like this magic number. I know it comes from 9**2, but don't think it is intuitive to readers.

Given that some of the (x,y) pairs will get rejected for being out-of-bounds anyway, we can't assume that all 81 points will be in the depth_sum. Therefore, I'd just recommend enumerating the points in the depth_sum, and using that number to do the average.

ada_feeding_perception/ada_feeding_perception/face_detection.py

-                              img_mouth_center = landmarks[largest_face[1]][0][66]
-                              img_mouth_points = landmarks[largest_face[1]][0][48:68]
+                              # Find marker between eyes. This is used to estimate stomion depth in the
+                              # case that the stomion is hidden behind the fork.

Contributor

amalnanavati Sep 11, 2023

Nit: this is not done "in the case that the stomion is hidden behind the fork", it is done in all cases :)

Perhaps reword to "This is used as a proxy to estimate stomion depth, because the stomion is often hidden behind the fork."

Contributor

amalnanavati commented Sep 13, 2023

Also, I just realized that this code returns the entire 3d point for the of the bridge of the node. It shouldn't do that -- the x and y should still be the stomion x and y, and the depth should be for the bridge of the nose.

However, I just realized a problem with the assumption that the depth of the bridge is the depth of the stomion: it depends on the camera pose relative to the face (because depth is in the camera frame). For example, if the RealSense is below the mouth, then the bridge of the nose will inherently be farther from the camera than the stomion is. The assumption only really holds true if the camera is half-way between the bridge and stomion, facing the user's face.

Thoughts @egordon ? The above is how the old code did it, but given that we'll be playing around with staging location, maybe we want a more reliable approach?

Collaborator

egordon commented Oct 4, 2023

@amalnanavati Relaying out discussion on the potential solution to this:

Outlier rejection: remove face points that are too close (i.e. right around the fork).

If there are at least a few mouth points remaining, average those to get the stommion depth.

If not: fit a plane to the rest of the face points (i.e. depth = np.dot(theta, [u; v; 1]), solve for theta), then use np.dot(theta, [u_stommion, v_stommion, 1]) as the depth for the mouth.

Contributor

amalnanavati commented Nov 14, 2023

Closed by #130

amalnanavati closed this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet