Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changed depth perception to point around nose bridge #87

Closed
wants to merge 1 commit into from

Conversation

taylorkf
Copy link
Contributor

@taylorkf taylorkf commented Sep 11, 2023

Description

This PR addresses issue #67. Occasionally, the fork is positioned in front of the user's mouth while face detection runs. This can lead to the camera detecting the depth of the fork rather than the user's mouth. This pull request updates the depth perception to a 9x9 square around the bridge of the nose, which does not get covered by the fork.

Testing procedure

Test and run using the same procedure as #36 (comment). I have not run this code on the robot yet.

Before opening a pull request

  • Format your code using black formatter python3 -m black .
  • Run your code through pylint and address all warnings/errors. The only warnings that are acceptable to not address is TODOs that should be addressed in a future PR. From the top-level ada_feeding directory, run: pylint --recursive=y --rcfile=.pylintrc ..

Before Merging

  • Squash & Merge

@amalnanavati amalnanavati changed the base branch from main to ros2-devel September 11, 2023 22:53
Copy link
Contributor

@amalnanavati amalnanavati left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly looks good, see requested changes below.

I think we should test it in-person before merging, just to verify everything makes sense.

depth_sum += closest_depth[int(point[1])][int(point[0])]
depth = depth_sum / float(len(img_mouth_points))
for x in range(u - 4, u + 4):
for y in range(v - 4, v + 4):
Copy link
Contributor

@amalnanavati amalnanavati Sep 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, what do you think about making this 4 be configureable by parameter? To allow us to easily tune it.

depth = depth_sum / float(len(img_mouth_points))
for x in range(u - 4, u + 4):
for y in range(v - 4, v + 4):
depth_sum += closest_depth[int(x)][int(y)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
depth_sum += closest_depth[int(x)][int(y)]
depth_sum += closest_depth[int(y)][int(x)]

Y and X should be swapped, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another suggestion: int(x) rounds down. But if the landmark point's position is e.g., 51.99, that means the detector thought it was closer to 52 than 51. To account for this I think we should round the float before casting it to an int.

depth = depth_sum / float(len(img_mouth_points))
for x in range(u - 4, u + 4):
for y in range(v - 4, v + 4):
depth_sum += closest_depth[int(x)][int(y)]
Copy link
Contributor

@amalnanavati amalnanavati Sep 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Soo...as I was testing bite transfer, I began running into issues where the points used here are out-of-bounds of the depth image. What I realized is that the LBF Landmark detector can detect a face even if it is only partially in the image. If it does, it extrapolates where the points outside of the image are. So landmark points outputted by that detector may be out-of-bounds of the image.

Therefore, can you add a check here for whether y and x are in-bounds? Maybe keep track of the number of (x,y) pairs that are out of bounds, and if it is greater than a threshold proportion (0.5? perhaps configureable by parameter?) then it does not publish a depth (since our depth estimate would be unreliable).

for x in range(u - 4, u + 4):
for y in range(v - 4, v + 4):
depth_sum += closest_depth[int(x)][int(y)]
depth = depth_sum / float(81)
Copy link
Contributor

@amalnanavati amalnanavati Sep 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like this magic number. I know it comes from 9**2, but don't think it is intuitive to readers.

Given that some of the (x,y) pairs will get rejected for being out-of-bounds anyway, we can't assume that all 81 points will be in the depth_sum. Therefore, I'd just recommend enumerating the points in the depth_sum, and using that number to do the average.

img_mouth_center = landmarks[largest_face[1]][0][66]
img_mouth_points = landmarks[largest_face[1]][0][48:68]
# Find marker between eyes. This is used to estimate stomion depth in the
# case that the stomion is hidden behind the fork.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: this is not done "in the case that the stomion is hidden behind the fork", it is done in all cases :)

Perhaps reword to "This is used as a proxy to estimate stomion depth, because the stomion is often hidden behind the fork."

@amalnanavati
Copy link
Contributor

Also, I just realized that this code returns the entire 3d point for the of the bridge of the node. It shouldn't do that -- the x and y should still be the stomion x and y, and the depth should be for the bridge of the nose.

However, I just realized a problem with the assumption that the depth of the bridge is the depth of the stomion: it depends on the camera pose relative to the face (because depth is in the camera frame). For example, if the RealSense is below the mouth, then the bridge of the nose will inherently be farther from the camera than the stomion is. The assumption only really holds true if the camera is half-way between the bridge and stomion, facing the user's face.

Thoughts @egordon ? The above is how the old code did it, but given that we'll be playing around with staging location, maybe we want a more reliable approach?

@egordon
Copy link
Collaborator

egordon commented Oct 4, 2023

@amalnanavati Relaying out discussion on the potential solution to this:

Outlier rejection: remove face points that are too close (i.e. right around the fork).

If there are at least a few mouth points remaining, average those to get the stommion depth.

If not: fit a plane to the rest of the face points (i.e. depth = np.dot(theta, [u; v; 1]), solve for theta), then use np.dot(theta, [u_stommion, v_stommion, 1]) as the depth for the mouth.

@amalnanavati
Copy link
Contributor

Closed by #130

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants