Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hand keypoint extraction #64

Closed
AndrGolubkov opened this issue Aug 29, 2019 · 16 comments
Closed

Hand keypoint extraction #64

AndrGolubkov opened this issue Aug 29, 2019 · 16 comments
Assignees
Labels
type:feature Enhancement in the New Functionality or Request for a New Solution type:support General questions

Comments

@AndrGolubkov
Copy link

Is it possible using the Hand Tracking (GPU) example to extract not an video, but an array of keypoints? Perhaps I didn’t carefully read the documentation and considered the example, I apologize in advance.

@ajinkyapuar
Copy link

ajinkyapuar commented Aug 29, 2019

Is it possible using the Hand Tracking (GPU) example to extract not an video, but an array of keypoints? Perhaps I didn’t carefully read the documentation and considered the example, I apologize in advance.

@AndrGolubkov Have you tried this?
https://github.com/google/mediapipe/tree/master/mediapipe/models
https://www.tensorflow.org/lite/guide/inference

@AndrGolubkov
Copy link
Author

@ajinkyapuar Yes, I noticed that the models are available. But here the question is precisely in obtaining an array of coordinates instead of the rendered output

@MedlarTea
Copy link

@AndrGolubkov It can't output precise coordinates!! You can see the graph below carefully:
https://mediapipe.readthedocs.io/en/latest/hand_tracking_mobile_gpu.html
You can find that the 2D keypoints location output are based uv-coordinate.
And in the file /mediapipe/tflite/tflite_tensors_to_landmarks_calculator.proto, you will find that the output z-coordinate is normalized, which don't have real scale.

@fanzhanggoogle
Copy link

@ajinkyapuar Yes, I noticed that the models are available. But here the question is precisely in obtaining an array of coordinates instead of the rendered output

If you look at the hand tracking graph as @MedlarTea mentioned, you can find the landmarks are output from the HandLandmarkSubgraph at stream "LANDMARKS:hand_landmarks".
You can find the definition of landmark here.
And @MedlarTea is correct about the scale that the output landmarks is in the image coordinates.

@chuoling
Copy link
Contributor

chuoling commented Sep 7, 2019

@AndrGolubkov
Are you asking about getting the landmark coordinates in C++ (e.g., to be used in another calculator), or getting them in Android to be consumed in the Android application?

@garam-kim1
Copy link

@chuoling
How to get "LANDMARKS:hand_landmarks" in Android?

I ran below code, becuase "LANDMARKS:hand_landmarks" is vector of proto, but failed.

processor.getGraph().addPacketCallback("hand_landmarks", new PacketCallback() {
  @Override
  public void process(Packet packet) {
    PacketGetter.getVectorOfPackets(packet);
  }
});

And I think a function to get type of packet is necessary.

@AndrGolubkov
Copy link
Author

@chuoling I am interested in using and getting key points on iOS.

@oishi89
Copy link

oishi89 commented Sep 11, 2019

I have the same question. But I'm wondering if the input is a 2d image then it's so hard to extract a 3D coordinator. Unless the input is a depth image containing depth data.

@metalwhale
Copy link

@MedlarTea
But theoretically, it is possible to precisely extract keypoints by using hand landmark model file combined with MediaPipe, isn't it?
I mean, if this was not possible, so why the rendered video can denote those landmarks so exactly like that?

@fanzhanggoogle
Copy link

I have the same question. But I'm wondering if the input is a 2d image then it's so hard to extract a 3D coordinator. Unless the input is a depth image containing depth data.

Hi @oishi89 ,
The model takes in RGB only and output 3D coordinates. We trained out model jointly with synthetic data which has 3D coordinates and the model was able to generalize the z-coord to real images (although it's not perfect yet, we are actively working on it). You can read the Hand Landmark Model session in our blogpost for more detail.

@chuoling
Copy link
Contributor

@AndrGolubkov @astinmiura
We'll look into how to best access such information in the iOS/Java API, and provide an example in the next release.

@AndrGolubkov
Copy link
Author

@chuoling Thank you very much, that would be great

@Hemanth715
Copy link

@chuoling Thank you, we were hoping for such an API when we first read about this project. We all would appreciate this.

@mgyong mgyong added the type:feature Enhancement in the New Functionality or Request for a New Solution label Sep 18, 2019
@fanzhanggoogle fanzhanggoogle removed the legacy:hands Hand tracking/gestures/etc label Sep 27, 2019
@mgyong
Copy link

mgyong commented Nov 4, 2019

@Hemanth715 @AndrGolubkov @astinmiura Before such an API is available, we have an intermediate solution in C++. See issue from #200 where we have example of Normalizedlandmark protos

@mgyong mgyong closed this as completed Nov 4, 2019
@mgyong
Copy link

mgyong commented Dec 3, 2019

@AndrGolubkov @astinmiura Fixed in v0.6.6 Pls check it out and let us know

@faizahmed618
Copy link

Is there any way to extract the keypoint in python so that i can use these in the VR project ? Thankyou :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:feature Enhancement in the New Functionality or Request for a New Solution type:support General questions
Projects
None yet
Development

No branches or pull requests