Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues, notes and documentation while testing on the CMU dataset, using volumetric model #75

Open
Samleo8 opened this issue May 18, 2020 · 14 comments

Comments

@Samleo8
Copy link

Samleo8 commented May 18, 2020

Following the instructions at issue #24 and #19, I was able to successfully test on the CMU Panoptic Dataset using the provided pretrained Human36M weights (more specifics here) on the volumetric models, with a snapshot of some of the results below:
heatmaps0
keypoints_vis0

Issues
However, despite following all 4 pointers in #24, I still have issues with the problems with some of the keypoint detections (especially with the predictions the lower body being completely off).
0019

Is it possible that the pretrained (H36M) model is unable to handle cases where the lower body is truncated, and thus results in the wrong predictions above?

Notes/Documentation
To those who would like to recreate the results and evaluate on the CMU dataset, note there are many changes that need to be made. I list the important ones below; check my forked repository for the rest.

  1. You will need to create your own custom CMUPanopticDataset class, similar to the Human36MMultiviewDataset class in mvn/datasets/human36m.py. You will also need the ground truth BBOXes in the link in issue Creating new "ground truth" for several datasets #19, and generate your own labels file. If you are lazy, follow my pre-processing instructions here, but note that there may be missing documentation here and there.
  2. As noted in issue testing on the CMU Panoptic dataset #24, units are a big issue. CMU keypoints are in mm while Human36M are in cm. Note that since the model was trained on the Human36M, the predicted keypoints and the ground truth keypoints need to be "synced" by appropriate scaling factors.
  3. UPDATE: If like me, you used the volumetric model without first running on the algebraic model, you need to specify use_gt_pelvis to be true in the yaml config file.

For those who are interested, I have updated the documentation in my repository at https://github.com/Samleo8/learnable-triangulation-pytorch.

@yurymalkov
Copy link

It seems like there is something wrong with world coordinates.

The model usually learns that legs are close to the ground if no other info is there. By looking at the pictures it does not seem to be the case, so probably there is bug with coordinates conversion between CMU and Human36.

@Samleo8
Copy link
Author

Samleo8 commented May 18, 2020

Hi thanks for the reply!

Uh, what do you mean by coordinates conversion? I believe that the world coordinates were properly converted by when I set the scaling factor and changed the world axes, according to the pointers in #24?

My current hypothesis is that the model is unable to guess joints which are "out of the picture" (leg joints that are missing), and so the the heatmaps for those particular joints are either non-existent, or the model guesses that the person is kneeling or sitting instead.

@yurymalkov
Copy link

@Samleo8 I would have double-checked that everything is the same. As far as I remember, z-axis has different sign in CMU and Humans3.6M and at some point we had a bug in this part and saw somewhat similar behavior.
I can imagine the hypothesis to be the case, but I would expect of the model to give default coordinates of foots (e.g. close to the ground).

@Samleo8
Copy link
Author

Samleo8 commented May 18, 2020

@Samleo8 I would have double-checked that everything is the same. As far as I remember, z-axis has different sign in CMU and Humans3.6M and at some point we had a bug in this part and saw somewhat similar behavior.

Hi thanks again for the reply! You are right about the z-axis having a different sign, and indeed I saw this in triangulation.py, and made sure it was triggered, unless of course I am missing something else as well?

I can imagine the hypothesis to be the case, but I would expect of the model to give default coordinates of foots (e.g. close to the ground).

You are actually right: In most cases the model chose the feet to be closer to the floor (see below). The example I gave was a bad one as it was an "anomaly" compared to the rest.
0001
0003
0012

@Samleo8
Copy link
Author

Samleo8 commented May 18, 2020

To confirm the hypothesis, I will try it out on cameras which are able to capture the full body (i.e. no truncation), I'll let you know how it goes!

@Samleo8
Copy link
Author

Samleo8 commented May 18, 2020

I have tried it out on camera views which capture the full body. Unfortunately, because of that the perspectives are a bit more "birds-eye view" than the frontal view. The results are below:
image
image

It is noted that this time, the keypoints are even more off. Could it be because the model is not used to views from such an angle?

@Samleo8
Copy link
Author

Samleo8 commented May 18, 2020

It is noted that this time, the keypoints are even more off. Could it be because the model is not used to views from such an angle?

Apparently, the model is robust against different angles. It seems that the issue is due to some of the cameras being faulty.

To confirm the hypothesis, I will try it out on cameras which are able to capture the full body (i.e. no truncation), I'll let you know how it goes!

With all cameras capturing full pose, preliminary results seem to suggest that the model works well on the CMU Dataset as well! The hypothesis about the lack of full-body pose seems to be correct.

It would be good to train the model so that it knows what to do with occluded body parts.

Some of the results are shown below:
image

image

@yurymalkov
Copy link

@Samleo8 I am a bit confused. Are you using algebraic or volumetric models?

@Samleo8
Copy link
Author

Samleo8 commented May 19, 2020

Oh, sorry I didn't make it clearer; I've since updated the title.

I'm using the volumetric model, but didnt use the algebraic model to first predict the pelvis positions. Because of this, the use_gt_pelvis flag must be set to true for this to work.

@Samleo8 Samleo8 changed the title Issues, notes and documentation while testing on the CMU dataset Issues, notes and documentation while testing on the CMU dataset, using volumetric model May 19, 2020
@yurymalkov
Copy link

@Samleo8 I see. I wonder, how do you get the 2D heatmap distributions?

@Samleo8
Copy link
Author

Samleo8 commented May 19, 2020

Thanks for pointing this out, I didn't think much of it before!

Correct me if I am wrong, but the 2D heatmaps seem to come from the 2D backbone that is part of the volumetric model? The checkpoints for this backbone (human36m) were given as a pretrained weights , likely from the algebraic model?

~~Am I therefore right to say that in order to properly evaluate (and train) the CMU dataset, I need to first run it on the algebraic model to produce a 2D backbone with weights targeted towards the joints that CMU wants? ~~

If you are wondering how I visualized the heatmaps, they were part of the visualize_heatmap code that was already shipped with the repository

@karfly
Copy link
Owner

karfly commented May 20, 2020

Hi, @Samleo8!
You’re correct about heatmaps.
As a backbone for CMU we used a model, pretrained on COCO dataset (from here https://github.com/microsoft/human-pose-estimation.pytorch/blob/master/README.md). You still need to evaluate Algebraic model to get positions for 3D cubes.

@karfly
Copy link
Owner

karfly commented May 20, 2020

Looking at the images above, I think there can be 3 possible problems:

  1. Wrong location of the cube. Maybe human doesn’t fully fit into the cube
  2. Something wrong with coordinate system. It differs a lot from Human3.6M’s, so you’d better carefully double check that.
  3. Something wrong with camera parameters (extrinsics and intrinsics).

@Samleo8
Copy link
Author

Samleo8 commented May 21, 2020

Hi @karfly thanks for the reply. Is this to answer the above comment #75 (comment) or problem with partially occluded body in #76 ?

  1. This is quite possible, especially considering that I realised the "gt" pelvis may have been referring to the wrong base point index. I'll double check on that.

  2. I've ensured that the parts of your code where you fixed the coordinate system issue are being used in triangulation.py, and also double checked, so this should be fine.

image

  1. There seems to be some issue with this particular camera's intrinsics as you can see from the failed projection. I've since ignored this camera (camera 29).

For now, the model is being trained on the CMU dataset (but possible issue #77) and seems to be doing well if the Tensorboard images are anything to go by; we'll see how that goes!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants