Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training on MSCOCO keypoint dataset #39

Open
mkocabas opened this issue Sep 7, 2017 · 2 comments
Open

Training on MSCOCO keypoint dataset #39

mkocabas opened this issue Sep 7, 2017 · 2 comments

Comments

@mkocabas
Copy link

mkocabas commented Sep 7, 2017

Hi @anewell ,

First of all thank you for sharing your code.

I'm preparing COCO keypoint annotations and dataset specific interface file to train on COCO. I've done mostly except one issue. MPII dataset provides head bbox for each person which COCO doesn't have. To overcome this I found a work around:

  • If shoulders are visible use distance between shoulders to define head bbox, else use person bbox and ideal human body ratio to define head bbox.

But this approach is erroneous when the image doesn't contain whole body.

Is there any advice to properly define the head size?

Below code snippet belongs to src/misc/convert_annot.py file:

# Find shoulder coordinates
left_shoulder = (ann['keypoints'][0::3][6], ann['keypoints'][1::3][6])
right_shoulder = (ann['keypoints'][0::3][5], ann['keypoints'][1::3][5])
            
# If shoulders not visible then approximate head bbox with person bbox values
if left_shoulder == (0,0) or right_shoulder == (0,0):
    diff = np.array([ann['bbox'][3]/7.5, ann['bbox'][2]/7.5], np.float)
    normalization = np.linalg.norm(diff) * .6

# If shoulders are visible define head bbox according to dist between shoulders
else:
    dist = math.sqrt((right_shoulder[0] - left_shoulder[0])**2 + (right_shoulder[1] - left_shoulder[1])**2)
    diff = np.array([dist/2, dist/1.5], np.float)
    normalization = np.linalg.norm(diff) * .6

annot['normalize'] += [normalization]
@anewell
Copy link
Collaborator

anewell commented Sep 11, 2017

Hi @mkocabas

This is a tough issue, especially on the COCO data. The way that normalization is done during the official COCO evaluation is by pixel area of the person's segmentation mask, but that can be a fairly inconsistent indication of person size.

An alternative is to compute all possible limb lengths given a particular set of ground truth keypoints and compare these to an average baseline for each limb. The relative ratio will give an indication of the person's size, and by computing the ratio across all annotated limbs there will be some robustness if some limbs happen to be foreshortened.

No matter what you end up doing, the ground truth "size" will be pretty noisy. The best thing you can do is play around with different ideas and visually inspect to see what leads to the most reliable cropping of input figures. The network should have the capacity to learn some deal of scale invariance, and it is worth adding scale data augmentation during training anyways.

Hope that helps a bit, and let me know if you figure out a more reliable measure of scale.

@arnitkun
Copy link

arnitkun commented Jun 12, 2018

@anewell, @mkocabas

what tool did you use to annotate the keypoints of a human body?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants