Training on MSCOCO keypoint dataset #39

mkocabas · 2017-09-07T11:02:57Z

First of all thank you for sharing your code.

I'm preparing COCO keypoint annotations and dataset specific interface file to train on COCO. I've done mostly except one issue. MPII dataset provides head bbox for each person which COCO doesn't have. To overcome this I found a work around:

If shoulders are visible use distance between shoulders to define head bbox, else use person bbox and ideal human body ratio to define head bbox.

But this approach is erroneous when the image doesn't contain whole body.

Is there any advice to properly define the head size?

Below code snippet belongs to src/misc/convert_annot.py file:

# Find shoulder coordinates
left_shoulder = (ann['keypoints'][0::3][6], ann['keypoints'][1::3][6])
right_shoulder = (ann['keypoints'][0::3][5], ann['keypoints'][1::3][5])
            
# If shoulders not visible then approximate head bbox with person bbox values
if left_shoulder == (0,0) or right_shoulder == (0,0):
    diff = np.array([ann['bbox'][3]/7.5, ann['bbox'][2]/7.5], np.float)
    normalization = np.linalg.norm(diff) * .6

# If shoulders are visible define head bbox according to dist between shoulders
else:
    dist = math.sqrt((right_shoulder[0] - left_shoulder[0])**2 + (right_shoulder[1] - left_shoulder[1])**2)
    diff = np.array([dist/2, dist/1.5], np.float)
    normalization = np.linalg.norm(diff) * .6

annot['normalize'] += [normalization]

The text was updated successfully, but these errors were encountered:

anewell · 2017-09-11T16:34:25Z

Hi @mkocabas

This is a tough issue, especially on the COCO data. The way that normalization is done during the official COCO evaluation is by pixel area of the person's segmentation mask, but that can be a fairly inconsistent indication of person size.

An alternative is to compute all possible limb lengths given a particular set of ground truth keypoints and compare these to an average baseline for each limb. The relative ratio will give an indication of the person's size, and by computing the ratio across all annotated limbs there will be some robustness if some limbs happen to be foreshortened.

No matter what you end up doing, the ground truth "size" will be pretty noisy. The best thing you can do is play around with different ideas and visually inspect to see what leads to the most reliable cropping of input figures. The network should have the capacity to learn some deal of scale invariance, and it is worth adding scale data augmentation during training anyways.

Hope that helps a bit, and let me know if you figure out a more reliable measure of scale.

arnitkun · 2018-06-12T11:23:33Z

@anewell, @mkocabas

what tool did you use to annotate the keypoints of a human body?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training on MSCOCO keypoint dataset #39

Training on MSCOCO keypoint dataset #39

mkocabas commented Sep 7, 2017

anewell commented Sep 11, 2017

arnitkun commented Jun 12, 2018 •

edited

Loading

Training on MSCOCO keypoint dataset #39

Training on MSCOCO keypoint dataset #39

Comments

mkocabas commented Sep 7, 2017

anewell commented Sep 11, 2017

arnitkun commented Jun 12, 2018 • edited Loading

arnitkun commented Jun 12, 2018 •

edited

Loading