You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm preparing COCO keypoint annotations and dataset specific interface file to train on COCO. I've done mostly except one issue. MPII dataset provides head bbox for each person which COCO doesn't have. To overcome this I found a work around:
If shoulders are visible use distance between shoulders to define head bbox, else use person bbox and ideal human body ratio to define head bbox.
But this approach is erroneous when the image doesn't contain whole body.
Is there any advice to properly define the head size?
Below code snippet belongs to src/misc/convert_annot.py file:
# Find shoulder coordinates
left_shoulder = (ann['keypoints'][0::3][6], ann['keypoints'][1::3][6])
right_shoulder = (ann['keypoints'][0::3][5], ann['keypoints'][1::3][5])
# If shoulders not visible then approximate head bbox with person bbox values
if left_shoulder == (0,0) or right_shoulder == (0,0):
diff = np.array([ann['bbox'][3]/7.5, ann['bbox'][2]/7.5], np.float)
normalization = np.linalg.norm(diff) * .6
# If shoulders are visible define head bbox according to dist between shoulders
else:
dist = math.sqrt((right_shoulder[0] - left_shoulder[0])**2 + (right_shoulder[1] - left_shoulder[1])**2)
diff = np.array([dist/2, dist/1.5], np.float)
normalization = np.linalg.norm(diff) * .6
annot['normalize'] += [normalization]
The text was updated successfully, but these errors were encountered:
This is a tough issue, especially on the COCO data. The way that normalization is done during the official COCO evaluation is by pixel area of the person's segmentation mask, but that can be a fairly inconsistent indication of person size.
An alternative is to compute all possible limb lengths given a particular set of ground truth keypoints and compare these to an average baseline for each limb. The relative ratio will give an indication of the person's size, and by computing the ratio across all annotated limbs there will be some robustness if some limbs happen to be foreshortened.
No matter what you end up doing, the ground truth "size" will be pretty noisy. The best thing you can do is play around with different ideas and visually inspect to see what leads to the most reliable cropping of input figures. The network should have the capacity to learn some deal of scale invariance, and it is worth adding scale data augmentation during training anyways.
Hope that helps a bit, and let me know if you figure out a more reliable measure of scale.
Hi @anewell ,
First of all thank you for sharing your code.
I'm preparing COCO keypoint annotations and dataset specific interface file to train on COCO. I've done mostly except one issue. MPII dataset provides head bbox for each person which COCO doesn't have. To overcome this I found a work around:
But this approach is erroneous when the image doesn't contain whole body.
Is there any advice to properly define the head size?
Below code snippet belongs to
src/misc/convert_annot.py
file:The text was updated successfully, but these errors were encountered: