Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

💡Idea: Mosaic cropping using segmentation labels #2151

Closed
glenn-jocher opened this issue Feb 6, 2021 · 20 comments · Fixed by #2188
Closed

💡Idea: Mosaic cropping using segmentation labels #2151

glenn-jocher opened this issue Feb 6, 2021 · 20 comments · Fixed by #2188
Assignees
Labels
enhancement New feature or request

Comments

@glenn-jocher
Copy link
Member

I had an idea today! COCO supplies segmentation annotations for every instance, but we don't use them. I realized it might be useful though to have access to these annotations in the dataloader because they can help re-label cropped objects more accurately. The current mosaic loader will translate/augment images and adjust their labels accordingly, but depending on the shape of the object this may produce suboptimal results (see below).

Re-labelling the augmented images based on their cropped segmentation labels rather than their cropped box labels would likely produce more desirable bounding boxes. The benefit is not possible to quantize without actually implementing the idea though, which seems to be a very complicated task, and unfortunately the benefit would only be available to datasets with accompanying segmentation labels.

Has anyone tried this, or does anyone have a segmentation-capable version of the YOLOv5 dataloader available?

Screen Shot 2021-02-06 at 1 16 28 PM

@glenn-jocher glenn-jocher added the enhancement New feature or request label Feb 6, 2021
@glenn-jocher glenn-jocher self-assigned this Feb 6, 2021
@WongKinYiu
Copy link

Hello,

We have done this by using pycocotools. All you need is:

from pycocotools.coco import COCO
from pycocotools import mask as maskUtils

Following, you need get annotations of each image.

coco_info = COCO("your_train_or_val_json")
for img_id in coco_info.getImgIds():
    # you could choose `iscrowd=True` or is `iscrowd=False` or both for each image
    anns_ids = coco_info.getAnnIds(img_id, iscrowd=False)
    # annotation info include `box`, `segmentation`, `area`... will be here
    anns = coco_info.loadAnns(anns_ids)
    # image info include `file_name`, `width`, `height`... will be here
    img = coco_info.loadImgs(int(img_id))[0]

Now, you could process annotation of each image

img_file_name = img["file_name"]
img_height = img["height"]
img_width = img["width"]
for ann in anns:
    # you could use ann["area"] to ignore small or large objects here
    # you may need `coco91_to_80` to convert category id for yolov5 style annotation
    ann_class = ann["category_id"]
    ann_bbox = ann["bbox"]
    ann_segm = ann['segmentation']
    # you could normalize x,y coordinate to ratio by using `img_height` and `img_width` here for yolov5 style annotation
    # also you may want to save your annotation file corresponding to img_file_name here

There are three cases for segmentation info:

  1. an object is separated into several parts
  2. an object has only one part
  3. segmentation already be compressed
# case 1
if type(ann_segm) is list:
    rles = maskUtils.frPyObjects(ann_segm , img_height , img_width)
    rle = maskUtils.merge(rles)
# case 2
elif type(ann_segm['counts']) is list:
    rle = maskUtils.frPyObjects(ann_segm , img_height , img_width)
# case 3
else:
    rle = ann_segm['segmentation']

# again, use pycocotools to get the binary mask
ann_mask = maskUtils.decode(rle)

Now you get the annotation mask, the annotation mask is a binary mask with same resolution as image.
You could do:

  1. perspective transformation to annotation mask (include rotate, shear...)
  2. get bonding box on the fly (by calculating min max x y)
  3. do instance segmentation and semantic segmentation tasks
  4. copy and paste augmentation (https://arxiv.org/abs/2012.07177)
  5. ...

I am really sure this case is the main reason which makes mosaic9 get worse results than mosaic4.
By handling this problem, all of YOLOv4-P5, YOLOv4-P6, and YOLOv4-P7 improve about 0.5% AP on COCO.
image
You could also do image collage augmentation which automatic generate grid layout of sampled images without (or with least) cropping. For reference: https://github.com/adrienverge/PhotoCollage

@glenn-jocher glenn-jocher added the TODO High priority items label Feb 7, 2021
@glenn-jocher
Copy link
Member Author

@WongKinYiu good suggestions! Copy-paste augmentation looks like a good idea too. I've tried this in the paste with bounding boxes but the results were poor, I'm sure segmentation will help this substantially.

@WongKinYiu
Copy link

Analysis of mosaic augmentation:
Mosaic4 - about 2 of 4 images have crop issue (1/2)
image

Mosaic9 - about 6 of 9 images have crop issue (2/3)
image

@glenn-jocher
Copy link
Member Author

@WongKinYiu yes this is a good point, mosaic9 will have more crops on average than mosaic4. Ok I'm working an implementation that can leverage the segmentation masks to handle these crops better, I'll test this on the 4 models at 640 to see if it helps.

I've tried to make this extensible to other datasets so anyone with segmentation data can also benefit.

@glenn-jocher glenn-jocher linked a pull request Feb 12, 2021 that will close this issue
@glenn-jocher glenn-jocher reopened this Feb 12, 2021
@github-actions
Copy link
Contributor

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the Stale Stale and schedule for closing soon label Mar 15, 2021
@Edwardmark
Copy link

@glenn-jocher I just check the cropped_part with the original gt, and if the IoU between the real bbox and the cropped box is less than 0.5, than discard it:

if len(labels4):
        labels4_org = labels4.copy()

        m1 = np.sum(labels4[:, 1:] < 0, axis=1)
        m2 = np.sum(labels4[:, 1:] > 2 * s, axis=1)
        np.clip(labels4[:, 1:], 0, 2 * s, out=labels4[:, 1:])
        m_overlap = jaccard_numpy(labels4_org[:, 1:], labels4[:, 1:]) >= 0.5
        m_border = np.invert((m1 + m2).astype(bool))
        labels4 = labels4[m_border * m_overlap, :]

Is that right? Looking forward to your comment.

@glenn-jocher
Copy link
Member Author

@Edwardmark is this custom code that you've written? We have box candidate criteria that are used to filter labels for use in training, including the percent of area lost during augmentation here:

yolov5/utils/datasets.py

Lines 924 to 926 in d4456e4

# filter candidates
i = box_candidates(box1=targets[:, 1:5].T * s, box2=new.T, area_thr=0.01 if use_segments else 0.10)
targets = targets[i]

The box_candidates() function itself is here:

yolov5/utils/datasets.py

Lines 932 to 938 in d4456e4

def box_candidates(box1, box2, wh_thr=2, ar_thr=20, area_thr=0.1, eps=1e-16): # box1(4,n), box2(4,n)
# Compute candidate boxes: box1 before augment, box2 after augment, wh_thr (pixels), aspect_ratio_thr, area_ratio
w1, h1 = box1[2] - box1[0], box1[3] - box1[1]
w2, h2 = box2[2] - box2[0], box2[3] - box2[1]
ar = np.maximum(w2 / (h2 + eps), h2 / (w2 + eps)) # aspect ratio
return (w2 > wh_thr) & (h2 > wh_thr) & (w2 * h2 / (w1 * h1 + eps) > area_thr) & (ar < ar_thr) # candidates

By default it will reject any box that has lost > 90% of its area (adjusted for scale augmentation) during the augmentation performed in random_perspective(). It's possible for boxes to lose area also in load_mosaic(), though we do not currently filter there (this has been proposed in the past by a different user by also applying box_candidates in load_mosaic()).

@Edwardmark
Copy link

Edwardmark commented Mar 26, 2021

@glenn-jocher Yes, it is my custom code. I think we should handle area loss in load_mosaic. Is 90% too much? For example, if we loss 50% of a person, then it may only contains the legs of the person, it is not a person technically.

@glenn-jocher
Copy link
Member Author

@Edwardmark yes maybe that's a good idea! You could try running box_candidates() in the mosaic function as well as in random_perspective(), as it's possible for objects to reduce in quality during both steps. If you'd like to submit a PR based on that modification I can try running some quick trainings (VOC YOLOv5s 50 epochs, baseline scenario from the Google Colab Notebook VOC section) to quantify the difference.

@glenn-jocher glenn-jocher removed Stale Stale and schedule for closing soon TODO High priority items labels May 21, 2021
@glenn-jocher
Copy link
Member Author

Removing TODO as this has now been implemented.

@GMN23362
Copy link

I had an idea today! COCO supplies segmentation annotations for every instance, but we don't use them. I realized it might be useful though to have access to these annotations in the dataloader because they can help re-label cropped objects more accurately. The current mosaic loader will translate/augment images and adjust their labels accordingly, but depending on the shape of the object this may produce suboptimal results (see below).

Re-labelling the augmented images based on their cropped segmentation labels rather than their cropped box labels would likely produce more desirable bounding boxes. The benefit is not possible to quantize without actually implementing the idea though, which seems to be a very complicated task, and unfortunately the benefit would only be available to datasets with accompanying segmentation labels.

Has anyone tried this, or does anyone have a segmentation-capable version of the YOLOv5 dataloader available?

Screen Shot 2021-02-06 at 1 16 28 PM

Hi, may I ask where can we find the PPT in this issue? I haven't found that on the website of ultralytics.

@glenn-jocher
Copy link
Member Author

@GMN23362 slides are internal and not publically available.

@bit-scientist
Copy link

Hi, @glenn-jocher, I would like to use copy-paste data augmentation. How should I proceed with training when I have segmentation labels as below?:

class x1, y1, x2, y2, x3, y3, ... xn, yn
class x1, y1, x2, y2, x3, y3, ... xn, yn
class x1, y1, x2, y2, x3, y3, ... xn, yn

Is python path/to/train.py --data coco128.yaml --weights yolov5s.pt --img 640 where coco128.yaml points to labels.txt file enough ? I mean, does train.py automatically infer bbox coordinations for object detection as well?

@glenn-jocher
Copy link
Member Author

@bit-scientist yes.

@bit-scientist
Copy link

bit-scientist commented Sep 20, 2022

@glenn-jocher I'm finding it difficult to make my masks to come to x1, y1, x2, y2, x3, y3, ... xn, yn format. What is meant by segment in?

yolov5/utils/general.py

Lines 287 to 293 in ad05e37

def segment2box(segment, width=640, height=640):
# Convert 1 segment label to 1 box label, applying inside-image constraint, i.e. (xy1, xy2, ...) to (xyxy)
x, y = segment.T # segment xy
inside = (x >= 0) & (y >= 0) & (x <= width) & (y <= height)
x, y, = x[inside], y[inside]
return np.array([x.min(), y.min(), x.max(), y.max()]) if any(x) else np.zeros((1, 4)) # cls, xyxy

I mean what format is the segment expected to be?

The coco format for segmentation creates one mask image per image, right? How do I then convert it to generate x1, y1, x2, y2, x3, y3, ... xn, yn?

EDIT: I'm sorry, it does have segmentation points in the json file in the form:
"segmentation": [[1454.1, 647.93, 1529.96, 557.96, 1582.88, 499.75, 1600.52, 392.14, 1669.32, 349.8 ]].
So, in this case it should be:

class 1454.1, 647.93, 1529.96, 557.96, 1582.88, 499.75, 1600.52, 392.14, 1669.32, 349.8.

Does it make sense now?

@glenn-jocher
Copy link
Member Author

@bit-scientist for a segmentation dataset just run segment/train.py usage examples:

python segment/train.py --data coco128-seg.yaml

@bit-scientist
Copy link

@glenn-jocher, I don't need it for segmentation task 😃, I'd like to augment the data with --copy-paste functionality.

@glenn-jocher
Copy link
Member Author

@bit-scientist this should work for segmentation, you just update the hyp here:

copy_paste: 0.0 # segment copy-paste (probability)

@tino926
Copy link
Contributor

tino926 commented Jul 27, 2023

@glenn-jocher

Hi, I am not confident in my understanding of your codes. Please correct me if I am wrong.

  1. With YOLOv5, one can use segmentation data to train a detection model.
  2. YOLOv5 automatically checks if the annotation of one object is a mask or bounding box (perhaps by the length?).
  3. In COCO's segmentation annotation, one object may have two separate parts. However, in YOLOv5's segmentation, one segmentation can only have one continuous part.

If my understanding is correct, can you please explain how you convert COCO annotation to YOLOv5's annotation for objects with two separate parts?

@glenn-jocher
Copy link
Member Author

@tino926 hi,

In YOLOv5, you can indeed use segmentation data to train a detection model. To handle both segmentation masks and bounding boxes, YOLOv5 automatically detects the format of the annotation based on its structure.

If an object in COCO's segmentation annotation consists of two separate parts, YOLOv5's annotation expects one continuous mask for each object. To convert COCO annotation to YOLOv5's format in such cases, you would need to merge the two separate parts into one continuous mask before using it in YOLOv5.

I hope this clarifies how the conversion is handled. If you have any further questions, please let me know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants