-
-
Notifications
You must be signed in to change notification settings - Fork 16.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
💡Idea: Mosaic cropping using segmentation labels #2151
Comments
Hello, We have done this by using pycocotools. All you need is: from pycocotools.coco import COCO
from pycocotools import mask as maskUtils Following, you need get annotations of each image. coco_info = COCO("your_train_or_val_json")
for img_id in coco_info.getImgIds():
# you could choose `iscrowd=True` or is `iscrowd=False` or both for each image
anns_ids = coco_info.getAnnIds(img_id, iscrowd=False)
# annotation info include `box`, `segmentation`, `area`... will be here
anns = coco_info.loadAnns(anns_ids)
# image info include `file_name`, `width`, `height`... will be here
img = coco_info.loadImgs(int(img_id))[0] Now, you could process annotation of each image img_file_name = img["file_name"]
img_height = img["height"]
img_width = img["width"]
for ann in anns:
# you could use ann["area"] to ignore small or large objects here
# you may need `coco91_to_80` to convert category id for yolov5 style annotation
ann_class = ann["category_id"]
ann_bbox = ann["bbox"]
ann_segm = ann['segmentation']
# you could normalize x,y coordinate to ratio by using `img_height` and `img_width` here for yolov5 style annotation
# also you may want to save your annotation file corresponding to img_file_name here There are three cases for segmentation info:
# case 1
if type(ann_segm) is list:
rles = maskUtils.frPyObjects(ann_segm , img_height , img_width)
rle = maskUtils.merge(rles)
# case 2
elif type(ann_segm['counts']) is list:
rle = maskUtils.frPyObjects(ann_segm , img_height , img_width)
# case 3
else:
rle = ann_segm['segmentation']
# again, use pycocotools to get the binary mask
ann_mask = maskUtils.decode(rle) Now you get the annotation mask, the annotation mask is a binary mask with same resolution as image.
I am really sure this case is the main reason which makes mosaic9 get worse results than mosaic4. |
@WongKinYiu good suggestions! Copy-paste augmentation looks like a good idea too. I've tried this in the paste with bounding boxes but the results were poor, I'm sure segmentation will help this substantially. |
@WongKinYiu yes this is a good point, mosaic9 will have more crops on average than mosaic4. Ok I'm working an implementation that can leverage the segmentation masks to handle these crops better, I'll test this on the 4 models at 640 to see if it helps. I've tried to make this extensible to other datasets so anyone with segmentation data can also benefit. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
@glenn-jocher I just check the cropped_part with the original gt, and if the IoU between the real bbox and the cropped box is less than 0.5, than discard it:
Is that right? Looking forward to your comment. |
@Edwardmark is this custom code that you've written? We have box candidate criteria that are used to filter labels for use in training, including the percent of area lost during augmentation here: Lines 924 to 926 in d4456e4
The box_candidates() function itself is here: Lines 932 to 938 in d4456e4
By default it will reject any box that has lost > 90% of its area (adjusted for scale augmentation) during the augmentation performed in random_perspective(). It's possible for boxes to lose area also in load_mosaic(), though we do not currently filter there (this has been proposed in the past by a different user by also applying box_candidates in load_mosaic()). |
@glenn-jocher Yes, it is my custom code. I think we should handle area loss in load_mosaic. Is 90% too much? For example, if we loss 50% of a person, then it may only contains the legs of the person, it is not a person technically. |
@Edwardmark yes maybe that's a good idea! You could try running box_candidates() in the mosaic function as well as in random_perspective(), as it's possible for objects to reduce in quality during both steps. If you'd like to submit a PR based on that modification I can try running some quick trainings (VOC YOLOv5s 50 epochs, baseline scenario from the Google Colab Notebook VOC section) to quantify the difference. |
Removing TODO as this has now been implemented. |
@GMN23362 slides are internal and not publically available. |
Hi, @glenn-jocher, I would like to use
Is |
@bit-scientist yes. |
@glenn-jocher I'm finding it difficult to make my masks to come to Lines 287 to 293 in ad05e37
I mean what format is the segment expected to be?
The coco format for segmentation creates one mask image per image, right? How do I then convert it to generate EDIT: I'm sorry, it does have segmentation points in the json file in the form:
Does it make sense now? |
@bit-scientist for a segmentation dataset just run segment/train.py usage examples:
|
@glenn-jocher, I don't need it for segmentation task 😃, I'd like to augment the data with |
@bit-scientist this should work for segmentation, you just update the hyp here: yolov5/data/hyps/hyp.scratch-low.yaml Line 34 in 77dcf55
|
Hi, I am not confident in my understanding of your codes. Please correct me if I am wrong.
If my understanding is correct, can you please explain how you convert COCO annotation to YOLOv5's annotation for objects with two separate parts? |
@tino926 hi, In YOLOv5, you can indeed use segmentation data to train a detection model. To handle both segmentation masks and bounding boxes, YOLOv5 automatically detects the format of the annotation based on its structure. If an object in COCO's segmentation annotation consists of two separate parts, YOLOv5's annotation expects one continuous mask for each object. To convert COCO annotation to YOLOv5's format in such cases, you would need to merge the two separate parts into one continuous mask before using it in YOLOv5. I hope this clarifies how the conversion is handled. If you have any further questions, please let me know. |
I had an idea today! COCO supplies segmentation annotations for every instance, but we don't use them. I realized it might be useful though to have access to these annotations in the dataloader because they can help re-label cropped objects more accurately. The current mosaic loader will translate/augment images and adjust their labels accordingly, but depending on the shape of the object this may produce suboptimal results (see below).
Re-labelling the augmented images based on their cropped segmentation labels rather than their cropped box labels would likely produce more desirable bounding boxes. The benefit is not possible to quantize without actually implementing the idea though, which seems to be a very complicated task, and unfortunately the benefit would only be available to datasets with accompanying segmentation labels.
Has anyone tried this, or does anyone have a segmentation-capable version of the YOLOv5 dataloader available?
The text was updated successfully, but these errors were encountered: