-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Support Binary Mask with transparent SegmentationMask interface #473
Conversation
|
||
def convert_to_polygon(self): | ||
contours = self._findContours() | ||
return PolygonList(contours, self.size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not generally true. Consider a donut shape mask, the contours would fail to represent the real shape.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please refer to L139:
contour, hierarchy = cv2.findContours(
mask, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_TC89_L1
)
This solves the donut problem:
- binary mask
- polygon using cv2.RETR_TREE
- polygon using cv2.RETR_EXTERNAL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you tested with a binary mask with multiple holes? I am not sure whether it is generalizable to all cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you give a counterexample, where you cannot select the outermost contour?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@botcs Can you convert the contours of a donut-shape object or object with holes found by cv2 back to the original binary mask via cocomask API? At least in my experience this is not trivial.
Besides, the function you provided has a minor problem with my opencv version.
It should be _, contour, hierarchy = cv2.findContours( mask, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_TC89_L1 )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed that part from the test since it did not change anything of the binary mask, which makes me think that the conversion is performed on both polygon entities but there are no signs why would be the inner hole's mask subtracted from the outer.
One way to come around this is to convert each polygon entities of the PolygonInstance using a N-ary XOR that would translate to 1 IFF the pixel is covered by a single polygon.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is OK, but maybe a warning or comments should be noted to users that if their dataset contains objects with holes, this conversion is somehow problematic and using binary_mask backend is preferred.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have now looked into the details of this polygon -> mask
function carried out by COCO API, and the following happens: each polygon entity (parts of a polygon instance) is converted to RLE and the RLE is converted to mask in the segmentation_mask.py L303
.
This whole unnecessary overhead could be eliminated by using cv2.fillPoly
, which would also handle the output of the Mask->Polygon
conversion in a very neat way. This modification could allow a probably lossless Mask<->Polygon
bi-directional conversion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@botcs I'd prefer not to have a dependency on OpenCV in the core training loop (I've seen reports in the past that it didn't mixed well with pytorch multiprocessing, even though it might not apply here).
About lossless conversion from Mask to Polygon, I don't think this is actually possible, because there are discretization artifacts introduced when we go from a Polygon to a Mask that can't be exactly recovered. That being said, it would definitely be nice to have a way to convert back from mask to polygon, but in this case, given that polygons take less space, wouldn't it be recommended to (almost) always use polygons instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fmassa I agree with the dependency issue. Should I import it only when the conversion is called and raise a warning, or remove the feature?
I agree, but apart from corner cases, in worst case the boundary pixels' coordinates would serve as the polygon representation. In reverse, if no shrinking or warping has been applied I think conversion is completely reversible.
given that polygons take less space, wouldn't it be recommended to (almost) always use polygons instead?
in theory yes, in practice I think it depends how often does the conversion happen, and the format of the original label. In synthetic datasets there are ridiculously rough surfaces (e.g. grass/tree occlusion) which would be expensive to convert and store and as long as polygon operations are not parallelized, masks could be dealt with more easily.
To sum up: In the current version polygon conversion returns the outer boundary only, no polygons with holes can be represented (nor in the previous version); on the other side we don't have parallelized binary mask transformations, which causes a huge overhead in cases where each element in the batch needs to be cropped / resized differently. As long as BinaryMasks are not parallelized, offline conversion to polygons is the best option.
if isinstance(masks, torch.Tensor): | ||
pass | ||
elif isinstance(masks, (list, tuple)): | ||
masks = torch.stack(masks, dim=2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to add support for converting RLE masks to binary mask as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In PR #150 RLE masks are discussed, however I could not find anywhere in the code of RLE support.
In my opinion, the COCO dataset seems to go fine without RLE, and I don't see the point why would it be generally useful.
As in the previous reviews for #150 was discussed by @fmassa and @wangg12 :
In my opinion, the inner computation should be based on either polygons or binary mask. So if the mode is 'mask' or the user's input is not polygons, I'd like to convert any of it into binary mask and let the inner computation based on binary masks.
Which justifies that RLE is not a self-containing representation in that sense, that for transformations like crop
, resize
, transpose
, you would have to change the underlying representation.
Aside from that, one can implement the RLEList
to back up SegmentationMask
supporting that format, without interfering with the other classes. Of course to support full compatibility, one would have to take care of the to-from conversion for both RLEList <-> BinaryMaskList
and RLEList <-> PolygonList
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again RLE -> polygons
is not always possible considering a donut-like shape. So just convert it into binary mask in the input of binary mask class is enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds reasonable, I will add it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the refactoring!
I have a few comments. I'd need to run a few experiments to validate that there is no change in behavior, so it might take a bit more time to merge this PR than usual.
|
||
''' This crashes the training way too many times... | ||
for p in polygons: | ||
assert p[::2].min() >= 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you explain why the training crashes with those asserts? are the elements out of bounds or negatives?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
both cases happen in the COCO training.
Ideally, this assert should not be a problem, however it seems that when the polygons are converted to binary masks, the implementation handles them without throwing warnings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't understand this correctly. This assumes that p
is a polygon, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fmassa please take a look at the PolygonInstance.crop
method (starts at L241), which is a bit odd to me: instead of clamping (as noted in the in-line comment) it just subtracts the left/top boundrary of the cropping box of the polygons, and passes the reduced cropping size as an argument for the new PolygonInstance
instance.
This was in the original implementation, and I suppose it heavily relies on the fact that polygon->binary
conversion will handle this issue by just ignoring over- and under-flowing polygon coordinates. So this could be why it is working without, and crashing with these asserts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see, yes, it makes sense.
The cropping was implemented that way to mimic the implementation in Detectron for the fused project_on_box
. Actually, to make things really correct we would need to identify that a point is negative, and add new edges to the polygon. Just clamping the coordinates of the polygon is unfortunately not enough
I could not add a comment to @fmassa 's comment so I will post it below:
Agreed.
But since the |
Hi @fmassa, Thank you |
Hi @botcs sorry for the delay About your first message: I was maybe hoping to have a way of decomposing a bit more the structures, but this might not be possible / needed. About validating the consistency, my thought was to run a full training and compare end performances. This would indicate if the representation that we use brings any noticeable difference in the final model result. |
Hi @fmassa,
That sounds amazing! I really hope I could utilize then the What would you think about a CityScapes training, where we could compare the
|
@botcs that sounds amazing. |
@fmassa Why would evaluation be a problem since we could save the testing results in COCO style? |
@wangg12 yes, you are right, the predictions are masks anyway and we convert them to RLE for COCO evaluation. But the COCOeval expects polygons for the ground truth, so those polygons should be there as well. But the current way we dispatch to the evaluation code is for now hard-coded to dispatch to either the pascal evaluation or the coco evaluation, depending on the dataset class. Those are not blockers, just problems that we need to address in order to make the code easy to use, and would definitely be a great contribution! |
@fmassa I think if users use binary masks, they could convert their annotations losslessly to COCO-style RLE formats, so there is no need to change the evaluation code. |
I am working on a generic coco-style evaluation that is compatible (as much as possible) with segmentation datasets. However, it would definitely help if this PR would be merged, since the mentioned evaluation refactoring would be out of scope. |
@wangg12 sounds good. One thing I'd love to do is to have a kind of generic way of calling into the COCOeval without requiring the data to be in the COCODataset format. |
Hi @fmassa,
The resulting scores are the following: Is this difference occurs because of the 2 GPU - smaller batch size - smaller learning rate or because of a significantly different behavior using the new A detailed summary of the precision and recall:
|
[COCOEval related] I have made good progress with the plans on a new dataset agnostic COCO-style evaluation approach, however your help would be really appreciated. Please find my results visualized at: thanks, |
Hey @botcs , sorry for the delay in replying.
This seems to indicate that the new
I had a quick look and it looked very interesting and promising. I'll have a closer look tomorrow make some more comments (I need to run now). |
@botcs I've launched two trainings yesterday with 8GPUs each for Mask R-CNN R-50 FPN. They are still training and the time per iteration is at about 2.2915 s, which is 5x slower than the equivalent implementation using Polygons. Does that match your expectations? |
@fmassa I have experienced a 2x slower computation, but this is because of the CPU intensive operations, probably the config is different. If you use the polygon representation the speed is expected to be the same as the original implementation, also my current training on the CityScapes polygon annotations can confirm this: time: 0.4781 (0.4750) data: 0.0091 (0.0093) for the original image size (1024x2048). The discussion in #527 was mainly motivated by the increased process time... at the moment no tricks and heuristics are applied, just the original |
@fmassa I've experienced a training speed with about 1s/iter by using mmdetection and feeding the annotations with RLE based segmentation which are then converted into binary masks for training. So I am expecting that the speed should be faster with this repo. |
@wangg12 I agree. We can speed up here literally every for loop by simply just multithreading them. Which could save a lot of time. Also the overall spectacular training speed of But to be fair.. the original implementation of the |
@botcs I've finished training with masks, but evaluation failed with the same type of error as before (the 0-d tensor)
so the fixes are not yet enough for this to be merged. :-/ |
@fmassa I am sad to hear that, I thought that the interpolation commit has fixed this issue. Can you please send me the full error-stack? Is it possible that a slicing operation is called after extracting the internal data representation by |
@fmassa in the COCO evaluation script found at EDIT: PR #559 proposes a small modification to avoid the error - does that help? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
elif isinstance(masks[0], dict) and "count" in masks[0]: | ||
# RLE interpretation | ||
|
||
masks = mask_utils |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be different for the case of a list of RLEs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @botcs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@IssamLaradji Nice catch!
Here is a quick guide for usage (and for sanity check on update):
https://gist.github.com/botcs/a59f3f59e22e5df93e3e5e4f86718af3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in PR #657
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perfect! the updated version looks good!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quick question, will this get merged into master :P?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
…ookresearch#473) * support RLE and binary mask * do not convert to numpy * be consistent with Detectron * delete wrong comment * [WIP] add tests for segmentation_mask * update tests * minor change * Refactored segmentation_mask.py * Add unit test for segmentation_mask.py * Add RLE support for BinaryMaskList * PEP8 black formatting * Minor patch * Use internal that handles 0 channels * Fix polygon slicing
After a major refactor on #150 I have implemented a generic SegmentationMask with having the following in mind:
Features:
BinaryMaskList
s andPolygonList
s are now interchangeable, and both can serve as the underlying representation of theSegmentationMask
.configs/quick_schedules/e2e_mask_rcnn_R_50_FPN_quick.yaml
with the Binary Masks consistently outperforming (maskAP=5.8) the training using Polygon representation oncoco_minival
. Also, using the Polygon backend the similar performance (maskAP=5.6) was achieved when using the current backend in themaster
branch.Downsides:
I have changed the syntax to represent the structure more coherently which affects two lines in the lib:
maskrcnn_benchmark/data/datasets/coco.py
-L83]maskrcnn_benchmark/modeling/roi_heads/mask_head/loss.py
-L38]