-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Add an option to postprocess masks during inference #124
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR!
I've left a few comments, I think there are a few things that might need to be fixed.
Also, it might be interesting to know how slower this approach will be compared to what we currently have.
The reason why I'm asking is because we now perform the interpolation using float32, while before we were using uint8. Maybe it shouldn't matter, but might be good to have a few numbers for now, also to see if there are different interpolation artifacts that appear due to this implementation.
I'll need to try out to see if accuracies change or not.
You're right about interpolation being done on float32. I'll try to run some benchmarks (probably not before next week). But we could still keep the Numpy version and use the Torch one only if |
(By the way, it had to be said: your maskrcnn implementation is really great. The code logic and model structure is well designed and flexible enough. I have been looking for this kind of implementation for a long time since I don't have the time myself to implement it. Congratulation!) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is almost good to go, only a few minor points.
Also, it would be good to know the runtime difference before merging this PR.
I'll perform some tests next week before merging the PR, just to triple check that we didn't forget anything.
Thanks again!
And about keeping the PIL version, here is what I propose: I have a CUDA and PyTorch-only implementation for the I could share this PyTorch / CUDA implementation here, and we could look into fixing the boundary artifacts so that it gives the exact same results as before, and still allow to paste the masks on the GPU in parallel (which would be a win). Thoughts? |
I've fixed issues from your comments. Here is a notebook showing the Numpy version looks faster by 20-40%. I didn't do an extensive benchmark on a large image, so the reality could be different. Anyway, it's a good starting point for you if you want to test/benchmark this function (why not integrating it as a test in the repo?). |
Here is the notebook: https://gist.github.com/hadim/43e929498b8f84c6a19ccd62e4ff810a |
As for the CUDA version, let's integrate it in this PR. Better to do that properly rather than going back to this later. As for the boundary artifacts, I am afraid that resizing the masks will always give a smaller mAP than without resizing no matter how good your resizing function is. Now the question is by how much the mAP is decreased... Only tests will tell us. Before merging we also need to make (I don't have the resource to train a dataset from scratch such as COCO, so you're gonna have to do it, sorry for that) |
No worries about training on COCO, we can do that here. When I tried using the CUDA implementation from last time, it gave roughly 2 mAP lower than the current implementation. When I checked, it was mostly the boundaries that were a bit off by a few pixels. Here is a gist with the CUDA (and equivalent CPU implementation): https://gist.github.com/fmassa/2b6763272ec867b6e7285684095b753d Unfortunately the CUDA version will probably not compile anymore due to a number of breaking changes in PyTorch internals, but those can be adapted without too much trouble so that it still compiles. The idea is to create a warp mask which will be used to apply the Let me know if you have questions. I'd recommend first trying to get the python implementation in |
About working on batches, I totally agree. |
Could you check this notebook: https://gist.github.com/hadim/99181210e5788ae20c2a3bde72d268d1 In my test, the difference between the flow field CPU version and the previous one ( I don't know if it still too big for you or maybe the Pytorch I almost didn't modify the source code (only some minor changes to update the aginst Pytorch 1.0 API). |
Also, the flow field version looks a bit faster than the previous one. I am waiting for your green light to integrate it into this PR. |
Hum, interesting. I'd need to run a full testing to see if it brings the mAP for masks down or not. I can do this next week. We should also note that this I'd prefer to have this change in a separate PR though. It will make merging this PR easier and quicker, if you are ok with it? And it's going to be simpler for tracking the history. Could you maybe send another PR with this implementation? |
Ok, so I will push some commits here to make Masker, batch compatible and open another PR with the new implementation. |
The last commit make Masker batch compatible. Backward compatibility is maintened. |
41141c5
to
1890a84
Compare
My solution raises an error during inference because masker returns a list instead of a tensor. I am not sure the returned masks could be merged into a single tensor since they don't have the same shape (number of objects. Or do they? |
The line just after the masker is applied converts the tensor into a list of tensors, one per each image. if self.masker:
mask_prob = self.masker(mask_prob, boxes)
else:
mask_prob = mask_prob.split(boxes_per_image, dim=0)
boxes_per_image = [len(box) for box in boxes] might be enough. Thoughts? |
I am confused because in To what corresponds to the first dimension of x? |
It looks like the first dimension is the number of images in the batch multiplied by the number of detected objects. Am I right? |
@hadim yes, its the total number of mask predictions for the batch. |
1890a84
to
c5bf98b
Compare
I think it's good to be reviewed. |
c5bf98b
to
f300985
Compare
f300985
to
8d00e40
Compare
Hi @hadim I'll try finding some time late this week or early next week to pull your changes and test some things locally so that I can merge this. Sorry for the delay! |
Training and inference work ok.
The only thing is that more than one image per batch is broken during inference because
Masker
does not support more than one image.