Skip to content
This repository has been archived by the owner on Oct 31, 2023. It is now read-only.

Add an option to postprocess masks during inference #124

Closed
wants to merge 6 commits into from

Conversation

hadim
Copy link
Contributor

@hadim hadim commented Nov 6, 2018

Training and inference work ok.

The only thing is that more than one image per batch is broken during inference because Masker does not support more than one image.

@facebook-github-bot facebook-github-bot added the CLA Signed Do not delete this pull request or issue due to inactivity. label Nov 6, 2018
@hadim hadim mentioned this pull request Nov 6, 2018
Copy link
Contributor

@fmassa fmassa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

I've left a few comments, I think there are a few things that might need to be fixed.

Also, it might be interesting to know how slower this approach will be compared to what we currently have.
The reason why I'm asking is because we now perform the interpolation using float32, while before we were using uint8. Maybe it shouldn't matter, but might be good to have a few numbers for now, also to see if there are different interpolation artifacts that appear due to this implementation.
I'll need to try out to see if accuracies change or not.

maskrcnn_benchmark/engine/inference.py Outdated Show resolved Hide resolved
@hadim
Copy link
Contributor Author

hadim commented Nov 6, 2018

You're right about interpolation being done on float32. I'll try to run some benchmarks (probably not before next week).

But we could still keep the Numpy version and use the Torch one only if POSTPROCESS_MASKS is set to True.

@hadim
Copy link
Contributor Author

hadim commented Nov 6, 2018

(By the way, it had to be said: your maskrcnn implementation is really great. The code logic and model structure is well designed and flexible enough. I have been looking for this kind of implementation for a long time since I don't have the time myself to implement it. Congratulation!)

Copy link
Contributor

@fmassa fmassa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is almost good to go, only a few minor points.

Also, it would be good to know the runtime difference before merging this PR.
I'll perform some tests next week before merging the PR, just to triple check that we didn't forget anything.

Thanks again!

@fmassa
Copy link
Contributor

fmassa commented Nov 7, 2018

And about keeping the PIL version, here is what I propose:

I have a CUDA and PyTorch-only implementation for the paste_mask_in_image which can run on either the CPU and the GPU.
I removed it from the current version of the code because it had a few boundary artifacts that dropped the mAP by a couple of points, and I didn't have the time to fix it.

I could share this PyTorch / CUDA implementation here, and we could look into fixing the boundary artifacts so that it gives the exact same results as before, and still allow to paste the masks on the GPU in parallel (which would be a win).

Thoughts?

@hadim
Copy link
Contributor Author

hadim commented Nov 7, 2018

I've fixed issues from your comments.

Here is a notebook showing the Numpy version looks faster by 20-40%. I didn't do an extensive benchmark on a large image, so the reality could be different. Anyway, it's a good starting point for you if you want to test/benchmark this function (why not integrating it as a test in the repo?).

@hadim
Copy link
Contributor Author

hadim commented Nov 7, 2018

@hadim
Copy link
Contributor Author

hadim commented Nov 7, 2018

As for the CUDA version, let's integrate it in this PR. Better to do that properly rather than going back to this later.

As for the boundary artifacts, I am afraid that resizing the masks will always give a smaller mAP than without resizing no matter how good your resizing function is. Now the question is by how much the mAP is decreased... Only tests will tell us.

Before merging we also need to make Masker batch compatible don't you think? It's a pretty big limitation not being able to run inference on batch > 1.

(I don't have the resource to train a dataset from scratch such as COCO, so you're gonna have to do it, sorry for that)

@fmassa
Copy link
Contributor

fmassa commented Nov 7, 2018

No worries about training on COCO, we can do that here.
Note though that this function is used only during testing.

When I tried using the CUDA implementation from last time, it gave roughly 2 mAP lower than the current implementation. When I checked, it was mostly the boundaries that were a bit off by a few pixels.

Here is a gist with the CUDA (and equivalent CPU implementation): https://gist.github.com/fmassa/2b6763272ec867b6e7285684095b753d
The CUDA version is basically doing the same thing as the compute_flow_field_cpu function.

Unfortunately the CUDA version will probably not compile anymore due to a number of breaking changes in PyTorch internals, but those can be adapted without too much trouble so that it still compiles.

The idea is to create a warp mask which will be used to apply the F.grid_sampler on it.
The warp mask (or flow field) is -2 everywhere, except in the locations where the masks are present. In those places, we create an interpolation grid of the right size.

Let me know if you have questions. I'd recommend first trying to get the python implementation in compute_flow_field_cpu match what we currently have, and then worry about the GPU implementation.

@fmassa
Copy link
Contributor

fmassa commented Nov 7, 2018

About working on batches, I totally agree.
If we have the compute_flow_field_cpu working as expected, making it support batches should not be too difficult I think, but needs some experimentation.

@hadim
Copy link
Contributor Author

hadim commented Nov 7, 2018

Could you check this notebook: https://gist.github.com/hadim/99181210e5788ae20c2a3bde72d268d1

In my test, the difference between the flow field CPU version and the previous one (paste_mask_in_image) is very small.

I don't know if it still too big for you or maybe the Pytorch grid_sample() function has been improved.

I almost didn't modify the source code (only some minor changes to update the aginst Pytorch 1.0 API).

@hadim
Copy link
Contributor Author

hadim commented Nov 7, 2018

Also, the flow field version looks a bit faster than the previous one. I am waiting for your green light to integrate it into this PR.

@fmassa
Copy link
Contributor

fmassa commented Nov 7, 2018

Hum, interesting.

I'd need to run a full testing to see if it brings the mAP for masks down or not. I can do this next week.

We should also note that this Masker is applied in the probability maps, and not the thresholded images. There might be some more difference coming from there.

I'd prefer to have this change in a separate PR though. It will make merging this PR easier and quicker, if you are ok with it?

And it's going to be simpler for tracking the history. Could you maybe send another PR with this implementation?

@hadim
Copy link
Contributor Author

hadim commented Nov 7, 2018

Ok, so I will push some commits here to make Masker, batch compatible and open another PR with the new implementation.

@hadim
Copy link
Contributor Author

hadim commented Nov 7, 2018

The last commit make Masker batch compatible. Backward compatibility is maintened.

@hadim
Copy link
Contributor Author

hadim commented Nov 7, 2018

My solution raises an error during inference because masker returns a list instead of a tensor. I am not sure the returned masks could be merged into a single tensor since they don't have the same shape (number of objects. Or do they?

@hadim hadim mentioned this pull request Nov 7, 2018
@fmassa
Copy link
Contributor

fmassa commented Nov 7, 2018

The line just after the masker is applied converts the tensor into a list of tensors, one per each image.
So we would need to have some casing to split the mask_prob if the masker is not passed, or else use directly the result of Masker.
Something like

        if self.masker:
            mask_prob = self.masker(mask_prob, boxes)
        else:
            mask_prob = mask_prob.split(boxes_per_image, dim=0)

        boxes_per_image = [len(box) for box in boxes]

might be enough.

Thoughts?

@hadim
Copy link
Contributor Author

hadim commented Nov 7, 2018

I am confused because in MaskPostProcessor, the inputs of forward() are a list of BoxList, one element for each image and x for the masks.

To what corresponds to the first dimension of x?

@hadim
Copy link
Contributor Author

hadim commented Nov 7, 2018

It looks like the first dimension is the number of images in the batch multiplied by the number of detected objects. Am I right?

@fmassa
Copy link
Contributor

fmassa commented Nov 7, 2018

@hadim yes, its the total number of mask predictions for the batch.

@hadim
Copy link
Contributor Author

hadim commented Nov 7, 2018

I think it's good to be reviewed.

@fmassa
Copy link
Contributor

fmassa commented Nov 13, 2018

Hi @hadim

I'll try finding some time late this week or early next week to pull your changes and test some things locally so that I can merge this.

Sorry for the delay!

@fmassa
Copy link
Contributor

fmassa commented Nov 19, 2018

Thanks a lot @hadim , I've merged the changes in #180 (I couldn't push the changes to this PR)

@fmassa fmassa closed this Nov 19, 2018
@hadim hadim deleted the postprocess-mask branch November 19, 2018 18:18
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
CLA Signed Do not delete this pull request or issue due to inactivity.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants