Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature request] ROI Pooling layers #477

Closed
2 of 3 tasks
varunagrawal opened this issue Apr 24, 2018 · 46 comments
Closed
2 of 3 tasks

[feature request] ROI Pooling layers #477

varunagrawal opened this issue Apr 24, 2018 · 46 comments

Comments

@varunagrawal
Copy link
Contributor

varunagrawal commented Apr 24, 2018

It would be great to have support for various ROI Pooling operations as easy to add layers to facilitate research in object detection and semantic/instance segmentation.

Here is a live checklist:

General PRs: #626

@varunagrawal varunagrawal changed the title [feature request] ROI Pooling as nn layers [feature request] ROI Pooling layers Apr 24, 2018
@fmassa
Copy link
Member

fmassa commented Apr 24, 2018

I agree. I've started sketching the structure of it in https://github.com/pytorch/vision/tree/layers?files=1 .
I'll look into opening a PR tomorrow with a few layers

@wadimkehl
Copy link

Any movement on this?

@fmassa
Copy link
Member

fmassa commented May 15, 2018

Hey Wadim,
ROIPool and ROIAlign are implemented in the layers branch. I'm holding on merging them as is because I might want to change a few things, but feel free to use them as is (they are working)

@wadimkehl
Copy link

Great, will have a look! Thanks :)

@varunagrawal
Copy link
Contributor Author

@fmassa any updates on this? I'm sure a lot of people would benefit from having a master branch version of this available soon.

@botcs
Copy link

botcs commented Jun 19, 2018

It would be super convenient to have this installed automatically with torch/torchvision

@fmassa
Copy link
Member

fmassa commented Jun 19, 2018

Having the master branch have cpu/cuda layers officially requires a few additional changes, like providing wheels with the compiled binaries for each supported architecture, and I'm not looking at this at the moment.

@rawmarshmellows
Copy link

Just wondering if the ROI pooling/align could theoretically be done in pure Pytorch (even if it will be slow?)

@botcs
Copy link

botcs commented Jun 19, 2018

Was thinking about the same...
ROI pooling: Adaptive MaxPools exist, if you can efficiently crop out all the tensors you need from each image in a batch and concat them in the batch dimension, maybe it could work, however I have a bad feeling about the efficiency of this naive approach

@fmassa
Copy link
Member

fmassa commented Jun 19, 2018

@kevinlu1211 it is possible to implement it using pure PyTorch, and performance is OK.
An (old, badly tested) implementation can be found in https://github.com/pytorch/examples/pull/21/files#diff-7573d025c4128229f8efa3ff042e09d1R38

@rawmarshmellows
Copy link

rawmarshmellows commented Jun 19, 2018 via email

@varunagrawal
Copy link
Contributor Author

@fmassa I find it surprising this is not higher priority. The fact that every other major deep learning framework supports ROI Pooling and there is no easy way to write a Pytorch version of Detectron for research purposes despite the deep integration between Pytorch and Caffe2 is bewildering.

Is there some other way we can push this forward if you're too busy? I'm sure we can find volunteers to push this out the door as soon as possible.

@wadimkehl
Copy link

Come on @fmassa, make us all happy. If you don't have time, I'd gladly help!

@vfdev-5
Copy link
Collaborator

vfdev-5 commented Jul 12, 2018

Yeah, captain @fmassa you have almost an army of volunteers that wait for your orders :)

@varunagrawal
Copy link
Contributor Author

@fmassa I guess I've figured out how to get the CppExtension module to work for me and I should be able to finish this feature.

I see you have TODOs to pull some common CUDA utilities out into a common file. Any other things you'd like to do before I make a PR?

@fmassa
Copy link
Member

fmassa commented Jul 13, 2018

Hey guys, sorry for the delay here.

So, there are a number of things that should be done in order to be able to put this in torchvision:

  • package wheels with CPU / CUDA compiled code
  • add proper unit tests
  • documentation
  • code clean-up
  • CppExtensions and ATen rapidly changing and breaking the code :-)

I've been doing some great progress on Detectron, and I've currently moved all those layers to the detectron repo for the moment. I'm currently hesitating if I should put those layers in torchvision because of the aforementioned difficulties.

What do you guys think?

@wadimkehl
Copy link

wadimkehl commented Jul 13, 2018

Is the last issue a constantly persisting one? All the others I do not perceive to be big problems for a WIP branch, really. But it would enable everyone to have a working, if temporary, in-house pytorch solution.

@varunagrawal
Copy link
Contributor Author

@fmassa I believe I can take care of everything else except Wheel generation since I'm not familiar with the python packaging pipeline at FB.

As @wadimkehl mentioned, is there a checkpoint we can use for CppExtension and ATen? I used the latest master of PyTorch as of yesterday and your branch compiles fine as is.

@vfdev-5
Copy link
Collaborator

vfdev-5 commented Jul 13, 2018

@varunagrawal concerning wheels and packaging you can take a look at pytorch/builder

@fmassa I think torchvision can have a scope to provide models/datasets/transforms for tasks like

  • classification (as it is today)
  • segmentation (need new models and transforms that takes care of masks)
  • detection (at least transformations and maybe a stuff to encode/decode ground truth)
    • bboxes
    • keypoints

IMHO, ROI Pooling is very specific to an architecture and if torchvision is not intended to merge inside itself the research on faster-rcnn-like nets, this can be avoided.

Any thoughts?

@varunagrawal
Copy link
Contributor Author

@vfdev-5 your suggestion would turn this into a Chicken & Egg problem since we need ROI Pooling to implement even a basic RCNN model.

Given that Detectron supports Faster RCNN, Caffe2 is now intrinsically linked to Pytorch, and 2 stage detectors are still highly looked into in research (e.g. Light Head RCNN) and industry, having a ROI pooling/align layer would be beneficial for torchvision overall.

While I agree with your categorization for different tasks such as classification, segmentation and detection, doing so would require significant effort which the Pytorch team isn't able to provide given the priority of v1. Indeed, I have already spoken to @soumith about a separate repo for detection and segmentation related tasks and he's shown considerable interest. Until then, and looking at the large amount of interest on this issue, having the layers here for now would be sufficient.

@fmassa
Copy link
Member

fmassa commented Jul 30, 2018

Sorry for the delay in replying.

@wadimkehl @varunagrawal as of today, my branch doesn't compile anymore on latest PyTorch because of pytorch/pytorch#9435, and patches such as ngimel/pytorch@ae176af should be applied to ROIAlign (and maybe other functions as well)

This has been the case at least 3-4 times for me already, which means that supporting those extension layers officially in torchvision at the moment would be hard to maintain -- if the user updates PyTorch, torchvision breaks, if the user update torchvision but not pytorch, it also breaks, he needs to update both at the same time. This was a recurring issue with Lua-Torch, and I'd rather avoid it at the moment.

About where to put the aforementioned layers, I'm not yet convinced on what is the right solution.
In the one hand, those backward-compatibility issues makes me hesitant to put them in torchvision (as it's widely used and up to now has been a python-only lib), so putting them in Detectron would make sense.
On the other hand, having a unified place where the basic building blocks can be found is a nice thing to have.
I think once we release Detectron, we might converge into gradually migrating a few layers and abstractions to torchvision, as the BC breakages on the C++ level would be less recurrent I'd hope.

@varunagrawal
Copy link
Contributor Author

@fmassa the good news is I have forked your branch and already made all the fixes. As of 07/27/2018, the ROI Pooling layer compiles successfully on my branch and I have also added a whole bunch of tests to check for correctness.

I can submit the PR and continue to maintain ROI Pooling (and ROI Align hopefully soon) until we get more stability from ATen and checkpoint at either PyTorch v0.5 or v1.

@fmassa
Copy link
Member

fmassa commented Jul 30, 2018

Sure, if you send a PR to the layers branch, I will merge it ,thanks!
But I'm not going to be merging the layers branch into master before things stabilize, which might be before v1.0

@varunagrawal
Copy link
Contributor Author

That works! For now let's point people towards the layers branch until we get the desired stability.

This was referenced Aug 23, 2018
@varunagrawal
Copy link
Contributor Author

Added support for ROIAlign with #630.

@fmassa
Copy link
Member

fmassa commented Oct 25, 2018

FYI, we have released our implementation of {Faster, Mask} R-CNN in https://github.com/facebookresearch/maskrcnn-benchmark , which contains the implementations for ROI Pooling and ROI Align. It currently doesn't have all the nice improvements that @varunagrawal has pushed to the layers branch here (like backwards for a few layers).

I suggest we move this discussion there for now.

@fmassa fmassa closed this as completed Oct 25, 2018
@seanremy
Copy link

It would be wonderful if the (working) ROI Pooling code in the layers branch could be updated and merged into torchvision. I think I speak for myself and many other vision researchers in that this is an essential functionality, and having it supported in the current torchvision is far less of a hassle than continuing to build this repo from source using an outdated branch.

@varunagrawal
Copy link
Contributor Author

Check out #708
@fmassa plans to merge layers for v3. Let's hope the next release is pushed out soon.

@varunagrawal
Copy link
Contributor Author

@fmassa do you want to reopen this issue until we can get all the related PRs merged? I'll update the original Issue comments with the PR numbers to help keep track.

@fmassa
Copy link
Member

fmassa commented Mar 31, 2019

@varunagrawal I'm going to be merging the layers branch this weekend. Thanks a lot for the awesome help improving it!

@dungmn
Copy link

dungmn commented May 29, 2019

@fmassa When is the model Roi pooling available on the master branch?

@wadimkehl
Copy link

It already is with 0.3

@LukasBommes
Copy link
Contributor

Is anyone working on position sensitive ROI pooling similar to this one: https://github.com/tensorflow/models/blob/f9fe0fe97aee7964ac344ce38bafb20e977586dc/research/object_detection/utils/ops.py#L652?

@fmassa
Copy link
Member

fmassa commented Aug 30, 2019

@LukasBommes there is an open PR adding it to torchvision, see #1259

@MitraTj
Copy link

MitraTj commented Sep 17, 2019

Hi all,
I need to extract different scales of ROIs to have 7x7, 5x6, 1x1
Any help please?

@fmassa
Copy link
Member

fmassa commented Sep 17, 2019

@MitraTj just add different RoIPool layers with different output sizes.

@XuYunqiu
Copy link

XuYunqiu commented Oct 3, 2019

Hi all, I would ask is there any implementation of an average version of ROI Pooling?
I find existing ROI Pool is only implemented with max pool.

@fmassa
Copy link
Member

fmassa commented Oct 3, 2019

@XuYunqiu there is RoIAlign, which performs bilinear interpolation (instead of max).
Would that be ok for your use-case?

@XuYunqiu
Copy link

XuYunqiu commented Oct 4, 2019

@XuYunqiu there is RoIAlign, which performs bilinear interpolation (instead of max).
Would that be ok for your use-case?

@fmassa Thanks for your quick reply. Actually, I just want to get the mean values of each ROIs.
So can I use ROI Align like this roi_align(conv_feat, rois, 1, spatial_scale=1.0/stride, sampling_ratio=1) ?

@varunagrawal
Copy link
Contributor Author

I've actually been considering adding average pooling as an option to the ROI operations. It's not hard and allows for some nice generalization.

@XuYunqiu
Copy link

XuYunqiu commented Oct 4, 2019

I've actually been considering adding average pooling as an option to the ROI operations. It's not hard and allows for some nice generalization.

Exactly, it will be helpful.

@fmassa
Copy link
Member

fmassa commented Oct 4, 2019

@XuYunqiu yes, that is going to be doing roughly what you are looking for

@XuYunqiu
Copy link

XuYunqiu commented Oct 4, 2019

@XuYunqiu yes, that is going to be doing roughly what you are looking for

But it might not work well with ROIs with a large area. I think the output using bilinear interpolation only relevant to a quite local context of the sample location (i.e., the center of ROIs in my case).

@XuYunqiu
Copy link

XuYunqiu commented Oct 6, 2019

@fmassa Hi, sorry to bother you again. Would you mind to tell me which mode (average or max pool) is selected in RoIAlign calculating the output based on the value of several sampled pointes?
I find there are RoIAlignAvg and RoIAlignMax in former implementation. But I don't find any information about this in the documents of torchvision version RoIAlign.

@XuYunqiu
Copy link

XuYunqiu commented Oct 6, 2019

@fmassa Hi, sorry to bother you again. Would you mind to tell me which mode (average or max pool) is selected in RoIAlign calculating the output based on the value of several sampled pointes?
I find there are RoIAlignAvg and RoIAlignMax in former implementation. But I don't find any information about this in the documents of torchvision version RoIAlign.

I‘ve gotten my answer from the source code. It seems that only average mode is set for RoIAlign.

// We do average (integral) pooling inside a bin

I really hope RoIPool and RoIAlign in torchvision could keep both average and max mode for more convenient usage.

@florinshen
Copy link

Up to now, the torchvision still only implement the maxpool version of RoI-Pooling and avgpool version of RoI-align. For convinience, I fount that the mmcv have implement both mode for this two ops.
https://mmcv.readthedocs.io/en/latest/_modules/mmcv/ops/roi_align.html.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests