[feature request] ROI Pooling layers #477

varunagrawal · 2018-04-24T22:08:09Z

It would be great to have support for various ROI Pooling operations as easy to add layers to facilitate research in object detection and semantic/instance segmentation.

Here is a live checklist:

ROI Pooling Support for ROI Pooling #592 ROIPool: Support for all datatypes #632
Position Specific ROI Pooling
ROI Align Support for ROIAlign Layer #630

General PRs: #626

fmassa · 2018-04-24T22:10:17Z

I agree. I've started sketching the structure of it in https://github.com/pytorch/vision/tree/layers?files=1 .
I'll look into opening a PR tomorrow with a few layers

wadimkehl · 2018-05-15T21:37:32Z

Any movement on this?

fmassa · 2018-05-15T21:39:50Z

Hey Wadim,
ROIPool and ROIAlign are implemented in the layers branch. I'm holding on merging them as is because I might want to change a few things, but feel free to use them as is (they are working)

wadimkehl · 2018-05-15T21:48:10Z

Great, will have a look! Thanks :)

varunagrawal · 2018-06-18T18:49:11Z

@fmassa any updates on this? I'm sure a lot of people would benefit from having a master branch version of this available soon.

botcs · 2018-06-19T10:53:06Z

It would be super convenient to have this installed automatically with torch/torchvision

fmassa · 2018-06-19T11:25:05Z

Having the master branch have cpu/cuda layers officially requires a few additional changes, like providing wheels with the compiled binaries for each supported architecture, and I'm not looking at this at the moment.

rawmarshmellows · 2018-06-19T12:20:30Z

Just wondering if the ROI pooling/align could theoretically be done in pure Pytorch (even if it will be slow?)

botcs · 2018-06-19T12:51:40Z

Was thinking about the same...
ROI pooling: Adaptive MaxPools exist, if you can efficiently crop out all the tensors you need from each image in a batch and concat them in the batch dimension, maybe it could work, however I have a bad feeling about the efficiency of this naive approach

fmassa · 2018-06-19T13:43:58Z

@kevinlu1211 it is possible to implement it using pure PyTorch, and performance is OK.
An (old, badly tested) implementation can be found in https://github.com/pytorch/examples/pull/21/files#diff-7573d025c4128229f8efa3ff042e09d1R38

rawmarshmellows · 2018-06-19T13:45:41Z

You are a life saver! I’m just writing a tutorial to explain mask rcnn thanks a lot!

…

On Tue, 19 Jun 2018 at 11:44 pm, Francisco Massa ***@***.***> wrote: @kevinlu1211 <https://github.com/kevinlu1211> it is possible to implement it using pure PyTorch, and performance is OK. An (old, badly tested) implementation can be found in https://github.com/pytorch/examples/pull/21/files#diff-7573d025c4128229f8efa3ff042e09d1R38 — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#477 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AME8D-m7bWhfCgW9yzr5VXr8fwUFcI4Sks5t-QAkgaJpZM4Tifuk> .

varunagrawal · 2018-06-19T18:53:36Z

@fmassa I find it surprising this is not higher priority. The fact that every other major deep learning framework supports ROI Pooling and there is no easy way to write a Pytorch version of Detectron for research purposes despite the deep integration between Pytorch and Caffe2 is bewildering.

Is there some other way we can push this forward if you're too busy? I'm sure we can find volunteers to push this out the door as soon as possible.

wadimkehl · 2018-07-12T22:28:28Z

Come on @fmassa, make us all happy. If you don't have time, I'd gladly help!

vfdev-5 · 2018-07-12T22:37:54Z

Yeah, captain @fmassa you have almost an army of volunteers that wait for your orders :)

varunagrawal · 2018-07-13T00:36:25Z

@fmassa I guess I've figured out how to get the CppExtension module to work for me and I should be able to finish this feature.

I see you have TODOs to pull some common CUDA utilities out into a common file. Any other things you'd like to do before I make a PR?

fmassa · 2018-07-13T17:31:24Z

Hey guys, sorry for the delay here.

So, there are a number of things that should be done in order to be able to put this in torchvision:

package wheels with CPU / CUDA compiled code
add proper unit tests
documentation
code clean-up
CppExtensions and ATen rapidly changing and breaking the code :-)

I've been doing some great progress on Detectron, and I've currently moved all those layers to the detectron repo for the moment. I'm currently hesitating if I should put those layers in torchvision because of the aforementioned difficulties.

What do you guys think?

wadimkehl · 2018-07-13T19:15:08Z

Is the last issue a constantly persisting one? All the others I do not perceive to be big problems for a WIP branch, really. But it would enable everyone to have a working, if temporary, in-house pytorch solution.

varunagrawal · 2018-07-13T19:19:00Z

@fmassa I believe I can take care of everything else except Wheel generation since I'm not familiar with the python packaging pipeline at FB.

As @wadimkehl mentioned, is there a checkpoint we can use for CppExtension and ATen? I used the latest master of PyTorch as of yesterday and your branch compiles fine as is.

vfdev-5 · 2018-07-13T20:10:59Z

@varunagrawal concerning wheels and packaging you can take a look at pytorch/builder

@fmassa I think torchvision can have a scope to provide models/datasets/transforms for tasks like

classification (as it is today)
segmentation (need new models and transforms that takes care of masks)
detection (at least transformations and maybe a stuff to encode/decode ground truth)
- bboxes
- keypoints

IMHO, ROI Pooling is very specific to an architecture and if torchvision is not intended to merge inside itself the research on faster-rcnn-like nets, this can be avoided.

Any thoughts?

varunagrawal · 2018-07-23T19:14:44Z

@vfdev-5 your suggestion would turn this into a Chicken & Egg problem since we need ROI Pooling to implement even a basic RCNN model.

Given that Detectron supports Faster RCNN, Caffe2 is now intrinsically linked to Pytorch, and 2 stage detectors are still highly looked into in research (e.g. Light Head RCNN) and industry, having a ROI pooling/align layer would be beneficial for torchvision overall.

While I agree with your categorization for different tasks such as classification, segmentation and detection, doing so would require significant effort which the Pytorch team isn't able to provide given the priority of v1. Indeed, I have already spoken to @soumith about a separate repo for detection and segmentation related tasks and he's shown considerable interest. Until then, and looking at the large amount of interest on this issue, having the layers here for now would be sufficient.

fmassa · 2018-07-30T12:22:30Z

Sorry for the delay in replying.

@wadimkehl @varunagrawal as of today, my branch doesn't compile anymore on latest PyTorch because of pytorch/pytorch#9435, and patches such as ngimel/pytorch@ae176af should be applied to ROIAlign (and maybe other functions as well)

This has been the case at least 3-4 times for me already, which means that supporting those extension layers officially in torchvision at the moment would be hard to maintain -- if the user updates PyTorch, torchvision breaks, if the user update torchvision but not pytorch, it also breaks, he needs to update both at the same time. This was a recurring issue with Lua-Torch, and I'd rather avoid it at the moment.

About where to put the aforementioned layers, I'm not yet convinced on what is the right solution.
In the one hand, those backward-compatibility issues makes me hesitant to put them in torchvision (as it's widely used and up to now has been a python-only lib), so putting them in Detectron would make sense.
On the other hand, having a unified place where the basic building blocks can be found is a nice thing to have.
I think once we release Detectron, we might converge into gradually migrating a few layers and abstractions to torchvision, as the BC breakages on the C++ level would be less recurrent I'd hope.

varunagrawal · 2018-07-30T15:46:56Z

@fmassa the good news is I have forked your branch and already made all the fixes. As of 07/27/2018, the ROI Pooling layer compiles successfully on my branch and I have also added a whole bunch of tests to check for correctness.

I can submit the PR and continue to maintain ROI Pooling (and ROI Align hopefully soon) until we get more stability from ATen and checkpoint at either PyTorch v0.5 or v1.

fmassa · 2018-07-30T15:49:46Z

Sure, if you send a PR to the layers branch, I will merge it ,thanks!
But I'm not going to be merging the layers branch into master before things stabilize, which might be before v1.0

varunagrawal · 2018-07-30T17:03:18Z

That works! For now let's point people towards the layers branch until we get the desired stability.

varunagrawal · 2018-10-16T22:03:36Z

Added support for ROIAlign with #630.

fmassa · 2018-10-25T12:35:06Z

FYI, we have released our implementation of {Faster, Mask} R-CNN in https://github.com/facebookresearch/maskrcnn-benchmark , which contains the implementations for ROI Pooling and ROI Align. It currently doesn't have all the nice improvements that @varunagrawal has pushed to the layers branch here (like backwards for a few layers).

I suggest we move this discussion there for now.

seanremy · 2019-01-18T19:22:56Z

It would be wonderful if the (working) ROI Pooling code in the layers branch could be updated and merged into torchvision. I think I speak for myself and many other vision researchers in that this is an essential functionality, and having it supported in the current torchvision is far less of a hassle than continuing to build this repo from source using an outdated branch.

varunagrawal · 2019-01-18T19:37:38Z

Check out #708
@fmassa plans to merge layers for v3. Let's hope the next release is pushed out soon.

varunagrawal · 2019-03-11T17:24:58Z

@fmassa do you want to reopen this issue until we can get all the related PRs merged? I'll update the original Issue comments with the PR numbers to help keep track.

fmassa · 2019-03-31T14:19:54Z

@varunagrawal I'm going to be merging the layers branch this weekend. Thanks a lot for the awesome help improving it!

dungmn · 2019-05-29T04:14:24Z

@fmassa When is the model Roi pooling available on the master branch?

wadimkehl · 2019-05-29T04:18:30Z

It already is with 0.3

LukasBommes · 2019-08-30T06:02:25Z

Is anyone working on position sensitive ROI pooling similar to this one: https://github.com/tensorflow/models/blob/f9fe0fe97aee7964ac344ce38bafb20e977586dc/research/object_detection/utils/ops.py#L652?

fmassa · 2019-08-30T07:46:26Z

@LukasBommes there is an open PR adding it to torchvision, see #1259

MitraTj · 2019-09-17T05:15:39Z

Hi all,
I need to extract different scales of ROIs to have 7x7, 5x6, 1x1
Any help please?

fmassa · 2019-09-17T11:19:25Z

@MitraTj just add different RoIPool layers with different output sizes.

XuYunqiu · 2019-10-03T09:46:22Z

Hi all, I would ask is there any implementation of an average version of ROI Pooling?
I find existing ROI Pool is only implemented with max pool.

fmassa · 2019-10-03T09:48:07Z

@XuYunqiu there is RoIAlign, which performs bilinear interpolation (instead of max).
Would that be ok for your use-case?

XuYunqiu · 2019-10-04T06:22:03Z

@XuYunqiu there is RoIAlign, which performs bilinear interpolation (instead of max).
Would that be ok for your use-case?

@fmassa Thanks for your quick reply. Actually, I just want to get the mean values of each ROIs.
So can I use ROI Align like this roi_align(conv_feat, rois, 1, spatial_scale=1.0/stride, sampling_ratio=1) ?

varunagrawal · 2019-10-04T06:43:56Z

I've actually been considering adding average pooling as an option to the ROI operations. It's not hard and allows for some nice generalization.

XuYunqiu · 2019-10-04T07:14:27Z

I've actually been considering adding average pooling as an option to the ROI operations. It's not hard and allows for some nice generalization.

Exactly, it will be helpful.

fmassa · 2019-10-04T11:45:12Z

@XuYunqiu yes, that is going to be doing roughly what you are looking for

XuYunqiu · 2019-10-04T12:22:33Z

@XuYunqiu yes, that is going to be doing roughly what you are looking for

But it might not work well with ROIs with a large area. I think the output using bilinear interpolation only relevant to a quite local context of the sample location (i.e., the center of ROIs in my case).

XuYunqiu · 2019-10-06T03:12:54Z

@fmassa Hi, sorry to bother you again. Would you mind to tell me which mode (average or max pool) is selected in RoIAlign calculating the output based on the value of several sampled pointes?
I find there are RoIAlignAvg and RoIAlignMax in former implementation. But I don't find any information about this in the documents of torchvision version RoIAlign.

XuYunqiu · 2019-10-06T04:03:30Z

@fmassa Hi, sorry to bother you again. Would you mind to tell me which mode (average or max pool) is selected in RoIAlign calculating the output based on the value of several sampled pointes?
I find there are RoIAlignAvg and RoIAlignMax in former implementation. But I don't find any information about this in the documents of torchvision version RoIAlign.

I‘ve gotten my answer from the source code. It seems that only average mode is set for RoIAlign.

vision/torchvision/csrc/cuda/ROIAlign_cuda.cu

Line 108 in 76702a0

// We do average (integral) pooling inside a bin

I really hope RoIPool and RoIAlign in torchvision could keep both average and max mode for more convenient usage.

florinshen · 2023-04-10T13:05:14Z

Up to now, the torchvision still only implement the maxpool version of RoI-Pooling and avgpool version of RoI-align. For convinience, I fount that the mmcv have implement both mode for this two ops.
https://mmcv.readthedocs.io/en/latest/_modules/mmcv/ops/roi_align.html.

varunagrawal changed the title ~~[feature request] ROI Pooling as nn layers~~ [feature request] ROI Pooling layers Apr 24, 2018

This was referenced Aug 23, 2018

ROI Pooling on GPU and CPU #584

Closed

ROI Pooling Layer #585

Closed

fmassa closed this as completed Oct 25, 2018

[feature request] ROI Pooling layers #477

[feature request] ROI Pooling layers #477

Comments

varunagrawal commented Apr 24, 2018 • edited Loading

fmassa commented Apr 24, 2018

wadimkehl commented May 15, 2018

fmassa commented May 15, 2018

wadimkehl commented May 15, 2018

varunagrawal commented Jun 18, 2018

botcs commented Jun 19, 2018

fmassa commented Jun 19, 2018

rawmarshmellows commented Jun 19, 2018

botcs commented Jun 19, 2018

fmassa commented Jun 19, 2018

rawmarshmellows commented Jun 19, 2018 via email

varunagrawal commented Jun 19, 2018

wadimkehl commented Jul 12, 2018

vfdev-5 commented Jul 12, 2018

varunagrawal commented Jul 13, 2018

fmassa commented Jul 13, 2018 • edited Loading

wadimkehl commented Jul 13, 2018 • edited Loading

varunagrawal commented Jul 13, 2018

vfdev-5 commented Jul 13, 2018 • edited Loading

varunagrawal commented Jul 23, 2018

fmassa commented Jul 30, 2018

varunagrawal commented Jul 30, 2018

fmassa commented Jul 30, 2018

varunagrawal commented Jul 30, 2018

varunagrawal commented Oct 16, 2018

fmassa commented Oct 25, 2018

seanremy commented Jan 18, 2019

varunagrawal commented Jan 18, 2019

varunagrawal commented Mar 11, 2019

fmassa commented Mar 31, 2019

dungmn commented May 29, 2019 • edited Loading

wadimkehl commented May 29, 2019

LukasBommes commented Aug 30, 2019

fmassa commented Aug 30, 2019

MitraTj commented Sep 17, 2019

fmassa commented Sep 17, 2019

XuYunqiu commented Oct 3, 2019

fmassa commented Oct 3, 2019

XuYunqiu commented Oct 4, 2019 • edited Loading

varunagrawal commented Oct 4, 2019

XuYunqiu commented Oct 4, 2019

fmassa commented Oct 4, 2019

XuYunqiu commented Oct 4, 2019

XuYunqiu commented Oct 6, 2019

XuYunqiu commented Oct 6, 2019 • edited Loading

florinshen commented Apr 10, 2023

varunagrawal commented Apr 24, 2018 •

edited

Loading

fmassa commented Jul 13, 2018 •

edited

Loading

wadimkehl commented Jul 13, 2018 •

edited

Loading

vfdev-5 commented Jul 13, 2018 •

edited

Loading

dungmn commented May 29, 2019 •

edited

Loading

XuYunqiu commented Oct 4, 2019 •

edited

Loading

XuYunqiu commented Oct 6, 2019 •

edited

Loading