-
Notifications
You must be signed in to change notification settings - Fork 7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add C++ ops to torchvision #826
Conversation
@fmassa are you sure about removing those |
@varunagrawal it depends on how we consider what the coordinates of the bounding box represent and if cropping is inclusive or exclusive on the boundary. If indexing is inclusive (like in matlab), then yes, the +1 is necessary. But if it's exclusive (like in Python), then This is up for discussion, but I believe that we should be consistent everywhere. Tagging @rbgirshick if he has any comments if we should include the |
When I looked through the implementation of IoU on maskrcnn-benchmark, the +1 is always included, so there's precedence that requires that. More importantly, I don't think it is a indexing issue. For example, in COCO the boxes are XYWH e.g. |
torchvision/csrc/cuda/nms_cuda.cu
Outdated
|
||
dim3 blocks( | ||
at::cuda::ATenCeilDiv(boxes_num, threadsPerBlock), | ||
at::cuda::ATenCeilDiv(boxes_num, threadsPerBlock)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can replace the above two lines directly with col_blocks
.
at::cuda::CUDAGuard device_guard(dets.device()); | ||
return at::empty({0}, dets.options().dtype(at::kLong)); | ||
} | ||
auto b = at::cat({dets, scores.unsqueeze(1)}, 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been wondering about this. It seems unnecessary to waste memory creating a new tensor here when we can easily just update the function signature to accept 3 arguments. This would also lead to a more consistent signature between nms_cuda
and nms_cpu
.
Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I totally agree, I just wanted to minimize the amount of work for now, but this is definitely something I'd like to have done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add a PR on top of this branch to add that later today.
@varunagrawal thanks for the comments. To address your points:
I believe this is just a matter of convention. For example, in TF object detection library, area and width/height do not include the +1. This is particularly important because they use normalized coordinates (so that boxes are within 0-1). Plus, when converting COCO dataset to their format, no +1 is added either. If you see, the There are plenty of inconsistencies in Let me know if you disagree |
I see. I guess having Ross' comments on this would be helpful then.
Okay this is what I was looking for. If |
The "+1" history is described here https://github.com/facebookresearch/Detectron/blob/master/detectron/utils/boxes.py#L28-L40. I think this is baggage that should be shed. I believe the correct thing to do is consider a box as a geometric entity defined in terms of continuous coordinates. In this case width is simply |
Thanks @fmassa , it would be great to have officially supported layers for object detection. |
@lopuhin this might be the same issue as facebookresearch/maskrcnn-benchmark#331 Also, I'm putting together a MWE of object detection and semantic segmentation, to be included in the |
Thanks @fmassa , in my case the difference looks larger, e.g. 1.4492 vs 2.5916 for some values (and equal for other). It could be some bug on my side, will try to investigate soon. |
@fmassa here is an example of roi pooling forward difference between GPU and CPU:
which gives for me (blank lines removed for clarity):
while I expected |
@lopuhin this looks indeed like a bug, I'll have a look before merging this |
Codecov Report
@@ Coverage Diff @@
## master #826 +/- ##
==========================================
+ Coverage 56.61% 57.34% +0.72%
==========================================
Files 38 43 +5
Lines 3432 3603 +171
Branches 540 553 +13
==========================================
+ Hits 1943 2066 +123
- Misses 1372 1418 +46
- Partials 117 119 +2
Continue to review full report at Codecov.
|
fyi, the CI is now good to go on this. |
@lopuhin I believe I've fixed the bug you pointed out. Thanks a lot for the repro! |
Thank you very much for the fix and for the feature @fmassa , I confirm that it works as expected for me 👍 |
@fmassa can those ops be used from torchscript? |
@TheCodez probably not as of now, but I'll want to make them work in torchscript at some point in time |
@fmass static auto registry = torch::jit::RegisterOperators("vision::nms", &nms); The other ops seem to be more involved. |
@TheCodez yes, and I'll follow the modifications from facebookresearch/maskrcnn-benchmark#138 to make the operators here work with JIT, but that might require a few modifications elsewhere in PyTorch so that both can coexist, so I'm postponing this change. |
Also fixes a bug in the clip_boxes_to_image -- this function needs a test!
FWIW I'm getting
|
Yes, a repro on a new issue would be great! |
Thanks @fmassa , so far it looks a bit more challenging to reproduce :) |
Don't we add a contiguous call to grad_output already? |
If not, we should |
Oh interesting, I didn't know it's required, this is the place where it helps me: vision/torchvision/ops/roi_pool.py Line 32 in 37bd11d
|
There is a call to contiguous in the cuda file, jest before the kernel launch, so the problem should be elsewhere |
Right, found it here vision/torchvision/csrc/cuda/ROIPool_cuda.cu Line 224 in dc3ac29
Just for my own sanity, I see that we first get the strides (to be passed into the kernel), and only then call vision/torchvision/csrc/cuda/ROIPool_cuda.cu Lines 216 to 224 in dc3ac29
|
Yes, you are right, this is a bug! Removing the contiguous in this case should fix it. Can you also check if the same happens in ROIAlign, and send a PR? Thanks for the catch! |
Thanks! Sure, will do. |
And please also add a test for this case, if possible |
Hi, thanks a lot for adding the PR for these ops. |
@fmassa I guess we should rerun Sphinx and generate the docs afresh? |
@Naman-ntc I'm adding the doc entries for those functions now. |
This PR adds C++ / CUDA extensions to torchvision for
ROIAlign
,ROIPool
andnms
.I'll still be doing some cleanups / linting, as well as moving the files from
layers
toops
folder, so this is not yet ready to be merged.Thanks a lot to @varunagrawal who added a number of improvements on top of the
layers
branch, such as CPU support for ROIAlign and ROIPool backwards, as well as unit tests for those operations!