-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Extensions for Mask R-CNN for improved performance #379
Conversation
…me changes in mask head loss project function
Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please sign up at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need the corporate CLA signed. If you have received this in error or have any questions, please contact us at [email protected]. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR!
I have an initial question: does this PR work with fp16 on the GPU? I have seen that the kernels use float, but there are no checks for fp32 when dispatching to the kernels (only is_cuda
).
We keep coordinate and IoU values in fp32 even in mixed precision training; so they work in our mixed precision training. If you cast coordinates (either bounding box coordinates or mask polygon coordinates as fetched from dataset), or IoU matrix (which is an input to proposal matcher function) to fp16, then some of these kernels won't work. |
@seryilmaz thanks for the explanation! I think it's ok to leave these kernels to work on fp32-only for now, but I'd say that it can be a bit confusing if the users move the IoU values to be in fp16. I'll do a pass on the implementation later today and I'll let you know |
I was thinking about a way of being less intrusive to the python implementation, what about the following: 1 - all the parts of the python implementation that you have currently wrapped inside a conditional would become functions. Here is an example implementation: def criterion(b1, b2):
global enable_optim
if not enable_optim:
return False
if b1.bbox.is_cuda and b2.bbox.is_cuda:
return True
return False
@optimized_codepath(
_C.box_iou,
lambda boxlist1, boxlist2: boxlist1.bbox, boxlist2.bbox,
criterion)
def boxlist_iou(boxlist1, boxlist2):
# all the python stays here
return iou The basic idea is that, if criterion is satisfied for the inputs (and a global flag that will enable/disable the optimizations), then the optimized implementation will be used. This has the benefit to reduce the amount of places where the user need to change something if they want to disable the optimizations, and also has a clean separation between both. Thoughts? |
That could work, I'd be a little concerned about the verbose-ness of it as the number of arguments increases though. |
Hi @slayton58, This PR seems really valuable, since the CPU cropping / resizing is causing some serious overhead. Thank you :) |
@botcs Well, this is @seryilmaz's PR, so... FWIW I feel that a concrete implementation framework to handle this kind of optimisation would be a big help |
@slayton58 oh sorry, I forgot to mention @seryilmaz as well. BTW, what do you mean by:
|
We have a concept from Francisco above for a framework to handle these optimizations but no actual implementation to work from. |
hi @seryilmaz |
@fmassa -- I got a ping on this the other day from NVIDIA. Can you look at this again? It looks like @slayton58 didn't think he had enough detail from your comments above to carry this forward. |
@gottbrath @slayton58 maskrcnn-benchmark is now deprecated in favor of detectron2. I'd recommend re-opening an issue to discuss about those optimization in detectron2 |
This PR adds C++/CUDA extensions for some functions used for Mask R-CNN (and for some other models as well). Extensions are for generating mask targets for loss computation in mask head, box iou computation, box encode function, and proposal matcher function.