Skip to content
This repository has been archived by the owner on Oct 31, 2023. It is now read-only.

Will maskrcnn-benchmark support torch.jit.trace or torch.jit.script mode in the nearly future? #27

Open
xxradon opened this issue Oct 26, 2018 · 19 comments
Labels
enhancement New feature or request

Comments

@xxradon
Copy link

xxradon commented Oct 26, 2018

❓ Questions and Help

As we know,in pytorch1.0 Torch Script is a way to create serializable and optimizable models from PyTorch code. Any code written in Torch Script can be saved from Python process and loaded in a process where there is no Python dependency.
So will maskrcnn-benchmark support torch.jit.trace or torch.jit.script mode in the nearly future?

@fmassa
Copy link
Contributor

fmassa commented Oct 26, 2018

There are currently some python functionality in this codebase that is not supported by torch.jit.script, but which will be supported in the future.

Currently, you can trace almost all the model, except the custom C++ layers. Once we add support for those missing C++ layers by registering them as torch ops, I believe tracing should work without issues for same-sized images.

I'll look into registering the C++ layers into the torch ops

@fmassa fmassa added the enhancement New feature or request label Oct 26, 2018
@soumith
Copy link
Member

soumith commented Oct 26, 2018

@fmassa the C++ layers can be registered with the JIT, see @goldsborough's slides from DevCon

@fmassa
Copy link
Contributor

fmassa commented Oct 26, 2018

Yes, but I believe that it currently requires some extra code that follows a different codepath.
I'll check with Peter about that.

@xxradon
Copy link
Author

xxradon commented Oct 27, 2018

@fmassa Thanks for your concern,if there is a way to trace the maskrcnn or any userful information,plesea let us know.

@fmassa
Copy link
Contributor

fmassa commented Oct 27, 2018

I'll look into adding tracing support for the custom ops early this week, it should not be hard. I'll update on the issue once it's done

@t-vi
Copy link

t-vi commented Oct 27, 2018

Awesome!

@hadim
Copy link
Contributor

hadim commented Nov 1, 2018

I am also interested by this feature!

@Eric-Zhang1990
Copy link

@fmassa Do you have added tracing support for the custom ops? Thanks.

@t-vi
Copy link

t-vi commented Nov 5, 2018

So I did look at this in some depth and continue to do so, here is a bit of a progress report for discussion. I'm also happy to share a branch with my code, but the code is even more "stream of consciousness" than this write-up.

Goal and and plan

My goal is to be able to detect in single images of a fixed size (known during tracing) in C++ as close as possibe to the "load traced model in C++" example.

My first step is to get something that

  • has scripted/traced paths for every "processing step" bit, and
  • manages to reproduce the output on the image it has been traced on (so I postpone variations that occur during detection for different scores, but I try not to screw up things too much),
  • I did take the "do whatever works and cleanup later" approach - it's really, really messy right now.
  • I think there will be JIT bits to think about on the PyTorch/JIT side, too.

My findings so far

C++ bits / Custom Ops

  • The mere addition of custom ops support (for inference) for the C++ ops seems easy:
    • Change int -> int64_t, float -> double (I'm not 100% certain it's needed),
    • link to libtorch.so and libcaffe2.so (it's probably silly to use the extension mechanism, but that is "clean up later").
    • Add registry in vision.cpp
#include <torch/script.h>
...
static auto registry =
  torch::jit::RegisterOperators()
    .op("maskrcnn_benchmark::nms", &nms)
    .op("maskrcnn_benchmark::roi_align_forward(Tensor input, Tensor rois, float spatial_scale, int pooled_height, int pooled_width, int sampling_ratio) -> Tensor", &ROIAlign_forward);
  • Using them: In layers/nms.py:
import torch

nms = torch.ops.maskrcnn_benchmark.nms

Easy. However, I could not trace the resulting nms.
(torch.jit.trace(lambda x,y: maskrcnn_benchmark.layers.nms(x,y, 2), (torch.randn(5,5), torch.randn(5,5))) gives an error, it should not).
This can be worked around (for a fixed threshold, but that's OK, I think) by a double torchscript wrapper:

    @torch.jit.script
    def nms_fixed_thresh1(dets, scores, th: float=coco_demo.model.rpn.box_selector_test.nms_thresh):
        return maskrcnn_benchmark.layers.nms(dets, scores, th)

    @torch.jit.script
    def nms_fixed_thresh(dets, scores):
        return nms_fixed_thresh1(dets, scores)

Now we can trace nms_fixed_thresh in place of the lambda above. @goldsborough will want to know. :)

A similar wrapping trick was needed for roi align forward, I put that in the layer (where all the constants are parameters, so it's natural).

I did change some lists to tuples to make the jit happier.

Custom bookkeeping types (boxlist oh oh)

The jit isn't very fond of the boxlist things. Where it works, a minimal fix is to "unpack" the parameters of functions, assuming that all Tensors are arguments and all others are constants. That works reasonably well when operating on the same input again in traced mode. It remains to be seen if we run into generalization problems. To facilitate that, I added two methods to bounding_box:

    # note: _get_tensors/_set_tensors only work if the keys don't change in between!
    def _get_tensors(self):
        return (self.bbox,)+tuple(f for f in (self.get_field(field) for field in sorted(self.fields())) if isinstance(f, torch.Tensor))

    def _set_tensors(self, ts):
        self.bbox = ts[0]
        for i, f in enumerate(sorted(self.fields())):
            if isinstance(self.extra_fields[f], torch.Tensor):
                self.extra_fields[f] = ts[1 + i]

and there is some wrapper code.

Some things that don't work well with tracing/scripting

The box_coder uses

            pred_boxes = torch.zeros_like(rel_codes)
            # x1
            pred_boxes[:, 0::4] = pred_ctr_x - 0.5 * pred_w
            # y1
            pred_boxes[:, 1::4] = pred_ctr_y - 0.5 * pred_h
            # x2 (note: "- 1" is correct; don't be fooled by the asymmetry)
            pred_boxes[:, 2::4] = pred_ctr_x + 0.5 * pred_w - 1
            # y2 (note: "- 1" is correct; don't be fooled by the asymmetry)
            pred_boxes[:, 3::4] = pred_ctr_y + 0.5 * pred_h - 1

The jit doesn't love that, so I used:

            pred_boxes = torch.stack([pred_ctr_x - 0.5 * pred_w,
                                      pred_ctr_y - 0.5 * pred_h,
                                      pred_ctr_x + 0.5 * pred_w - 1,
                                      pred_ctr_y + 0.5 * pred_h - 1], 2).view(*rel_codes.shape)

Similarly the pooling over several levels in the roi_heads.box.feature_extractor.pooler forward seems problematic for tracing. As indexed assignment isn't for torch script, I wrote a new custom op to replace

    for level, (per_level_feature, pooler) in enumerate(zip(x, self.poolers)):
        idx_in_level = torch.nonzero(levels == level).squeeze(1)
        rois_per_level = rois[idx_in_level]
        result[idx_in_level] = pooler(per_level_feature, rois_per_level)

There still is a problem (a riddle?) around script wanting to pass a tensor list as a list of tensors and not a tuple, but tracing not accepting lists, I will have to sort that out. Maybe one could convince JIT people to allow passing tuples of tensors where the JIT wants lists of tensors.

Mask composition

This uses PIL at the moment, it'll be replaced. @fmassa has this for GPU, but I will do a CPU version and custom op for it.

Things that work at the moment

  • The backbone,
  • the rpn_head,
  • from what I understand the generated anchors only depend on the image size and are fixed for our purposes, so I didn't worry about that,
  • the post_processing forward for single feature map, I think it should not be too hard to make the entire postprocessing work, too.

So I'm now at the roi heads (as you can see above), the box first.

@fmassa
Copy link
Contributor

fmassa commented Nov 5, 2018

That's awesome progress @t-vi !

We were aware of the problems that BoxList would bring to the JIT, but that's something that we discussed with @zdevito and team, and we will want to support it in the future (but I think that an approach similar to namedtuples (that we were thinking) might not be enough as is).
But can't the torch.jit.trace work with the BoxList objects? I thought it would work...

Indexing with the JIT doesn't work very well yet (but we are improving support for it), so the approach you followed for the box_coder seems good to me. I'd hope that we could avoid a custom op for the pooler, but for that we need to have better support for mutability and indexing in the script, which is planned and being worked on I believe.

I didn't quite understand the problem with tracing the constant parameters, but I suppose this is a bug in upstream PyTorch?

Thanks a lot for all your help!

@t-vi
Copy link

t-vi commented Nov 5, 2018

So I filed the two JIT observations as issues with PyTorch (see above).

@Zehaos
Copy link

Zehaos commented Dec 15, 2018

It seems that pytorch/pytorch#13564 had been fixed.

@t-vi
Copy link

t-vi commented Dec 15, 2018

Yes, and we managed to do tracing in #138. There is a "regression" in 1.0 that invalidates the merge_levels script, so you'd currently need to replace it with a (very straightforward) custom op.

@jordanxlj
Copy link

jordanxlj commented Dec 25, 2018

@t-vi , I met a core dump bug, when I executed the trace_model.py from your patch. The core information is below.
(gdb) bt
#0 0x00007f4e28193eeb in mkldnn::impl::scales_t::set(int, int, float const*) ()
from /home/user/code/maskrcnn-benchmark/maskrcnn_benchmark/libmkldnn.so.0
#1 0x00007f4e281989e2 in mkldnn_primitive_desc_create_v2 () from /home/user/code/maskrcnn-benchmark/maskrcnn_benchmark/libmkldnn.so.0
#2 0x00007f4e2ea1a9bc in mkldnn::convolution_forward::primitive_desc::primitive_desc(mkldnn::convolution_forward::desc const&, mkldnn::engine const&) () from /home/user/code/maskrcnn-benchmark/maskrcnn_benchmark/libcaffe2.so
#3 0x00007f4e2ea16516 in at::native::mkldnn_convolution(at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::ArrayRef, c10::ArrayRef, c10::ArrayRef, long) () from /home/user/code/maskrcnn-benchmark/maskrcnn_benchmark/libcaffe2.so
#4 0x00007f4e2ebb718c in at::TypeDefault::mkldnn_convolution(at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::ArrayRef, c10::ArrayRef, c10::ArrayRef, long) const () from /home/user/code/maskrcnn-benchmark/maskrcnn_benchmark/libcaffe2.so
#5 0x00007f4e2d8c7c55 in torch::autograd::VariableType::mkldnn_convolution(at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::ArrayRef, c10::ArrayRef, c10::ArrayRef, long) const ()

@nicolasCruzW21
Copy link

Hello, any progress on this? I am also very interested.
Thank yo!

@tuboxin
Copy link

tuboxin commented May 3, 2019

Hi, any progress on this? Anyone who managed to do this?

@imranparuk
Copy link

I'm also interested in knowing about the progress on this.

@ulyssesxxxi
Copy link

Hi @t-vi @fmassa, I'm interested in using TVM to run one of the maskrcnn-benchmark models (specifically e2e_mask_rcnn_X-152-32x8d-FPN-IN5k_1.44x_caffe2), but during the conversion, it fails on torch.jit.trace() (because of BoxList). Any updates on this? Thx in advance.

@dashesy
Copy link

dashesy commented Sep 17, 2019

I get torch::jit::RegisterOperators is deprecated, what is the new way to register a C++ extension for tracing?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests