-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Frontend][Pytorch] Improve Pytorch frontend for object detection models #6449
Conversation
@kevinthesun Thanks for working on this. Can you split this into multiple PRs? In particular, besides the new op conversion, you made many non trivial changes to existing ops. Without tests for the latter changes, it is hard to tell what they are for. We can merge the new op conversion first (as they came with tests). |
@masahi These changes are mainly for torch vision rcnn models which enhances current features. There is another PR adding some backend stuffs. After that I can add e2e torch vision rcnn tests into this PR which should cover all changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for working on this, @kevinthesun !
I must admit I'm a bit concerned about the typing changes moving us back to doing things in a less systematic way. Most importantly
- Have a view how we handle types (where in the conversion we switch from PyTorch types to Python/TVM ones) and try to avoid per-op rules where possible. (Kudos for unifying the
full
shape arguments!) - One of the patterns I would like to try to avoid is
try: ... except:
as a regular way of processing inputs of different types. It would seem that it might blur our understanding of what going on.
There seems to be a lot going on at the same time, if @masahi's suggestion to split would be feasible, it would be easier to see what is what.
python/tvm/relay/frontend/pytorch.py
Outdated
dtype0 = _infer_type(inputs[0]).checked_type.dtype | ||
if isinstance(inputs[1], _expr.Expr): | ||
dtype1 = _infer_type(inputs[1]).checked_type.dtype | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I must admit that I'd appreciate if there were more commentary to the typing changes here.
- In my opinion (and I could be wrong), it would be helpful to have a view what kind of types
input_types
andinputs
can have and have a single place where we do implicit type promotion. I had hoped_pytorch_promote_types
could be that. - If
_pytorch_promote_types
doesn't do the job, maybe we can comment why it isn't. Also why is this particular apparently particular elementwise ops as opposed to amending_pytorch_promote_types
?
I know this looks like I'm asking for busywork when you're mostly interested in getting a particular to work, but I have the impression that we would want to avoid ad hoc type workarounds as much as possible if we want to avoid having subtle bugs whenever someone uses something outside what our unit tests catch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comes from weird behavior of prim::NumToTensor
. It converts int32 to int64 silently:
%11 : int = aten::size(%img.1, %10), scope: __module.model # /usr/local/lib/python3.6/dist-packages/torchvision/models/detection/generalized_rcnn.py:62:0
%im_h : Long() = prim::NumToTensor(%11), scope: __module.model
Right now py frontend just follow use the same dtype for this op output. For an elemwise op, pytorch input dtype is ["int64", "int64"] which is fine. However, the actual input dtype is ["int64", "int32"]. What I can do is to enhance _pytorch_promote_types
so that we do _infer_type for every input and get actual input dtype, rather than solely relying on pytorch input dtype. Sounds like a plan?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we would eventually want to look at using type propagation more.
However, the issue here is that PyTorch's default dtype for integral tensors is int64. I don't think we should be hacking around that, really, because we're bound to end up with cases where int64 is the right thing to have. If I understood the discussions on the forum correctly, the idea was to downcast 64 bit indexing to 32 based if it is considered safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
%11 : int = aten::size(%img.1, %10)
generates int32 but %im_h : Long() = prim::NumToTensor(%11)
automatically converts it to int64, without any hint. When we converting prim::NumToTenso
, we can just follow the input type which is int32 here since there is no any other information. So this is about the weird behavior of prim::NumToTenso
rather than indexing. I'm not sure how many other ops in pytorch has such behavior, but it looks like inferring actual input type in _pytorch_promote_types
would fix these kind of issues.
k = int(_infer_value(inputs[1], {}).asnumpy().tolist()) | ||
k = _expr.const(k) | ||
except Exception: | ||
k = inputs[1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The int is not needed here?
Also it might be worth trying to avoid try: ... except Exception:
during non-error-processing in favour of if isinstance(....): ... else:
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The try except block is mainly for _infer_value. Currently there is no very secure way to try _infer_value with explicit error types. That's why a general Exception is used here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still, I would prefer looking at what the type of inputs[1]
is and have an if
. We should at least know which types are good to leave as is (the current except block).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. I can do what I did for arange. It's checking whether input is type _expr.Expr.
python/tvm/relay/frontend/pytorch.py
Outdated
begin = _op.concatenate(tmp, axis=0) | ||
btype = _infer_type(begin).checked_type.dtype | ||
if str(btype) != "int32": | ||
begin = _op.cast(begin, "int32") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
int32 here and index_size limit 2**63-1 feels strange.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use int64 now.
if isinstance(inputs[3], _expr.Expr): | ||
try: | ||
target_end = np.asscalar(_infer_value(inputs[3], {}).asnumpy().astype(np.int)) | ||
except Exception: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For which types do we want to do this (or alternatively which can go straight through)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if isinstance(inputs[3], _expr.Expr):
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd have a strong preference for that, yeah.
new_shape.append(dim) | ||
else: | ||
try: | ||
dim = int(_infer_value(dim, {}).asnumpy()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, too, maybe avoid: try: .. except:
(there are more places, I didn't flag them all, but I think they should be all changed to use plain if
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same. These try except blocks are necessary to handle dynamic operators.
@kevinthesun sounds like you already have mask rcnn working :) can't wait |
@masahi Coming soon. :D |
6847dfa
to
f394a23
Compare
If |
@masahi The problem of creating a try_infer_value API is that it doesn't simplify the codes since we need to do various handling in except block for different ops. We still need to check the output of try_infer_value and have a branching to decide what actions to take. In some cases we also need to do more processing in try block. There is no uniform logic for such dynamic attribute inference. |
But we should still know when it is appropriate to run the inference, should we not? |
We do this for some ops which have dynamic attributes. When those dynamic attributes are relay Expr, we need to try to infer value to make the generated relay program as static as possible. There is no good way to further tell which Relay Expr needs to be inferred(And not necessary since _infer_value does general evaluation for Relay Expr). |
E2E tests added. Now waiting for #6464. |
Thanks for the efforts and discussions. @kevinthesun could you please summarize the solutions/decisions to align with @masahi and @t-vi so that we can move forward? |
@zhiics @masahi @t-vi Sure. One major thing in this PR is the handling for dynamic operators such as slice, arange and topk. These ops have dynamic attribute which affects relay type inference. The methodology here is to try to infer these values to make them as static as possible(Similar in tf parser). @masahi suggested we can have an API wrap around @t-vi suggested we can check the input type to see whether we need to do such Any comments or suggestions? |
Is moving to fully dynamic frontend like being done for onnx #6351 help removing infer_value usage? |
While we try to make relay expression as static as possible, a lot of work can still only be done in frontend. For these cases we still need to use _infer_value. This happens a lot for tf/pt od models. For some simple ops such as topk, we can directly use dyn namespace op and eliminate _infer_value. Later when we gradually improve dynamic ops it's possible to eliminate more _infer_values. |
@kevinthesun Thanks I'm trying running e2e on my end. I have following questions:
|
@zhiics @kevinthesun @masahi I think using
Neither am I entirely sure whether 1. is contentious or not and to me it would seem that a PR is an odd place to form an opinion on 2. At the same time I see the construct as problematic enough to have a really hard time liking the current state of the PR. It would bring great joy if you could be convinced to move it to I should emphasize that I'm entirely for having the new functions appreciate your @kevinthesun work on this. Thank you! |
f5195bf
to
7ab4f31
Compare
@t-vi Thanks for your thoughts. To handle dynamic op correctly, we have to use |
@masahi It looks like certain recent change causes this error. I'm investigating. |
I don't actually get why we need try: ... except: ... what case does it not handle that is handled properly by the except part? |
Original pt frontend just handles limited cases, mostly static shape/attributes. It is fine we just keep input as it is for static models. For more dynamic models, we need to do some extra work to reduce the dynamism during type inference. For example, there is a chance to reduce output shape of (?, ?, ?) to (1, ?, ?) in a dynamic op. This is necessary otherwise it's hard to ensure we are doing the right thing for backend. That error pointed out by @masahi is exactly the case. The input shape of |
I think I'm slowly starting to understand. But couldn't one have something like |
As I have discussed with @masahi, the problem of having a try interface is that there is no common logic between different dynamic ops when dealing with dynamic attributes. We need to take different actions in try/except block depending on the actual op. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, the bits I had in mind seem OKish even if it creates cleanup opportunity.
@kevinthesun The maskrcnn test worked for me, great! But unfortunately under torch 1.6, conversion fails with the following error. Most likely they come from pytorch function that are scripted https://github.com/pytorch/vision/blob/1a04d3c265679e1a508e7cd627006aaa9ef1ccfb/torchvision/models/detection/roi_heads.py#L454. Most of them don't make sense for tracing (like raising an exception). We need to come back to this problem later when we upgrade our CI.
|
@masahi Yeah. Those ops look like coming from scripted model. I believe for pt 1.6 if we trace the model there are 2 or 3 ops missing. |
To be clear, that list op is coming from tracing mask rcnn model, since mask rcnn is partly scripted, even if we For faster rcnn, which is not partly scripted and thus can be traced completely, I get following missing ops with PyTorch 1.6
|
pt_scores = pt_res[1].detach().numpy().tolist() | ||
tvm_scores = tvm_res[1].asnumpy().tolist() | ||
num_pt_valid_scores = num_tvm_valid_scores = 0 | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm comparing the two output (box coordinates etc) by eye balling the raw numerical values, and it looks good!
I hope we can have a better way to test the outputs, for example extracting valid box indices based on score, sort indices by score, and sort boxes by sorted indices, like I did below.
In [59]: boxes_pt[ind_pt]
Out[59]:
array([[2.04335907e+02, 1.14787331e+02, 2.59456146e+02, 2.23669510e+02],
[1.44117985e+01, 1.24377182e+02, 6.13694534e+01, 2.14236847e+02],
[1.74448120e+02, 1.58607117e+02, 2.78158417e+02, 2.36064560e+02],
[1.17156494e+02, 1.18118942e+02, 1.53017059e+02, 1.92442230e+02],
[1.00772736e+02, 1.22123978e+02, 1.23872040e+02, 1.93398422e+02],
[1.49618347e+02, 1.32603149e+02, 2.18598679e+02, 1.74433960e+02],
[2.13966250e-01, 1.39350525e+02, 1.12648888e+01, 1.53912018e+02],
[1.33723541e+02, 1.24649574e+02, 1.64407623e+02, 1.61921951e+02],
[8.67264709e+01, 1.28565033e+02, 9.51557159e+01, 1.56289093e+02]],
dtype=float32)
In [60]: boxes_tvm[ind_tvm]
Out[60]:
array([[204.3359 , 114.78732 , 259.45615 , 223.66951 ],
[ 14.411795 , 124.37717 , 61.369446 , 214.23685 ],
[174.44815 , 158.60712 , 278.1584 , 236.06454 ],
[117.156494 , 118.118935 , 153.01706 , 192.44223 ],
[100.772736 , 122.12396 , 123.87204 , 193.39842 ],
[149.61836 , 132.60315 , 218.5987 , 174.43396 ],
[ 0.39432764, 139.76776 , 11.332638 , 153.84328 ],
[133.72354 , 124.64958 , 164.40762 , 161.92194 ],
[ 86.72647 , 128.56502 , 95.155716 , 156.28911 ]],
dtype=float32)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. Ideally in this case we should test against testing data set to calculate MAP. We have tested against coco data set and the accuracy is fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I gave up supporting mask rcnn from torchvision in last January, concluding that was not possible with TVM at that time. Really great to see this happening!!
Thanks @kevinthesun for the great work!! |
…els (apache#6449) * Improve Pytorch Frontend * Add tests * Fix pylint * Improve data cast * Use int64 for slice axis * Fix lint * fix roi_align(..., aligned=True) * Minor fix * Add e2e test * Add asf header * Minor change * Use dynamic topk * Improve test * Rollback topk * py format * remove print * More improve * Fix test * Improve addmm * Fix test * Fix format * Fix format * Fix test scatter Co-authored-by: q.yao <[email protected]>
…els (apache#6449) * Improve Pytorch Frontend * Add tests * Fix pylint * Improve data cast * Use int64 for slice axis * Fix lint * fix roi_align(..., aligned=True) * Minor fix * Add e2e test * Add asf header * Minor change * Use dynamic topk * Improve test * Rollback topk * py format * remove print * More improve * Fix test * Improve addmm * Fix test * Fix format * Fix format * Fix test scatter Co-authored-by: q.yao <[email protected]>
…els (apache#6449) * Improve Pytorch Frontend * Add tests * Fix pylint * Improve data cast * Use int64 for slice axis * Fix lint * fix roi_align(..., aligned=True) * Minor fix * Add e2e test * Add asf header * Minor change * Use dynamic topk * Improve test * Rollback topk * py format * remove print * More improve * Fix test * Improve addmm * Fix test * Fix format * Fix format * Fix test scatter Co-authored-by: q.yao <[email protected]>
|
Some necessary improvements for pytorch od models.
@zhiics @yongwww @masahi