-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(wip)(do not merge) Run all tests on relay #338
Conversation
I'll have to take a look at the failure. I'll do that after I review the live infer PR. |
My current impression is that there is a problem in the implementation of conv_transpose2d for relay (maybe only in certain cases). I've made a patch to remove the bias argument since it's bad design. As for the root cause, I'm not 100% sure yet, and I may not have more time to look at it for a while since we are shifting to other projects for the time being. |
ce57c0c
to
1cfde49
Compare
Hi! I am still trying to fix I tested conv_transpose2d with same parameters but changing only output_padding with
So, I deduced that trailing rows and/or columns of computed input grad are filled with zeros when output padding is non-null. After checking TVM code, it seems that this zero-filling occurs here: So, I guess that output padding is still not well handled in TVM backend, and now I am working on fixing it directly in TVM code. To do that, I am checking Theano implementation of conv grad input, which is here: From what I understand now, output padding should be used to guess real input shape, because conv2d may produce same output shape for different input shapes when stride is not (1, 1), but TVM currently uses output padding only after internal conv2d has run. Theano, instead, receives input_shape instead of output padding as parameter, and seems to handle it before running internal conv2d. I still need further reading to fully understand everything, but I hope I could ultimately fix it this week, at most. |
8dc356f
to
5f2eefd
Compare
See: apache/tvm#4318 for why the fix was reverted apparently. |
Generalize pytorch backend tests to all backend.
Pass kwargs to eqtest through run. Update eqtest for torch.Tensor, taking atol and rtol. Replace bias with None and back to run_no_relay in test_torch_conv_transpose2d. Add testing in both backends for: - test_conv2d - test_conv2d__non_tuple_args - test_conv2d__group3 - test_conv_transpose2d
Apply flake8, pydocstyle myia, isort and black.
5f2eefd
to
d8a04d0
Compare
Hi! Still on conv2d_transpose. I did not understand how to fix the bug addressed on old @abergeron TVM pull request related to conv2d_transpose ( here: apache/tvm#4318 (comment) ), so I decided to simply implement conv2d_transpose using Theano implementation, so that:
As conv2d_transpose is ultimately a wrapper around conv2d, I started by implementing it in pure relay graph first. If necessary, I could later translate it directly into TVM code. My current implementation is here: https://github.com/notoraptor/myia/blob/full-test-relay/tmp/shape_conv.py#L387 It's based on Theano implementation, which is fully extracted and cleaned here (works with numpy arrays, no theano symbols): https://github.com/notoraptor/myia/blob/full-test-relay/tmp/conv2d_transpose.py I needed to add a relay operation Script However, I currently face some hard bugs with, either segmentation fault, or script indefinitely running (I even need to manually kill process, as Ctrl+C won't work). Sometimes all tests in Theano implementation itself (in Python) seems to work as good as Pytorch, with various dilation, groups and strides. So, if I could fix these bugs, I guess we would have a good implementation for conv2d_transpose. What do you think? |
Is this PR still needed? If so, please clean it up so it is mergable, otherwise close it. |
@abergeron I will close it, as now I already know which operations are missing in relay backend. |
Missing operations in relay backend to make all tests pass:
I am currently working on
conv_transpose2d
. Issues:groups=1
anddilation=(1, 1)
: https://github.com/apache/incubator-tvm/blob/master/python/tvm/relay/op/strategy/x86.py#L169pytest -xvvs tests/frontends/test_pytorch_ops.py::test_torch_conv2d
on this branch:The main error is in this line:
let %seq.14 = nn.conv2d_transpose(%seq.3, %v_parameter26, channels=3, kernel_size=[3, 3], strides=[2, 3], output_padding=[1, 0], padding=[3, 2, 3, 2]) in particular dimension 1 conflicts 3 does not match 6; unable to unify:
Tensor[(3, 3, 3, 3), float32]and
Tensor[(3, 6, 3, 3), float32]; in particular dimension 1 conflicts 3 does not match 6; unable to unify:
Tensor[(2, 3, 4, 5), float32]and
Tensor[(2, 6, 4, 5), float32]; ;
. I still don't know what causes this.