Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ONNX CI workflow is broken #5971

Closed
pmeier opened this issue May 9, 2022 · 17 comments
Closed

ONNX CI workflow is broken #5971

pmeier opened this issue May 9, 2022 · 17 comments

Comments

@pmeier
Copy link
Collaborator

pmeier commented May 9, 2022

Since the 5th of May our CI workflow for ONNX is broken (commit 970ba35). Looking at the warnings emitted by the failing tests

WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
[...]
AssertionError: The values for attribute 'shape' do not match: torch.Size([1, 4]) != torch.Size([0, 4]).

Two models are affected faster_rcnn and mask_rcnn. To reproduce run:

pytest test/test_onnx.py -k "test_faster_rcnn" 
pytest test/test_onnx.py -k "test_mask_rcnn" 

I believe a recent patch to primtorch might be the offender here. cc @neginraoof @seemethere @mruberry

@mruberry
Copy link
Contributor

mruberry commented May 9, 2022

It does seem like primTorch would be to blame because we also use the "prim" or "prims" prefix, but we don't have a prim::Constant or any C++ code

@pmeier
Copy link
Collaborator Author

pmeier commented May 9, 2022

Looking through the commit history again, pytorch/pytorch#76622 might be the offender. See #5971 (comment).

@sanchitintel

This comment was marked as resolved.

@pmeier
Copy link
Collaborator Author

pmeier commented May 11, 2022

My bad, I linked the wrong PR 🤦 Sorry for the noise. It should have been pytorch/pytorch#76875 See #5971 (comment)

@pmeier
Copy link
Collaborator Author

pmeier commented May 11, 2022

After some painful bisection, I finally found the real offender: pytorch/pytorch#73284. After seeing that the PR title contains the phrase ONNX and our ONNX tests are failing, I have no idea how I missed that when looking at the PRs 🤦

@datumbox
Copy link
Contributor

@BowenBao Philip confirmed above that one of the PRs submitted last week broke TorchVision and @atalman helped us revert it at pytorch/pytorch#77349. Please review and let us know what you think.

Since the PR is 1 week old already, other PRs might depend on it. So if you believe that a separate forward fix is needed, please feel free to close the PR and issue a new fix. We just need to resolve the issues on TorchVision CI as soon as possible. Thanks in advance ashen one.

@garymm
Copy link
Contributor

garymm commented May 12, 2022

I'll look at this today. It took us 3 months to get that PR merged so even though it's generally bad practice, I'd really like to fix forward if possible.

@datumbox
Copy link
Contributor

Thanks. I replied at pytorch/pytorch#73284 (comment)

@garymm
Copy link
Contributor

garymm commented May 12, 2022

I've got this failure locally with torch and torchvision built from master / main.

@BowenBao
Copy link
Contributor

@datumbox @garymm We have identified the root cause. Will put out a fix PR in PyTorch shortly.

@BowenBao
Copy link
Contributor

Fix has been merged in pytorch master. @datumbox please let us know if this fixes torchvision CI.

@pmeier
Copy link
Collaborator Author

pmeier commented May 13, 2022

We'll know when the fresh torch nightly drops, which is around UTC+0 10:00. I'll report back.

@pmeier
Copy link
Collaborator Author

pmeier commented May 13, 2022

@pmeier pmeier reopened this May 13, 2022
@BowenBao
Copy link
Contributor

Hi @pmeier, if my understanding is correct, it appears my fix has not been included in yesterday's nightly yet.

5-13 nightly pytorch/pytorch@44bf440, head is pytorch/pytorch@65f71c0, the fix commit is pytorch/pytorch@a812c4c after it.

@datumbox
Copy link
Contributor

datumbox commented May 13, 2022

@atalman I wonder if you could confirm the cutpoint of yesterday's nightly?

@pmeier
Copy link
Collaborator Author

pmeier commented May 13, 2022

Can confirm @BowenBao's assessment. You can verify yourself, by looking at the nightly branch. The cutoff for today (2022-05-13) was pytorch/pytorch@65f71c0. Taking that knowledge to the master branch, we can verify that pytorch/pytorch@a812c4c was three commits late.

It will make its way into tomorrows nightly. I'll retest and close this if the fix worked.

facebook-github-bot pushed a commit to pytorch/pytorch that referenced this issue May 14, 2022
Summary:
None as input is legal per ONNX spec for representing
optional inputs. For [example](https://github.com/onnx/onnx/blob/main/docs/Operators.md#inputs-2---3-7) `constant_value` for `ONNX::Pad`.
This PR removes such constraint check that was set prior
to calling onnx shape inference. For the issue below, such
constraint prevents the onnx shape inference of `ONNX::Pad`,
which leads to falling back on an incorrect constant traced
shape.
For the unit test in this PR, prior to this PR, the ONNX shape inference
for `ONNX::Pad` would be skipped, and would return `None` instead.

Fixes pytorch/vision#5971

Pull Request resolved: #77379
Approved by: https://github.com/garymm

Test Plan: contbuild & OSS CI, see https://hud.pytorch.org/commit/pytorch/pytorch/a812c4cd96d94d51627d2af290ae87de34169ec0

Reviewed By: atalman

Differential Revision: D36378914

fbshipit-source-id: a11be0f9666dd637490db80725f7021b328c9f27
@datumbox
Copy link
Contributor

Fixed on latest nightly

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants