Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable ONNX test in circle CI #3144

Merged
merged 13 commits into from
Dec 10, 2020
Merged

Enable ONNX test in circle CI #3144

merged 13 commits into from
Dec 10, 2020

Conversation

jiafatom
Copy link
Contributor

@jiafatom jiafatom commented Dec 8, 2020

The PR disable onnx test since it is failing since 11/14/2020. I did a test locally and find that onnx need update to 1.8.0. So This PR aims to enable onnx test by pulling latest onnx==1.8.0.
The root cause is that onnx_shape_inference is enabled on 11/13/2020, but the shape inference PR (committed 9/30/2020) does not handle invalid dynamic axes input well. In torch vision CI, there are some invalid axes, so we fixed it in this PR, and enable onnx test. In a separate pytorch PR, we add exception handling for invalid axes input, then it will give meaningful failure msg rather than seg fault here.

@jiafatom
Copy link
Contributor Author

jiafatom commented Dec 8, 2020

@fmassa I try to enable onnx test, it works in my local machine, but fails in the Ci build here, it shows
free(): invalid next size (fast)
which is a memory issue. Do you have any ideas on why this happen? thanks!

@fmassa
Copy link
Member

fmassa commented Dec 9, 2020

Hi @jiafatom

Thanks for the PR!

Do you have any ideas on why this happen?

I don't know what could be the issue. This looks like something inside ORT might be problematic?

@codecov
Copy link

codecov bot commented Dec 10, 2020

Codecov Report

Merging #3144 (8a1b8b9) into master (25e2085) will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #3144   +/-   ##
=======================================
  Coverage   72.75%   72.75%           
=======================================
  Files          99       99           
  Lines        8979     8979           
  Branches     1431     1431           
=======================================
  Hits         6533     6533           
  Misses       1999     1999           
  Partials      447      447           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 25e2085...8a1b8b9. Read the comment docs.

@jiafatom
Copy link
Contributor Author

jiafatom commented Dec 10, 2020

@fmassa Just fixed the issue, and test_onnx passes in CircleCI, so we are enabling test_onnx now. Could you please sign off? Thank you!

@datumbox datumbox self-requested a review December 10, 2020 23:36
Copy link
Contributor

@datumbox datumbox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work and investigation @jiafatom. Thanks for the PR.

@datumbox datumbox merged commit e337103 into pytorch:master Dec 10, 2020
@fmassa fmassa mentioned this pull request Dec 11, 2020
@datumbox
Copy link
Contributor

@jiafatom I think the test broke again, possibly due to upstream changes. Could you have a look on your side and confirm?

@jiafatom
Copy link
Contributor Author

jiafatom commented Dec 11, 2020

@datumbox
Could you please show me the link of test broken? I am checking https://travis-ci.org/github/pytorch/vision/builds
The last time the master branch runs is 12/8/2020, that is 3 days ago, before this commits checked in.

The latest job failure is https://travis-ci.org/github/pytorch/vision/jobs/748808277
but this is a v0.8.2 branch, not related to this PR.

v0.8.2 did pip install -q --user -i https://test.pypi.org/simple/ ort-nightly==1.4.0.dev202008122
I cannot find this ort-nightly in pytorch/vision master code.
This ort-nightly cannot be fetched, the possible explanation is that onnxruntime 1.6.0 released on 12/10/2020 (yesterday),
so some old ort-nightly is removed.

Try to use the released version of onnxruntime.
If you want to use nightly build, try pip install --user -i https://test.pypi.org/simple/ ort-nightly==1.5.2.dev202012031 instead.
Please let me know if you have other questions. Thanks.

@fmassa
Copy link
Member

fmassa commented Dec 11, 2020

@jiafatom check #3123 for example, where the only failing job is an ONNX test job.

@jiafatom
Copy link
Contributor Author

jiafatom commented Dec 11, 2020

@datumbox @fmassa
There is a PR merged in pytorch at 12:30 pm 12/10/2020 (yesterday). That PR changes pytorch-onnx exporter API for a special case when the last input is a dictionary. So we need fix our unit test here.
The fix is here, please sign off, thank you!

cc @spandantiwari

@spandantiwari
Copy link

spandantiwari commented Dec 11, 2020

Hi @fmassa - thank you for your help with this. @jiafatom is working with our team. This latest PR #3160 should fix the tests. It was broken because of a commit that went into PyTorch recently (just yesterday). It seems like there was a bit of difference in test running in PyTorch CI which is why this wasn't immediately detected for the PyTorch commit. We are going to update the version of tests running in PyTorch CI so that it matches vision tests with priority.
Thanks again for your help.

@datumbox
Copy link
Contributor

@spandantiwari @jiafatom I confirm that the PR #3160 fixed the issue. Thanks a lot for your speedy response and support. We just merged the fix into master. Have great weekend.

@jiafatom jiafatom deleted the enable_onnx branch December 12, 2020 00:54
@fmassa
Copy link
Member

fmassa commented Dec 14, 2020

Thanks @jiafatom and @spandantiwari for the help!

facebook-github-bot pushed a commit that referenced this pull request Dec 15, 2020
Summary: * Enable ONNX test in circle CI

Reviewed By: datumbox

Differential Revision: D25531035

fbshipit-source-id: 022b8613ac5bd8cb7165e9cbaa9daf99a9e1694e
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants