-
Notifications
You must be signed in to change notification settings - Fork 6.8k
updating ubuntu_cpu base image to 20.04 to observe failing tests regarding Python 3.8 #18445
Conversation
Hey @ma-hei , Thanks for submitting the PR
CI supported jobs: [edge, unix-cpu, sanity, clang, miscellaneous, centos-gpu, website, windows-gpu, centos-cpu, unix-gpu, windows-cpu] Note: |
When updating the base image, the Dockerfile would require some changes. You can test that the containers can still build, locally via: |
8aba6cf
to
890473d
Compare
@leezu When using Ubuntu 20.04 as the base image of Docker.build.ubuntu, I found some issues with the apt-get installation of packages such as clang-10 and doxygen. To solve the problem at hand, I went back to Ubuntu 18.04 but instead of installing python3 (in Docker.build.ubuntu), I'm installing python3.8. The image builds successfully locally. I have two questions:
|
Thanks @ma-hei. I wasn't aware that 3.8 is also available on 18.04 via the bionic-updates repository. Thus it's best to first solve the Python 3.8 bugs without updating the image. We can update to 20.04 in a separate PR. Updating to 20.04 will allow us to simplify parts of the Dockerfile that currently install dependencies from third-party resources and replace them with installation from the official 20.04 repository. This will also improve the stability of the CI (as the 3rdparty sometimes become unavailable). To update the GPU containers, we may want to wait for https://gitlab.com/nvidia/container-images/cuda/-/issues/67 |
Undefined action detected. |
@leezu I was hoping that I could observe the test failures related to Python3.8 in one of the ci/jenkins/mxnet-validation build jobs. I assume those jobs did not run because the ci/jenkins/mxnet-validation/sanity build failed. Does the failure of the sanity build look related to the python3.8 update I made in Dockerfile.build.ubuntu to you? To me it looks like the build stalled at the end and was automatically killed. |
There were some issues with the CI this morning. So let's just retry @mxnet-bot run ci [all] |
@mxnet-bot run ci [all] |
@ChaiBapchya the bot doesn't work |
In the build job ci/jenkins/mxnet-validation/unix-cpu the following command was previously failing:
I was then able to build the image successfully and was able to successfully run the ci/build.py command I mentioned above. |
ONNX 1.6 appears to have some major changes (cf. discussion in #18054). So it's possible that updating ONNX leads to test failures. The Ubuntu container you're modifying here is used for the ubuntu-cpu and ubuntu-gpu jobs. So you'd start seeing Python 3.8 related failures there. But the requirement file you modified is shared among all containers. So you may want to use https://www.python.org/dev/peps/pep-0508/#environment-markers feature to only update ONNX when running under Python 3.8 and then disable or fix the ONNX tests in the unix-cpu and unix-gpu pipeline. Alternatively you can try building ONNX 1.5 for Py 3.8 from source https://github.com/onnx/onnx#linux-and-macos @RuRo what do you think?
I think your current steps are fine |
5f8a3ef
to
3ebbd65
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.
ci/docker/install/requirements
Outdated
|
||
# Development dependencies | ||
cpplint==1.3.0 | ||
pylint==2.3.1 # pylint and astroid need to be aligned | ||
pylint==2.3.1;python_version<"3.8" | ||
pylint==2.4.4;python_version=="3.8" # pylint and astroid need to be aligned | ||
astroid==2.3.3 # pylint and astroid need to be aligned |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect your lint errors are due to not aligning the pylint and astroid version.
@mxnet-bot run ci [all] |
Jenkins CI successfully triggered : [unix-gpu, clang, centos-cpu, edge, sanity, windows-gpu, unix-cpu, miscellaneous, windows-cpu, website, centos-gpu] |
ci/docker/runtime_functions.sh
Outdated
@@ -433,6 +433,7 @@ build_ubuntu_cpu_mkl() { | |||
-DUSE_TVM_OP=ON \ | |||
-DUSE_MKL_IF_AVAILABLE=ON \ | |||
-DUSE_BLAS=MKL \ | |||
-DPYTHON_EXECUTABLE:FILEPATH=/usr/bin/python3.8 \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/usr/bin/python3 ?
@leezu I think I got to a state where I can run the unittests with python3.8 and reproduce what is described in the issue ticket #18380 As described in #18380 we're seeing an issue related to the usage of time.clock(). Besides that, I found the following issue: The test tests/python/unittest/onnx/test_node.py::TestNode::test_import_export seems to fail. In the Jenkins job I don't see the error, but when running the test locally with python3.8 and onnx 1.7, I'm getting:
I believe this is an issue in onnx 1.7 as it looks exactly like onnx/onnx#2548. I also found that the test job Python3: MKL-CPU is not running through which seems to be due to a Timeout. I believe this is happening in test tests/python/conftest.py but the log output is not telling me what exactly goes wrong here. Do you have any idea how to reproduce this locally, or to get better insight into that failure? I will now look into the following:
Btw. I believe that you and @marcoabreu are getting pinged every time I do a commit to this PR. I don't think this PR is actually in a reviewable state yet but the purpose of it is more to see what's breaking when upgrading to Python3.8. |
Thank you @ma-hei. For pylint and astroid: You have now updated astroid to latest version, but not pylint. Would updating pylint to I'm not sure about the ONNX test failure. If you can figure it out, that would be great. Feel also free to ping other people that show up in he commit history for the related mxnet onnx files. To avoid pinging people on commits, you can mark the PR as draft PR. I did that for you for now. Thanks for your help in fixing the Python 3.8 support! |
here's whats going on with onnx 1.7: onnx/onnx#2865 |
@ma-hei I also noticed this error in some unrelated cd job which was building mxnet for python 3.8. Just FYI, and not necessarily something you need to take into account here.
|
518a219
to
a917be8
Compare
Thanks @leezu, I think I found the underlying cause of the test failure in unittest/onnx/test_node.py::TestNode::test_import_export. In onnx 1.7, the input of the Pad operator has changed. We can see this by comparing https://github.com/onnx/onnx/blob/master/docs/Operators.md#Pad to https://github.com/onnx/onnx/blob/master/docs/Changelog.md#Pad-1. I believe I can fix this test and I'm working on that now. However the same test will not pass with onnx 1.5 anymore after that (but at least we know how to fix it, I guess). I assume the stacktrace you posted above from the unrelated cd job probably has some similar root cause. Besides that I'm trying to make pylint happy.. but that's the smaller issue I think. |
I see that the discussion above regarding the failing test unittest/onnx/test_node.py::TestNode::test_import_export is now obsolete since this test got removed with commit fb73de7 |
@ma-hei there is more background at #18525 (comment) |
Seems like the unit tests in the unix-cpu job are failing at this point
Trying to reproduce it locally.
|
@mxnet-bot run ci [centos-cpu] |
Jenkins CI successfully triggered : [centos-cpu] |
82c187e
to
1c2274e
Compare
1c2274e
to
ec0615e
Compare
7ae5ae6
to
17e403d
Compare
There are a couple of python lint issues that block the CI:
|
9e2d114
to
c144a8c
Compare
c144a8c
to
5ea9ff9
Compare
5ea9ff9
to
1714c6b
Compare
@leezu I think I have to give up on this issue or find a different approach. What I tried was the following:
Then I did the following : |
Description
This PR is used to observe failing tests when updating ubuntu_cpu base image to 20.04. With ubuntu 20.04 python 3.8 is used. As described in #18380 we should observe various test failures.
Checklist
Essentials
Please feel free to remove inapplicable items for your PR.
Changes
Comments