Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade to ONNX 1.15.0 #2649

Closed
wants to merge 9 commits into from
Closed

Upgrade to ONNX 1.15.0 #2649

wants to merge 9 commits into from

Conversation

hamptonm1
Copy link
Collaborator

@hamptonm1 hamptonm1 commented Nov 30, 2023

I finally was able to get us to upgrade to ONNX 1.15.0!! YES!! After hours of testing and debugging I found out the problem. Please read below:

@hamptonm1 hamptonm1 self-assigned this Nov 30, 2023
@hamptonm1
Copy link
Collaborator Author

I was receiving this error only for RHEL builds of ONNX-MLIR:

/usr/lib64/python3.9/site-packages/protobuf-4.21.12-py3.9-linux-s390x.egg/google/protobuf/internal/api_implementation.py:109: UserWarning: Selected implementation cpp is not available.
  warnings.warn(
Traceback (most recent call last):
  File "/workdir/onnx-mlir/build/docs/doc_example/gen_add_onnx.py", line 1, in <module>
    import onnx
  File "/usr/local/lib64/python3.9/site-packages/onnx/__init__.py", line 75, in <module>
    from onnx import serialization
  File "/usr/local/lib64/python3.9/site-packages/onnx/serialization.py", line 16, in <module>
    import google.protobuf.json_format
  File "/usr/lib64/python3.9/site-packages/protobuf-4.21.12-py3.9-linux-s390x.egg/google/protobuf/json_format.py", line 54, in <module>
    from google.protobuf.internal import type_checkers
  File "/usr/lib64/python3.9/site-packages/protobuf-4.21.12-py3.9-linux-s390x.egg/google/protobuf/internal/type_checkers.py", line 51, in <module>
    from google.protobuf.internal import decoder
  File "/usr/lib64/python3.9/site-packages/protobuf-4.21.12-py3.9-linux-s390x.egg/google/protobuf/internal/decoder.py", line 87, in <module>
    from google.protobuf.internal import encoder
  File "/usr/lib64/python3.9/site-packages/protobuf-4.21.12-py3.9-linux-s390x.egg/google/protobuf/internal/encoder.py", line 71, in <module>
    from google.protobuf.internal import wire_format
  File "/usr/lib64/python3.9/site-packages/protobuf-4.21.12-py3.9-linux-s390x.egg/google/protobuf/internal/wire_format.py", line 36, in <module>
    from google.protobuf import descriptor
  File "/usr/lib64/python3.9/site-packages/protobuf-4.21.12-py3.9-linux-s390x.egg/google/protobuf/descriptor.py", line 51, in <module>
    from google.protobuf.pyext import _message
ImportError: libprotobuf.so.32: cannot open shared object file: No such file or directory
make[2]: *** [docs/doc_example/CMakeFiles/OMRuntimeTestModel.dir/build.make:73: docs/doc_example/libadd.so] Error 1
make[1]: *** [CMakeFiles/Makefile2:11974: docs/doc_example/CMakeFiles/OMRuntimeTestModel.dir/all] Error 2

The main error is found here: /usr/lib64/python3.9/site-packages/protobuf-4.21.12-py3.9-linux-s390x.egg/google/protobuf/internal/api_implementation.py:109: UserWarning: Selected implementation cpp is not available.

After searching this warning which is located in google/protobuf/internal/api_implementation.py. There is a block of code for cpp implementation that we are hitting due to the flag but it is failing every time.

if _implementation_type == 'cpp':
  try:
    # pylint: disable=g-import-not-at-top
    from google.protobuf.pyext import _message
    sys.modules['google3.net.proto2.python.internal.cpp._message'] = _message
    _c_module = _message
    del _message
  except ImportError:
    # TODO: fail back to python
    warnings.warn(
        'Selected implementation cpp is not available.')
    pass

It seems like building Protobuf from source has known issues of confusing pure python implementation with cpp implementation hence the error. So when removing the cpp flag we were able to get a successful build. This flag is NOT needed for us to build Protobuf, it is supposed to provide performance enhancement for Protobuf messages but it does not negatively impact us if we remove it.

Here are further links which conclude the issue with using --cpp_implementation:
protocolbuffers/protobuf#9014
protocolbuffers/protobuf#539
https://chromium.googlesource.com/external/github.com/google/protobuf/+/v3.0.0-beta-2/python/

@hamptonm1 hamptonm1 marked this pull request as ready for review November 30, 2023 22:49
@hamptonm1
Copy link
Collaborator Author

@gongsu832 @AlexandreEichenberger If you have any questions please feel free to let me know but I spent hours debugging for the build to finally go through successfully. Thanks!

@@ -34,6 +34,9 @@ RUN distro=$(cat /etc/os-release|grep -Po '(?<=^ID=").*(?=")|(?<=^ID=)[^"].*[^"]
python3 python3-dev python3-distutils python3-numpy \
python3-pip python3-pytest-xdist python3-setuptools \
python3-typing-extensions zlib1g-dev && \
# Use newer version of setuptools
Copy link
Collaborator Author

@hamptonm1 hamptonm1 Nov 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also this upgrade of pip and setuptools is required to bypass another error of UNKNOWN Wheel.

@gongsu832
Copy link
Collaborator

No we cannot remove --cpp_implementation. It's there for a reason. It performs much better than the pure python implementation (see https://daobook.github.io/protobuf/docs/performance.html). Onnx uses protobuf extensively (@AlexandreEichenberger @chentong319 @tungld probably know better) for model data so its performance very much impacts what onnx and therefore onnx-mlir does.

It's extremely unlikely that a minor version upgrade for one package (onnx) would require changing the implementation of another third party dependent package (protobuf). The error you get is almost certainly due to factor(s) not related to upgrading onnx itself. If you have trouble making it work, I can try to fix it with another PR.

@hamptonm1
Copy link
Collaborator Author

hamptonm1 commented Dec 1, 2023

No we cannot remove --cpp_implementation. It's there for a reason. It performs much better than the pure python implementation (see https://daobook.github.io/protobuf/docs/performance.html). Onnx uses protobuf extensively (@AlexandreEichenberger @chentong319 @tungld probably know better) for model data so its performance very much impacts what onnx and therefore onnx-mlir does.

It's extremely unlikely that a minor version upgrade for one package (onnx) would require changing the implementation of another third party dependent package (protobuf). The error you get is almost certainly due to factor(s) not related to upgrading onnx itself. If you have trouble making it work, I can try to fix it with another PR.

@gongsu832 According to our git history we did not always have -cpp_implementation so no Gong Su it is not required. Second, I have spent hours working on this. So if you can come up with a better solution, please do. I am all in favor but my PR that I closed out before demonstrates that simply upgrading ONNX to 1.15.0 does not work like you so happen to imply. The four weeks I have been working on this and the 163 commits is testimony to that fact.

@hamptonm1
Copy link
Collaborator Author

hamptonm1 commented Dec 1, 2023

You added -cpp-implementation on October 1, 2022 which is last year #1752. So then how did things work beforehand? Was the performance of ONNX-MLIR significantly improved from last year when the flag was added? What are the stats/numbers for improvement when incorporating said flag? What is the delta? I understand the -cpp-implementation flag is supposed to improve performance but if there is not a huge difference in terms of performance, I am not seeing the argument here.

@hamptonm1
Copy link
Collaborator Author

hamptonm1 commented Dec 1, 2023

@tungld @AlexandreEichenberger @chentong319 If you have strong feelings for keeping this flag, please let me know what works best for the community. Thanks! :)

@gongsu832
Copy link
Collaborator

I didn't say --cpp_implementation is required for protobuf to work. I said we cannot remove it because the performance of the pure python implementation after removing --cpp_implementation is abysmal and I have given the link to the performance comparison.

We added the --cpp_implementation flag last year after we found out it's available.

And yes, I will have a better solution that will upgrade onnx without requiring protobuf implementation change.

@hamptonm1
Copy link
Collaborator Author

I didn't say --cpp_implementation is required for protobuf to work. I said we cannot remove it because the performance of the pure python implementation after removing --cpp_implementation is abysmal and I have given the link to the performance comparison.

We added the --cpp_implementation flag last year after we found out it's available.

And yes, I will have a better solution that will upgrade onnx without requiring protobuf implementation change.

Of course you will!

@chentong319
Copy link
Collaborator

No idea about the impact of --cpp-implementation. Just looked into the setup.py. Could this comment in the code give us any hint?

  if GetOptionFromArgv('--cpp_implementation'):
    # Link libprotobuf.a and libprotobuf-lite.a statically with the
    # extension. Note that those libraries have to be compiled with
    # -fPIC for this to work.

@hamptonm1
Copy link
Collaborator Author

No idea about the impact of --cpp-implementation. Just looked into the setup.py. Could this comment in the code give us any hint?

  if GetOptionFromArgv('--cpp_implementation'):
    # Link libprotobuf.a and libprotobuf-lite.a statically with the
    # extension. Note that those libraries have to be compiled with
    # -fPIC for this to work.

Yeah libtools should take care of that for us but the above error also mentions:

ImportError: libprotobuf.so.32: cannot open shared object file: No such file or directory

When I do a find / -name "libprotobuf.so.32" -print 2> /dev/null \ of libprotobuf.so.32 the file does exists and I also did a check to see if there were conflicting protobuf versions being pulled in by ONNX and now that we are on the same version as them... there are no conflicting versions installed. Also ONNX only installs protobuf if they cannot detect that a version is already installed.

/usr/local/lib/libprotobuf.so.32
/workdir/protobuf/src/.libs/libprotobuf.so.32
-rwxr-xr-x 1 root root     1012 Nov 17 11:23 /usr/local/lib/libprotobuf-lite.la
lrwxrwxrwx 1 root root       27 Nov 17 11:23 /usr/local/lib/libprotobuf-lite.so -> libprotobuf-lite.so.32.0.12
lrwxrwxrwx 1 root root       27 Nov 17 11:23 /usr/local/lib/libprotobuf-lite.so.32 -> libprotobuf-lite.so.32.0.12
-rwxr-xr-x 1 root root  6331648 Nov 17 11:23 /usr/local/lib/libprotobuf-lite.so.32.0.12
-rwxr-xr-x 1 root root      982 Nov 17 11:23 /usr/local/lib/libprotobuf.la
lrwxrwxrwx 1 root root       22 Nov 17 11:23 /usr/local/lib/libprotobuf.so -> libprotobuf.so.32.0.12
lrwxrwxrwx 1 root root       22 Nov 17 11:23 /usr/local/lib/libprotobuf.so.32 -> libprotobuf.so.32.0.12
-rwxr-xr-x 1 root root 40894872 Nov 17 11:23 /usr/local/lib/libprotobuf.so.32.0.12
-rwxr-xr-x 1 root root     1000 Nov 17 11:23 /usr/local/lib/libprotoc.la
lrwxrwxrwx 1 root root       20 Nov 17 11:23 /usr/local/lib/libprotoc.so -> libprotoc.so.32.0.12
lrwxrwxrwx 1 root root       20 Nov 17 11:23 /usr/local/lib/libprotoc.so.32 -> libprotoc.so.32.0.12
-rwxr-xr-x 1 root root 56825384 Nov 17 11:23 /usr/local/lib/libprotoc.so.32.0.12

@hamptonm1 hamptonm1 closed this Dec 1, 2023
@hamptonm1 hamptonm1 deleted the hamptonm/feature/onnx-1.15.0-upgrade branch December 1, 2023 20:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants