Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Latest protobuf package causes multiple problems #879

Closed
Michaelvll opened this issue Jun 2, 2022 · 2 comments · Fixed by #885
Closed

Latest protobuf package causes multiple problems #879

Michaelvll opened this issue Jun 2, 2022 · 2 comments · Fixed by #885
Labels

Comments

@Michaelvll
Copy link
Collaborator

Recently, protobuf has been upgraded to 4.21.1. That causes multiple problems:

  1. The resnet_app example fails due to the following error. The problem happens because we create a new conda environment for the user code, which installs the latest protobuf that breaks the tensorflow.
    (resnet-app pid=25538) 2022-06-02 01:09:15.880170: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
    (resnet-app pid=25538) Traceback (most recent call last):
    (resnet-app pid=25538)   File "models/official/resnet/resnet_main.py", line 71, in <module>
    (resnet-app pid=25538)     import tensorflow.compat.v1 as tf
    (resnet-app pid=25538)   File "/home/ubuntu/.conda/envs/resnet/lib/python3.7/site-packages/tensorflow/__init__.py", line 41, in <module>
    (resnet-app pid=25538)     from tensorflow.python.tools import module_util as _module_util
    (resnet-app pid=25538)   File "/home/ubuntu/.conda/envs/resnet/lib/python3.7/site-packages/tensorflow/python/__init__.py", line 41, in <module>
    (resnet-app pid=25538)     from tensorflow.python.eager import context
    (resnet-app pid=25538)   File "/home/ubuntu/.conda/envs/resnet/lib/python3.7/site-packages/tensorflow/python/eager/context.py", line 32, in <module>
    (resnet-app pid=25538)     from tensorflow.core.framework import function_pb2
    (resnet-app pid=25538)   File "/home/ubuntu/.conda/envs/resnet/lib/python3.7/site-packages/tensorflow/core/framework/function_pb2.py", line 16, in <module>
    (resnet-app pid=25538)     from tensorflow.core.framework import attr_value_pb2 as tensorflow_dot_core_dot_framework_dot_attr__value__pb2
    (resnet-app pid=25538)   File "/home/ubuntu/.conda/envs/resnet/lib/python3.7/site-packages/tensorflow/core/framework/attr_value_pb2.py", line 16, in <module>
    (resnet-app pid=25538)     from tensorflow.core.framework import tensor_pb2 as tensorflow_dot_core_dot_framework_dotShared connection to 34.210.9.118 closed.
    _tensor__pb2
    (resnet-app pid=25538)   File "/home/ubuntu/.conda/envs/resnet/lib/python3.7/site-packages/tensorflow/core/framework/tensor_pb2.py", line 16, in <module>
    (resnet-app pid=25538)     from tensorflow.core.framework import resource_handle_pb2 as tensorflow_dot_core_dot_framework_dot_resource__handle__pb2
    (resnet-app pid=25538)   File "/home/ubuntu/.conda/envs/resnet/lib/python3.7/site-packages/tensorflow/core/framework/resource_handle_pb2.py", line 16, in <module>
    (resnet-app pid=25538)     from tensorflow.core.framework import tensor_shape_pb2 as tensorflow_dot_core_dot_framework_dot_tensor__shape__pb2
    (resnet-app pid=25538)   File "/home/ubuntu/.conda/envs/resnet/lib/python3.7/site-packages/tensorflow/core/framework/tensor_shape_pb2.py", line 42, in <module>
    (resnet-app pid=25538)     serialized_options=None, file=DESCRIPTOR),
    (resnet-app pid=25538)   File "/home/ubuntu/.conda/envs/resnet/lib/python3.7/site-packages/google/protobuf/descriptor.py", line 560, in __new__
    (resnet-app pid=25538)     _message.Message._CheckCalledFromGeneratedFile()
    (resnet-app pid=25538) TypeError: Descriptors cannot not be created directly.
    (resnet-app pid=25538) If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0.
    (resnet-app pid=25538) If you cannot immediately regenerate your protos, some other possible workarounds are:
    (resnet-app pid=25538)  1. Downgrade the protobuf package to 3.20.x or lower.
    (resnet-app pid=25538)  2. Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python (but this will use pure-Python parsing and will be much slower).
    (resnet-app pid=25538) 
    (resnet-app pid=25538) More information: https://developers.google.com/protocol-buffers/docs/news/2022-05-06#python-updates
    SKY ERROR: Job 1 failed with return code list: [1]
    
  2. The installation of sky will fail if the latest protobuf was installed, which is observed when we set up the environment for Fabio (IBM). We need to pin the protobuf version in our setup file.
@concretevitamin
Copy link
Member

This breaks the three test_cancel_<cloud> tests and test_distributed_tf.

@Michaelvll
Copy link
Collaborator Author

Related issues found by @infwinston :
protocolbuffers/protobuf#10048
ray-project/ray#25211

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants