Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ddtrace fails to serialize Python objects by value with cloudpickle #5889

Closed
lukesturgis opened this issue May 17, 2023 · 4 comments
Closed

Comments

@lukesturgis
Copy link

Summary of problem

ddtrace does not work with registering serialized Python objects by value. This worked with a previous version of ddtrace (0.55.1) but broke when we upgraded to 1.5.0. To sanity check, I tried importing 1.12.6 as well with the latest version of cloudpickle (2.2.1) and it did not resolve the issue.

It seems to take the following route when broken down: cloudpickle/cloudpickle_fast.py:632 attempts to dump the object, which goes to cpython/blob/3.11/Lib/pickle.py:476, which dumps the object then attempts to save it in cpython/blob/3.11/Lib/pickle.py#L535, which calls save_reduce in cpython/blob/3.11/Lib/pickle.py#L603, and the object is somehow malformed.

I've confirmed this only happens when ddtrace is imported before cloudpickle, and somewhere in ddtrace's imports it seems to modify the behavior of the import structure of class objects, but did not happen before. Please let me know if I can get any more information over. Thanks!

Exception has occurred: PicklingError
args[0] from __newobj__ args has the wrong class
  File "/home/lukes/cloudpickle_test/python/cloudpickle/2/2/1/dist/lib/python3.9/cloudpickle/cloudpickle_fast.py", line 632, in dump
    return Pickler.dump(self, obj)
  File "/home/lukes/cloudpickle_test/python/cloudpickle/2/2/1/dist/lib/python3.9/cloudpickle/cloudpickle_fast.py", line 73, in dumps
    cp.dump(obj)
  File "/home/lukes/cloudpickle_test/python/modelingtests.py", line 24, in <module>
    x_after = pickle.dumps(to_pickle)
_pickle.PicklingError: args[0] from __newobj__ args has the wrong class
cloudpickle/cloudpickle_fast.py:632
            return Pickler.dump(self, obj)
https://github.com/python/cpython/blob/3.11/Lib/pickle.py#L476
    def dump(self, obj):
        """Write a pickled representation of obj to the open file."""
        # Check whether Pickler was initialized correctly. This is
        # only needed to mimic the behavior of _pickle.Pickler.dump().
        if not hasattr(self, "_file_write"):
            raise PicklingError("Pickler.__init__() was not called by "
                                "%s.__init__()" % (self.__class__.__name__,))
        if self.proto >= 2:
            self.write(PROTO + pack("<B", self.proto))
        if self.proto >= 4:
            self.framer.start_framing()
        self.save(obj)
        self.write(STOP)
        self.framer.end_framing()
https://github.com/python/cpython/blob/3.11/Lib/pickle.py#L535
    def save(self, obj, save_persistent_id=True):


https://github.com/python/cpython/blob/3.11/Lib/pickle.py#L603
        self.save_reduce(obj=obj, *rv)


https://github.com/python/cpython/blob/3.11/Lib/pickle.py#L621
    def save_reduce(self, func, args, state=None, listitems=None,
                    dictitems=None, state_setter=None, *, obj=None):
        # This API is called by some subclasses

            cls, args, kwargs = args

Which version of dd-trace-py are you using?

1.12.6

Which version of pip are you using?

22.0.4

Which libraries and their versions are you using?

attrs==20.3.0 bytecode==0.13.0 cattrs==1.3.0 click==8.1.3 cloudpickle==2.2.1 coverage==6.3.1 ddsketch==2.0.4 ddtrace==1.12.6 Deprecated==1.2.13 envier==0.4.0 Flask==2.1.3 gunicorn==20.1.0 importlib-metadata==5.0.0 isort==5.10.1 itsdangerous==2.1.2 Jinja2==3.1.2 jsonschema==3.2.0 MarkupSafe==2.1.1 more-itertools==8.8.0 opentelemetry-api==1.17.0 packaging==20.9 protobuf==3.19.3 pyparsing==2.4.7 pyrsistent==0.15.5 six==1.14.0 tenacity==8.0.1 types-setuptools==57.4.17 typing_extensions==4.4.0 uWSGI==2.0.20 uwsgidecorators==1.1.0 Werkzeug==2.1.2 wrapt==1.11.2 xmltodict==0.13.0 zipp==0.6.0

How can we reproduce your problem?

main.py

import cloudpickle as pickle
import ddtrace

import test

def to_pickle():
    return test.test()


x_before = pickle.dumps(to_pickle)
print(x_before)

pickle.register_pickle_by_value(test)
x_after = pickle.dumps(to_pickle)
print(x_after)

test.py

def test():
    return 1

What is the result that you get?

b'\x80\x05\x95\x19\x02\x00\x00\x00\x00\x00\x00\x8c\x17cloudpickle.cloudpickle\x94\x8c\x0e_make_function\x94\x93\x94(h\x00\x8c\r_builtin_type\x94\x93\x94\x8c\x08CodeType\x94\x85\x94R\x94(K\x00K\x00K\x00K\x00K\x02KCC\x08t\x00\xa0\x00\xa1\x00S\x00\x94N\x85\x94\x8c\x04test\x94\x85\x94)\x8cM/home/lukes/cloudpickle_test/python/modelingtests.py\x94\x8c\tto_pickle\x94K\x10C\x02\x00\x01\x94))t\x94R\x94}\x94(\x8c\x0b__package__\x94N\x8c\x08__name__\x94\x8c\x08__main__\x94\x8c\x08__file__\x94h\x0cuNNNt\x94R\x94\x8c\x1ccloudpickle.cloudpickle_fast\x94\x8c\x12_function_setstate\x94\x93\x94h\x17}\x94}\x94(h\x13h\r\x8c\x0c__qualname__\x94h\r\x8c\x0f__annotations__\x94}\x94\x8c\x0e__kwdefaults__\x94N\x8c\x0c__defaults__\x94N\x8c\n__module__\x94h\x14\x8c\x07__doc__\x94N\x8c\x0b__closure__\x94N\x8c\x17_cloudpickle_submodules\x94]\x94\x8c\x0b__globals__\x94}\x94h\nh\x00\x8c\tsubimport\x94\x93\x94h\n\x85\x94R\x94su\x86\x94\x86R0.'

Traceback (most recent call last):
  File "home/lukes/cloudpickle_test/python/modelingtests.py", line 24, in <module>
    x_after = pickle.dumps(to_pickle)
  File "home/lukes/cloudpickle_test/python/cloudpickle/2/2/1/dist/lib/python3.9/cloudpickle/cloudpickle_fast.py", line 73, in dumps
    cp.dump(obj)
  File "home/lukes/cloudpickle_test/python/cloudpickle/2/2/1/dist/lib/python3.9/cloudpickle/cloudpickle_fast.py", line 632, in dump
    return Pickler.dump(self, obj)
_pickle.PicklingError: args[0] from __newobj__ args has the wrong class

What is the result that you expected?

b'\x80\x05\x95\x19\x02\x00\x00\x00\x00\x00\x00\x8c\x17cloudpickle.cloudpickle\x94\x8c\x0e_make_function\x94\x93\x94(h\x00\x8c\r_builtin_type\x94\x93\x94\x8c\x08CodeT
ype\x94\x85\x94R\x94(K\x00K\x00K\x00K\x00K\x02KCC\x08t\x00\xa0\x00\xa1\x00S\x00\x94N\x85\x94\x8c\x04test\x94\x85\x94)\x8cM/home/lukes/cloudpickle_test/python/modelingtests.py\x94\x8c\tto_pickle\x94K\x10C\x02\x00\x01\x94))t\x94R\x94}\x94(\x8c\x0b__package__\x94N\x8c\x08__name__\x94\x8c\x08__main__\x94\x8c
\x08__file__\x94h\x0cuNNNt\x94R\x94\x8c\x1ccloudpickle.cloudpickle_fast\x94\x8c\x12_function_setstate\x94\x93\x94h\x17}\x94}\x94(h\x13h\r\x8c\x0c__qualname__\x94h\r\
x8c\x0f__annotations__\x94}\x94\x8c\x0e__kwdefaults__\x94N\x8c\x0c__defaults__\x94N\x8c\n__module__\x94h\x14\x8c\x07__doc__\x94N\x8c\x0b__closure__\x94N\x8c\x17_cloudpickle_submodules\x94]\x94\x8c\x0b__globals__\x94}\x94h\nh\x00\x8c\tsubimport\x94\x93\x94h\n\x85\x94R\x94su\x86\x94\x86R0.'
b'\x80\x05\x95\xad\x04\x00\x00\x00\x00\x00\x00\x8c\x17cloudpickle.cloudpickle\x94\x8c\x0e_make_function\x94\x93\x94(h\x00\x8c\r_builtin_type\x94\x93\x94\x8c\x08CodeType\x94\x85\x94R\x94(K\x00K\x00K\x00K\x00K\x02KCC\x08t\x00\xa0\x00\xa1\x00S\x00\x94N\x85\x94\x8c\x04test\x94\x85\x94)\x8cMhome/lukes/cloudpickle_test/python/modelingtests.py\x94\x8c\tto_pickle\x94K\x10C\x02\x00\x01\x94))t\x94R\x94}\x94(\x8c\x0b__package__\x94N\x8c\x08__name__\x94\x8c\x08__main__\x94\x8c\x08__file__\x94h\x0cuNNNt\x94R\x94\x8c\x1ccloudpickle.cloudpickle_fast\x94\x8c\x12_function_setstate\x94\x93\x94h\x17}\x94}\x94(h\x13h\r\x8c\x0c__qualname__\x94h\r\x8c\x0f__annotations__\x94}\x94\x8c\x0e__kwdefaults__\x94N\x8c\x0c__defaults__\x94N\x8c\n__module__\x94h\x14\x8c\x07__doc__\x94N\x8c\x0b__closure__\x94N\x8c\x17_cloudpickle_submodules\x94]\x94\x8c\x0b__globals__\x94}\x94h\nh\x00\x8c\x11dynamic_subimport\x94\x93\x94h\n}\x94(h\x13h\nh#Nh\x12\x8c\x00\x94\x8c\n__loader__\x94\x8c\x1a_frozen_importlib_external\x94\x8c\x10SourceFileLoader\x94\x93\x94)\x81\x94}\x94(\x8c\x04name\x94h\n\x8c\x04path\x94\x8cD/home/lukes/repos/DDG-311/ts/user/lukes/modelingtests/python/test.py\x94ub\x8c\x08__spec__\x94\x8c\x11_frozen_importlib\x94\x8c\nModuleSpec\x94\x93\x94)\x81\x94}\x94(h3h\n\x8c\x06loader\x94h1\x8c\x06origin\x94h5\x8c\x0cloader_state\x94N\x8c\x1asubmodule_search_locations\x94N\x8c\r_set_fileattr\x94\x88\x8c\x07_cached\x94\x8c\\home/lukes/cloudpickle_test/python/__pycache__/test.cpython-39.pyc\x94\x8c\r_initializing\x94\x89ubh\x15h5\x8c\n__cached__\x94hBh\nh\x02(h\x07(K\x00K\x00K\x00K\x00K\x01KCC\x04d\x01S\x00\x94NK\x01\x86\x94))\x8cD/home/lukes/cloudpickle_test/python/test.py\x94h\nK\nC\x02\x00\x01\x94))t\x94R\x94}\x94(h\x12h,h\x13h\nh\x15h5uNNNt\x94R\x94h\x1ahM}\x94}\x94(h\x13h\nh\x1dh\nh\x1e}\x94h Nh!Nh"h\nh#Nh$Nh%]\x94h\'}\x94u\x86\x94\x86R0u\x86\x94R\x94su\x86\x94\x86R0.'
@mabdinur
Copy link
Contributor

Hey @lukesturgis,

Thanks for opening this issue. ddtrace==1.5.0 introduced support for dynamic instrumentation. This product introduced "module watchdog" (commit). This feature replaces the standard sys.modules dictionary to detect when modules are loaded/unloaded. This mechanism of monitoring imported modules is not compatible with cloudpickle.

I was able to work around this issue by deleting the test module from the global scope and then reimporting this module in the to_pickle() function. Ex:

.....

import test

def to_pickle():
    del sys.modules['test']
    import test
    return test.test()

.....

Would this solution work for your application?

@lukesturgis
Copy link
Author

Hey @mabdinur, thanks for looking into this. I don't believe the workaround would work good enough in practice, as we might have many nested modules as this is happening in a larger library with lots of imports (including ones that users might not know is necessarily being imported).

Is there any potential that this could be something that would be supported to be compatible with cloudpickle? I read a good amount of the module watchdog code and it doesn't seem like this would be easily supported, but would be great if possible. I will likely move to open a feature request with cloudpickle to see if they are able to do anything. Thanks again!

@P403n1x87
Copy link
Contributor

P403n1x87 commented Aug 29, 2023

This seems to be caused by these lines

def __getattribute__(self, name):
if name == "__class__":
# Make isinstance believe that self is also an instance of
# type(self.loader). This is required, e.g. by some tools, like
# slotscheck, that can handle known loaders only.
return self.loader.__class__
return super(_ImportHookChainedLoader, self).__getattribute__(name)

which we might be able to remove, or at least put behind a private env var if they are really needed. However, even by doing so, it seems that we would still get an issue, this time from six:

Traceback (most recent call last):
  File "/Users/gabriele.tornetta/p403n1x87/cloudpickle/main.py", line 15, in <module>
    x_after = pickle.dumps(to_pickle)
              ^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/gabriele.tornetta/p403n1x87/cloudpickle/.venv/lib/python3.11/site-packages/cloudpickle/cloudpickle_fast.py", line 73, in dumps
    cp.dump(obj)
  File "/Users/gabriele.tornetta/p403n1x87/cloudpickle/.venv/lib/python3.11/site-packages/cloudpickle/cloudpickle_fast.py", line 632, in dump
    return Pickler.dump(self, obj)
           ^^^^^^^^^^^^^^^^^^^^^^^
TypeError: cannot pickle '_MovedItems' object

We are planning to drop Python 2 support with the 2.0 release of ddtrace, and that's where we have good chances of resolving this issue.

@P403n1x87 P403n1x87 self-assigned this Aug 29, 2023
@github-actions github-actions bot added the stale label Dec 7, 2023
Copy link
Contributor

github-actions bot commented May 8, 2024

This issue has been automatically closed after a period of inactivity. If it's a
feature request, it has been added to the maintainers' internal backlog and will be
included in an upcoming round of feature prioritization. Please comment or reopen
if you think this issue was closed in error.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale May 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants