Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Serializing attrs classes does not work? #4226

Closed
bitbucketboss opened this issue Nov 7, 2020 · 4 comments
Closed

Serializing attrs classes does not work? #4226

bitbucketboss opened this issue Nov 7, 2020 · 4 comments

Comments

@bitbucketboss
Copy link

bitbucketboss commented Nov 7, 2020

Working minimal example using Dask's default scheduler:

import time
import attr
import dask

class MyClass:
    def __init__(self, i):
        self.i = i

@attr.s
class MyClassAttrs:
    i = attr.ib()

my_class = MyClass(1)
my_class_attrs = MyClassAttrs(1)

@dask.delayed
def Increment(c):
    time.sleep(1)
    c.i = c.i + 1
    return c

delayed = Increment(my_class)
result = delayed.compute()
result.i

delayed_attrs = Increment(my_class_attrs)
result_attrs = delayed_attrs.compute()
result_attrs.i

This code correctly returns 2 for result.i and result_attrs.i using Dask's default scheduler.

Failing minimal example using Dask's distributed scheduler:

import time
import attr
import dask

# Start new code
from dask.distributed import Client
client = Client()
# End new code

class MyClass:
    def __init__(self, i):
        self.i = i

@attr.s
class MyClassAttrs:
    i = attr.ib()

my_class = MyClass(1)
my_class_attrs = MyClassAttrs(1)

@dask.delayed
def Increment(c):
    time.sleep(1)
    c.i = c.i + 1
    return c

delayed = Increment(my_class)
result = delayed.compute()
result.i

delayed_attrs = Increment(my_class_attrs)
result_attrs = delayed_attrs.compute()
result_attrs.i

client.shutdown()

This code correctly returns 2 for result.i but raises

TypeError: cannot pickle '_thread._local' object

, at line:

result_attrs = delayed_attrs.compute()

when using the Dask distributed scheduler.

attrs is a widely used package so I'm not sure why its use is causing a failure.

Attrs classes do pickle properly by themselves. For example, this does work:

from pickle import dumps, loads
loads(dumps(my_class_attrs))

I have also successfully used the standard multiprocessing library to do multiprocessing with attrs classes.

Environment

  • Dask version: 2.30.0
  • Python version: 3.8.5
  • Operating System: Windows 10
  • Install method (conda, pip, source): conda

Any advice on how to deal with this issue would be welcome.

@bitbucketboss
Copy link
Author

Upon further investigation I note that cloudpickle does not work on my_class_attrs:

import cloudpickle
cloudpickle.loads(cloudpickle.dumps(my_class_attrs))

, raises the same error encountered previously:

TypeError: cannot pickle '_thread._local' object

@bitbucketboss
Copy link
Author

This appears to be a cloudpickle bug:

Cloudpickle issue 320
Attrs issue 458

As such I will close this issue.

@kylebarron
Copy link

Note that as mentioned in the linked issues, this will work if you set repr=False on your attrs classes.

@thetorpedodog
Copy link

With attrs release 21.4.0, that workaround is no longer needed; attrs classes can now be pickled with cloudpickle (e.g. via Dask).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants