Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Efficient serialization of memoryviews #3640

Closed
jakirkham opened this issue Mar 26, 2020 · 2 comments · Fixed by #3743
Closed

Efficient serialization of memoryviews #3640

jakirkham opened this issue Mar 26, 2020 · 2 comments · Fixed by #3743
Assignees

Comments

@jakirkham
Copy link
Member

It would be useful to have efficient serialization of memoryviews as these show up in lots of places or are easy enough for other codes to generate (given they are builtin component of Python). Ideally this would be very similar to NumPy's ndarray serialization with some tweaks. It also has the potential to simplify a bunch of serialization code as it would then be able to pass things off to/take things from memoryview serialization.

Currently this is getting handled by cloudpickle, which is performing a copy to a bytes object before pickling. An efficient serialization method of memoryview objects should be able to avoid this copy as well as pickling.

In [1]: from distributed.protocol.serialize import serialize, deserialize       

In [2]: b = b"abc"                                                              

In [3]: serialize(b)                                                            
Out[3]: 
({'type': 'builtins.bytes',
  'type-serialized': b'\x80\x04\x95\x16\x00\x00\x00\x00\x00\x00\x00\x8c\x08builtins\x94\x8c\x05bytes\x94\x93\x94.',
  'serializer': 'dask'},
 [b'abc'])

In [4]: m = memoryview(b)                                                       

In [5]: serialize(m)                                                            
Out[5]: 
({'serializer': 'pickle'},
 [b'\x80\x04\x95\x07\x00\x00\x00\x00\x00\x00\x00C\x03abc\x94.'])

In [6]: deserialize(serialize(m)) 
@mrocklin
Copy link
Member

Yeah, passing through memoryviews would be ideal. Thanks for bringing this up @jakirkham . I assumed, incorrectly, that this was already happening.

@jakirkham
Copy link
Member Author

Me too.

I started a bit of work on this locally. Will try to get this in a WIP PR at some point so we can discuss 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants