Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proto Plus messages are not pickleable #162

Closed
busunkim96 opened this issue Nov 17, 2020 · 3 comments · Fixed by #260 or #280
Closed

Proto Plus messages are not pickleable #162

busunkim96 opened this issue Nov 17, 2020 · 3 comments · Fixed by #260 or #280
Labels
type: docs Improvement to the documentation for an API.

Comments

@busunkim96
Copy link
Contributor

From googleapis/python-iot#42

class Composer(proto.Message):
    given_name = proto.Field(proto.STRING, number=1)
    family_name = proto.Field(proto.STRING, number=2)

composer = Composer(given_name="Johannes", family_name="Bach")
pickled_composer = pickle.dumps(composer)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
_pickle.PicklingError: Can't pickle <class 'Composer'>: it's not the same object as __main__.Composer

The current recommendation is to serialize to bytes. https://proto-plus-python.readthedocs.io/en/latest/reference/message.html

@busunkim96 busunkim96 added the type: question Request for information or clarification. Not an issue. label Nov 17, 2020
@software-dov
Copy link
Contributor

software-dov commented Nov 19, 2020

The preferred serialization mechanism is the serialize class method.
Given the example message definition above, serialize and deserialize like so:

composer = Composer(given_name="Richard", family_name="Strauss")
composer_bytes = Composer.serialize(composer)
composer2 = Composer.deserialize(composer_bytes)

If using something like multiprocessing.Pool.map, the suggested pattern is something like this:

from multiprocessing import Pool

with Pool(5) as p:
    def process_bach(bach_bytes):
        bach = Composer.deserialize(bach_bytes)
        return Composer.serialize(get_parent(bach))

    initial_bachs = [Composer(given_name=name, family_name="Bach") for name in ["Johann Sebastian", "Carl Philipp", "Johann Christian", "Wilhelm Friedemann", "Johann Friedrich"]]
    parent_bachs = [Composer.deserialize(bytes) for bytes in p.map(process_bach, (Composer.serialize(bach) for bach in initial_bachs))]

@busunkim96 busunkim96 added type: docs Improvement to the documentation for an API. and removed type: question Request for information or clarification. Not an issue. labels Nov 19, 2020
@tseaver
Copy link
Contributor

tseaver commented Dec 8, 2021

@software-dov This issue is a major blocker for integrating current Bigtable (2.4.x) with Apache Beam, which relies on pickling messages across hosts / processes. See e.g. this CI failure. We could solve the issue by writing __getstate__ and __setstate__ for proto.Message which just used the same format as serialize.

@tseaver tseaver reopened this Dec 8, 2021
This was linked to pull requests Jan 4, 2022
@software-dov
Copy link
Contributor

Pickling support addd by #280

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: docs Improvement to the documentation for an API.
Projects
None yet
3 participants