-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: add pickling support to proto messages #280
Conversation
FBO of frameworks such as Apache Beam, which use them for sharing state between trusted hosts. Closes #260.
Codecov Report
@@ Coverage Diff @@
## main #280 +/- ##
========================================
Coverage ? 100.00%
========================================
Files ? 22
Lines ? 968
Branches ? 214
========================================
Hits ? 968
Misses ? 0
Partials ? 0 Continue to review full report at Codecov.
|
I don't know enough about the details of pickling or what the "serious consequences" were when objects were pickled before, but this would greatly simplify implementations of connectors in tools like Dask and Beam. Connectors such as https://beam.apache.org/releases/pydoc/2.33.0/apache_beam.io.gcp.bigtableio.html#apache_beam.io.gcp.bigtableio.WriteToBigTable are already pickling objects that contain proto-plus properties, so whatever "serious consequences" we are worried about are already happening over there. |
@tswast There is a lot of fear in the Python community around pickles, because unpickling an object can import code: if an attacker can get hostile (or compromised) code onto the system, and cause a process to unpickle an object which loads a class from the "evil" code, Bad Things(TM) happen. So, loading pickles from untrusted sources can widen an attack surface. Dask and Beam connectors don't really fall into that scenario, IMNSHO. |
I'm not very familiar with Beam, and it could be that this falls outside the scope of Beam's use case, but one additional complication is compatibility between different versions of the same message, e.g. # Old software version
from mollusc_v1 import Squid as SquidV1
s_v1 = SquidV1(mass_kg=10, length_cm=20)
s_pickle = pickle.dumps(s_v1)
with open("/tmp/squid.txt", "w") as f:
f.write(s_pickle)
# Updated software version
from mollusc_v2 import Squid as SquidV2
with open("/tmp/squid.txt", "w") as f:
s = f.read()
s_v2 = pickle.loads(s)
assert isinstance(s_v2, SquidV2) What happens in the above scenario? Does the unpickling fail because the exact class isn't available? Is pickle smart/flexible enough to detect that SquidV2 is the class we have in mind? Are there subtle errors? Is the message class description serialized and deserialized as part of the pickle protocol? |
A pickle of a class instance contains the dotted-importable name of the instance's type: it ignores any Another consequence is that instances of "local" classes (defined inside another function / method) aren't picklable, because their class can't be imported: that is why I had to define the |
@tseaver Okay, that makes sense, and that answers my questions about old/new message interchange. The importable path will be the same assuming no breaking changes. It is then protobuf's responsibility to handle unknown/missing fields, which is well tested. This does imply that it is not possible to pickle messages across a major version update. |
🤖 I have created a release *beep* *boop* --- ### [1.19.9](v1.19.8...v1.19.9) (2022-01-25) ### Bug Fixes * add pickling support to proto messages ([#280](#280)) ([2b7be35](2b7be35)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
FBO of frameworks such as Apache Beam, which use them for sharing
state between trusted hosts.
Closes #260.