-
Notifications
You must be signed in to change notification settings - Fork 486
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(bindings/python): support pickle [de]serialization for Operator #5324
feat(bindings/python): support pickle [de]serialization for Operator #5324
Conversation
It seems like we should add a new capability |
""" | ||
serialized = pickle.dumps(operator) | ||
deserialized = pickle.loads(serialized) | ||
assert repr(operator) == repr(deserialized) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For me, I think the ideal behavior for pickle [de]serialization
is that the object has the same behavior/operation result.
So I suggest the test can be
data="abc"
operator.write("/abc","abc")
serialized = pickle.dumps(operator)
del operator
gc.collect()
deserialized = pickle.loads(serialized)
assert "abc"==deserialized.read("/abc")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good!
core: self.core.clone().into(), | ||
__scheme: self.__scheme.clone(), | ||
__map: self.__map.clone(), | ||
}) | ||
} | ||
|
||
fn __repr__(&self) -> String { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need add extra __hash__
method to represent two object are equal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's hard to implement, we can't get internal information of Operator
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about hash the scheme and the map?
repr is used for show what the object is looking at. Maybe not a good method for identify
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer don’t implement hash and eq for Operator, and remove the equal assertion based on repr.
(schema, map) can’t identify an Operator, leave eq and empty empty is better. Only the ID of PyObject can identify the Operator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer don’t implement hash and eq for Operator, and remove the equal assertion based on repr.
Agreed. ocore::Operator
doesn't implement Eq
ahd Hash
too.
SGTM, maybe |
IMO all operators can be serialized as pickle. In most Python libraries, all objects can be serialized as pickle, but it may not be identical when deserialized. The behavior is acceptable. We need the capability only for tests. We can ignore backends which doesn't support the capability in the test. |
I'm considering adding a Based on this definition:
I believe this capability is better than the following:
I don't like the idea that:
I view it as a design failure for OpenDAL to include something in the public API solely for testing purposes. |
SGTM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot reviewed 4 out of 4 changed files in this pull request and generated no suggestions.
Comments skipped due to low confidence (2)
bindings/python/src/operator.rs:230
- The method
__getnewargs_ex__
might not correctly handle the conversion ofkwargs
to a Python dictionary. Ensure thatkwargs
are correctly converted to a Python dictionary.
fn __getnewargs_ex__(&self, py: Python) -> PyResult<PyObject> {
bindings/python/src/operator.rs:51
- [nitpick] The naming of the private attributes
__scheme
and__map
should be consistent with the convention used in the rest of the codebase.
__scheme: ocore::Scheme,
Please note that What we only can do is disable the tests for unique operator. |
That's correct. As you mentioned, we just need to avoid testing it in a shared manner (which implies that we can access the same storage service after unpickling) if this storage service doesn't I prefer to add the |
In fact, there are two types of capabilities in all backends.
Of course, we don't need to be so strict and complicated, I support temporarily adding only shared capability. |
Would you like to add this capability? I believe we can include them in the next 0.51 release. |
I can add it. Please note that However, I guess we will not support that in a while. |
Currently, we don't offer any services that aren't |
Will test another PR use the CI flow, just ignore it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM about the Python binding part
I will merge this after all ci green
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thank you @TennyZhuang for working on this and thanks @Zheaoli's review.
Please merge when you feel this PR is ok, @Zheaoli
Which issue does this PR close?
Partially finished #5316
Rationale for this change
Now an
Operator
can be serialized and deserialized via pickle, as a result, theOperator
class can be used in multiprocessing context.What changes are included in this PR?
In the PR, we store the arguments for
Operator
constructor, so that we can recover theOperator
from pickle deserialization.Are there any user-facing changes?
Operator
in python bindingOperator
to be used in multiprocessing context@Zheaoli I'm not very familiar with Python binding, PTAL for the PR, thanks!