-
Notifications
You must be signed in to change notification settings - Fork 167
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor builtin method pickling #262
Refactor builtin method pickling #262
Conversation
Codecov Report
@@ Coverage Diff @@
## master #262 +/- ##
==========================================
+ Coverage 88.9% 89.35% +0.44%
==========================================
Files 1 1
Lines 631 620 -11
Branches 131 131
==========================================
- Hits 561 554 -7
+ Misses 46 43 -3
+ Partials 24 23 -1
Continue to review full report at Codecov.
|
write = self.write | ||
|
||
if name is None: | ||
name = obj.__name__ | ||
name = getattr(obj, '__qualname__', None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change solves a related bug, but it deserves a different PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you be more explicit? Maybe you could include a non-regression test directly in this PR and document it the changelog?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
precisely, python3 does not make a difference between unbound methods and functions.
Consider pickle._Pickler.dump
: this is a non-builtin unbound method.
The bug is was mentioning is that cloudpickle
master currently fails at detecting this as global, and instead pickles this dynamically:
import cloudpickle
import pickle
cloudpickle.dumps(pickle._Pickler.dump)
gives:
Out[4]: b'\x80\x04\x95c\x03\x00\x00\x00\x00\x00\x00\x8c\x17cloudpickle.cloudpickle\x94\x8c\x0e_fill_function\x94\x93\x94(h\x00\x8c\x0f_
make_skel_func\x94\x93\x94h\x00\x8c\r_builtin_type\x94\x93\x94\x8c\x08CodeType\x94\x85\x94R\x94(K\x02K\x00K\x02K\x06KCCrt\x00|\x00d\x01
\x83\x02s\x1ct\x01d\x02|\x00j\x02j\x03f\x01\x16\x00\x83\x01\x82\x01|\x00j\x04d\x03k\x05r<|\x00\xa0\x05t\x06t\x07d\x04|\x00j\x04\x83\x02
\x17\x00\xa1\x01\x01\x00|\x00j\x04d\x05k\x05rP|\x00j\x08\xa0\t\xa1\x00\x01\x00|\x00\xa0\n|\x01\xa1\x01\x01\x00|\x00\xa0\x05t\x0b\xa1\x0
1\x01\x00|\x00j\x08\xa0\x0c\xa1\x00\x01\x00d\x06S\x00\x94(\x8c7Write a pickled representation of obj to the open file.\x94\x8c\x0b_file
_write\x94\x8c2Pickler.__init__() was not called by %s.__init__()\x94K\x02\x8c\x02<B\x94K\x04Nt\x94(\x8c\x07hasattr\x94\x8c\rPicklingEr
ror\x94\x8c\t__class__\x94\x8c\x08__name__\x94\x8c\x05proto\x94\x8c\x05write\x94\x8c\x05PROTO\x94\x8c\x04pack\x94\x8c\x06framer\x94\x8c
\rstart_framing\x94\x8c\x04save\x94\x8c\x04STOP\x94\x8c\x0bend_framing\x94t\x94\x8c\x04self\x94\x8c\x03obj\x94\x86\x94\x8c\x1c/usr/lib/
python3.7/pickle.py\x94\x8c\x04dump\x94M\xaa\x01C\x14\x00\x04\n\x01\x04\x01\x0e\x01\n\x01\x16\x01\n\x01\n\x01\n\x01\n\x01\x94))t\x94R\x
94J\xff\xff\xff\xff}\x94(\x8c\x0b__package__\x94\x8c\x00\x94h\x13\x8c\x06pickle\x94\x8c\x08__file__\x94\x8c\x1c/usr/lib/python3.7/pickl
e.py\x94u\x87\x94R\x94}\x94(\x8c\x07globals\x94}\x94(h\x17\x8c\x07_struct\x94\x8c\x04pack\x94\x93\x94h\x1bC\x01.\x94h\x16C\x01\x80\x94h
\x11\x8c\x07_pickle\x94\x8c\rPicklingError\x94\x93\x94u\x8c\x08defaults\x94N\x8c\x04dict\x94}\x94\x8c\x0eclosure_values\x94N\x8c\x06mod
ule\x94h)\x8c\x04name\x94h"\x8c\x03doc\x94h\x0b\x8c\x0bannotations\x94}\x94\x8c\x08qualname\x94\x8c\r_Pickler.dump\x94utR.'
The reason being that pickle._Pickler.dump.__name__
is simply dump
, whereas to retrieve this function, we need to get the qualified name of this function, namely _Pickler.dump
If we use __qualname__
instead of __name__
, we cannot use getattr
though, because getattr
does not accept dotted path. Instead, we need to use pickle._getattribute
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we use qualname instead of name, we cannot use getattr though, because getattr does not accept dotted path. Instead, we need to use pickle._getattribute.
But this (using getattr
) is what you are doing here, no?
Sorry, I misread what you said. That sounds good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So to summarize, this can only be (easily) fixed in Python 3?
I think we still need a specific test for this fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in python2
, unbound methods are methods, and get serialized using save_instancemethod
:
In [57]: isinstance(pickle.Pickler.dump, types.MethodType)
Out[57]: True
@@ -361,31 +327,14 @@ def save_function(self, obj, name=None): | |||
themodule = None | |||
|
|||
try: | |||
lookedup_by_name = getattr(themodule, name, None) | |||
lookedup_by_name, _ = _getattribute(themodule, name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is indeed much cleaner. Here are some minor comments:
OK so the failing test happens only on Technically though, |
I just changed the test to make it weaker. So it's actually a bug in PyPy35. Maybe we can re-add this assertion and skip only this line on PyPy 3.5. |
My weaker variant of the test is also failing on PyPy:
So +1 for using the old assertion but skipping it on PyPy3.5 |
Yeah that was what I wanted to suggest. Let's wait a bit before merging this one though. I'll make one more pass through the |
This is yet another bug though: visibly we forget the default value of the In [28]: tuple.__new__
Out[28]: <function __new__(tupletype, sequence=<no value>)>
In [29]: cloudpickle.loads(cloudpickle.dumps(tuple.__new__))
Out[29]: <function __new__(tupletype, sequence)> |
Some good news at least: the downstream tests pass: |
Do you think it's specific to dynamic builtin functions / methods or it's a bug of dynamic functions in general? |
No it seems to be related only to some builtin functions/methods: in pypy3.5: In [30]: def f(a=1):
...: pass
In [31]: cloudpickle.loads(cloudpickle.dumps(f))
Out[31]: <function __main__.f(a=1)> |
Ok. BTW something surprising but we cannot probably fix easily (on CPython): >>> import cloudpickle
>>> default_value = object()
>>> def f(a=default_value):
... return id(a)
...
>>> cloudpickle.loads(cloudpickle.dumps(f))()
140550658935136
>>> f()
140550658935104 |
Yes: defaults value get shipped into the function code. In your case, the default value is an instance, so it gets re-created at unpickling time. Sadly, we cannot pickle any global variable as global (per attribute) because most of them do not have a |
I agree. We could do something similar for provenance tracking to what I did for the type definitions themselves in #246 but I am afraid of the overhead and the unanticipated side effects. |
Anyway this is unrelated to this PR so leave this aside for now. |
I made a new pass on this PR. I found PEP 579, which turned out to be a very useful resource to cover all flavours in which builtin functions and methods can appear. Building upon this, I first wrote new comprehensive tests for all possible cases (b6953f4) that showed that some exotic methods flavours could not be pickled under python2: cloudpickle.dumps(1.5.__repr__)
cloudpickle.dumps(float.__repr__)
cloudpickle.dumps(float.__dict__['fromhex']) Starting from these tests, I adjusted the implementation of cloudpickle's builtin method saving logic. All builtin functions and methods are now redirected into one single, simple function. Also, given this consideration, I think we can safely remove the |
Downstream tests pass with the new version as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here are a few more comments. But otherwise this PR looks ready for merge.
Please add an entry to the change log to target version 1.2.0.
5aea555
to
43e358b
Compare
Just rebased. Also, I mentioned the PR number and not the issue number in |
Merged, thank you very much @pierreglaser! |
2.0.0 ===== - Python 3.5 is no longer supported. - Support for registering modules to be serialised by value. This allows code defined in local modules to be serialised and executed remotely without those local modules installed on the remote machine. ([PR #417](cloudpipe/cloudpickle#417)) - Fix a side effect altering dynamic modules at pickling time. ([PR #426](cloudpipe/cloudpickle#426)) - Support for pickling type annotations on Python 3.10 as per [PEP 563]( https://www.python.org/dev/peps/pep-0563/) ([PR #400](cloudpipe/cloudpickle#400)) - Stricter parametrized type detection heuristics in _is_parametrized_type_hint to limit false positives. ([PR #409](cloudpipe/cloudpickle#409)) - Support pickling / depickling of OrderedDict KeysView, ValuesView, and ItemsView, following similar strategy for vanilla Python dictionaries. ([PR #423](cloudpipe/cloudpickle#423)) - Suppressed a source of non-determinism when pickling dynamically defined functions and handles the deprecation of co_lnotab in Python 3.10+. ([PR #428](cloudpipe/cloudpickle#428)) 1.6.0 ===== - `cloudpickle`'s pickle.Pickler subclass (currently defined as `cloudpickle.cloudpickle_fast.CloudPickler`) can and should now be accessed as `cloudpickle.Pickler`. This is the only officially supported way of accessing it. ([issue #366](cloudpipe/cloudpickle#366)) - `cloudpickle` now supports pickling `dict_keys`, `dict_items` and `dict_values`. ([PR #384](cloudpipe/cloudpickle#384)) 1.5.0 ===== - Fix a bug causing cloudpickle to crash when pickling dynamically created, importable modules. ([issue #360](cloudpipe/cloudpickle#354)) - Add optional dependency on `pickle5` to get improved performance on Python 3.6 and 3.7. ([PR #370](cloudpipe/cloudpickle#370)) - Internal refactoring to ease the use of `pickle5` in cloudpickle for Python 3.6 and 3.7. ([PR #368](cloudpipe/cloudpickle#368)) 1.4.1 ===== - Fix incompatibilities between cloudpickle 1.4.0 and Python 3.5.0/1/2 introduced by the new support of cloudpickle for pickling typing constructs. ([issue #360](cloudpipe/cloudpickle#360)) - Restore compat with loading dynamic classes pickled with cloudpickle version 1.2.1 that would reference the `types.ClassType` attribute. ([PR #359](cloudpipe/cloudpickle#359)) 1.4.0 ===== **This version requires Python 3.5 or later** - cloudpickle can now all pickle all constructs from the ``typing`` module and the ``typing_extensions`` library in Python 3.5+ ([PR #318](cloudpipe/cloudpickle#318)) - Stop pickling the annotations of a dynamic class for Python < 3.6 (follow up on #276) ([issue #347](cloudpipe/cloudpickle#347)) - Fix a bug affecting the pickling of dynamic `TypeVar` instances on Python 3.7+, and expand the support for pickling `TypeVar` instances (dynamic or non-dynamic) to Python 3.5-3.6 ([PR #350](cloudpipe/cloudpickle#350)) - Add support for pickling dynamic classes subclassing `typing.Generic` instances on Python 3.7+ ([PR #351](cloudpipe/cloudpickle#351)) 1.3.0 ===== - Fix a bug affecting dynamic modules occuring with modified builtins ([issue #316](cloudpipe/cloudpickle#316)) - Fix a bug affecting cloudpickle when non-modules objects are added into sys.modules ([PR #326](cloudpipe/cloudpickle#326)). - Fix a regression in cloudpickle and python3.8 causing an error when trying to pickle property objects. ([PR #329](cloudpipe/cloudpickle#329)). - Fix a bug when a thread imports a module while cloudpickle iterates over the module list ([PR #322](cloudpipe/cloudpickle#322)). - Add support for out-of-band pickling (Python 3.8 and later). https://docs.python.org/3/library/pickle.html#example ([issue #308](cloudpipe/cloudpickle#308)) - Fix a side effect that would redefine `types.ClassTypes` as `type` when importing cloudpickle. ([issue #337](cloudpipe/cloudpickle#337)) - Fix a bug affecting subclasses of slotted classes. ([issue #311](cloudpipe/cloudpickle#311)) - Dont pickle the abc cache of dynamically defined classes for Python 3.6- (This was already the case for python3.7+) ([issue #302](cloudpipe/cloudpickle#302)) 1.2.2 ===== - Revert the change introduced in ([issue #276](cloudpipe/cloudpickle#276)) attempting to pickle functions annotations for Python 3.4 to 3.6. It is not possible to pickle complex typing constructs for those versions (see [issue #193]( cloudpipe/cloudpickle#193)) - Fix a bug affecting bound classmethod saving on Python 2. ([issue #288](cloudpipe/cloudpickle#288)) - Add support for pickling "getset" descriptors ([issue #290](cloudpipe/cloudpickle#290)) 1.2.1 ===== - Restore (partial) support for Python 3.4 for downstream projects that have LTS versions that would benefit from cloudpickle bug fixes. 1.2.0 ===== - Leverage the C-accelerated Pickler new subclassing API (available in Python 3.8) in cloudpickle. This allows cloudpickle to pickle Python objects up to 30 times faster. ([issue #253](cloudpipe/cloudpickle#253)) - Support pickling of classmethod and staticmethod objects in python2. arguments. ([issue #262](cloudpipe/cloudpickle#262)) - Add support to pickle type annotations for Python 3.5 and 3.6 (pickling type annotations was already supported for Python 3.7, Python 3.4 might also work but is no longer officially supported by cloudpickle) ([issue #276](cloudpipe/cloudpickle#276)) - Internal refactoring to proactively detect dynamic functions and classes when pickling them. This refactoring also yields small performance improvements when pickling dynamic classes (~10%) ([issue #273](cloudpipe/cloudpickle#273)) 1.1.1 ===== - Minor release to fix a packaging issue (Markdown formatting of the long description rendered on pypi.org). The code itself is the same as 1.1.0. 1.1.0 ===== - Support the pickling of interactively-defined functions with positional-only arguments. ([issue #266](cloudpipe/cloudpickle#266)) - Track the provenance of dynamic classes and enums so as to preseve the usual `isinstance` relationship between pickled objects and their original class defintions. ([issue #246](cloudpipe/cloudpickle#246)) 1.0.0 ===== - Fix a bug making functions with keyword-only arguments forget the default values of these arguments after being pickled. ([issue #264](cloudpipe/cloudpickle#264)) 0.8.1 ===== - Fix a bug (already present before 0.5.3 and re-introduced in 0.8.0) affecting relative import instructions inside depickled functions ([issue #254](cloudpipe/cloudpickle#254)) 0.8.0 ===== - Add support for pickling interactively defined dataclasses. ([issue #245](cloudpipe/cloudpickle#245)) - Global variables referenced by functions pickled by cloudpickle are now unpickled in a new and isolated namespace scoped by the CloudPickler instance. This restores the (previously untested) behavior of cloudpickle prior to changes done in 0.5.4 for functions defined in the `__main__` module, and 0.6.0/1 for other dynamic functions. 0.7.0 ===== - Correctly serialize dynamically defined classes that have a `__slots__` attribute. ([issue #225](cloudpipe/cloudpickle#225)) 0.6.1 ===== - Fix regression in 0.6.0 which breaks the pickling of local function defined in a module, making it impossible to access builtins. ([issue #211](cloudpipe/cloudpickle#211)) 0.6.0 ===== - Ensure that unpickling a function defined in a dynamic module several times sequentially does not reset the values of global variables. ([issue #187](cloudpipe/cloudpickle#205)) - Restrict the ability to pickle annotations to python3.7+ ([issue #193]( cloudpipe/cloudpickle#193) and [issue #196]( cloudpipe/cloudpickle#196)) - Stop using the deprecated `imp` module under Python 3. ([issue #207](cloudpipe/cloudpickle#207)) - Fixed pickling issue with singleton types `NoneType`, `type(...)` and `type(NotImplemented)` ([issue #209](cloudpipe/cloudpickle#209)) 0.5.6 ===== - Ensure that unpickling a locally defined function that accesses the global variables of a module does not reset the values of the global variables if they are already initialized. ([issue #187](cloudpipe/cloudpickle#187)) 0.5.5 ===== - Fixed inconsistent version in `cloudpickle.__version__`. 0.5.4 ===== - Fixed a pickling issue for ABC in python3.7+ ([issue #180]( cloudpipe/cloudpickle#180)). - Fixed a bug when pickling functions in `__main__` that access global variables ([issue #187]( cloudpipe/cloudpickle#187)). 0.5.3 ===== - Fixed a crash in Python 2 when serializing non-hashable instancemethods of built-in types ([issue #144](cloudpipe/cloudpickle#144)). - itertools objects can also pickled ([PR #156](cloudpipe/cloudpickle#156)). - `logging.RootLogger` can be also pickled ([PR #160](cloudpipe/cloudpickle#160)). 0.5.2 ===== - Fixed a regression: `AttributeError` when loading pickles that hold a reference to a dynamically defined class from the `__main__` module. ([issue #131]( cloudpipe/cloudpickle#131)). - Make it possible to pickle classes and functions defined in faulty modules that raise an exception when trying to look-up their attributes by name. 0.5.1 ===== - Fixed `cloudpickle.__version__`. 0.5.0 ===== - Use `pickle.HIGHEST_PROTOCOL` by default. 0.4.4 ===== - `logging.RootLogger` can be also pickled ([PR #160](cloudpipe/cloudpickle#160)). 0.4.3 ===== - Fixed a regression: `AttributeError` when loading pickles that hold a reference to a dynamically defined class from the `__main__` module. ([issue #131]( cloudpipe/cloudpickle#131)). - Fixed a crash in Python 2 when serializing non-hashable instancemethods of built-in types. ([issue #144](cloudpipe/cloudpickle#144)) 0.4.2 ===== - Restored compatibility with pickles from 0.4.0. - Handle the `func.__qualname__` attribute. 0.4.1 ===== - Fixed a crash when pickling dynamic classes whose `__dict__` attribute was defined as a [`property`](https://docs.python.org/3/library/functions.html#property). Most notably, this affected dynamic [namedtuples](https://docs.python.org/2/library/collections.html#namedtuple-factory-function-for-tuples-with-named-fields) in Python 2. (cloudpipe/cloudpickle#113) - Cloudpickle now preserves the `__module__` attribute of functions (cloudpipe/cloudpickle#118). - Fixed a crash when pickling modules that don't have a `__package__` attribute (cloudpipe/cloudpickle#116). 0.4.0 ===== * Fix functions with empty cells * Allow pickling Logger objects * Fix crash when pickling dynamic class cycles * Ignore "None" mdoules added to sys.modules * Support WeakSets and ABCMeta instances * Remove non-standard `__transient__` support * Catch exception from `pickle.whichmodule()` 0.3.1 ===== * Fix version information and ship a changelog 0.3.0 ===== * Import submodules accessed by pickled functions * Support recursive functions inside closures * Fix `ResourceWarnings` and `DeprecationWarnings` * Assume modules with `__file__` attribute are not dynamic 0.2.2 ===== * Support Python 3.6 * Support Tornado Coroutines * Support builtin methods
The way
cloudpickle
handlesbuiltin_function_or_method
is very confusing for many reasons:builtin_function_or_method
handling with more classic function handlingIn realilty, things are not that complex:
builtin_function_or_method
are correctly handled by the standard pickle in python3, so we do not need to special case them AT ALL in python3This PR addresses all the previously stated issues: it separates clearly
builtin_function_or_method
from classic functions, add a special case only for builtin_method in python2, and as a consequence, removes a lot of obscure special-cases.