-
-
Notifications
You must be signed in to change notification settings - Fork 381
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Implement __reduce__ with cache_hash #615
Conversation
0285d3d
to
c5b1dfd
Compare
I am not sure I understand what's going on with the pypy target - I cannot reproduce the failure locally. |
4849c42
to
17e209b
Compare
This fixes GH issue python-attrs#613 and python-attrs#494. It turns out that the hash cache-clearing implementation for non-slots classes was flawed and never quite worked properly. This switches away from using __setstate__ and instead adds a custom __reduce__ that removes the cached hash value from the default serialized output.
This is a rough attempt at building a compatibility function that allows us to detect whether an object defines its own __reduce__ or __reduce_ex__ on any interpreter. Some versions of PyPy 2.7 don't seem to return True with "is" comparisons, and instead require ==. I have arbitrarily chosen int as a builtin that derives from object and doesn't define its own __reduce__. I think this is fairly safe to do in 2.7-only code, since it is stable now.
I couldn't think of any way to make a useful and meaningful class that has no state and also has no custom __reduce__ method, so I went minimalist with it.
Previously, there was some miniscule risk of hash collision, and also it was relying on the implementation details of `pickle` (the assumption that `hash()` is never called as part of `pickle.loads`).
17e209b
to
4e8acb8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once the general approach is approved, I will go through and modify the documentation to clarify some things about this.
One problem here is that if you write your own __reduce__
, there's no public attribute or function that allows you to clear out the hash cache. In practice I don't think this will be a major problem because my impression of this is that when you write a custom __reduce__
, you tend to do it for the purposes of white-listing attributes. Spot checking a few custom defined __reduce__
s on Github seems to bear this out.
So I think the options are:
- Use this method of having
cache_hash
generate a default__reduce__
with no warning and just a mention in the documentation and the changelog. - Use this method and call any existing
__reduce__
, then do the modification if and only if:
a.) we see that the third argument isself.__dict__
- this will probably catch most edge cases, but it will also make the bug even harder to detect (since changing fromreturn x, y, self.__dict__
toreturn x, y, copy.copy(self.__dict__)
will cause pickling to have a subtle bug).
b.) we see that the third argument is a dictionary and contains_hash_cache_field
. This is probably a very good heuristic, but we don't actually know what this dictionary is used for in any random custom__reduce__
, since it's defined in relationship with the first two arguments, so I'm somewhat wary of messing with it. - Throw an exception if someone has a custom
__reduce__
telling them they can't usecache_hash
- this would be a breaking change and seems unnecessary anyway. - Warn if someone has a custom
__reduce__
withcache_hash=True
. This is less intrusive but the warning will mostly be noise - most__reduce__
implementations won't have this problem. - Basically do get rid of <> surrounding repr #1, but also either expose
_hash_cache_attr
or add a public function likeremove_hash_cache_from_dict
as a public attribute, so people can use it in their custom__reduce__
implementations.
I think the best options are probably 1 or 2b, with a fallback to 5 if it turns out that this is causing real problems for end users.
@@ -483,6 +513,13 @@ def __init__( | |||
self._cls_dict["__setattr__"] = _frozen_setattrs | |||
self._cls_dict["__delattr__"] = _frozen_delattrs | |||
|
|||
if ( | |||
cache_hash | |||
and _method_eq(cls.__reduce__, object.__reduce__) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to reviewers: This is probably the most controversial part - the __reduce__
is generated only if you haven't defined your own __reduce__
or __reduce_ex__
.
I have done this because it seems mildly inappropriate to squash someone's existing __reduce__
and I am not confident that we can count on the third return value from a custom __reduce__
being the object's __dict__
.
Closing in favor of #620. |
It turns out that the hash cache-clearing implementation for non-slots classes was flawed and never quite worked properly. This switches away from using
__setstate__
and instead adds a custom__reduce__
that removes the cached hash value from the default serialized output.This commit also refactors some of the tests a bit, to try and more cleanly organize the tests related to this issue.
Fixes #613.
Fixes #494.
Pull Request Check List
.rst
files is written using semantic newlines.changelog.d
.