Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Implement __reduce__ with cache_hash #615

Closed
wants to merge 5 commits into from

Conversation

pganssle
Copy link
Member

@pganssle pganssle commented Jan 13, 2020

It turns out that the hash cache-clearing implementation for non-slots classes was flawed and never quite worked properly. This switches away from using __setstate__ and instead adds a custom __reduce__ that removes the cached hash value from the default serialized output.

This commit also refactors some of the tests a bit, to try and more cleanly organize the tests related to this issue.

Fixes #613.
Fixes #494.

Pull Request Check List

  • Added tests for changed code.
  • Updated documentation for changed code.
  • Documentation in .rst files is written using semantic newlines.
  • Changes (and possible deprecations) have news fragments in changelog.d.

@pganssle
Copy link
Member Author

I am not sure I understand what's going on with the pypy target - I cannot reproduce the failure locally.

@pganssle pganssle changed the title Implement __reduce__ with cache_hash WIP: Implement __reduce__ with cache_hash Jan 14, 2020
src/attr/_make.py Outdated Show resolved Hide resolved
This fixes GH issue python-attrs#613 and python-attrs#494. It turns out that the hash
cache-clearing implementation for non-slots classes was flawed and never
quite worked properly. This switches away from using __setstate__ and
instead adds a custom __reduce__ that removes the cached hash value from the
default serialized output.
This is a rough attempt at building a compatibility function that
allows us to detect whether an object defines its own __reduce__ or
__reduce_ex__ on any interpreter. Some versions of PyPy 2.7 don't seem
to return True with "is" comparisons, and instead require ==.

I have arbitrarily chosen int as a builtin that derives from object and
doesn't define its own __reduce__. I think this is fairly safe to do in
2.7-only code, since it is stable now.
I couldn't think of any way to make a useful and meaningful class that
has no state and also has no custom __reduce__ method, so I went
minimalist with it.
Previously, there was some miniscule risk of hash collision, and also it
was relying on the implementation details of `pickle` (the assumption
that `hash()` is never called as part of `pickle.loads`).
Copy link
Member Author

@pganssle pganssle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once the general approach is approved, I will go through and modify the documentation to clarify some things about this.

One problem here is that if you write your own __reduce__, there's no public attribute or function that allows you to clear out the hash cache. In practice I don't think this will be a major problem because my impression of this is that when you write a custom __reduce__, you tend to do it for the purposes of white-listing attributes. Spot checking a few custom defined __reduce__s on Github seems to bear this out.

So I think the options are:

  1. Use this method of having cache_hash generate a default __reduce__ with no warning and just a mention in the documentation and the changelog.
  2. Use this method and call any existing __reduce__, then do the modification if and only if:
    a.) we see that the third argument is self.__dict__ - this will probably catch most edge cases, but it will also make the bug even harder to detect (since changing from return x, y, self.__dict__ to return x, y, copy.copy(self.__dict__) will cause pickling to have a subtle bug).
    b.) we see that the third argument is a dictionary and contains _hash_cache_field. This is probably a very good heuristic, but we don't actually know what this dictionary is used for in any random custom __reduce__, since it's defined in relationship with the first two arguments, so I'm somewhat wary of messing with it.
  3. Throw an exception if someone has a custom __reduce__ telling them they can't use cache_hash - this would be a breaking change and seems unnecessary anyway.
  4. Warn if someone has a custom __reduce__ with cache_hash=True. This is less intrusive but the warning will mostly be noise - most __reduce__ implementations won't have this problem.
  5. Basically do get rid of <> surrounding repr #1, but also either expose _hash_cache_attr or add a public function like remove_hash_cache_from_dict as a public attribute, so people can use it in their custom __reduce__ implementations.

I think the best options are probably 1 or 2b, with a fallback to 5 if it turns out that this is causing real problems for end users.

@@ -483,6 +513,13 @@ def __init__(
self._cls_dict["__setattr__"] = _frozen_setattrs
self._cls_dict["__delattr__"] = _frozen_delattrs

if (
cache_hash
and _method_eq(cls.__reduce__, object.__reduce__)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to reviewers: This is probably the most controversial part - the __reduce__ is generated only if you haven't defined your own __reduce__ or __reduce_ex__.

I have done this because it seems mildly inappropriate to squash someone's existing __reduce__ and I am not confident that we can count on the third return value from a custom __reduce__ being the object's __dict__.

@hynek
Copy link
Member

hynek commented Feb 1, 2020

Closing in favor of #620.

@hynek hynek closed this Feb 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants