WIP: Switch over to a global hash cache #616
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This uses an alternate approach to hash caching than the current master or #615. Rather than trying to store the hash cache on the object itself, we create a global hash cache object mapping the id of the object to its hash.
This pretty dramatically simplifies the code. The two biggest downsides:
This introduces weak references and mutable global state into the code. I think I've got a thread-safe implementation, but it's very hard to test for that. Even if we weren't worried about memory leaks, we definitely need the cache lifecycle tied to the object's lifecycle, because otherwise the interpreter might assign another
cache_hash=True
object the sameid()
value.At least the way I've implemented it (and I don't know if there's a much faster way to do it), this method adds what I would consider a lot of overhead, compare the speed for this version of the code (both using Python 3.7.4):
To the speed of the equivalent code using #615
For a more complicated class (where
cache_hash=True
might be more valuable), the "create-and-hash" numbers narrow a bit, but there's still a pretty significant difference for the "lookup in hash" case:#615 version:
I have spent less time working on this version, so it could probably be optimized somewhat, but I doubt we'll get the first-hash time down much, and I think this way will always be slower.
Fixes #613.
Fixes #494.
Pull Request Check List
.rst
files is written using semantic newlines.changelog.d
.