-
-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: improved caching of parameterized fixtures #12600
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot @0xDEC0DE for tracking this down!
Could you please also write a simple integration test for this specifically? It is important to have that in order to avoid future regressions, as was the case in 6541.
I thought it might already be under test, but the coverage map disabused me of that notion. I'll get something together shortly. |
Done, and done. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @0xDEC0DE, looks great!
I will merge this tomorrow, unless someone wants more time to review it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason for using is
is written in a comment, namely that ==
can be expensive for large numpy arrays. How is this addressed?
Ouch missed that completely, thanks. I guess we just need to invert the order, first attempt But this makes me wonder, why is the |
Pithy version: "expensive and correct" is always an improvement over "cheap but wrong". Looking over the original bug report, and reproducing the regression locally, I believe that comment makes an assertion wholly without evidence. The problem wasn't that cache validation was taking a long time, it was that cache validation was throwing stack traces due to wonky inputs. As part of the fix, I updated the comment to state the things that are known to be true, and got rid of the conjecture. |
src/_pytest/fixtures.py
Outdated
# Coerce the comparison into a bool (#12600), and if that fails, fall back to an identity check: | ||
# `__eq__` is not required to return a bool, and sometimes doesn't, e.g., numpy arrays (#6497). | ||
try: | ||
cache_hit = bool(my_cache_key == cache_key) | ||
except (ValueError, RuntimeError): | ||
cache_hit = my_cache_key is cache_key | ||
# First attempt to use 'is' for performance reasons (for example numpy arrays (#6497)). | ||
cache_hit = my_cache_key is cache_key | ||
if not cache_hit: | ||
# If they are not the same, fallback to a bool comparison (#12600). | ||
try: | ||
cache_hit = bool(my_cache_key == cache_key) | ||
except (ValueError, RuntimeError): | ||
cache_hit = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is suitable for speeding up cache hits, but all cache misses will go through the "expensive" code path. Which I suppose is an improvement, but I don't know if it's worth the extra complexity introduced here.
But, your house, your rules.
I revisited the issue and indeed the problem was not performance, but the fact it was raising an error when comparing with But I think my latest change takes care of both concerns in the end? One thing I still did not get is why the parametrized values were failing the identity check... should always be the same object, unless we are making a copy somewhere? |
The test case in #6962 roughly simulates how we hit it: using string interpolation to generate parameter sets, such that separate tests in the session should, by all appearances, use the same fixture. And since string operations LOVE making copies:
...it wasn't using the cached fixture. |
Yeah that I get, I don't get is why at some point a copy of that string seems to be made... after passing the object (whatever the object is, it can be a string, a list, a custom object) over to |
Ahh of course, |
Is this insight worth adding to the code comments? |
Don't think so, brainfart on my part I guess. |
@bluetech would you like to do another review, or are we OK to merge? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes me wonder if we shouldn't just intern hashable objects
I'm pretty sure there are people using some large values as parameters, and I think they will complain if we make this change. There have been previous proposals for solving the "identity instead of equality" problem, one was for example to use equality but provide a workaround to use the param id as the cache key (see #9420). But we didn't go forward with them in the end. Generally I get the sense that the number of people complaining about I'd be fine with merging this change as long as we consider what to say to people who will start complaining about slowness. One way will be to tell them not to do it, for example to some value in the parameter to stand in for the actual big value and only grab the actual value in the test. Another will be to provide some builtin solution like in #9420. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, let's try it and see what happens...
(I'm going to remove the backport label -- though we're probably not going to have a patch release I wouldn't want to backport this anyway).
The fix for Issue pytest-dev#6541 caused regression where cache hits became cache misses, unexpectedly. Attempt to restore the previous behavior, while also retaining the fix for the bug. Fixes: Issue pytest-dev#6962
Co-authored-by: Ran Benita <[email protected]>
The fix for Issue pytest-dev#6541 caused regression where cache hits became cache misses, unexpectedly. Fixes pytest-dev#6962 --------- Co-authored-by: Nicolas Simonds <[email protected]> Co-authored-by: Bruno Oliveira <[email protected]> Co-authored-by: Ran Benita <[email protected]>
The fix for Issue #6541 caused a regression where cache hits unexpectedly became cache misses. Attempt to restore the previous behavior, while also retaining the fix for the bug.
Fixes: Issue #6962