fix: improved caching of parameterized fixtures #12600

0xDEC0DE · 2024-07-11T00:54:11Z

The fix for Issue #6541 caused a regression where cache hits unexpectedly became cache misses. Attempt to restore the previous behavior, while also retaining the fix for the bug.

Fixes: Issue #6962

nicoddemus

Thanks a lot @0xDEC0DE for tracking this down!

Could you please also write a simple integration test for this specifically? It is important to have that in order to avoid future regressions, as was the case in 6541.

changelog/6962.bugfix.rst

0xDEC0DE · 2024-07-11T20:15:10Z

Thanks a lot @0xDEC0DE for tracking this down!

Could you please also write a simple integration test for this specifically? It is important to have that in order to avoid future regressions, as was the case in 6541.

I thought it might already be under test, but the coverage map disabused me of that notion. I'll get something together shortly.

testing/python/fixtures.py

0xDEC0DE · 2024-07-12T00:13:05Z

Thanks a lot @0xDEC0DE for tracking this down!

Could you please also write a simple integration test for this specifically? It is important to have that in order to avoid future regressions, as was the case in 6541.

Done, and done.

testing/python/fixtures.py

nicoddemus

Thanks @0xDEC0DE, looks great!

nicoddemus · 2024-07-12T20:50:14Z

I will merge this tomorrow, unless someone wants more time to review it.

bluetech

The reason for using is is written in a comment, namely that == can be expensive for large numpy arrays. How is this addressed?

nicoddemus · 2024-07-13T13:43:37Z

The reason for using is is written in a comment, namely that == can be expensive for large numpy arrays. How is this addressed?

Ouch missed that completely, thanks. I guess we just need to invert the order, first attempt is then fallback to ==.

But this makes me wonder, why is the == comparison is needed in the first place? I mean the list is passed to pytest in pytest_generate_tests, at which point do we get a different instance of the parametrized value, do we make a copy somewhere? 🤔

0xDEC0DE · 2024-07-13T14:09:00Z

The reason for using is is written in a comment, namely that == can be expensive for large numpy arrays. How is this addressed?

Pithy version: "expensive and correct" is always an improvement over "cheap but wrong".

Looking over the original bug report, and reproducing the regression locally, I believe that comment makes an assertion wholly without evidence.

The problem wasn't that cache validation was taking a long time, it was that cache validation was throwing stack traces due to wonky inputs. As part of the fix, I updated the comment to state the things that are known to be true, and got rid of the conjecture.

0xDEC0DE · 2024-07-13T14:17:39Z

src/_pytest/fixtures.py

-            # Coerce the comparison into a bool (#12600), and if that fails, fall back to an identity check:
-            # `__eq__` is not required to return a bool, and sometimes doesn't, e.g., numpy arrays (#6497).
-            try:
-                cache_hit = bool(my_cache_key == cache_key)
-            except (ValueError, RuntimeError):
-                cache_hit = my_cache_key is cache_key
+            # First attempt to use 'is' for performance reasons (for example numpy arrays (#6497)).
+            cache_hit = my_cache_key is cache_key
+            if not cache_hit:
+                # If they are not the same, fallback to a bool comparison (#12600).
+                try:
+                    cache_hit = bool(my_cache_key == cache_key)
+                except (ValueError, RuntimeError):
+                    cache_hit = False


This is suitable for speeding up cache hits, but all cache misses will go through the "expensive" code path. Which I suppose is an improvement, but I don't know if it's worth the extra complexity introduced here.

But, your house, your rules.

nicoddemus · 2024-07-13T14:19:21Z

The problem wasn't that cache validation was taking a long time, it was that cache validation was throwing stack traces due to wonky inputs.

I revisited the issue and indeed the problem was not performance, but the fact it was raising an error when comparing with ==.

But I think my latest change takes care of both concerns in the end? One thing I still did not get is why the parametrized values were failing the identity check... should always be the same object, unless we are making a copy somewhere?

0xDEC0DE · 2024-07-13T14:34:21Z

The problem wasn't that cache validation was taking a long time, it was that cache validation was throwing stack traces due to wonky inputs.

I revisited the issue and indeed the problem was not performance, but the fact it was raising an error when comparing with ==.

But I think my latest change takes care of both concerns in the end? One thing I still did not get is why the parametrized values were failing the identity check... should always be the same object, unless we are making a copy somewhere?

The test case in #6962 roughly simulates how we hit it: using string interpolation to generate parameter sets, such that separate tests in the session should, by all appearances, use the same fixture. And since string operations LOVE making copies:

>>> "lol{}".format("wut") == "lol{}".format("wut")
True
>>> "lol{}".format("wut") is "lol{}".format("wut")
False

...it wasn't using the cached fixture.

nicoddemus · 2024-07-13T14:41:19Z

Yeah that I get, I don't get is why at some point a copy of that string seems to be made... after passing the object (whatever the object is, it can be a string, a list, a custom object) over to parametrize, that object should always be the same inside the pytest machinery, but seems at some point someone makes a copy of that.

nicoddemus · 2024-07-13T14:47:22Z

Ahh of course, pytest_generate_tests is called once for each metafunc... of course we will have multiple instances then. 🤦‍♂️

0xDEC0DE · 2024-07-13T15:01:26Z

Ahh of course, pytest_generate_tests is called once for each metafunc... of course we will have multiple instances then. 🤦‍♂️

Is this insight worth adding to the code comments?

nicoddemus · 2024-07-13T15:18:30Z

Don't think so, brainfart on my part I guess.

nicoddemus · 2024-07-15T12:29:47Z

@bluetech would you like to do another review, or are we OK to merge?

RonnyPfannschmidt

Makes me wonder if we shouldn't just intern hashable objects

bluetech · 2024-07-16T10:13:36Z

I'm pretty sure there are people using some large values as parameters, and I think they will complain if we make this change. There have been previous proposals for solving the "identity instead of equality" problem, one was for example to use equality but provide a workaround to use the param id as the cache key (see #9420). But we didn't go forward with them in the end.

Generally I get the sense that the number of people complaining about is is greater than the number who will complain about ==/hashing slowness. But it's hard to be sure.

I'd be fine with merging this change as long as we consider what to say to people who will start complaining about slowness. One way will be to tell them not to do it, for example to some value in the parameter to stand in for the actual big value and only grab the actual value in the test. Another will be to provide some builtin solution like in #9420.

src/_pytest/fixtures.py

bluetech

OK, let's try it and see what happens...

(I'm going to remove the backport label -- though we're probably not going to have a patch release I wouldn't want to backport this anyway).

changelog/6962.bugfix.rst

The fix for Issue pytest-dev#6541 caused regression where cache hits became cache misses, unexpectedly. Attempt to restore the previous behavior, while also retaining the fix for the bug. Fixes: Issue pytest-dev#6962

Co-authored-by: Ran Benita <[email protected]>

The fix for Issue pytest-dev#6541 caused regression where cache hits became cache misses, unexpectedly. Fixes pytest-dev#6962 --------- Co-authored-by: Nicolas Simonds <[email protected]> Co-authored-by: Bruno Oliveira <[email protected]> Co-authored-by: Ran Benita <[email protected]>

psf-chronographer bot added the bot:chronographer:provided (automation) changelog entry is part of PR label Jul 11, 2024

0xDEC0DE changed the title ~~fix: allow caching of parameterized fixtures~~ fix: improved caching of parameterized fixtures Jul 11, 2024

nicoddemus requested changes Jul 11, 2024

View reviewed changes

changelog/6962.bugfix.rst Outdated Show resolved Hide resolved

nicoddemus added the backport 8.2.x label Jul 11, 2024

0xDEC0DE force-pushed the issue/6962 branch from 1ab185c to 1dc1eb6 Compare July 11, 2024 23:43

0xDEC0DE requested a review from nicoddemus July 11, 2024 23:44

0xDEC0DE commented Jul 11, 2024

View reviewed changes

testing/python/fixtures.py Outdated Show resolved Hide resolved

0xDEC0DE force-pushed the issue/6962 branch from 1dc1eb6 to d38a997 Compare July 11, 2024 23:50

nicoddemus reviewed Jul 12, 2024

View reviewed changes

testing/python/fixtures.py Show resolved Hide resolved

nicoddemus approved these changes Jul 12, 2024

View reviewed changes

0xDEC0DE force-pushed the issue/6962 branch from e6518d0 to 3670225 Compare July 12, 2024 20:28

bluetech requested changes Jul 13, 2024

View reviewed changes

nicoddemus requested a review from bluetech July 13, 2024 13:56

0xDEC0DE commented Jul 13, 2024

View reviewed changes

RonnyPfannschmidt reviewed Jul 15, 2024

View reviewed changes

bluetech requested changes Jul 16, 2024

View reviewed changes

src/_pytest/fixtures.py Outdated Show resolved Hide resolved

nicoddemus requested a review from bluetech July 17, 2024 00:19

bluetech approved these changes Jul 17, 2024

View reviewed changes

changelog/6962.bugfix.rst Outdated Show resolved Hide resolved

bluetech removed the backport 8.2.x label Jul 17, 2024

nisimond and others added 6 commits July 17, 2024 09:35

fix: improve caching of parameterized fixtures

d434bcf

The fix for Issue pytest-dev#6541 caused regression where cache hits became cache misses, unexpectedly. Attempt to restore the previous behavior, while also retaining the fix for the bug. Fixes: Issue pytest-dev#6962

Update testing/python/fixtures.py

a8ce305

Improve test and comment

bf5ca7b

Identity comparsion first

3153ab2

Change comparison order

fa7147b

Update changelog/6962.bugfix.rst

a8229c2

Co-authored-by: Ran Benita <[email protected]>

nicoddemus force-pushed the issue/6962 branch from 4cc02c1 to a8229c2 Compare July 17, 2024 12:35

nicoddemus merged commit d489247 into pytest-dev:main Jul 17, 2024
29 checks passed

0xDEC0DE deleted the issue/6962 branch July 17, 2024 16:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: improved caching of parameterized fixtures #12600

fix: improved caching of parameterized fixtures #12600

0xDEC0DE commented Jul 11, 2024 •

edited

Loading

nicoddemus left a comment

0xDEC0DE commented Jul 11, 2024

0xDEC0DE commented Jul 12, 2024

nicoddemus left a comment

nicoddemus commented Jul 12, 2024

bluetech left a comment

nicoddemus commented Jul 13, 2024

0xDEC0DE commented Jul 13, 2024

0xDEC0DE Jul 13, 2024

nicoddemus commented Jul 13, 2024

0xDEC0DE commented Jul 13, 2024

nicoddemus commented Jul 13, 2024

nicoddemus commented Jul 13, 2024

0xDEC0DE commented Jul 13, 2024

nicoddemus commented Jul 13, 2024

nicoddemus commented Jul 15, 2024

RonnyPfannschmidt left a comment

bluetech commented Jul 16, 2024

bluetech left a comment

fix: improved caching of parameterized fixtures #12600

fix: improved caching of parameterized fixtures #12600

Conversation

0xDEC0DE commented Jul 11, 2024 • edited Loading

nicoddemus left a comment

Choose a reason for hiding this comment

0xDEC0DE commented Jul 11, 2024

0xDEC0DE commented Jul 12, 2024

nicoddemus left a comment

Choose a reason for hiding this comment

nicoddemus commented Jul 12, 2024

bluetech left a comment

Choose a reason for hiding this comment

nicoddemus commented Jul 13, 2024

0xDEC0DE commented Jul 13, 2024

0xDEC0DE Jul 13, 2024

Choose a reason for hiding this comment

nicoddemus commented Jul 13, 2024

0xDEC0DE commented Jul 13, 2024

nicoddemus commented Jul 13, 2024

nicoddemus commented Jul 13, 2024

0xDEC0DE commented Jul 13, 2024

nicoddemus commented Jul 13, 2024

nicoddemus commented Jul 15, 2024

RonnyPfannschmidt left a comment

Choose a reason for hiding this comment

bluetech commented Jul 16, 2024

bluetech left a comment

Choose a reason for hiding this comment

0xDEC0DE commented Jul 11, 2024 •

edited

Loading