Do not fetch result cache twice in case of cache miss #4189

k0ka · 2020-07-24T11:50:53Z

Q	A
Type	bug
BC Break	yes
Fixed issues	#4178

Summary

It is #4169 for 3.0.x version. BC Break doesn't really matters, as it's already broken by renaming of CachingResult class

morozov · 2020-07-24T15:34:04Z

src/Cache/CachingResult.php

     */
-    public function __construct(Result $result, Cache $cache, $cacheKey, $realKey, $lifetime)
+    public function __construct(Result $result, Cache $cache, $cacheKey, $realKey, $lifetime, $fetchedData)


See #4169 (comment).

$fetchedData - is data, fetched from the result cache by $cacheKey key. It is a hash of $realKey to serialized $result.
One result cache data may store more than 1 result caches, that are distinguished by realKey.

CachingResult is updating one of these keys with the underlying Result.

In order to keep all other keys, it should have the full data, stored in cache. It is fetching it now

dbal/src/Cache/CachingResult.php

Line 173 in 824d2eb

$data = $this->cache->fetch($this->cacheKey);

and than appends its key to the fetched array

dbal/src/Cache/CachingResult.php

Line 179 in 824d2eb

$data[$this->realKey] = $this->data;

However, this array was fetched from cache before

dbal/src/Connection.php

Line 1006 in 824d2eb

$data = $resultCache->fetch($cacheKey);

and might be just passed to the constructor of CachingResult

This PR does exactly this - removes 2nd fetching from CachingResult and passes fetched data from Connection.

@k0ka thank you for the details. I'm still trying to get to the bottom of this. The last time I touched this layer, I described the responsibility of this class as following (#4048 (comment)):

This class is responsible only for reading from the underlying statement and storing the result in the cache if there was no hit.

Am I wrong, or, otherwise, why does the class try to fetch from the cache if there knowingly was no hit?

In any event, as I asked earlier in #4169 (review), this change will require a unit test that will also help find the most optimal solution for this problem. Right now, it looks dirty that besides the Result that contains the fetched data, the object needs to receive the data fetched from that result.

Am I wrong, or, otherwise, why does the class try to fetch from the cache if there knowingly was no hit?

Because one cache key may contain more than one result.

it looks dirty that besides the Result that contains the fetched data, the object needs to receive the data fetched from that result.

I have added unit and functional test to show you how it works now.

I have added unit and functional test to show you how it works now.

Thanks for the test. I understand the logic better now.

Because one cache key may contain more than one result.

Do you know why we need to store more than one result in a cache entry? I.e. why not store each key individually and use the cache key as a sort of namespace for hashing the keys?

Do you know why we need to store more than one result in a cache entry?

I don't know exactly, but I think it's done to be able to easily clear cache entries. You don't have to know exact sql and all parameters to purge its result.

I see. So it holds all the data that corresponds to the given cache profile in one entry… I don't know what to say.

Suggested change

public function __construct(Result $result, Cache $cache, $cacheKey, $realKey, $lifetime, $fetchedData)

public function __construct(Result $result, Cache $cache, $cacheKey, $realKey, $lifetime, array $fetchedData)

morozov · 2020-07-29T00:59:52Z

tests/Functional/ResultCacheTest.php

+    public function testCacheQueriedOnlyOnceForCacheMiss(): void
+    {
+        $layerCache = $this->createMock(ArrayCache::class);
+        $layerCache->expects(self::once())


I'd put this under the unit category since it meant to cover the logic of the unit that calls the fetch and it doesn't depend on the actual implementation of the cache.

It can't be a unit test as it covers the full use case. Right now, there are two invocations of fetch - in different places of the library. This test checks, that fetch from real cache is executed only once.

In my PR it is executed in the Connection::executeCacheQuery method. However, it might be executed anywhere and moved somewhere else in the future.

IMO it should be either a functional test the tests certain functionality (hence not mocking any dependencies) or a unit test that covers a given unit and mocks the dependencies. The purpose of such a hybrid test is unclear. If it fails, it's unclear whether the case is broken or a specific unit is because the test doesn't cover either of the two.

I may change it to ArrayCache, but:

it will depend on the ArrayCache implementation, which is a part of another project - Doctrine/Common.

essentially ArrayCache is a mock by itself. I can hardly imagine why this class might be used in the production environment.

dbal must accept any object, that implements Doctrine\Common\Cache\Cache interface and work with it. This class is external relative to dbal.

Functional testing performs tests of the whole library by sending input to its end-user endpoint and examining output (without checking internal structure). In our case, cache-object is just another type of the input. And we're checking whether dbal uses it correctly.

What's your take on "Don't mock what you don't own?"
To me, as you pointed out in 2. ArrayCache is already a mock, but with a few differences:
a) it's easier to use, you don't have to write any expectations, only new ArrayCache()
b) it's closer to reality although it probably won't be used in production (or won't it? you don't know that, could be useful in a long-running process)
c) it is not controlled by you, and a change in it may break your tests; and if that's the case you want to know you don't want to wait until production to notice it breaks.

I don't get this question and how it is related to this topic.

Let me explain one more time:

dbal has the "contract": it will accept any object that implements Doctrine\Common\Cache\Cache interface and will use it independenty of it's internal structure to store and retrieve cache entries. In most unit and functional tests Doctrine\Common\Cache\ArrayCache is used. But it's done just for simplicity. dbal must be able to work with any cache provider.

I'm mocking my own object, that implements Doctrine\Common\Cache\Cache interface, because in this particular case it's easier.

Why Doctrine\Common\Cache\ArrayCache is harder to use? The only way to get the number of fetches from it is to retrieve statistics by the getStats() method. But consider running this code:

$cache = new ArrayCache(); $cache->fetch('test_key'); self::assertEquals(1, $cache->getStats()[Cache::STATS_MISSES]);

It will fail. The right value for Cache::STATS_MISSES after one fetch() is 2. It's because of their internal structure (the first cache miss is due to fetching namespace version - but it doesn't matter here). So instead of my current test code

$layerCache->expects(self::once())->method('fetch')

I will have to write the following code

self::assertEquals(2, $layerCache->getStats()[Cache::STATS_MISSES]);

It's completely unclear why I'm using 2 here, while the whole idea of this patch is to remove the double fetch. And if they remove this "namespace version" feature in the ArrayCache (as it useless in this case), this test will fail.

TL;DR:

This test is functional as it treats dbal as a whole and doesn't depend on its internal structure

Mock is used because it's easier to calculate number of real fetches. While ArrayCache is quite simple, its statistics is distorted.

Oh I see, you really need a mock since you are adding expectations to it.

essentially ArrayCache is a mock by itself.

It's more like a stub in fact, that's what got me confused. It's easier to use as a stub, but can't be used as a mock since you can't know which methods were called, and instead just get some distorted stats.

Why does interaction between the units of code (the cache and the underlying statement) need to be tested in a functional way? It's obviously a unit-level requirement, not a functional requirement.

The contract you're talking about is a unit-level contract. A cache is a non-functional unit by definition since it's not meant to change the behavior of the system. It only changes the internal implementation details.

Why does interaction between the units of code

Any interaction is an "interaction between units of code". The question is where these units of code reside. Cache is not part of DBAL.

A cache is a non-functional unit by definition since it's not meant to change the behavior of the system.

Cache DOES change the behavior of the system. The system is working differently depending on what the cache object returns.

I see you don't understand (or don't want to understand) why this test is functional. I can delete it if you want. Or I can copy it to the "tests/Cache" directory.

I see you don't understand (or don't want to understand)

Please treat others more kindly, especially when they take that much time to answer to you. Assuming either ignorance or malice like this will not help.

As for your definition of behavior, it's debatable. To me, it would be more about direct inputs and outputs / the public interface of the system. Whether the system calls systems external systems to create the output should not be relevant in a functional test IMO.

morozov · 2020-07-29T01:00:09Z

tests/Functional/ResultCacheTest.php

+        $result->fetchAllAssociative();
+    }
+
+    public function testDifferentQueriesMayBeSavedToOneCacheKey(): void


This test passes even without the fix. Is it really needed?

It shows how many queries may be stored in one cache key. This logic wasn't explicitly shown in other tests and you weren't aware of it, so I think it's better to leave this test. However, it might be taken out to another PR, as it doesn't directly relates to this one.

morozov · 2020-07-29T01:05:59Z

src/Cache/CachingResult.php

     */
-    public function __construct(Result $result, Cache $cache, $cacheKey, $realKey, $lifetime)
+    public function __construct(Result $result, Cache $cache, $cacheKey, $realKey, $lifetime, $fetchedData)


I have added unit and functional test to show you how it works now.

Thanks for the test. I understand the logic better now.

Because one cache key may contain more than one result.

Do you know why we need to store more than one result in a cache entry? I.e. why not store each key individually and use the cache key as a sort of namespace for hashing the keys?

morozov

@doctrine/team-dbal anyone wants to take over this? I don't really understand how the caching layer is supposed to be used nor I have time/desire to learn.

morozov · 2020-08-04T00:57:04Z

src/Cache/CachingResult.php

     */
-    public function __construct(Result $result, Cache $cache, $cacheKey, $realKey, $lifetime)
+    public function __construct(Result $result, Cache $cache, $cacheKey, $realKey, $lifetime, $fetchedData)


I see. So it holds all the data that corresponds to the given cache profile in one entry… I don't know what to say.

morozov · 2020-08-04T00:59:02Z

tests/Functional/ResultCacheTest.php

+    public function testCacheQueriedOnlyOnceForCacheMiss(): void
+    {
+        $layerCache = $this->createMock(ArrayCache::class);
+        $layerCache->expects(self::once())


IMO it should be either a functional test the tests certain functionality (hence not mocking any dependencies) or a unit test that covers a given unit and mocks the dependencies. The purpose of such a hybrid test is unclear. If it fails, it's unclear whether the case is broken or a specific unit is because the test doesn't cover either of the two.

k0ka · 2020-09-06T09:33:31Z

This PR is staying without review and comments for more than a month. What can I do to push it?

SenseException

I don't really understand how the caching layer is supposed to be used

Neither do I, but maybe I can review this based on how the cache code is working.

SenseException · 2020-09-06T21:24:53Z

src/Cache/CachingResult.php

     */
-    public function __construct(Result $result, Cache $cache, $cacheKey, $realKey, $lifetime)
+    public function __construct(Result $result, Cache $cache, $cacheKey, $realKey, $lifetime, $fetchedData)


Suggested change

public function __construct(Result $result, Cache $cache, $cacheKey, $realKey, $lifetime, $fetchedData)

public function __construct(Result $result, Cache $cache, $cacheKey, $realKey, $lifetime, array $fetchedData)

tests/Cache/CachingResultTest.php

tests/Functional/ResultCacheTest.php

SenseException

@doctrine/team-dbal

greg0ire · 2020-09-07T18:50:52Z

You can drop the "retry test" commit, and should squash the 2 remaining commits together. Be aware that retrying a build can be done by closing and reopening the PR

morozov

This patch requires additional work on tests. I understand that the author has put some effort in writing them and they pass and they cover the code but the way they mix the unit and functional concerns may become a problem during their maintenance.

Please have the unit and functional concerns separated.

tests/Functional/ResultCacheTest.php

morozov · 2020-09-07T18:56:37Z

tests/Cache/CachingResultTest.php

+
+        do {
+            $row = $cachingResult->fetchAssociative();
+        } while ($row !== false);


Why does a test need flow control logic? Do we not know how many rows are expected to be fetched here?

In this particular test, we know that the 2nd fetchAssociative should be false. But that is not what is tested here. This block is a standard way to fetch all rows.

I may use Doctrine\DBAL\Driver\FetchUtils::fetchAllAssociative instead of this block if you like.

tests/Cache/CachingResultTest.php

Co-authored-by: Claudio Zizza <[email protected]> Co-authored-by: Grégoire Paris <[email protected]>

greg0ire · 2020-09-27T10:56:46Z

tests/Functional/ResultCacheTest.php

+        self::assertCount(
+            4,
+            $this->sqlLogger->queries,
+            'Deleting one cache key leads to deleting cache for both queris'


Suggested change

'Deleting one cache key leads to deleting cache for both queris'

'Deleting one cache key leads to deleting cache for both queries'

The base branch was changed.

k0ka mentioned this pull request Jul 24, 2020

Do not query result cache twice in case of cache miss #4169

Closed

morozov reviewed Jul 24, 2020

View reviewed changes

k0ka force-pushed the 3.0.x-double-cache-fetch branch 2 times, most recently from 3e93697 to e2b927f Compare July 26, 2020 11:52

morozov reviewed Jul 29, 2020

View reviewed changes

morozov reviewed Aug 4, 2020

View reviewed changes

SenseException requested changes Sep 6, 2020

View reviewed changes

SenseException previously approved these changes Sep 7, 2020

View reviewed changes

morozov reviewed Sep 7, 2020

View reviewed changes

greg0ire reviewed Sep 7, 2020

View reviewed changes

tests/Cache/CachingResultTest.php Outdated Show resolved Hide resolved

k0ka dismissed SenseException’s stale review via b67dbe2 September 7, 2020 19:18

Do not fetch result cache twice in case of cache miss

3365cbd

Co-authored-by: Claudio Zizza <[email protected]> Co-authored-by: Grégoire Paris <[email protected]>

k0ka force-pushed the 3.0.x-double-cache-fetch branch from b67dbe2 to 3365cbd Compare September 7, 2020 19:31

k0ka requested review from SenseException and greg0ire September 11, 2020 12:06

SenseException previously approved these changes Sep 26, 2020

View reviewed changes

greg0ire requested review from morozov and removed request for morozov September 27, 2020 10:52

greg0ire reviewed Sep 27, 2020

View reviewed changes

morozov closed this Apr 21, 2021

morozov deleted the branch doctrine:3.1.x April 21, 2021 16:26

morozov reopened this Apr 23, 2021

morozov changed the base branch from 3.0.x to 3.1.x April 23, 2021 16:13

This was referenced May 4, 2021

Merge 2.13.x/3.1.x into 3.2.x #4633

Closed

executeCacheQuery + fetchOne is a footgun #4634

Closed

morozov mentioned this pull request Nov 20, 2021

Always cache the full result #5003

Merged

morozov closed this Nov 20, 2021

github-actions bot locked as resolved and limited conversation to collaborators Nov 21, 2022

	public function __construct(Result $result, Cache $cache, $cacheKey, $realKey, $lifetime, $fetchedData)
	public function __construct(Result $result, Cache $cache, $cacheKey, $realKey, $lifetime, array $fetchedData)

	'Deleting one cache key leads to deleting cache for both queris'
	'Deleting one cache key leads to deleting cache for both queries'

Do not fetch result cache twice in case of cache miss #4189

Do not fetch result cache twice in case of cache miss #4189

Conversation

k0ka commented Jul 24, 2020

Summary

Choose a reason for hiding this comment

k0ka Jul 24, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

morozov Sep 29, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

morozov left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

k0ka commented Sep 6, 2020

SenseException left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SenseException left a comment

Choose a reason for hiding this comment

greg0ire commented Sep 7, 2020

morozov left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

k0ka Jul 24, 2020 •

edited

Loading

morozov Sep 29, 2020 •

edited

Loading