Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not fetch result cache twice in case of cache miss #4189

Closed
wants to merge 1 commit into from

Conversation

k0ka
Copy link

@k0ka k0ka commented Jul 24, 2020

Q A
Type bug
BC Break yes
Fixed issues #4178

Summary

It is #4169 for 3.0.x version. BC Break doesn't really matters, as it's already broken by renaming of CachingResult class

*/
public function __construct(Result $result, Cache $cache, $cacheKey, $realKey, $lifetime)
public function __construct(Result $result, Cache $cache, $cacheKey, $realKey, $lifetime, $fetchedData)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

@k0ka k0ka Jul 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$fetchedData - is data, fetched from the result cache by $cacheKey key. It is a hash of $realKey to serialized $result.
One result cache data may store more than 1 result caches, that are distinguished by realKey.

CachingResult is updating one of these keys with the underlying Result.

In order to keep all other keys, it should have the full data, stored in cache. It is fetching it now

$data = $this->cache->fetch($this->cacheKey);

and than appends its key to the fetched array
$data[$this->realKey] = $this->data;

However, this array was fetched from cache before

$data = $resultCache->fetch($cacheKey);

and might be just passed to the constructor of CachingResult

This PR does exactly this - removes 2nd fetching from CachingResult and passes fetched data from Connection.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@k0ka thank you for the details. I'm still trying to get to the bottom of this. The last time I touched this layer, I described the responsibility of this class as following (#4048 (comment)):

This class is responsible only for reading from the underlying statement and storing the result in the cache if there was no hit.

Am I wrong, or, otherwise, why does the class try to fetch from the cache if there knowingly was no hit?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In any event, as I asked earlier in #4169 (review), this change will require a unit test that will also help find the most optimal solution for this problem. Right now, it looks dirty that besides the Result that contains the fetched data, the object needs to receive the data fetched from that result.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I wrong, or, otherwise, why does the class try to fetch from the cache if there knowingly was no hit?

Because one cache key may contain more than one result.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks dirty that besides the Result that contains the fetched data, the object needs to receive the data fetched from that result.

I have added unit and functional test to show you how it works now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added unit and functional test to show you how it works now.

Thanks for the test. I understand the logic better now.

Because one cache key may contain more than one result.

Do you know why we need to store more than one result in a cache entry? I.e. why not store each key individually and use the cache key as a sort of namespace for hashing the keys?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know why we need to store more than one result in a cache entry?

I don't know exactly, but I think it's done to be able to easily clear cache entries. You don't have to know exact sql and all parameters to purge its result.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. So it holds all the data that corresponds to the given cache profile in one entry… I don't know what to say.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
public function __construct(Result $result, Cache $cache, $cacheKey, $realKey, $lifetime, $fetchedData)
public function __construct(Result $result, Cache $cache, $cacheKey, $realKey, $lifetime, array $fetchedData)

@k0ka k0ka force-pushed the 3.0.x-double-cache-fetch branch 2 times, most recently from 3e93697 to e2b927f Compare July 26, 2020 11:52
public function testCacheQueriedOnlyOnceForCacheMiss(): void
{
$layerCache = $this->createMock(ArrayCache::class);
$layerCache->expects(self::once())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd put this under the unit category since it meant to cover the logic of the unit that calls the fetch and it doesn't depend on the actual implementation of the cache.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can't be a unit test as it covers the full use case. Right now, there are two invocations of fetch - in different places of the library. This test checks, that fetch from real cache is executed only once.

In my PR it is executed in the Connection::executeCacheQuery method. However, it might be executed anywhere and moved somewhere else in the future.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO it should be either a functional test the tests certain functionality (hence not mocking any dependencies) or a unit test that covers a given unit and mocks the dependencies. The purpose of such a hybrid test is unclear. If it fails, it's unclear whether the case is broken or a specific unit is because the test doesn't cover either of the two.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may change it to ArrayCache, but:

  1. it will depend on the ArrayCache implementation, which is a part of another project - Doctrine/Common.
  2. essentially ArrayCache is a mock by itself. I can hardly imagine why this class might be used in the production environment.
  3. dbal must accept any object, that implements Doctrine\Common\Cache\Cache interface and work with it. This class is external relative to dbal.

Functional testing performs tests of the whole library by sending input to its end-user endpoint and examining output (without checking internal structure). In our case, cache-object is just another type of the input. And we're checking whether dbal uses it correctly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's your take on "Don't mock what you don't own?"
To me, as you pointed out in 2. ArrayCache is already a mock, but with a few differences:
a) it's easier to use, you don't have to write any expectations, only new ArrayCache()
b) it's closer to reality although it probably won't be used in production (or won't it? you don't know that, could be useful in a long-running process)
c) it is not controlled by you, and a change in it may break your tests; and if that's the case you want to know you don't want to wait until production to notice it breaks.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get this question and how it is related to this topic.

Let me explain one more time:

dbal has the "contract": it will accept any object that implements Doctrine\Common\Cache\Cache interface and will use it independenty of it's internal structure to store and retrieve cache entries. In most unit and functional tests Doctrine\Common\Cache\ArrayCache is used. But it's done just for simplicity. dbal must be able to work with any cache provider.

I'm mocking my own object, that implements Doctrine\Common\Cache\Cache interface, because in this particular case it's easier.

Why Doctrine\Common\Cache\ArrayCache is harder to use? The only way to get the number of fetches from it is to retrieve statistics by the getStats() method. But consider running this code:

$cache = new ArrayCache();
$cache->fetch('test_key');
self::assertEquals(1, $cache->getStats()[Cache::STATS_MISSES]);

It will fail. The right value for Cache::STATS_MISSES after one fetch() is 2. It's because of their internal structure (the first cache miss is due to fetching namespace version - but it doesn't matter here). So instead of my current test code

$layerCache->expects(self::once())->method('fetch')

I will have to write the following code

self::assertEquals(2, $layerCache->getStats()[Cache::STATS_MISSES]);

It's completely unclear why I'm using 2 here, while the whole idea of this patch is to remove the double fetch. And if they remove this "namespace version" feature in the ArrayCache (as it useless in this case), this test will fail.

TL;DR:

  1. This test is functional as it treats dbal as a whole and doesn't depend on its internal structure
  2. Mock is used because it's easier to calculate number of real fetches. While ArrayCache is quite simple, its statistics is distorted.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see, you really need a mock since you are adding expectations to it.

essentially ArrayCache is a mock by itself.

It's more like a stub in fact, that's what got me confused. It's easier to use as a stub, but can't be used as a mock since you can't know which methods were called, and instead just get some distorted stats.

Copy link
Member

@morozov morozov Sep 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does interaction between the units of code (the cache and the underlying statement) need to be tested in a functional way? It's obviously a unit-level requirement, not a functional requirement.

The contract you're talking about is a unit-level contract. A cache is a non-functional unit by definition since it's not meant to change the behavior of the system. It only changes the internal implementation details.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does interaction between the units of code

Any interaction is an "interaction between units of code". The question is where these units of code reside. Cache is not part of DBAL.

A cache is a non-functional unit by definition since it's not meant to change the behavior of the system.

Cache DOES change the behavior of the system. The system is working differently depending on what the cache object returns.

I see you don't understand (or don't want to understand) why this test is functional. I can delete it if you want. Or I can copy it to the "tests/Cache" directory.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see you don't understand (or don't want to understand)

Please treat others more kindly, especially when they take that much time to answer to you. Assuming either ignorance or malice like this will not help.

As for your definition of behavior, it's debatable. To me, it would be more about direct inputs and outputs / the public interface of the system. Whether the system calls systems external systems to create the output should not be relevant in a functional test IMO.

$result->fetchAllAssociative();
}

public function testDifferentQueriesMayBeSavedToOneCacheKey(): void
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test passes even without the fix. Is it really needed?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It shows how many queries may be stored in one cache key. This logic wasn't explicitly shown in other tests and you weren't aware of it, so I think it's better to leave this test. However, it might be taken out to another PR, as it doesn't directly relates to this one.

*/
public function __construct(Result $result, Cache $cache, $cacheKey, $realKey, $lifetime)
public function __construct(Result $result, Cache $cache, $cacheKey, $realKey, $lifetime, $fetchedData)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added unit and functional test to show you how it works now.

Thanks for the test. I understand the logic better now.

Because one cache key may contain more than one result.

Do you know why we need to store more than one result in a cache entry? I.e. why not store each key individually and use the cache key as a sort of namespace for hashing the keys?

Copy link
Member

@morozov morozov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@doctrine/team-dbal anyone wants to take over this? I don't really understand how the caching layer is supposed to be used nor I have time/desire to learn.

*/
public function __construct(Result $result, Cache $cache, $cacheKey, $realKey, $lifetime)
public function __construct(Result $result, Cache $cache, $cacheKey, $realKey, $lifetime, $fetchedData)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. So it holds all the data that corresponds to the given cache profile in one entry… I don't know what to say.

public function testCacheQueriedOnlyOnceForCacheMiss(): void
{
$layerCache = $this->createMock(ArrayCache::class);
$layerCache->expects(self::once())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO it should be either a functional test the tests certain functionality (hence not mocking any dependencies) or a unit test that covers a given unit and mocks the dependencies. The purpose of such a hybrid test is unclear. If it fails, it's unclear whether the case is broken or a specific unit is because the test doesn't cover either of the two.

@k0ka
Copy link
Author

k0ka commented Sep 6, 2020

This PR is staying without review and comments for more than a month. What can I do to push it?

Copy link
Member

@SenseException SenseException left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really understand how the caching layer is supposed to be used

Neither do I, but maybe I can review this based on how the cache code is working.

*/
public function __construct(Result $result, Cache $cache, $cacheKey, $realKey, $lifetime)
public function __construct(Result $result, Cache $cache, $cacheKey, $realKey, $lifetime, $fetchedData)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
public function __construct(Result $result, Cache $cache, $cacheKey, $realKey, $lifetime, $fetchedData)
public function __construct(Result $result, Cache $cache, $cacheKey, $realKey, $lifetime, array $fetchedData)

tests/Cache/CachingResultTest.php Outdated Show resolved Hide resolved
tests/Functional/ResultCacheTest.php Show resolved Hide resolved
SenseException
SenseException previously approved these changes Sep 7, 2020
Copy link
Member

@SenseException SenseException left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@doctrine/team-dbal

@greg0ire
Copy link
Member

greg0ire commented Sep 7, 2020

You can drop the "retry test" commit, and should squash the 2 remaining commits together. Be aware that retrying a build can be done by closing and reopening the PR

Copy link
Member

@morozov morozov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This patch requires additional work on tests. I understand that the author has put some effort in writing them and they pass and they cover the code but the way they mix the unit and functional concerns may become a problem during their maintenance.

Please have the unit and functional concerns separated.

tests/Functional/ResultCacheTest.php Show resolved Hide resolved

do {
$row = $cachingResult->fetchAssociative();
} while ($row !== false);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does a test need flow control logic? Do we not know how many rows are expected to be fetched here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this particular test, we know that the 2nd fetchAssociative should be false. But that is not what is tested here. This block is a standard way to fetch all rows.

I may use Doctrine\DBAL\Driver\FetchUtils::fetchAllAssociative instead of this block if you like.

Co-authored-by: Claudio Zizza <[email protected]>
Co-authored-by: Grégoire Paris <[email protected]>
@k0ka k0ka force-pushed the 3.0.x-double-cache-fetch branch from b67dbe2 to 3365cbd Compare September 7, 2020 19:31
SenseException
SenseException previously approved these changes Sep 26, 2020
@greg0ire greg0ire requested review from morozov and removed request for morozov September 27, 2020 10:52
self::assertCount(
4,
$this->sqlLogger->queries,
'Deleting one cache key leads to deleting cache for both queris'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
'Deleting one cache key leads to deleting cache for both queris'
'Deleting one cache key leads to deleting cache for both queries'

@morozov morozov closed this Apr 21, 2021
@morozov morozov deleted the branch doctrine:3.1.x April 21, 2021 16:26
@morozov morozov reopened this Apr 23, 2021
@morozov morozov changed the base branch from 3.0.x to 3.1.x April 23, 2021 16:13
@morozov morozov dismissed SenseException’s stale review April 23, 2021 16:13

The base branch was changed.

@morozov morozov closed this Nov 20, 2021
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 21, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants