-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do not fetch result cache twice in case of cache miss #4189
Conversation
src/Cache/CachingResult.php
Outdated
*/ | ||
public function __construct(Result $result, Cache $cache, $cacheKey, $realKey, $lifetime) | ||
public function __construct(Result $result, Cache $cache, $cacheKey, $realKey, $lifetime, $fetchedData) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #4169 (comment).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
$fetchedData
- is data, fetched from the result cache by $cacheKey
key. It is a hash of $realKey
to serialized $result
.
One result cache data may store more than 1 result caches, that are distinguished by realKey.
CachingResult
is updating one of these keys with the underlying Result
.
In order to keep all other keys, it should have the full data, stored in cache. It is fetching it now
dbal/src/Cache/CachingResult.php
Line 173 in 824d2eb
$data = $this->cache->fetch($this->cacheKey); |
and than appends its key to the fetched array
dbal/src/Cache/CachingResult.php
Line 179 in 824d2eb
$data[$this->realKey] = $this->data; |
However, this array was fetched from cache before
Line 1006 in 824d2eb
$data = $resultCache->fetch($cacheKey); |
and might be just passed to the constructor of
CachingResult
This PR does exactly this - removes 2nd fetching from CachingResult and passes fetched data from Connection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@k0ka thank you for the details. I'm still trying to get to the bottom of this. The last time I touched this layer, I described the responsibility of this class as following (#4048 (comment)):
This class is responsible only for reading from the underlying statement and storing the result in the cache if there was no hit.
Am I wrong, or, otherwise, why does the class try to fetch from the cache if there knowingly was no hit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In any event, as I asked earlier in #4169 (review), this change will require a unit test that will also help find the most optimal solution for this problem. Right now, it looks dirty that besides the Result
that contains the fetched data, the object needs to receive the data fetched from that result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Am I wrong, or, otherwise, why does the class try to fetch from the cache if there knowingly was no hit?
Because one cache key may contain more than one result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it looks dirty that besides the
Result
that contains the fetched data, the object needs to receive the data fetched from that result.
I have added unit and functional test to show you how it works now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added unit and functional test to show you how it works now.
Thanks for the test. I understand the logic better now.
Because one cache key may contain more than one result.
Do you know why we need to store more than one result in a cache entry? I.e. why not store each key individually and use the cache key as a sort of namespace for hashing the keys?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know why we need to store more than one result in a cache entry?
I don't know exactly, but I think it's done to be able to easily clear cache entries. You don't have to know exact sql and all parameters to purge its result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. So it holds all the data that corresponds to the given cache profile in one entry… I don't know what to say.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public function __construct(Result $result, Cache $cache, $cacheKey, $realKey, $lifetime, $fetchedData) | |
public function __construct(Result $result, Cache $cache, $cacheKey, $realKey, $lifetime, array $fetchedData) |
3e93697
to
e2b927f
Compare
public function testCacheQueriedOnlyOnceForCacheMiss(): void | ||
{ | ||
$layerCache = $this->createMock(ArrayCache::class); | ||
$layerCache->expects(self::once()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd put this under the unit category since it meant to cover the logic of the unit that calls the fetch and it doesn't depend on the actual implementation of the cache.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can't be a unit test as it covers the full use case. Right now, there are two invocations of fetch
- in different places of the library. This test checks, that fetch
from real cache is executed only once.
In my PR it is executed in the Connection::executeCacheQuery
method. However, it might be executed anywhere and moved somewhere else in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO it should be either a functional test the tests certain functionality (hence not mocking any dependencies) or a unit test that covers a given unit and mocks the dependencies. The purpose of such a hybrid test is unclear. If it fails, it's unclear whether the case is broken or a specific unit is because the test doesn't cover either of the two.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may change it to ArrayCache
, but:
- it will depend on the
ArrayCache
implementation, which is a part of another project -Doctrine/Common
. - essentially
ArrayCache
is a mock by itself. I can hardly imagine why this class might be used in the production environment. dbal
must accept any object, that implementsDoctrine\Common\Cache\Cache
interface and work with it. This class is external relative todbal
.
Functional testing performs tests of the whole library by sending input to its end-user endpoint and examining output (without checking internal structure). In our case, cache-object is just another type of the input. And we're checking whether dbal
uses it correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's your take on "Don't mock what you don't own?"
To me, as you pointed out in 2. ArrayCache
is already a mock, but with a few differences:
a) it's easier to use, you don't have to write any expectations, only new ArrayCache()
b) it's closer to reality although it probably won't be used in production (or won't it? you don't know that, could be useful in a long-running process)
c) it is not controlled by you, and a change in it may break your tests; and if that's the case you want to know you don't want to wait until production to notice it breaks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't get this question and how it is related to this topic.
Let me explain one more time:
dbal
has the "contract": it will accept any object that implements Doctrine\Common\Cache\Cache
interface and will use it independenty of it's internal structure to store and retrieve cache entries. In most unit and functional tests Doctrine\Common\Cache\ArrayCache
is used. But it's done just for simplicity. dbal
must be able to work with any cache provider.
I'm mocking my own object, that implements Doctrine\Common\Cache\Cache
interface, because in this particular case it's easier.
Why Doctrine\Common\Cache\ArrayCache
is harder to use? The only way to get the number of fetches from it is to retrieve statistics by the getStats()
method. But consider running this code:
$cache = new ArrayCache();
$cache->fetch('test_key');
self::assertEquals(1, $cache->getStats()[Cache::STATS_MISSES]);
It will fail. The right value for Cache::STATS_MISSES
after one fetch()
is 2. It's because of their internal structure (the first cache miss is due to fetching namespace version - but it doesn't matter here). So instead of my current test code
$layerCache->expects(self::once())->method('fetch')
I will have to write the following code
self::assertEquals(2, $layerCache->getStats()[Cache::STATS_MISSES]);
It's completely unclear why I'm using 2 here, while the whole idea of this patch is to remove the double fetch. And if they remove this "namespace version" feature in the ArrayCache
(as it useless in this case), this test will fail.
TL;DR:
- This test is functional as it treats
dbal
as a whole and doesn't depend on its internal structure - Mock is used because it's easier to calculate number of real fetches. While
ArrayCache
is quite simple, its statistics is distorted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see, you really need a mock since you are adding expectations to it.
essentially ArrayCache is a mock by itself.
It's more like a stub in fact, that's what got me confused. It's easier to use as a stub, but can't be used as a mock since you can't know which methods were called, and instead just get some distorted stats.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does interaction between the units of code (the cache and the underlying statement) need to be tested in a functional way? It's obviously a unit-level requirement, not a functional requirement.
The contract you're talking about is a unit-level contract. A cache is a non-functional unit by definition since it's not meant to change the behavior of the system. It only changes the internal implementation details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does interaction between the units of code
Any interaction is an "interaction between units of code". The question is where these units of code reside. Cache is not part of DBAL.
A cache is a non-functional unit by definition since it's not meant to change the behavior of the system.
Cache DOES change the behavior of the system. The system is working differently depending on what the cache object returns.
I see you don't understand (or don't want to understand) why this test is functional. I can delete it if you want. Or I can copy it to the "tests/Cache" directory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see you don't understand (or don't want to understand)
Please treat others more kindly, especially when they take that much time to answer to you. Assuming either ignorance or malice like this will not help.
As for your definition of behavior, it's debatable. To me, it would be more about direct inputs and outputs / the public interface of the system. Whether the system calls systems external systems to create the output should not be relevant in a functional test IMO.
$result->fetchAllAssociative(); | ||
} | ||
|
||
public function testDifferentQueriesMayBeSavedToOneCacheKey(): void |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test passes even without the fix. Is it really needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It shows how many queries may be stored in one cache key. This logic wasn't explicitly shown in other tests and you weren't aware of it, so I think it's better to leave this test. However, it might be taken out to another PR, as it doesn't directly relates to this one.
src/Cache/CachingResult.php
Outdated
*/ | ||
public function __construct(Result $result, Cache $cache, $cacheKey, $realKey, $lifetime) | ||
public function __construct(Result $result, Cache $cache, $cacheKey, $realKey, $lifetime, $fetchedData) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added unit and functional test to show you how it works now.
Thanks for the test. I understand the logic better now.
Because one cache key may contain more than one result.
Do you know why we need to store more than one result in a cache entry? I.e. why not store each key individually and use the cache key as a sort of namespace for hashing the keys?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@doctrine/team-dbal anyone wants to take over this? I don't really understand how the caching layer is supposed to be used nor I have time/desire to learn.
src/Cache/CachingResult.php
Outdated
*/ | ||
public function __construct(Result $result, Cache $cache, $cacheKey, $realKey, $lifetime) | ||
public function __construct(Result $result, Cache $cache, $cacheKey, $realKey, $lifetime, $fetchedData) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. So it holds all the data that corresponds to the given cache profile in one entry… I don't know what to say.
public function testCacheQueriedOnlyOnceForCacheMiss(): void | ||
{ | ||
$layerCache = $this->createMock(ArrayCache::class); | ||
$layerCache->expects(self::once()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO it should be either a functional test the tests certain functionality (hence not mocking any dependencies) or a unit test that covers a given unit and mocks the dependencies. The purpose of such a hybrid test is unclear. If it fails, it's unclear whether the case is broken or a specific unit is because the test doesn't cover either of the two.
This PR is staying without review and comments for more than a month. What can I do to push it? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really understand how the caching layer is supposed to be used
Neither do I, but maybe I can review this based on how the cache code is working.
src/Cache/CachingResult.php
Outdated
*/ | ||
public function __construct(Result $result, Cache $cache, $cacheKey, $realKey, $lifetime) | ||
public function __construct(Result $result, Cache $cache, $cacheKey, $realKey, $lifetime, $fetchedData) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public function __construct(Result $result, Cache $cache, $cacheKey, $realKey, $lifetime, $fetchedData) | |
public function __construct(Result $result, Cache $cache, $cacheKey, $realKey, $lifetime, array $fetchedData) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@doctrine/team-dbal
You can drop the "retry test" commit, and should squash the 2 remaining commits together. Be aware that retrying a build can be done by closing and reopening the PR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This patch requires additional work on tests. I understand that the author has put some effort in writing them and they pass and they cover the code but the way they mix the unit and functional concerns may become a problem during their maintenance.
Please have the unit and functional concerns separated.
|
||
do { | ||
$row = $cachingResult->fetchAssociative(); | ||
} while ($row !== false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does a test need flow control logic? Do we not know how many rows are expected to be fetched here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this particular test, we know that the 2nd fetchAssociative
should be false
. But that is not what is tested here. This block is a standard way to fetch all rows.
I may use Doctrine\DBAL\Driver\FetchUtils::fetchAllAssociative
instead of this block if you like.
Co-authored-by: Claudio Zizza <[email protected]> Co-authored-by: Grégoire Paris <[email protected]>
b67dbe2
to
3365cbd
Compare
self::assertCount( | ||
4, | ||
$this->sqlLogger->queries, | ||
'Deleting one cache key leads to deleting cache for both queris' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'Deleting one cache key leads to deleting cache for both queris' | |
'Deleting one cache key leads to deleting cache for both queries' |
Summary
It is #4169 for 3.0.x version. BC Break doesn't really matters, as it's already broken by renaming of
CachingResult
class