-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pagination CountWalker should run a COUNT(*) query instead of COUNT(tbl.id) when HINT_DISTINCT is false #11552
Comments
Do you have an explanation as to why the query plan differs from one query to the other, and are you sure that explanation will apply regardless of the database in use? I'm not sure we should introduce more complexity, especially if it is not possible to measure a substantial speed difference. But I agree that "Seq Scan" does not sound good. |
I've done some research just now and the short answer appears to be that My other observations:
I'm not sure but it's very plausible. For what it's worth, chatgpt is under impression (and bullet-points some examples) that most of the databases benefit from being told to do
After going down the rabbit hole here, I'm also less inclined now to make this change. The only standing argument for that change is what I said at the top: For this reason I'm fine to close this ticket now without any code change, unless you'd like (me) to make that change anyway due to the reason I explained above. |
Thanks for the very detailed explanation. From what you are telling, I'm starting to think that to observe a significant performance difference, we would have to add a where clause involving columns other than the primary key (I'm leaving aside the case of composite primary keys for the sake of simplicity). Is that what you observe ? |
Interesting. You're absolutely right. When I add a simple where-condition on a column that is indexed, the index-only scan is never used in the case of
The So yeah, still down to you if you wish to carry on with this code change. I'm somewhat more inclined to it now. Edit: I just tried different where-conditions, and as one could imagine, if the where-condition filters out significant amount of the table, the speed difference is significant. E.g. 0.1s vs 0.001s. |
Great! Using Git, can you find out why we are not using |
Git history shows that the code does what it does since the very beginning (2012) and it wasn't attempted to change it to conditionally doing
I made that change and no tests failed. In particular, I checked this code change in my webapp and it's effective (i.e. |
Then please send a PR against the next minor branch, I think I'm OK with this change 👍 |
2.x is only open for bugfixes. This does not look like a bug at all. The next minor branch is 3.3.x (since the last tag is 3.2.1). |
…se (doctrine#11552) This change makes CountWalker use COUNT(*) instead of COUNT(tbl.id), when the user declared that their query does not need to use (SELECT) DISTINCT, which is commonly the case when there are no JOINs in the query, or when the JOINs are only *ToOne. Research showed that COUNT(*) allows databases to use index(-only) scans more eagerly from any of the indexed columns, especially when the query is using a WHERE-condition that filters on an indexed column.
…nt-star-query-sometimes Make CountWalker use COUNT(*) when $distinct is explicitly set to false (#11552)
Feature Request
Summary
I'd like to propose that the
Doctrine\ORM\Tools\Pagination\CountWalker
should create a count query that selectsCOUNT(*)
instead ofCOUNT(tbl.id)
when a query'sHINT_DISTINCT
is set/declaredfalse
. Both "counts" result in the same number being produced, howeverCOUNT(*)
allows some databases (e.g. Postgres) to finish the query faster. Please see the following query plans.Notice how in the case of
COUNT(*)
, Postgres is counting using an "Index Only Scan". The speed difference isn't substantial (at least for the number of rows I tested with), but it's also quite cheap to obtain (by just making the code useCOUNT(*)
).Thank you.
The text was updated successfully, but these errors were encountered: