You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We have the following simple database query. Variable ids is received from the client in our real-world application, but I hard-coded it in this example for simplicity. Column id is the primary key of the ticket table.
--Microsoft.EntityFrameworkCore.Database.Command: Information: Executed DbCommand (96ms) [Parameters=[@__ids_0={ '1', '2', '3', '4', '5', ... } (DbType = Object)], CommandType='Text', CommandTimeout='30']SELECTt.id, t.assignee_id, t.description, t.priority, t.reporter_id, t.summaryFROMtick.ticketAS t
WHEREt.id= ANY (@__ids_0)
ORDER BYt.id
Output of EXPLAIN ANALYZE with ANY(...) operator:
Sort (cost=49.80..49.83 rows=11 width=75) (actual time=0.341..0.365 rows=11 loops=1)
Sort Key: id
Sort Method: quicksort Memory: 27kB
-> Bitmap Heap Scan on ticket t (cost=23.14..49.61 rows=11 width=75) (actual time=0.110..0.185 rows=11 loops=1)
Recheck Cond: (id = ANY ('{1,2,3,4,5,6,7,8,9,10}'::integer[]))
Heap Blocks: exact=6
-> Bitmap Index Scan on ticket_pkey (cost=0.00..23.11 rows=11 width=0) (actual time=0.084..0.086 rows=11 loops=1)
Index Cond: (id = ANY ('{1,2,3,4,5,6,7,8,9,10}'::integer[]))
Planning Time: 0.282 ms
Execution Time: 0.464 ms
I am wondering if the query would be faster if it used BETWEEN 1 AND 10 PostgreSQL syntax.
SELECTt.id, t.assignee_id, t.description, t.priority, t.reporter_id, t.summaryFROMtick.ticketAS t
WHEREt.id BETWEEN 1AND10ORDER BYt.id
Output of EXPLAIN ANALYZE with BETWEEN x AND y operator:
Sort (cost=31.05..31.08 rows=11 width=75) (actual time=0.298..0.322 rows=11 loops=1)
Sort Key: id
Sort Method: quicksort Memory: 27kB
-> Bitmap Heap Scan on ticket t (cost=4.39..30.86 rows=11 width=75) (actual time=0.103..0.180 rows=11 loops=1)
Recheck Cond: ((id >= 1) AND (id <= 10))
Heap Blocks: exact=6
-> Bitmap Index Scan on ticket_pkey (cost=0.00..4.39 rows=11 width=0) (actual time=0.051..0.052 rows=11 loops=1)
Index Cond: ((id >= 1) AND (id <= 10))
Planning Time: 0.276 ms
Execution Time: 0.414 ms
To me, execution plans look the same and should have equal performance. However, it needs to be checked if this is always the case (large tables, filtering is performed for non-indexed column, ...).
Proposal: Instead of using ANY operator for consecutive integers, maybe BETWEEN x AND y can be used. Of course, a thorough performance study has to be performed before jumping to the implementation of this proposal.
See also dotnet/efcore#12634 (comment), which is about transforming x >= y AND x <= z to x BETWEEN y AND z (not directly related).
Some remarks:
As you say, there's no clear indication that translating to BETWEEN here would have a significant advantage compared to the current translation.
In addition, this specific proposal would have us producing different SQL based on parameter values (since BETWEEN can only be used when the array being parameterized contains consecutive values); this isn't something we do unless there's a good reason to do it, since it means we can't cache the SQL within EF.
Finally, this proposal doesn't have anything to do with PostgreSQL specifically (i.e. could be done for any relational database), so if done, it would be done in EF itself rather than in the PG provider.
So while it's a nice idea, we'd need some more analysis/motivation that shows it's worth pursuing this. I'll go ahead and close the issue for now - if you believe it's worth doing and have more motivation for it, please reopen in the EF repo and we'll revisit.
Description
We have the following simple database query. Variable
ids
is received from the client in our real-world application, but I hard-coded it in this example for simplicity. Columnid
is the primary key of theticket
table.This generates the following SQL:
Output of
EXPLAIN ANALYZE
withANY(...)
operator:I am wondering if the query would be faster if it used
BETWEEN 1 AND 10
PostgreSQL syntax.Output of
EXPLAIN ANALYZE
withBETWEEN x AND y
operator:To me, execution plans look the same and should have equal performance. However, it needs to be checked if this is always the case (large tables, filtering is performed for non-indexed column, ...).
Proposal: Instead of using
ANY
operator for consecutive integers, maybeBETWEEN x AND y
can be used. Of course, a thorough performance study has to be performed before jumping to the implementation of this proposal.Further technical details
Npgsql version: 8.0.4
PostgreSQL version: 16.4
Operating system: Windows 10 64-bit
The text was updated successfully, but these errors were encountered: