Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Translate consecutive array of integers to "between x and y" #3272

Closed
aleksvujic opened this issue Sep 13, 2024 · 1 comment
Closed

Translate consecutive array of integers to "between x and y" #3272

aleksvujic opened this issue Sep 13, 2024 · 1 comment

Comments

@aleksvujic
Copy link

Description

We have the following simple database query. Variable ids is received from the client in our real-world application, but I hard-coded it in this example for simplicity. Column id is the primary key of the ticket table.

var ids = new int[] { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };

var tickets = _tickDbContext.Tickets
    .Where(x => ids.Contains(x.Id))
    .OrderBy(x => x.Id)
    .ToList();

This generates the following SQL:

--Microsoft.EntityFrameworkCore.Database.Command: Information: Executed DbCommand (96ms) [Parameters=[@__ids_0={ '1', '2', '3', '4', '5', ... } (DbType = Object)], CommandType='Text', CommandTimeout='30']
SELECT t.id, t.assignee_id, t.description, t.priority, t.reporter_id, t.summary
FROM tick.ticket AS t
WHERE t.id = ANY (@__ids_0)
ORDER BY t.id

Output of EXPLAIN ANALYZE with ANY(...) operator:

Sort  (cost=49.80..49.83 rows=11 width=75) (actual time=0.341..0.365 rows=11 loops=1)
  Sort Key: id
  Sort Method: quicksort  Memory: 27kB
  ->  Bitmap Heap Scan on ticket t  (cost=23.14..49.61 rows=11 width=75) (actual time=0.110..0.185 rows=11 loops=1)
        Recheck Cond: (id = ANY ('{1,2,3,4,5,6,7,8,9,10}'::integer[]))
        Heap Blocks: exact=6
        ->  Bitmap Index Scan on ticket_pkey  (cost=0.00..23.11 rows=11 width=0) (actual time=0.084..0.086 rows=11 loops=1)
              Index Cond: (id = ANY ('{1,2,3,4,5,6,7,8,9,10}'::integer[]))
Planning Time: 0.282 ms
Execution Time: 0.464 ms

I am wondering if the query would be faster if it used BETWEEN 1 AND 10 PostgreSQL syntax.

SELECT t.id, t.assignee_id, t.description, t.priority, t.reporter_id, t.summary
FROM tick.ticket AS t
WHERE t.id BETWEEN 1 AND 10
ORDER BY t.id

Output of EXPLAIN ANALYZE with BETWEEN x AND y operator:

Sort  (cost=31.05..31.08 rows=11 width=75) (actual time=0.298..0.322 rows=11 loops=1)
  Sort Key: id
  Sort Method: quicksort  Memory: 27kB
  ->  Bitmap Heap Scan on ticket t  (cost=4.39..30.86 rows=11 width=75) (actual time=0.103..0.180 rows=11 loops=1)
        Recheck Cond: ((id >= 1) AND (id <= 10))
        Heap Blocks: exact=6
        ->  Bitmap Index Scan on ticket_pkey  (cost=0.00..4.39 rows=11 width=0) (actual time=0.051..0.052 rows=11 loops=1)
              Index Cond: ((id >= 1) AND (id <= 10))
Planning Time: 0.276 ms
Execution Time: 0.414 ms

To me, execution plans look the same and should have equal performance. However, it needs to be checked if this is always the case (large tables, filtering is performed for non-indexed column, ...).

Proposal: Instead of using ANY operator for consecutive integers, maybe BETWEEN x AND y can be used. Of course, a thorough performance study has to be performed before jumping to the implementation of this proposal.

Further technical details

Npgsql version: 8.0.4
PostgreSQL version: 16.4
Operating system: Windows 10 64-bit

@vonzshik vonzshik transferred this issue from npgsql/npgsql Sep 13, 2024
@roji
Copy link
Member

roji commented Sep 13, 2024

See also dotnet/efcore#12634 (comment), which is about transforming x >= y AND x <= z to x BETWEEN y AND z (not directly related).

Some remarks:

  • As you say, there's no clear indication that translating to BETWEEN here would have a significant advantage compared to the current translation.
  • In addition, this specific proposal would have us producing different SQL based on parameter values (since BETWEEN can only be used when the array being parameterized contains consecutive values); this isn't something we do unless there's a good reason to do it, since it means we can't cache the SQL within EF.
  • Finally, this proposal doesn't have anything to do with PostgreSQL specifically (i.e. could be done for any relational database), so if done, it would be done in EF itself rather than in the PG provider.

So while it's a nice idea, we'd need some more analysis/motivation that shows it's worth pursuing this. I'll go ahead and close the issue for now - if you believe it's worth doing and have more motivation for it, please reopen in the EF repo and we'll revisit.

@roji roji closed this as not planned Won't fix, can't repro, duplicate, stale Sep 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants