Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple non-related dbset counts without filter executes as separate queries #27728

Open
joakimriedel opened this issue Mar 30, 2022 · 11 comments

Comments

@joakimriedel
Copy link
Contributor

Including multiple non-related dbset counts in one query will perform multiple SQL queries if the dbsets are not filtered by a dummy where clause. AsSingleQuery does not help. I would prefer not having to remember to add the .Where(x => true) part to prevent multiple queries.

See full gist here.

Include your code

                // info: Microsoft.EntityFrameworkCore.Database.Command[20101]
                //     Executed DbCommand (4ms) [Parameters=[], CommandType='Text', CommandTimeout='30']
                //     SELECT COUNT(*)
                //     FROM [Users] AS [u]
                // info: Microsoft.EntityFrameworkCore.Database.Command[20101]
                //     Executed DbCommand (1ms) [Parameters=[], CommandType='Text', CommandTimeout='30']
                //     SELECT COUNT(*)
                //     FROM [Animals] AS [a]
                // info: Microsoft.EntityFrameworkCore.Database.Command[20101]
                //     Executed DbCommand (8ms) [Parameters=[], CommandType='Text', CommandTimeout='30']
                //     SELECT TOP(1) 1
                //     FROM [Users] AS [u]

                var bad = await context.Users.Select(u => new {
                    userCount = context.Users.Count(),
                    animalCount = context.Animals.Count()
                }).FirstAsync();

                // info: Microsoft.EntityFrameworkCore.Database.Command[20101]
                //       Executed DbCommand (13ms) [Parameters=[], CommandType='Text', CommandTimeout='30']
                //       SELECT TOP(1) (
                //           SELECT COUNT(*)
                //           FROM [Users] AS [u0]) AS [userCount], (
                //           SELECT COUNT(*)
                //           FROM [Animals] AS [a]) AS [animalCount]
                //       FROM [Users] AS [u]

                var good = await context.Users.Select(u => new {
                    userCount = context.Users.Where(x => true).Count(),
                    animalCount = context.Animals.Where(x => true).Count()
                }).FirstAsync();

Include provider and version information

EF Core version: 6.0.3
Database provider: Microsoft.EntityFrameworkCore.SqlServer
Target framework: .NET 6.0

@smitpatel
Copy link
Contributor

context.Users.Count() can be evaluated in advance since it is not correlated with range variable in the query so rather than letting server compute count for each row, we compute it and send the computed value to server (as a parameter). Since Where is a queryable operator, regardless of the lambda inside it, it blocked the evaluation for us and query being combined together in single one.

@joakimriedel
Copy link
Contributor Author

@smitpatel for me as a user this is a very confusing default, since it fires a lot of extra queries to the sql server. Just because it can be evaluated in advance, I do not agree it is something ef core should do. The extra queries add latency and hurts performance. When would it ever be a good thing to evaluate something like this in advance?

In my real world example I found this when I tried to combined a query with 20+ counts, where only a few of them are filtered. Expected a single query to the server and instead got this crazy burst of requests.

I would expect everything rooted on the same context to be inlined. If I wanted to do extra roundtrip to the sql server I would have assigned a variable separately and included in the query such as

var userCount = await context.Users.CountAsync();
var bad = await context.Users.Select(u => new {
                    userCount = userCount,
                    animalCount = context.Animals.Count()
                }).FirstAsync();

Would it be possible to reconsider this for 7.0?

@smitpatel
Copy link
Contributor

Can you share actual query?
Looking at above query, it doesn't really make much sense to run it that way. You are counting number of users and not selecting any of data from user but just picking first (that means making sure there is at least 1 user) and then just putting count in that one result. Running FirstAsync query against the server is not useful in this case.

@joakimriedel
Copy link
Contributor Author

joakimriedel commented Mar 30, 2022

@smitpatel It doesn't matter what table I start off with, I just pick one that I know will contain at least one row. The real query looks like this

            var counts = await _context.Users.Select(u => new
            {
                offerCount = _context.Offers.Where(x => true).Count(),
                notificationCount = _context.Notifications.Where(x => true).Count(),
                messageCount = _context.Messages.Where(x => true).Count(),
                onlineCount = _context.HubConnections
                    .Where(HubConnection.IsOnline)
                    .Select(hc => hc.UserId)
                    .Distinct()
                    .Count(),
                contactCount = _context.Contacts.Where(x => true).Count(),
                // .. +15 other aggregations
            }).AsSingleQuery().FirstAsync();

Purpose is just to get some totals to show in admin pages for an application. The FirstAsync is just to not get more than one row of data, since each row will contain exactly the same information.

Even better would be to have some kind of possibility to do one-off queries without a root to project as a single anonymous or dto object, so that I could write something like

var counts = await _context.UnrootedQuery(q => new {
    rowCount = _context.SomeDbSet.Count()
});

but that is actually a bit out of scope for this issue, I just would like ef core to ENSURE that all aggregations will be inlined in a single sql server query. (without the ugly .Where(x => true) hack because if I forget it anywhere that aggregation will be run out-of-query and add a roundtrip, which is what I hoped the AsSingleQuery would prevent in this case)

@ajcvickers
Copy link
Contributor

Query batching (Tracked by #10879) would allow this to be done using LINQ in a relatively clear form with a single roundtrip. Until then, or if a specific form of SQL is required, the query could be written using FromSql.

@joakimriedel Is the issue here that you want a single round trip, or is it important that this be a single query?

@joakimriedel
Copy link
Contributor Author

@ajcvickers I haven't done any benchmarking, but I would assume the latency from roundtrips is worse than the additional query time from running 20+ simple scalar queries instead of one larger combined query where the columns do not relate to each other. Perhaps you have some better estimates on this for the work done on #10879 ?

Still, I think the current implementation where context.Users.Select(u => new { c = context.Animals.Count() }) generates a totally different query, with additional roundtrip, than context.Users.Select(u => new { c = context.Animals.Where(x => true).Count() }) is a bug, not a feature. I could have a perfect query plan, optimized and all, and then decide to skip the Where for some reason, and suddenly EF Core would add an extra roundtrip and totally change the generated query where the value would be sent as a parameter instead. It is illogical. Is this a complex and expensive code change for you?

@roji
Copy link
Member

roji commented Apr 5, 2022

Batching two SELECTs in a single roundtrip is likely to yield very similar performance to having the two selects as subqueries in a single SELECT query, i.e.:

SELECT COUNT(*) FROM [Users];
SELECT COUNT(*) FROM [Animals] AS [a];

... is very likely roughly equivalent to:

SELECT TOP(1) (
    SELECT COUNT(*)
    FROM [Users] AS [u0]) AS [userCount], (
    SELECT COUNT(*)
    FROM [Animals] AS [a]) AS [animalCount]
FROM [Users] AS [u]

(this is quite trivial to check). Batching two selects also has the advantage of possibly reusing query plans when the same SELECT COUNT(*) ... elsewhere (IIRC this isn't the case with SQL Server semicolon-based batching, but would be the case with DbBatch or on PostgreSQL).

So I do believe #10879 is the right general solution to this. I agree it's odd for us to do roundtrips only when Where is omitted, but whether that's worth fixing is an implementation complexity question (so @smitpatel).

@ziaulhasanhamim
Copy link

This a bit out of scope here. But it looks confusing for me.

var good = await context.Users
    .Select(u => new 
    { 
        userCount = context.Users.Where(x => true).Count(), 
        animalCount = context.Animals.Where(x => true).Count() 
    })
    .FirstAsync();

Because we are using Select on Users DbSet but not using anything from that. It's tricky. So isn't it better to have a Query or Select method on context it self.

var good = await context.Query(ctx => new 
{ 
    userCount = ctx.Users.Where(x => true).Count(), 
    animalCount = ctx.Animals.Where(x => true).Count() 
 })
    .ExecuteAsync();

This is more cleaner and easier to understand.

@roji
Copy link
Member

roji commented Sep 9, 2022

We've discussed something like context.Query() in other contexts (e.g. for top-level aggregate methods, see #28264).

@dotnet/efteam do we already have an issue tracking this?

@ajcvickers ajcvickers removed this from the Backlog milestone Sep 9, 2022
@ajcvickers
Copy link
Contributor

@divega Do you know if you ever created an issue for having a general-purpose context.Query method that could execute any query? I'm sure there must be/have been an issue, since it was something we discussed a lot, but I can't find one!

@ajcvickers
Copy link
Contributor

See #29484.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants