Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use single SQL query instead of split queries #12098

Closed
benmccallum opened this issue May 22, 2018 · 53 comments · Fixed by #16340
Closed

Use single SQL query instead of split queries #12098

benmccallum opened this issue May 22, 2018 · 53 comments · Fixed by #16340
Assignees
Labels
closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. customer-reported type-enhancement
Milestone

Comments

@benmccallum
Copy link
Contributor

benmccallum commented May 22, 2018

I've been investigation an issue when using AutoMapper's Queryable projection extensions on top of EF Core 2.1-rc1-final. Full details including troubleshooting logs with the SQL statements are at the issue (AutoMapper/AutoMapper#2639) and EfAutomapperTroubleshooting.zip is a repro.

Basically I've got a Booking which has multiple BookingItems. I have a DTO for each type and have configured them to be used by AutoMapper. When I execute the query on EF6 the JOIN optimization occurs (it's even smart enough that I don't need to specific the .Include(b => b.BookingItems). On EF Core however, two DB queries fire (one for Booking and the other for BookingItems). Explicitly specifying the Include potentially would work, but it's ignored by the "IgnoredIncludes" cleverness, so I can't force that behaviour even if I wanted to.

So there's two issues running on EF Core I'm trying to get past:

  1. The JOIN optimization isn't "figured out" automatically.
  2. Trying to force the JOIN with .Include(b => b.BookingItems) doesn't work either as EF Core is being clever but not clever enough. I can't see a way to turn this off or opt out per execution/context.

Obviously understanding that EF Core is not EF 6, but just wondering if this would ever be supported as it's a common scenario these days. The AutoMapper projection extensions are awesome. I'll try do this without AutoMapper for now and see if that can be my workaround for now.

Steps to reproduce

See linked issue logged at AutoMapper and the downloadable .sln zip in my later comment.

Further technical details

EF Core version: 2.1.0-rc1-final
Database Provider: Microsoft.EntityFrameworkCore.SqlServer
Operating system: Windows 10 Pro, Version: 1803, OS build: 17134.48
IDE: Visual Studio 2017 15.7.1

@benmccallum
Copy link
Contributor Author

I also tried it with a straight up LINQ query and hand-written projections and there is the same 2 queries generated. This isn't really an option for me though even if it did work, as I'm building a GraphQL service that passing required field names down from client and then I use the .ProjectTo<T>(null, requiredFieldNames) overload to basically have AutoMapper generate the perfect SQL query for me. It'll be magical if it can work end-to-end and produce the JOIN I'd like.

bookingDtos = db.Bookings
    .AsNoTracking()
    .Select(b => new BookingDto
    {
        Id = b.Id,
        Name = b.Name,
        BookingItems = b.BookingItems.Select(bi => new BookingItemDto
        {
            Id = bi.Id,
            Name = bi.Name
        }).ToList()
    })
    .ToArray();
info: Microsoft.EntityFrameworkCore.Database.Command[20101]
      Executed DbCommand (4ms) [Parameters=[], CommandType='Text', CommandTimeout='30']
      SELECT [b].[Id], [b].[Name]
      FROM [Booking] AS [b]
      ORDER BY [b].[Id]
info: Microsoft.EntityFrameworkCore.Database.Command[20101]
      Executed DbCommand (7ms) [Parameters=[], CommandType='Text', CommandTimeout='30']
      SELECT [t].[Id], [b.BookingItems].[Id] AS [Id0], [b.BookingItems].[Name], [b.BookingItems].[BookingId]
      FROM [BookingItem] AS [b.BookingItems]
      INNER JOIN (
          SELECT [b0].[Id]
          FROM [Booking] AS [b0]
      ) AS [t] ON [b.BookingItems].[BookingId] = [t].[Id]
      ORDER BY [t].[Id]

@smitpatel
Copy link
Contributor

Its not N+1, its just 2 queries. 2 queries instead of 1 is by design. What is your specific issue with that?

@benmccallum benmccallum changed the title N+1 with projections, even with Include 2 queries with one:many projections, even with Include May 22, 2018
@benmccallum
Copy link
Contributor Author

Sorry, you're right. So just to clarify, you are saying it's by design to do 2 queries when it can be achieved with one using a JOIN? If that's the case then no worries, it's just surprising.

@smitpatel
Copy link
Contributor

@benmccallum - Yes it is by design. While it can be achieved with 1 query, it is not less likely to be performant. Single query for collection (include or join) causes dreaded cartesian product.
To give an example, let's assume you have 50 bookings, and each booking has 5 bookingitems. With 1 query, it will do a join between both and cause 250 rows returns from server. And 50 bookings' data will be repeated for 5 times exactly. If you add another collection include (assume 10 bookingdetails per bookings), now the total size of data is 2500 rows. With a lot of duplicated data present.
With split queries, we send 1 query for each collection include. So you get 3 queries with 50/250/500 results each with no duplication of data. Reference includes are still done using join in single query. Of course there is trade off based on cardinality which would make multiple queries efficient or single query efficient. Due to join explosion in case of multiple collection joins, we decided to adopt 2 query approach in EF Core.

@benmccallum
Copy link
Contributor Author

Thanks heaps for taking the time to explain this @smitpatel. That makes total sense. Closing this out.

@ghost
Copy link

ghost commented Jun 28, 2018

there are 2 independent queries which means the second query does nothing with the first query, so still I don't understand how running 2 queries prevents data duplication!

@smitpatel
Copy link
Contributor

@elyasnew - They are not independent queries. 2nd query fetches the related data (for collection include) for the records in the first query. If you want all data in one query then you need to a join which will duplicate data in first query for every related record in 2nd query.

@brainoverflow98
Copy link

brainoverflow98 commented Nov 10, 2018

@benmccallum - Yes it is by design. While it can be achieved with 1 query, it is not less likely to be performant. Single query for collection (include or join) causes dreaded cartesian product.
To give an example, let's assume you have 50 bookings, and each booking has 5 bookingitems. With 1 query, it will do a join between both and cause 250 rows returns from server. And 50 bookings' data will be repeated for 5 times exactly. If you add another collection include (assume 10 bookingdetails per bookings), now the total size of data is 2500 rows. With a lot of duplicated data present.
With split queries, we send 1 query for each collection include. So you get 3 queries with 50/250/500 results each with no duplication of data. Reference includes are still done using join in single query. Of course there is trade off based on cardinality which would make multiple queries efficient or single query efficient. Due to join explosion in case of multiple collection joins, we decided to adopt 2 query approach in EF Core.

Thanks for the clarification. I think it is really important to splitting the queries in some cases where you have too many duplicate data but can you make it optional in case some people doesn't want to use it that way ? Something like: _db.Blogs.Include(b=>b.Posts).ThenInclude(p=>p.Comments).Where(b=>b.Id == 1).SingleQuery().ToList(); That would be usefull especially if trying to return a single blog. We do not need to split queries for a single blog. I know some scenarios where duplicate data is more than the size of actual data but for the most part people might want to use single query especially when the fact that round trip time and processing the statements makes up the most of the query time taken into account.

@dpsenner
Copy link

@smitpatel I would like to reopen the discussion about this issue.

I just wrote a single linq query and expected the database to spit out a single sql query. See the referenced issue for more information. But I was astonished that entity framework core did something totally different and even sorted out to join tuples in memory. This was unexpected.

This said I do believe that when a developer writes one linq query he expresses the intention to run that linq query as a single sql query and writes it such that it can be run as a single sql query against the database. If the developer did not want to offload that operation to the database, he would write two linq queries and join the results in the application server.

I do believe that the developer should have a choice. It is the developers responsibility to write linq quieries that implement the functional requirements in a sane way. The decision to generate two queries makes it impossible to implement streaming techniques of several hundredthousands of records with their related records while it silently increases the memory requirements of the machine that runs the application server.

Can I somehow persuade the entity framework core team to rethink the original decision? I do not think that entity framework core team has to protect developers from writing bad linq queries. For instance, I would not mind if entity framework core threw an exception if it is unable generate a sensible sql query from a bad linq query.

@roji
Copy link
Member

roji commented Nov 20, 2018

@smitpatel see related discussion in npgsql/efcore.pg#712

@divega divega added this to the Discussions milestone Nov 20, 2018
@divega
Copy link
Contributor

divega commented Nov 21, 2018

@dpsenner @roji Thanks, this is indeed an interesting discussion. I would like to point out that although EF Core creates a separate query whenever you project or include collections, for each of these cases there is generally an equivalent query (that is, a query that returns the exact same rows) that is translated by EF Core to a single query with JOINs. E.g.:

// this will translate as three SQL queries
db.Blogs.Include(b=>b.Posts).ThenInclude(p=>p.Comments).Where(b=>b.Id == 1).ToList();
// this will translate as a single SQL query
db.Comments.Include(b=>b.Post).ThenInclude(b=>b.Blog).Where(b=>b.Post.Blog.BlogId == 1).ToList();

The difference between the two queries is that you are traversing the navigation properties in the opposite direction.

  • For the first query, we interpret that your main intent is to get a Blog that has a BlogId equal to 1. Your secondary intent is to get related information about the posts and the comments. Based on this, we translate to one query to get the blog, and two additional queries to get the related posts and comments. In this case, your query is traversing the navigation properties from the one end toward the many end. Had we tried to translate to a single query, we would have caused the number of times you get the data in the rows for blog to explode.

  • For the second query, we interpret that your primary intent is to get the comments for a specific blog, and we your secondary intent is to get the data for the related posts and blogs. Our hypothesis is that you already expect that the cardinality of the results will be the same as all the number of comments for that blog in the database. You are traversing navigations from the many ends towards the one ends, each time narrowing the cardinality of the additional data you are asking for. So when we translate this to a single query, the number of rows in the results is going to be the same as the number of rows if you were querying for the comments alone. However, we are duplicating the data in all the columns of post and blog.

You are welcome to think that the distinction between the two queries (the direction in which you are traversing relationships) is arbitrary, and we could have used the same translation. However I would argue that we are simply using the order in which navigations are being traversed as a hint on how we translate the query. A useful way to think about how EF Core decides whether to translate to one or multiple SQL queries is that it is a heuristic that we apply to avoid the explosion of the cardinality of the root of the query that you are consuming. In a way, it is an attempt to achieve least astonishment, but focused on that specific aspect of the query.

Also regarding the assertion that this doesn't allow streaming: for the first query, we actually still stream the results of Blog on the same query. We buffer the results from Posts and Comments, but if your database (and connection string) supports multiple active result sets (MARS), we only buffer them incrementally. E.g. we will buffer together all the posts for the first blog, then when you skip to the next blog, we will buffer all the posts for the second blog, etc.

In any case, we are reopening this issue to discuss if there are other ways we can do this in the future. Obviously, it becomes much harder or impossible for you to write a query that gets resolved as single query if the reverse navigation properties are absent in the model, or if one of the leaf entities is an owned entity.

@divega divega reopened this Nov 21, 2018
@divega divega removed this from the Discussions milestone Nov 21, 2018
@divega divega changed the title 2 queries with one:many projections, even with Include Two queries with one:many projections, even with Include Nov 21, 2018
@roji
Copy link
Member

roji commented Nov 21, 2018

@divega that's very useful information I didn't know - thanks! I'd suggest maybe adding this to the docs as I'm sure not people are aware of the distinction and the two SQL generation options. Maybe an "advanced query techniques" page would be appropriate for this kind of thing...

At least IMO that's quite a satisfactory state of affairs... Even if the "default" mode (traversal from one to many) only buffers the one end, it's understandable that in some cases users may wish to avoid it, even at the expense of a cartesian product explosion. You seem to already provide a way to express it, and requiring that reverse navigation be present doesn't seem like a big problem to me. Owned entities may present a more serious problem though.

One last somewhat-related note... Npgsql indeed does not support MARS - you can only have one reader open on a connection at a given point in time. However, PostgreSQL does support SQL server-side cursors, where you start a query with DECLARE and progressively fetch results with FETCH, allowing multiple query resultsets to be traversed simultaneously. At some point I considered implementing MARS support in Npgsql over cursors (via lower-level support in the protocol) but ended up deciding against it for several reasons - and in any case the option is there for users to do so themselves with DECLARE/OPEN. Treating this as a higher-level SQL-based MARS, in theory EF Core could use that to prevent buffering even when MARS isn't supported - I think it's even part of the SQL standard and implemented across databases. That may be a bit optimistic though, IIRC MySQL only allows cursors in stored procedures, and PostgreSQL itself requires cursors to be in a transaction, which also complicated things.

Just dumping a bit more context/options for us to think about...

@divega
Copy link
Contributor

divega commented Nov 21, 2018

@roji agreed. Other two “crazy ideas” that come to mind re streaming and MARS:

  1. Someday we could just change how DbContext works to open additional connections when MARS is required but not supported and we are only doing reads.
  2. Assuming you implement connection multiplexing, you could to something homologous thing at the DbConnection vs. physical connection. I.e. they don’t need to be 1:1 anymore.

@dpsenner
Copy link

dpsenner commented Nov 21, 2018

I'm glad to see that ef core team is open to discuss this issue.

@divega If a dbcontext begins to spawn connections I raise serious concerns regarding scalability.

As for the two queries mentioned above, I have totally different expectations to how this should behave.

// this will translate as three SQL queries
db.Blogs.Include(b=>b.Posts).ThenInclude(p=>p.Comments).Where(b=>b.Id == 1).ToList();
// this will translate as a single SQL query
db.Comments.Include(b=>b.Post).ThenInclude(b=>b.Blog).Where(b=>b.Post.Blog.BlogId == 1).ToList();

I expect both linq queries to map to one sql query each. If I did want to run this as separate queries I could implement it in the application in this way:

// first query
var blogs = await dbContext.Blogs.Include(b => b.Posts).ToListAsync();
foreach (var blogPost in blogs.SelectMany(b => b.Posts)) {
  // n queries
  var blogPostComments = await dbContext.Comments.Where(c => c.PostId == blogPost.Id).toListAsync();
  // do whatever needs to be done
}

or even do fancy stuff with batching:

// first query
var blogs = await .ToListAsync();
foreach (var blogs in dbContext.Blogs.InBatchesOf(100)) {
  // fetch all blog posts for the set of blogs in one query and aggregate them into a lookup
  var blogPostsLookup = dbContext.Posts.Where(p => blogs.Select(b => b.Id).ToArray().Contains(p.BlogId)).ToLookup(t => t.BlogId, t => t);
  // then process all blog posts for each blog
  foreach (var blog in blogs) {
    foreach (var blogPost in blogPostsLookup[blog.Id]) {
      // fancy stuff
    }
  }
}

This said, linq to sql should map linq to sql. Not do fancy logic.

@benmccallum
Copy link
Contributor Author

Thanks for the tips about reversing the query traversal to make a single query.

In my case, I'm translating graphql queries down to db queries via AutoMapper's .ProjectTo<TDto>(fieldsToMap), so the query has to be driven top-down meaning I'll always get the query splitting, even if I'm only requesting one top-level record (blog in examples above).

For my situation to be optimized, I'd need to be able to tell EF that I'm only retrieving one blog somehow, or is it smart enough to recognise the query passes a primary key and/or TOP 1? If not, feature request!

@roji
Copy link
Member

roji commented Nov 21, 2018

@divega

Someday we could just change how DbContext works to open additional connections when MARS is required but not supported and we are only doing reads.

Definitely an option! Although we'd have to watch out for transactions, which also affect reads...

Assuming you implement connection multiplexing, you could to something homologous thing at the DbConnection vs. physical connection. I.e. they don’t need to be 1:1 anymore.

Yeah - that resembles your 1st suggestion in a way: since every query can be dispatched to an arbitrary physical connection, there's no more constraint on having two queries on the same connection. The same caveat with transactions applies though.

But multiplexing introduces a problem the other way - if you have multiple queries queued up for different producers on the same connection, then if the 2nd producer wants to process results but the 1st one still hasn't, then we have a problem... We can either delay the 2nd one (seems unacceptable) or buffer the 1st resultset in memory - in which we've killed streaming... I still have to think about multiplexing a bit more, but it definitely creates some issues in this area.

@roji
Copy link
Member

roji commented Nov 21, 2018

@dpsenner, here's my take on your argument:

I expect both linq queries to map to one sql query each. If I did want to run this as separate queries I could implement it in the application in this way:

The problem, again, is that executing this kind of query in a single SQL query can be extremely bad for performance, because of the reasons that have already been explained above: duplication of principal entity fields, leading to tons of unnecessary data being transferred and thrown away. This simply doesn't seem like the right option for the default usecase in an average application.

To summarize, when deciding what EF Core should do by default, the decision seems to be between:

  1. A single SQL query with joins
    a. Pro: totally streaming in all scenarios and providers
    b. Possibly a pro: maintains the one-SQL-per-one-LINQ query idea. Up to now I'm not really convinced why this should be a principle of EF Core or what actually value it brings to users
    b. Con: potentially huge amounts of needless data transferred because of duplication of principal entity fields
  2. A query per include
    a. Pro: no data duplication
    b. Con: only partially streamable (i.e. non-MARS provider).

To me, the decision between the two options isn't trivial. However, since the streaming issue isn't universal (MARS is supported in some providers and could be supported in others), and I don't really subscribe to the idea that one LINQ should provide one SQL, the 2nd really seems superior.

One last point is whether the multiple queries produced in the 2nd option are batched. If they're not, we're adding more database network roundtrips, which isn't good - but this could be fixed in EF Core (#10878 tracks this).

Finally, the idea suggested by @divega above could indeed mitigate the streaming concern when provider MARS isn't available, but I agree with your point that it hogs more connections and so could be problematic.

@dpsenner
Copy link

I'm glad that you too share the opinion that entity framework core should not be allowed to spawn connections.

As said, for me entity framework is an object relational mapping framework. As such it is responsible to map linq to a sql command and map the relational results to an object model. If a linq expression is badly formulated, it results in a NotImplementedException or NotSupportedException. This behavior is comparable to a badly formulated sql query. Nobody would expect the database engine to autocomplete missing aggregate functions in combination with group by. When faced with an exception, the developer will figure out a way to formulate a linq expression that gets the data from the database and mapped into the application server. I know very well that this is a very strict viewpoint that not many people share. But in this viewpoint entity framework core does no longer have to implement guesswork on linq query trees like the mentioned ordering of property traversals in the comment of @divega. Further, developers get the benefit of being able to implement sane and stable application servers.

Of course some developers will write bad queries. So they will also be able to write code that crashes. Nobody expects from the compiler to protect them from doing so.

@roji
Copy link
Member

roji commented Nov 21, 2018

I personally don't really agree with much of what you wrote (my views are my own):

As said, for me entity framework is an object relational mapping framework. As such it is responsible to map linq to a sql command and map the relational results to an object model.

I don't think there's anything in the idea of an ORM that requires a single LINQ query to be translated to a single SQL query - ORMs map objects to relational databases, and can do so in various ways. LINQ specifically is a pretty .NET-specific concept, so it seems odd to say that the concept of ORMs somehow necessarily implies something about how LINQ should be translated.

If a linq expression is badly formulated, it results in a NotImplementedException or NotSupportedException. This behavior is comparable to a badly formulated sql query.

I don't really think the analogy holds well - there's a very big difference between an incorrect expression or query, and the question of how to translate a correct one.

I think it's more appropriate to compare to compilers (or more accurately transpilers), which translate from one language to another. These tools routinely perform optimizations and other manipulations to make sure your code runs in the best way possible - that's one of their main job descriptions. Following your logic we may say that method inlining is a bad idea, because a C# method must always be translated to an IL and assembly method...

Of course some developers will write bad queries.

Agreed, but we're not talking about a bad query, but rather about what happens with the default, standard EF Core join construct. At the end of the day, if the proposal is that EF Core translate the standard join syntax (Include()) via an SQL construct that results in extremely bad performance for the common scenario, then I think everyone will end up losing.

@dpsenner
Copy link

dpsenner commented Nov 21, 2018

everyone will end up losing

Of course this should not happen.

SQL construct that results in extremely bad performance

Bad performance can also happen when a queried database column is missing an index or if the index has bad selectivity.

what happens with the default, standard EF Core join construct

Entity framework core should behave consistently and not do unexpected things. To me it was and is still unexpected that one linq query runs as two sql queries because I, being a human, would write it as one sql query. Including hundreds of columns from different tables and joining all together will always end up with huge result sets. I would not expect entity framework core to do any magic on this. But is it really entity framework core's responsibility to deal with these broken badly designed queries by loading everything into memory and doing the operation in the application server?

@marekvse
Copy link

No, I did not in this case as I just found out that it issues so many queries, but going across processes to the database more often than necessary, compiling of the query by the database... I'd definitely expect that taking more time then compiling one query and returning let's say 40 bytes of additional data per row.
(Anyone on why my code is not being presented colorized and multiline here?)

@marekvse
Copy link

marekvse commented Apr 20, 2019

OK, so I've spent a couple of hours measuring the difference....
And sure enough, getting one resultset and then - in my case - manually populating the class hierarchy (something EF6 used to do by itself without a problem) is muuuuuuch faster than calling the database 9 times instead. The average time increase on 9 vs 1 call to get the same data is 337%.!! And if I ask my function to populate all my permission roles for me (I have 7 right now in there), the difference goes up to 499%, which is of course no surprise since doing it the same thing with 50!!! queries in EF Core versus how EF6 did it with just one query to get the same result is bound to have an impact. Not a good one either.
And I'm not even mentioning that doing the query backwards to be able to measure this is really not easy as it is a very unnatural way to design a query. Doing this and debugging it regularly for other queries will take lots of additional time and lots more code... Or could it be done easier? Here's my testing code from my real app:
(Sorry if it comes out not formatted again. I'm happy to fix it if anyone can point me to what I'm doing wrong in this comment editor.)

            var bytForRoleId = (byte)forRole;
            bytForRoleId = 0;
            var w = new System.Diagnostics.Stopwatch();
            w.Start();

            var q1 = (
                from r in DC.OrganizationUserRoles
                where bytForRoleId == 0 || (bytForRoleId > 0 && r.OrgUserRoleId == bytForRoleId)
                orderby r.OrgUserRoleId
                select new OrgRolePermissionInfo() {
                    RoleId = r.OrgUserRoleId,
                    RoleName = r.OrgUserRoleName,
                    RoleDescr = r.OrgUserRoleDescr,
                    Permissions = (
                        from dp in r.RolePermissions
                        from p in DC.RolePermissions.Where(x => x.OrgId == forOrgId && x.OrgUserRoleId == dp.OrgUserRoleId && x.PermissionId == dp.PermissionId).DefaultIfEmpty()
                        where dp.OrgId == 0
                        select new OrgRolePermission() {
                            Permission = dp.Permission,
                            DefaultValue = dp.PermissionValue,
                            Value = p.PermissionId > 0 ? p.PermissionValue : dp.PermissionValue,
                            Inherited = p.PermissionId > 0 ? false : true,
                            Types = dp.Permission.PermissionApplicables.ToList()
                        }
                    ).ToList()
                }
            ).ToList();

            w.Stop();
            var ms1 = w.ElapsedTicks;

            w.Restart();

            var q2 = (
                from a in DC.PermissionApplicables
                from rp in DC.RolePermissions.Where(x => x.PermissionId == a.PermissionId).DefaultIfEmpty()
                from p in DC.RolePermissions.Where(x => x.OrgId == forOrgId && x.OrgUserRoleId == rp.OrgUserRoleId && x.PermissionId == rp.PermissionId).DefaultIfEmpty()
                from pp in DC.Permissions.Where(x => x.PermissionId == rp.PermissionId)
                from r in DC.OrganizationUserRoles.Where(x => bytForRoleId == 0 || (bytForRoleId > 0 && x.OrgUserRoleId == bytForRoleId)).DefaultIfEmpty()
                where rp.OrgId == 0 && (pp.PermissionId != 0 && pp.PermissionId != 3)
                orderby r.OrgUserRoleId, a.PermissionId, a.CanWhat
                select new Test {
                    RoleId = r.OrgUserRoleId,
                    RoleName = r.OrgUserRoleName,
                    RoleDescr = r.OrgUserRoleDescr,
                    Perm = pp,
                    DefaultValue = rp.PermissionValue,
                    Value = (p.PermissionId > 0 ? p.PermissionValue : rp.PermissionValue),
                    Inherited = p.PermissionId > 0 ? false : true,
                    PermId = a.PermissionId,
                    CanWhat = a.CanWhat,
                    UiText = a.UiText,
                    UiExplanation = a.UiExplanation,
                    Includes = a.Includes,
                    ApplicableToAppUser = a.ApplicableToAppUser
                }
            ).ToList();

            w.Stop();
            var ms2 = w.ElapsedTicks;

            w.Restart();

            Test row;
            byte iLastRoleId = 0;
            byte iLastPermId = 0;
            short iLastCanWhat = 0;
            OrgRolePermissionInfo o = null;
            var lst = new List<OrgRolePermissionInfo>();
            OrgRolePermission rolePerm = null;

            for (var i = 0; i < q2.Count; i++) {
                row = q2[i];
                if(row.RoleId != iLastRoleId) {
                    iLastRoleId = row.RoleId;
                    iLastPermId = 0;
                    iLastCanWhat = 0;

                    o = new OrgRolePermissionInfo() {
                        RoleId = row.RoleId,
                        RoleName = row.RoleName,
                        RoleDescr = row.RoleDescr,
                        Permissions = new List<OrgRolePermission>()
                    };
                    lst.Add(o);
                }

                if(row.PermId != iLastPermId) {
                    iLastPermId = row.PermId;
                    iLastCanWhat = 0;
                    rolePerm = new OrgRolePermission() {
                        Permission = row.Perm,
                        DefaultValue = row.DefaultValue,
                        Value = row.Value,
                        Inherited = row.Inherited,
                        Types = new List<PermissionApplicable>()
                    };
                    o.Permissions.Add(rolePerm);
                }

                if(row.CanWhat != iLastCanWhat) {
                    iLastCanWhat = row.CanWhat;
                    rolePerm.Types.Add(new PermissionApplicable() {
                        PermissionId = row.PermId,
                        ApplicableToAppUser = row.ApplicableToAppUser,
                        CanWhat = row.CanWhat,
                        Includes = row.Includes,
                        UiExplanation = row.UiExplanation,
                        UiText = row.UiText
                    });
                }
            }

            w.Stop();
            var ms3 = w.ElapsedTicks;

@marekvse
Copy link

Comments to the above code:
First of all, it is really strange how the code block starts after several lines after its actual beginning. But at least it is readable now. Anyway...
The q1 query is the "natural" one - the way it feels naturally to be written and the way EF6 would issue one query to the database for and fill it correctly.
The q2 is the one I attempted to write backwards. As I mentioned, that was pretty hard to debug. It gives the same results now, which is enough for the measurement, but I'd not be completely confident with it for the production code in this form.
The second query is ordered by the entity keys of the involved entities and then iterated over to populate the list of object hierarchies to return.

I need to write queries like this all the time. Yet, according to @roji this is not a common case? Can I do it better? I'm not happy at all with a 5-fold decrease in performance or more with more return objects expected. With so many queries, the DB server will not be very scaleable. :-(

@roji
Copy link
Member

roji commented Apr 20, 2019

@marekvse you may have missed it, but the decision has already been made to move to a single SQL query (with classical join) for all cases - see this comment above.

FWIW whether a single query will perform better than one-query-per-joined-table depends on your specific query, various environmental concerns (e.g. network latency), etc. It's very easy to show scenarios in which the single query approach will be very bad for perf. However, it's true that single query has several non-perf advantages (always allows streaming regardless of MARS support, potential subtle transaction isolation issues..), and users should be able to get the previous behavior by coding for it explicitly.

@marekvse
Copy link

Yes, I indeed did miss that. Thank you @roji . I just updated from EF Core 2.2.0 to 2.2.4, but I get the same result - the same 50 queries instead of one for q1 query from the code above. Is there a setting I have to change or alter somehow the way the query is written?
Overnight I decided I was going to write a native SQL sproc and get data that way, which would at least let me write the query naturally and understandably. I'm glad I don't have to do that probably, but I'm not sure what's wrong with my LINQ in q1 to achieve the desired result.

@divega
Copy link
Contributor

divega commented Apr 21, 2019

@marekvse, the change @roji is referring to is coming in EF Core 3.0.

@marekvse
Copy link

I see. Thanks @roji. Will be looking forward to it then :).

@MgSam
Copy link

MgSam commented Apr 30, 2019

Glad to hear this is being fixed in EF Core 3.0.

@smitpatel smitpatel added the closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. label Jun 28, 2019
smitpatel added a commit that referenced this issue Jun 28, 2019
Bugs fixed
- Allow ToList on collection projection to be called with less derived type
- Project out necessary identifiers for collection join when pushing down. (Except distinct case)
- Lift ordering from inner collection to generated ordered result (This would be improved further by #16226)
- Map Cast from Enumerable to Queryable
- Translate AsQueryable as no-op

Resolves #12098
Resolves #15043
Resolves #15611
Part of #15711
smitpatel added a commit that referenced this issue Jun 28, 2019
Bugs fixed
- Allow ToList on collection projection to be called with less derived type
- Project out necessary identifiers for collection join when pushing down. (Except distinct case)
- Lift ordering from inner collection to generated ordered result (This would be improved further by #16226)
- Map Cast from Enumerable to Queryable
- Translate AsQueryable as no-op

Resolves #12098
Resolves #15043
Resolves #15611
Part of #15711
smitpatel added a commit that referenced this issue Jun 28, 2019
Bugs fixed
- Allow ToList on collection projection to be called with less derived type
- Project out necessary identifiers for collection join when pushing down. (Except distinct case)
- Lift ordering from inner collection to generated ordered result (This would be improved further by #16226)
- Map Cast from Enumerable to Queryable
- Translate AsQueryable as no-op

Resolves #12098
Resolves #15043
Resolves #15611
Part of #15711
@ajcvickers ajcvickers modified the milestones: 3.0.0, 3.0.0-preview7 Jul 2, 2019
@ajcvickers ajcvickers modified the milestones: 3.0.0-preview7, 3.0.0 Nov 11, 2019
@asalaheldin
Copy link

I also tried it with a straight up LINQ query and hand-written projections and there is the same 2 queries generated. This isn't really an option for me though even if it did work, as I'm building a GraphQL service that passing required field names down from client and then I use the .ProjectTo<T>(null, requiredFieldNames) overload to basically have AutoMapper generate the perfect SQL query for me. It'll be magical if it can work end-to-end and produce the JOIN I'd like.

bookingDtos = db.Bookings
    .AsNoTracking()
    .Select(b => new BookingDto
    {
        Id = b.Id,
        Name = b.Name,
        BookingItems = b.BookingItems.Select(bi => new BookingItemDto
        {
            Id = bi.Id,
            Name = bi.Name
        }).ToList()
    })
    .ToArray();
info: Microsoft.EntityFrameworkCore.Database.Command[20101]
      Executed DbCommand (4ms) [Parameters=[], CommandType='Text', CommandTimeout='30']
      SELECT [b].[Id], [b].[Name]
      FROM [Booking] AS [b]
      ORDER BY [b].[Id]
info: Microsoft.EntityFrameworkCore.Database.Command[20101]
      Executed DbCommand (7ms) [Parameters=[], CommandType='Text', CommandTimeout='30']
      SELECT [t].[Id], [b.BookingItems].[Id] AS [Id0], [b.BookingItems].[Name], [b.BookingItems].[BookingId]
      FROM [BookingItem] AS [b.BookingItems]
      INNER JOIN (
          SELECT [b0].[Id]
          FROM [Booking] AS [b0]
      ) AS [t] ON [b.BookingItems].[BookingId] = [t].[Id]
      ORDER BY [t].[Id]

@smitpatel @ajcvickers There is an order by Id exists in the resulted query above although the linq doesn't contains any order by. I'm asking about this as it could cause a performance issue specially in case this "Id" is used only as a primary key, but not clustered index

@smitpatel smitpatel mentioned this issue Dec 10, 2019
@simeyla
Copy link

simeyla commented Mar 2, 2021

Note: EF Core 5.0 allows you to choose from split or single queries.

https://docs.microsoft.com/en-us/ef/core/querying/single-split-queries

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. customer-reported type-enhancement
Projects
None yet
Development

Successfully merging a pull request may close this issue.