Outline guiding principles and decide on direction for queries in 3.0 #12795

ajcvickers · 2018-07-25T20:26:21Z

There are several query-related efforts being worked on or considered for the 3.0 release. This issue is a single place to track and discuss these efforts at a high level as we make progress and decide on direction.

Initial areas to consider:

Increase test coverage
Fix query bugs
Decide on architecture changes, if any. For example internalize or remove Relinq.
Consolidate and document guiding principles for different types of translations and evaluations. For example, Should we be treating objects created by the user in a query differently from the ones we materialize? #12668, Query: revisit design for function non-determinism and client evaluatability #12672, Query: Consider adding method to force client evaluation of an expression #12736, Query: Interpret path passed to Include() and allow partial matches #12737, Warning when query materializes an entity outside the top projection #12667, .Any() on collection navigations after projection can cause client-side evaluation #12728, Query: Consider adding method to force client evaluation of an expression #12736, Make IQueryOptimizer/QueryOptimizer non-internal #12756, Query: Enable translations that generate different SQL depending on a value to be cached #12764, Query: Include for both navigations of One-To-Many relationship should use split query #12775, Change split query implementation to fetch dependents by keys, without reevaluating principal query #12776, Query: compilation error (dangling qsre) for query with SelectMany, order by and include collection #12794, Support different SQL translation when store representation has changed due to value conversion #10434, Query: consider expanding null safety beyond nav rewrite #12284 (null safety), ParameterExtractingExpressionVisitor::GetValue #13899, Query: projecting naked correlated collection when tracking should return collection object of it's parent entity #14046, Query: Top projection should fully materialize and track entities before they are passed to client methods #12761 (comment) (top projection rewrite), Query: correlated collection optimization doesn't get applied for subqueries with AsQueryable #12195
Decide on keeping full client-eval fallback, going to explicit client-eval only, or moving to a mixture. For example, see Using Func<> for where predicate does not work #12765, Provide a way to control client evaluation behavior on a per query basis #12844, InMemory provider queries on tracked entities (unlike SQLServer provider) #12945, Query: Allow non static methods in top level projection #13048, Support client evaluation when store evaluation is not appropriate #10265

divega · 2018-08-03T15:25:15Z

Including here the seminal client-side evaluation proposal Colin Meek wrote:

Implicit boundaries in LINQ to Entities (client-side evaluation)

Overview

LINQ forces us to blur the boundary between the server and the client. For a provider like LINQ to Entities, this means that a query supplied to the stack is always partly evaluated in the client application and partly in the database server. For instance, the query

int productId = 1;
var q =
    from p in context.DbSet<Product>()
    where p.ProductID == productId
    where DetermineProductPriority(p) == "High"
    select new XElement(
        "Product",
        new XAttribute("Name", p.Name),
        new XAttribute("ID", p.ProductId));

includes a selection evaluated by the server (p.ProductId == @productId), but also client expressions, e.g. the binding of the free variable productId and materialization of XML nodes in the projection. It also includes a call to a client-side method on a predicate over a server correlated expression, something that is not supported by either LINQ to Entities or in most LINQ to * implementations.

While we have found it convenient to talk about LINQ to Entities as a strict implementation – all server or nothing – and other implementations such as LINQ to SQL as hybrid implementations – splitting the query into client and server expressions – these implementations really exist along a continuum, and it makes sense to examine this continuum in more detail to clarify the current behavior and how we could improve EF.

It is convenient to discuss this continuum with respect to the following expression scopes:

Independent sub-expressions: in the above query, the access to the free variable productId is compiled into something like Expression.Field(Expression.Constant(CS$<>8__locals6), fieldof(<>c__DisplayClass5.productId)), which does not depend on the current scope of the query (i.e. it is not correlated to the contents of any table in the database). The process of identifying and evaluating these independent sub-expressions has become known as expression funcletization at Microsoft.
Client sources: LINQ queries are typically bootstrapped by IQueryable roots, e.g. dbContext.DbSet<Product>() in the above example.
Client projections: while client sources introduce typed iterators as the root or roots for remote queries, client projections close the loop by shaping typed query results. In the above example, entity results are shaped into XML nodes, new XElement…
Dependent sub-expressions: certain sub-expressions can only be evaluated by the client, e.g. DetermineProductPriority…, but depend on intermediate results from the server, e.g. DetermineProductPriority(<u>p</u>).

Note that these categories are somewhat arbitrary. A client source is a kind of independent sub-expression and a dependent sub-expression is in some ways just a generalization of client projections. The categories are still intuitively useful however: independent sub-expressions often map to parameters while client sources mostly map to scans, and; client projections are frequently benign while dependent sub-expressions are often cause for concern (consider high selectivity filters).

Implicit boundaries are dangerous but also an essential feature of LINQ. Finding the appropriate balance is important. From feedback we have received over the years, we know that users expect magic and may be disappointed if we either throw because we’re overly strict or we end up streaming 1,000,000 rows from the database into the client to find the 10 rows matching a client predicate because we were too loose.

Independent sub-expressions and client sources

Options:

Necessary: free variables and constants are unavoidable. We need to allow expressions that represent access to field, property and constant in order to implement a viable LINQ provider.
Literals: the LINQ equivalent of value literals. Allow constants (1), primitive type constructors (new DateTime(2008, 5, 28)) and even array initialization patterns ( new int[] {…}).
Root construction: currently LINQ to Entities only supports some forms of roots inside the query, e.g. context.Products is recognized, but unfortunately inline construction of query roots is not, e.g. dbContext.DbSet<T>() and context.CreateQuery<T>(string) are not recognized.
Server-unsupported expressions that can be converted to server parameters: currently LINQ to Entities will throw if it finds any expression that it cannot evaluate on the server in a query. If we could turn an independent expression into a query parameter and we know that no part of the expression could ever be evaluated by the server, at least with the same semantics, we could funcletize it. There is an interesting challenge on deciding how much of a sub-expression we can funcletize. For instance, consider the expression stringBuilder.ToString().Length. While the stringBuilder.ToString() part can only be evaluated on the client, .Length could either be evaluated on the client alongside the rest of the expression or translated to LEN(@param) in the store. LEN() in SQL Server has subtly different semantics form string.Lenght in the CLR in that it ignores trailing blanks. We have three options on what we can funcletize:
a. The minimal sub-expression that cannot be evaluated on the server
b. The minimal sub-expression that cannot be evaluated on the client plus any expression that can be evaluated on either the client or the server with identical semantics
c. The maximal sub-expression that can be evaluated on the client
Options (a) and (b) guarantee that at least for a particular server, all occurrences of such expressions would have consistent semantics, regardless of where they appear in the query. The alternative to this is to evaluate. Option (c) would imply that any sub-expression that can be turned into a parameter would be evaluated on the client. This is the most flexible approach but means we apply inconsistent semantics to some operators depending on which side of the boundary they find themselves on.
Server-unsupported expressions that cannot be turned into server parameters: We should also explore these. There are interesting solutions where you pipe values through the query to the result, which works when the value is never cracked or cracked at need on the client. This greatly increases the cost of the feature for LINQ to Entities which would need to introduce its own intermediate metadata representation for such values.

Client projection

Options:

Composable: projections that can be composed within a query are supported as client projections. For instance, entities, complex types and “rows” can be projected but arbitrary method calls or constructors cannot.
Non-composable: top-level projections could include arbitrary method calls and constructors. For efficiency, reverse funcletization occurs for the client projection, e.g. select ClientMethod1(ClientMethod2(e.X, e.Y), e.Z) becomes select new { e.X, e.Y, e.Z } into f select ClientMethod1(ClientMethod2(f.X, f.Y), f.Z).

Note that method calls may introduce additional round-trips to the server. Before people start shouting about “nanny state APIs”, consider that users are not complaining that they wanted the round-trips but that we failed to crack the methods to figure out how to avoid them…

Dependent sub-expressions

What happens when a sub-expression cannot be evaluated by the server but depends on intermediate results?

Whenever an unsupported expression is encountered, we could simply split the query at that point (modulo the kinds of local optimizations described for client optimization).
We should attempt to push as much server logic “down” the tree as possible to minimize the amount of work in the client. This is critical where joins, selections and even some projections are involved.

Interface considerations

If we implement support for these patterns, we should also consider allowing the user to disable them. The user can exercise whatever level of control they want over the client-server partitioning of the query. In addition, we should make the partitioned plan visible to the user, either by using documented boundary expressions or through a debugger visualizer.

Implementation considerations

We can include a separate pass to identify supported and unsupported expressions in the query tree, similar to other LINQ implementations.

tuespetre · 2018-08-20T14:13:21Z

I love the example with new XElement(...) and immediately see the potential translation into FOR XML PATH. 😉

As for architectural changes and managing query bugs, I feel like there is a lot I could say but I don't know how effective I would be at communicating it.

ajcvickers · 2018-08-20T17:22:28Z

@tuespetre I'm pretty sure we will want to talk to you about some of this stuff, so stay tuned. :-)

pmiddleton · 2018-08-20T18:09:03Z

@ajcvickers - In regards to looking at ReLinq. Is #12048 the driving issue behind that, or are there other things driving it as well?

It seems like a major undertaking to remove/replace it given how the tightly the query system is architected around it with a lot of risks for breaking things.

ajcvickers · 2018-08-20T18:13:47Z

@pmiddleton Other things too, And yes, I agree that it is risky; that is one of the considerations. Sorry for being a bit ambiguous here. Like I said, stay tuned. Don't be impatient. 😉

pmiddleton · 2018-08-20T21:45:56Z

@ajcvickers - I have the open PR for TVF and am currently working on a pivot feature. Both have tie-ins to ReLinq so the possible change peaked my interest as there might be some rework required on my part. :)

ajcvickers · 2018-08-20T21:47:55Z

@pmiddleton Agreed.

tuespetre · 2018-08-20T22:20:05Z

@ajcvickers Heyyy, I’m onto you 👀

Part of #12795 Part of #12048

divega · 2019-09-07T01:24:33Z

@smitpatel, @ajcvickers any need to keep this open now?

smitpatel · 2019-09-07T01:56:07Z

I will write certain notes overall about things and close it. Or I can add it to the docs and close the issue with reference to it.

divega · 2019-09-07T07:40:07Z

Adding a version of this with your notes to a “query architecture” section in the docs sounds great. I guess we can create a docs issue for that and close this anyway.

jjxtra · 2019-10-12T15:26:13Z

Even simple group by is broken in ef 3, why is this? Seems like a group by should translate to sql statement just fine...

linkerro · 2019-10-17T12:31:56Z

I second that question and also wonder how this situation makes any sort of sense.
This makes me feel miffed, very miffed.

smitpatel · 2019-10-17T15:34:18Z

@jjxtra @linkerro - Did you guys look at #17068?

divega mentioned this issue Jul 27, 2018

Query: Top projection should fully materialize and track entities before they are passed to client methods #12761

Closed

ajcvickers assigned divega and smitpatel Jul 30, 2018

ajcvickers added this to the 3.0.0 milestone Jul 30, 2018

ajcvickers added type-enhancement needs-design labels Jul 30, 2018

This was referenced Aug 1, 2018

Add Pivot LINQ extension #11963

Open

Include() not working when filtered with Contains() and aggregating to list #12852

Closed

austindrenski mentioned this issue Aug 9, 2018

Update the plugin model for expression filters npgsql/efcore.pg#581

Merged

ajcvickers mentioned this issue Aug 9, 2018

UDFs and SqlFunction quoting and schema logic #12757

Closed

pmiddleton mentioned this issue Aug 14, 2018

Add support for table valued functions #11129

Closed

austindrenski mentioned this issue Aug 20, 2018

Expose PostgreSQL XML functions npgsql/efcore.pg#7

Open

ajcvickers mentioned this issue Aug 21, 2018

Query: Allow non static methods in top level projection #13048

Open

smitpatel mentioned this issue Aug 23, 2018

Include Expression could not handle extra join #13092

Closed

4 tasks

ajcvickers mentioned this issue Aug 28, 2018

Query: correlated collection optimization doesn't get applied for subqueries with AsQueryable #12195

Closed

smitpatel added a commit that referenced this issue Jun 3, 2019

Query: Remove Remotion.Linq dependency

0e57d98

Part of #12795 Part of #12048

smitpatel mentioned this issue Jun 3, 2019

Query: Remove Remotion.Linq dependency #15913

Merged

smitpatel added a commit that referenced this issue Jun 3, 2019

Query: Remove Remotion.Linq dependency

92ed212

Part of #12795 Part of #12048

divega mentioned this issue Jun 7, 2019

Entity not tracked by change tracker when selecting not mapped property #15989

Closed

petli mentioned this issue Jun 28, 2019

Intermittent NRE in SelectExpression.Tags #15710

Closed

ajcvickers modified the milestones: 3.0.0, Epics Jun 28, 2019

smitpatel removed the query-design label Jul 1, 2019

ajcvickers mentioned this issue Aug 23, 2019

Is the dependency on re-linq going away? #17397

Closed

divega removed their assignment Sep 18, 2019

smitpatel added the area-query label Nov 19, 2019

ajcvickers mentioned this issue Dec 17, 2019

Why remove Remotion.Linq dependency? #19339

Closed

ajcvickers mentioned this issue Jan 3, 2020

A column has been specified more than once in the order by list. Columns in the order by list must be unique #19442

Closed

ajcvickers modified the milestones: Epics, Backlog Jan 29, 2020

maumar mentioned this issue Mar 4, 2020

The LINQ expression could not be translated. #20175

Closed

smitpatel mentioned this issue Mar 16, 2020

GroupJoin is doing aggregate in memory instead of on DB #13887

Closed

ajcvickers modified the milestones: Backlog, Epics Aug 10, 2020

AndriySvyryd added the propose-close label Oct 19, 2022

ajcvickers closed this as completed Oct 26, 2022

ajcvickers removed this from the Epics milestone Dec 7, 2022

ajcvickers added the closed-no-further-action The issue is closed and no further action is planned. label May 15, 2024

ajcvickers removed the needs-design label Aug 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Outline guiding principles and decide on direction for queries in 3.0 #12795

Outline guiding principles and decide on direction for queries in 3.0 #12795

ajcvickers commented Jul 25, 2018 •

edited

Loading

divega commented Aug 3, 2018 •

edited by ajcvickers

Loading

tuespetre commented Aug 20, 2018

ajcvickers commented Aug 20, 2018

pmiddleton commented Aug 20, 2018

ajcvickers commented Aug 20, 2018

pmiddleton commented Aug 20, 2018

ajcvickers commented Aug 20, 2018

tuespetre commented Aug 20, 2018

divega commented Sep 7, 2019

smitpatel commented Sep 7, 2019

divega commented Sep 7, 2019

jjxtra commented Oct 12, 2019

linkerro commented Oct 17, 2019

smitpatel commented Oct 17, 2019

Outline guiding principles and decide on direction for queries in 3.0 #12795

Outline guiding principles and decide on direction for queries in 3.0 #12795

Comments

ajcvickers commented Jul 25, 2018 • edited Loading

divega commented Aug 3, 2018 • edited by ajcvickers Loading

Implicit boundaries in LINQ to Entities (client-side evaluation)

Overview

Independent sub-expressions and client sources

Client projection

Dependent sub-expressions

Interface considerations

Implementation considerations

tuespetre commented Aug 20, 2018

ajcvickers commented Aug 20, 2018

pmiddleton commented Aug 20, 2018

ajcvickers commented Aug 20, 2018

pmiddleton commented Aug 20, 2018

ajcvickers commented Aug 20, 2018

tuespetre commented Aug 20, 2018

divega commented Sep 7, 2019

smitpatel commented Sep 7, 2019

divega commented Sep 7, 2019

jjxtra commented Oct 12, 2019

linkerro commented Oct 17, 2019

smitpatel commented Oct 17, 2019

ajcvickers commented Jul 25, 2018 •

edited

Loading

divega commented Aug 3, 2018 •

edited by ajcvickers

Loading