-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Query: Optimize certain types of GroupJoin/SelectMany queries #7845
Conversation
@tuespetre, |
5effff4
to
67cb24c
Compare
39513aa
to
ac2038e
Compare
{ | ||
if (groupJoinClause.JoinClause.InnerSequence is SubQueryExpression innerSequenceSubQuery) | ||
{ | ||
if (!innerSequenceSubQuery.QueryModel.ResultOperators.Any(r => r is TakeResultOperator || r is SkipResultOperator)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add more tests? this case doesn't seem to be covered by any of the existing ones
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For sure, need to ensure other bases are covered there too (group join's selector must be a QSRE, cannot have any Concat/Except/Intersect/Union/GroupBy, Distinct should work)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm.... might punt on this part for now and skip the case altogether. There are problems with query compilation. Will file a separate issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found some issues myself when digging, this area is definitely not a bug-free zone ;)
&& qsre.ReferencedQuerySource is GroupJoinClause groupJoinClause | ||
&& queryModel.CountQuerySourceReferences(groupJoinClause) == 1 | ||
&& subQueryExpression.QueryModel.BodyClauses.Any() | ||
&& subQueryExpression.QueryModel.BodyClauses.All(c => c is WhereClause)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
anything preventing us from translating more body clauses/result operators? e.g. with Skip/Take people often add OrderBys
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have an example? An additional from clause using OrderBy().Take() is a lateral join, whereas OrderBy().Take() in the group join is not. WhereClause is the only reasonable thing to bother shifting into the group join clause (and only when Skip/Take/Concat/etc are not present on the group join's inner sequence)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah you are right
ac2038e
to
1431832
Compare
/// This API supports the Entity Framework Core infrastructure and is not intended to be used | ||
/// directly from your code. This API may change or be removed in future releases. | ||
/// </summary> | ||
public class AdditionalFromClauseOptimizingQueryModelVisitor : QueryModelVisitorBase |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is separate Visitor necessary? This does not seem to be doing anything specific which relates to any kind of state between qmv & ev. This should be part of QueryOptimizer
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me, checked out SubQueryFromClauseFlattener
source and it shouldn't hinder anything: https://github.com/re-motion/Relinq/blob/develop/Core/Transformations/SubQueryFromClauseFlattener.cs#L86
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@smitpatel scratch that. It's no good because then we miss the optimization on queries after rewriting navigations into them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do nav rewrite before query optimizer ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That causes lots of problems. I think this is a chicken/egg scenario like the entity equality rewriting visitor. We could run it both before and after.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@smitpatel I'm going to put together another PR that runs query optimizer before and after nav rewrites so we can review the difference. I think doing so will make it much clearer that we can reliably place this optimization and any other ad-hoc optimizations into the QueryOptimizer in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Entity-equality rewrite is not chicken/egg scenario. It is the bug (or bad assumptions) in nav rewrite. Both of them affects different kind of Expressions. EntityEquality should only deal with binary expressions where qsre are being equated whereas nav rewrite should only deal with member access expressions where members are navigation. The deterministic order is to run nav rewrite first and expand all navigations & then run entity equality rewrite to convert qsre to key property access (including new qsre introduced by nav rewrite). There is one tricky step that nav rewrite optimizes navigation key access by writing it in forum of FK property instead of principal key to avoid join (or may not still having join unsure). Though even if nav rewrite does that it should not affect entity equality because there is no qsre comparison anymore.
The overall structure of OptimizeQueryModel
is not very well defined. We should be running all custom stuff like nav rewrite, entity equality rewrite, the new include compiler @anpete introducing, possibly SubQueryMemberPushDownEV
(which may go in QueryOptimizer
too) and then run the QueryOptimizer on all of that to generate the optimal QM we can get.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That may be so with entity equality but QueryOptimizer
is just going to be better off running before and after the navigation rewrites. There are subqueries that can only be flattened before the navigations are rewritten and there are subqueries that can only be flattened afterwards. Not just subquery flattening but optimizations in general (subquery member pushdown, optimizations like this one, etc.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we have a bug for this - #7773 we can add this scenario to the pile
&& !(groupJoinClause.JoinClause.InnerSequence is SubQueryExpression) | ||
&& queryModel.CountQuerySourceReferences(groupJoinClause) == 1 | ||
&& subQueryExpression.QueryModel.BodyClauses.Any() | ||
&& subQueryExpression.QueryModel.BodyClauses.All(c => c is WhereClause)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can also do the same for Order clauses - it's maybe not very useful to write orderby in this context but it should be safe
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good
[NotNull] QueryModel queryModel, | ||
int index) | ||
{ | ||
if (fromClause.FromExpression is SubQueryExpression subQueryExpression |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this optimization could be done as part of nav rewrite (RewriteNavigationIntoGroupJoin method - we are already in the right place so the bulk of the condition below can be removed, we just need to analyze the body clauses)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the navigation rewriter be responsible for general purpose optimizations like this, though?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh that's right. User could manually write GroupJoin with a filter and we would miss this optimization. Let's keep it where it is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add test for that (that hits this scenario using manually created groupjoin, i.e. doesn't go thru nav rewrite)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a test and see an issue related to the other query compilation bug I found with additional from clauses. Looks like this optimization needs to run before any subquery flattening; once again, something that should be run both before and after navigation rewriting.
1431832
to
36ead2a
Compare
🆙📅 added several tests for manual group join scenarios. |
@@ -299,15 +299,24 @@ protected virtual void ExtractQueryAnnotations([NotNull] QueryModel queryModel) | |||
Check.NotNull(queryModel, nameof(queryModel)); | |||
Check.NotNull(includeResultOperators, nameof(includeResultOperators)); | |||
|
|||
_queryOptimizer.Optimize(QueryCompilationContext.QueryAnnotations, queryModel); | |||
var additionalFromClauseOptimizer | |||
= new AdditionalFromClauseOptimizingQueryModelVisitor(); | |||
|
|||
var entityEqualityRewritingExpressionVisitor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: move this bit lower (just before it's called)
base.GroupJoin_SelectMany_subquery_with_filter_orderby_and_DefaultIfEmpty(); | ||
|
||
Assert.Equal( | ||
@"SELECT [o0].[OrderID], [o0].[CustomerID], [o0].[EmployeeID], [o0].[OrderDate] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would expect this to be translated to a single query, like so:
SELECT [t].[OrderID], [t].[CustomerID], [t].[EmployeeID], [t].[OrderDate], [c].[ContactName]
FROM [Customers] AS [c]
LEFT JOIN (
SELECT [o].[OrderID], [o].[CustomerID], [o].[EmployeeID], [o].[OrderDate]
FROM [Orders] AS [o]
WHERE [o].[OrderID] > 5
ORDER BY [o].[OrderDate]
) AS [t] ON [c].[CustomerID] = [t].[CustomerID]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i guess we can address order by in a different commit if thats significant work - it's not like we are enabling some crucial scenario with it, just a nice-to-have
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ORDER BY cannot be in a subquery without TOP.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, lets just do the 'where' - sorry for the confusion ;)
In some circumstances, a GroupJoin/SelectMany pattern (with or without DefaultIfEmpty) can be optimized by shifting body clauses from the AdditionalFromClause into the GroupJoinClause. These changes implement that optimization.
36ead2a
to
0fcbb5a
Compare
🆙📅 |
merged in 7441fc2, thank you for the contribution! |
In some circumstances, a GroupJoin/SelectMany pattern (with or
without DefaultIfEmpty) can be optimized by shifting body clauses
from the AdditionalFromClause into the GroupJoinClause. These changes
implement that optimization.