Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query: Optimize certain types of GroupJoin/SelectMany queries #7845

Closed

Conversation

tuespetre
Copy link
Contributor

In some circumstances, a GroupJoin/SelectMany pattern (with or
without DefaultIfEmpty) can be optimized by shifting body clauses
from the AdditionalFromClause into the GroupJoinClause. These changes
implement that optimization.

@dnfclas
Copy link

dnfclas commented Mar 10, 2017

@tuespetre,
Thanks for having already signed the Contribution License Agreement. Your agreement was validated by .NET Foundation. We will now review your pull request.
Thanks,
.NET Foundation Pull Request Bot

@tuespetre tuespetre force-pushed the additional-from-optimizations branch from 5effff4 to 67cb24c Compare March 11, 2017 17:59
@tuespetre tuespetre force-pushed the additional-from-optimizations branch 2 times, most recently from 39513aa to ac2038e Compare March 15, 2017 16:49
{
if (groupJoinClause.JoinClause.InnerSequence is SubQueryExpression innerSequenceSubQuery)
{
if (!innerSequenceSubQuery.QueryModel.ResultOperators.Any(r => r is TakeResultOperator || r is SkipResultOperator))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add more tests? this case doesn't seem to be covered by any of the existing ones

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For sure, need to ensure other bases are covered there too (group join's selector must be a QSRE, cannot have any Concat/Except/Intersect/Union/GroupBy, Distinct should work)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm.... might punt on this part for now and skip the case altogether. There are problems with query compilation. Will file a separate issue

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found some issues myself when digging, this area is definitely not a bug-free zone ;)

&& qsre.ReferencedQuerySource is GroupJoinClause groupJoinClause
&& queryModel.CountQuerySourceReferences(groupJoinClause) == 1
&& subQueryExpression.QueryModel.BodyClauses.Any()
&& subQueryExpression.QueryModel.BodyClauses.All(c => c is WhereClause))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

anything preventing us from translating more body clauses/result operators? e.g. with Skip/Take people often add OrderBys

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have an example? An additional from clause using OrderBy().Take() is a lateral join, whereas OrderBy().Take() in the group join is not. WhereClause is the only reasonable thing to bother shifting into the group join clause (and only when Skip/Take/Concat/etc are not present on the group join's inner sequence)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah you are right

@tuespetre tuespetre force-pushed the additional-from-optimizations branch from ac2038e to 1431832 Compare March 16, 2017 03:14
/// This API supports the Entity Framework Core infrastructure and is not intended to be used
/// directly from your code. This API may change or be removed in future releases.
/// </summary>
public class AdditionalFromClauseOptimizingQueryModelVisitor : QueryModelVisitorBase
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is separate Visitor necessary? This does not seem to be doing anything specific which relates to any kind of state between qmv & ev. This should be part of QueryOptimizer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me, checked out SubQueryFromClauseFlattener source and it shouldn't hinder anything: https://github.com/re-motion/Relinq/blob/develop/Core/Transformations/SubQueryFromClauseFlattener.cs#L86

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@smitpatel scratch that. It's no good because then we miss the optimization on queries after rewriting navigations into them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do nav rewrite before query optimizer ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That causes lots of problems. I think this is a chicken/egg scenario like the entity equality rewriting visitor. We could run it both before and after.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@smitpatel I'm going to put together another PR that runs query optimizer before and after nav rewrites so we can review the difference. I think doing so will make it much clearer that we can reliably place this optimization and any other ad-hoc optimizations into the QueryOptimizer in the future.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Entity-equality rewrite is not chicken/egg scenario. It is the bug (or bad assumptions) in nav rewrite. Both of them affects different kind of Expressions. EntityEquality should only deal with binary expressions where qsre are being equated whereas nav rewrite should only deal with member access expressions where members are navigation. The deterministic order is to run nav rewrite first and expand all navigations & then run entity equality rewrite to convert qsre to key property access (including new qsre introduced by nav rewrite). There is one tricky step that nav rewrite optimizes navigation key access by writing it in forum of FK property instead of principal key to avoid join (or may not still having join unsure). Though even if nav rewrite does that it should not affect entity equality because there is no qsre comparison anymore.

The overall structure of OptimizeQueryModel is not very well defined. We should be running all custom stuff like nav rewrite, entity equality rewrite, the new include compiler @anpete introducing, possibly SubQueryMemberPushDownEV (which may go in QueryOptimizer too) and then run the QueryOptimizer on all of that to generate the optimal QM we can get.

@maumar @anpete

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That may be so with entity equality but QueryOptimizer is just going to be better off running before and after the navigation rewrites. There are subqueries that can only be flattened before the navigations are rewritten and there are subqueries that can only be flattened afterwards. Not just subquery flattening but optimizations in general (subquery member pushdown, optimizations like this one, etc.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have a bug for this - #7773 we can add this scenario to the pile

&& !(groupJoinClause.JoinClause.InnerSequence is SubQueryExpression)
&& queryModel.CountQuerySourceReferences(groupJoinClause) == 1
&& subQueryExpression.QueryModel.BodyClauses.Any()
&& subQueryExpression.QueryModel.BodyClauses.All(c => c is WhereClause))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can also do the same for Order clauses - it's maybe not very useful to write orderby in this context but it should be safe

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good

[NotNull] QueryModel queryModel,
int index)
{
if (fromClause.FromExpression is SubQueryExpression subQueryExpression
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this optimization could be done as part of nav rewrite (RewriteNavigationIntoGroupJoin method - we are already in the right place so the bulk of the condition below can be removed, we just need to analyze the body clauses)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the navigation rewriter be responsible for general purpose optimizations like this, though?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh that's right. User could manually write GroupJoin with a filter and we would miss this optimization. Let's keep it where it is.

Copy link
Contributor

@maumar maumar Mar 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add test for that (that hits this scenario using manually created groupjoin, i.e. doesn't go thru nav rewrite)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a test and see an issue related to the other query compilation bug I found with additional from clauses. Looks like this optimization needs to run before any subquery flattening; once again, something that should be run both before and after navigation rewriting.

@tuespetre tuespetre force-pushed the additional-from-optimizations branch from 1431832 to 36ead2a Compare March 17, 2017 14:19
@tuespetre
Copy link
Contributor Author

🆙📅 added several tests for manual group join scenarios.

@@ -299,15 +299,24 @@ protected virtual void ExtractQueryAnnotations([NotNull] QueryModel queryModel)
Check.NotNull(queryModel, nameof(queryModel));
Check.NotNull(includeResultOperators, nameof(includeResultOperators));

_queryOptimizer.Optimize(QueryCompilationContext.QueryAnnotations, queryModel);
var additionalFromClauseOptimizer
= new AdditionalFromClauseOptimizingQueryModelVisitor();

var entityEqualityRewritingExpressionVisitor
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: move this bit lower (just before it's called)

base.GroupJoin_SelectMany_subquery_with_filter_orderby_and_DefaultIfEmpty();

Assert.Equal(
@"SELECT [o0].[OrderID], [o0].[CustomerID], [o0].[EmployeeID], [o0].[OrderDate]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect this to be translated to a single query, like so:

SELECT [t].[OrderID], [t].[CustomerID], [t].[EmployeeID], [t].[OrderDate], [c].[ContactName]
 FROM [Customers] AS [c]
 LEFT JOIN (
     SELECT [o].[OrderID], [o].[CustomerID], [o].[EmployeeID], [o].[OrderDate]
     FROM [Orders] AS [o]
     WHERE [o].[OrderID] > 5
     ORDER BY [o].[OrderDate]
 ) AS [t] ON [c].[CustomerID] = [t].[CustomerID]

Copy link
Contributor

@maumar maumar Mar 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i guess we can address order by in a different commit if thats significant work - it's not like we are enabling some crucial scenario with it, just a nice-to-have

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ORDER BY cannot be in a subquery without TOP.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, lets just do the 'where' - sorry for the confusion ;)

In some circumstances, a GroupJoin/SelectMany pattern (with or
without DefaultIfEmpty) can be optimized by shifting body clauses
from the AdditionalFromClause into the GroupJoinClause. These changes
implement that optimization.
@tuespetre tuespetre force-pushed the additional-from-optimizations branch from 36ead2a to 0fcbb5a Compare March 18, 2017 20:04
@tuespetre
Copy link
Contributor Author

🆙📅

@maumar
Copy link
Contributor

maumar commented Mar 19, 2017

merged in 7441fc2, thank you for the contribution!

@maumar maumar closed this Mar 19, 2017
@tuespetre tuespetre deleted the additional-from-optimizations branch March 19, 2017 18:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants