-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inline only expressions for filters that contains only ininling targe… #10860
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please add some tests for that, see TestPredicatePushdown
.
|
||
return rewrittenNode; | ||
} | ||
|
||
private Sets.SetView<Symbol> extractNonInliningTargets(List<Expression> conjuncts, ProjectNode node) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any specific reason for the return type to be Sets.SetView<Symbol>
. If not, declare the return type as Set<Symbol>
since the fact that it's a SetView is an irrelevant implementation detail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm replacing the function to return boolean indicating whether an expression should be inlined. The previous implementation was based on InlineProjections and was not right for pushing down filters because it counts the dependencies across all expressions.
.collect(toSet()); | ||
} | ||
|
||
private Map<Boolean, List<Expression>> partitionConjuncts(List<Expression> conjuncts, ProjectNode node) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having a separate method for this doesn't buy much, especially since the name doesn't describe what it does. Inline it where its being used above.
} | ||
|
||
return rewrittenNode; | ||
} | ||
|
||
private boolean isInliningExpression(Expression expression, ProjectNode node) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd call this isInliningCandidate
// change the semantics of those expressions | ||
Set<Symbol> tryArguments = extractTryArguments(expression).stream().collect(toSet()); | ||
|
||
long nonInliningSymbolsCount = dependencies.entrySet().stream() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to count. You can just do:
return dependencies.entrySet().stream()
.noneMatch(...);
or, even better, invert the condition to make it easier to reason about and use allMatch
:
return dependencies.entrySet().stream()
.allMatch(entry -> (entry.getValue() == 1 || node.getAssignments().get(entry.getKey()) instanceof Literal)
&& !tryArguments.contains(entry.getKey()))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh thanks!
{ | ||
// Non-singletons should not be pushed down | ||
assertPlan( | ||
"with t as (select orderkey * 2 x from orders) select * from t where x + x > 1", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think placing the with
and the select
in separate lines will improve readability:
assertPlan(
"with t as (select orderkey * 2 x from orders) " +
"select * from t where x + x > 1",
anyTree(...));
tableScan("orders", ImmutableMap.of( | ||
"orderkey", "orderkey")))))); | ||
|
||
// composit singletons should be pushed down |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: composite
41762bd
to
fbdec2b
Compare
// Non-singletons should not be pushed down | ||
assertPlan( | ||
"with t as (select orderkey * 2 x from orders) " + | ||
"select * from t where x + x > 1", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please use capital letters for SQL keywords, here would be SELECT * FROM t WHERE
@@ -267,4 +268,56 @@ public void testPredicatePushDownThroughMarkDistinct() | |||
project(assignUniqueId("unique", filter("A = 1", values("A")))), | |||
project(filter("1 = B", values("B")))))); | |||
} | |||
|
|||
@Test | |||
public void testPredicatePushDownOverProjection() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not see a test with TRY
// All non-deterministic conjuncts, if any, will be in the filter node. | ||
// We partition the expressions in the deterministicConjuncts into two lists, and only inline the | ||
// expressions that are in the inlining targets list. | ||
Map<Boolean, List<Expression>> partitionedConjuncts = deterministicConjuncts.stream() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
partitionedConjuncts
sounds like it might having related to data partitioning. Maybe inlineConjuncts
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
// 2. references to complex expressions that | ||
// a. are not inputs to try() expressions | ||
// b. appear only once across all expressions | ||
// which come from the node, as opposed to an enclosing scope. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please put this comment to javadoc of this method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no javadoc for any other method in the class. I'm not sure if I should do that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Up to you. In my opinion this comment explains the method behaviour so it should at the top level of the method.
} | ||
|
||
return rewrittenNode; | ||
} | ||
|
||
private boolean isInliningCandidate(Expression expression, ProjectNode node) | ||
{ | ||
Set<Symbol> childOutputSet = ImmutableSet.copyOf(node.getOutputSymbols()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/childOutputSet/input ?
Set<Symbol> tryArguments = extractTryArguments(expression); | ||
|
||
return dependencies.entrySet().stream() | ||
.allMatch(entry -> (entry.getValue() == 1 || node.getAssignments().get(entry.getKey()) instanceof Literal) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is not 1
too restrictive? Mentioned issue was about inlining when number of occurrences grew exponentially. Maybe we could have 10 here? Inlining expression 10 times should not affect planning but it should allow to handle more predicate pushdowns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, you get exponential behavior with anything > 1 depending on the depth of the plan. In the example shown in #10455 the number of occurrences is 3 for each expression.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, thanks for explanation.
|
||
// All deterministic conjuncts that contains non-inlining targets, and non-deterministic conjuncts, | ||
// if any, will be in the filter node. | ||
List<Expression> nonInliningConjuncts = partitionedConjuncts.get(false); | ||
if (!conjuncts.get(false).isEmpty()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is if
is not needed
|
||
// exclude any complex inputs to TRY expressions. Inlining them would potentially | ||
// change the semantics of those expressions | ||
Set<Symbol> tryArguments = extractTryArguments(expression); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks like a separate issue
This is to fix github issue prestodb#10455. We used to inline all predicate when doing push down predicates. This could have problems when the projection columns contains complex expressions like the following: WITH t1 (v) AS (VALUES 1), t2 AS( select if(v = 0, v, v) v from t1 ), t3 AS( select if(v = 0, v, v) v from t2 ), t4 AS( select if(v = 0, v, v) v from t3 ), t5 AS( select if(v = 0, v, v) v from t4 ), t6 AS( select if(v = 0, v, v) v from t5 ), t7 AS( select if(v = 0, v, v) v from t6 ), t8 AS( select if(v = 0, v, v) v from t7 ), t9 AS( select if(v = 0, v, v) v from t8 ), t10 AS( select if(v = 0, v, v) v from t9 ), t11 AS( select if(v = 0, v, v) v from t10 ), t12 AS( select if(v = 0, v, v) v from t11 ), t13 AS( select if(v = 0, v, v) v from t12 ), t14 AS( select if(v = 0, v, v) v from t13 ), t15 AS( select if(v = 0, v, v) v from t14 ), t16 AS( select if(v = 0, v, v) v from t15 ) select * from t16 where v = 0 This short-term fix is to adjust the inlining heuristics to only do it if the expressions are trivial or appear only once (similar to how the InlineProjections rule works)
Thank you for your pull request. We require contributors to sign our Contributor License Agreement, and yours has expired. Before we can review or merge your code, we need you to email [email protected] with your details so we can update your status. |
This is to fix github issue #10455.
We used to inline all predicate when doing push down predicates. This could have problems when the projection columns contains complex expressions like the following:
WITH
t1 (v) AS (VALUES 1),
t2 AS( select if(v = 0, v, v) v from t1 ),
t3 AS( select if(v = 0, v, v) v from t2 ),
t4 AS( select if(v = 0, v, v) v from t3 ),
t5 AS( select if(v = 0, v, v) v from t4 ),
t6 AS( select if(v = 0, v, v) v from t5 ),
t7 AS( select if(v = 0, v, v) v from t6 ),
t8 AS( select if(v = 0, v, v) v from t7 ),
t9 AS( select if(v = 0, v, v) v from t8 ),
t10 AS( select if(v = 0, v, v) v from t9 ),
t11 AS( select if(v = 0, v, v) v from t10 ),
t12 AS( select if(v = 0, v, v) v from t11 ),
t13 AS( select if(v = 0, v, v) v from t12 ),
t14 AS( select if(v = 0, v, v) v from t13 ),
t15 AS( select if(v = 0, v, v) v from t14 ),
t16 AS( select if(v = 0, v, v) v from t15 )
select *
from t16
where v = 0
This short-term fix is to adjust the inlining heuristics to only do it if the expressions are trivial or appear only once (similar to how the InlineProjections rule works)