Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inline only expressions for filters that contains only ininling targe… #10860

Merged
merged 1 commit into from
Aug 2, 2018

Conversation

yingsu00
Copy link
Contributor

This is to fix github issue #10455.

We used to inline all predicate when doing push down predicates. This could have problems when the projection columns contains complex expressions like the following:

WITH
t1 (v) AS (VALUES 1),
t2 AS( select if(v = 0, v, v) v from t1 ),
t3 AS( select if(v = 0, v, v) v from t2 ),
t4 AS( select if(v = 0, v, v) v from t3 ),
t5 AS( select if(v = 0, v, v) v from t4 ),
t6 AS( select if(v = 0, v, v) v from t5 ),
t7 AS( select if(v = 0, v, v) v from t6 ),
t8 AS( select if(v = 0, v, v) v from t7 ),
t9 AS( select if(v = 0, v, v) v from t8 ),
t10 AS( select if(v = 0, v, v) v from t9 ),
t11 AS( select if(v = 0, v, v) v from t10 ),
t12 AS( select if(v = 0, v, v) v from t11 ),
t13 AS( select if(v = 0, v, v) v from t12 ),
t14 AS( select if(v = 0, v, v) v from t13 ),
t15 AS( select if(v = 0, v, v) v from t14 ),
t16 AS( select if(v = 0, v, v) v from t15 )
select *
from t16
where v = 0

This short-term fix is to adjust the inlining heuristics to only do it if the expressions are trivial or appear only once (similar to how the InlineProjections rule works)

Copy link
Contributor

@kokosing kokosing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add some tests for that, see TestPredicatePushdown.


return rewrittenNode;
}

private Sets.SetView<Symbol> extractNonInliningTargets(List<Expression> conjuncts, ProjectNode node)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any specific reason for the return type to be Sets.SetView<Symbol>. If not, declare the return type as Set<Symbol> since the fact that it's a SetView is an irrelevant implementation detail.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm replacing the function to return boolean indicating whether an expression should be inlined. The previous implementation was based on InlineProjections and was not right for pushing down filters because it counts the dependencies across all expressions.

.collect(toSet());
}

private Map<Boolean, List<Expression>> partitionConjuncts(List<Expression> conjuncts, ProjectNode node)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having a separate method for this doesn't buy much, especially since the name doesn't describe what it does. Inline it where its being used above.

}

return rewrittenNode;
}

private boolean isInliningExpression(Expression expression, ProjectNode node)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd call this isInliningCandidate

// change the semantics of those expressions
Set<Symbol> tryArguments = extractTryArguments(expression).stream().collect(toSet());

long nonInliningSymbolsCount = dependencies.entrySet().stream()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to count. You can just do:

return dependencies.entrySet().stream()
    .noneMatch(...);

or, even better, invert the condition to make it easier to reason about and use allMatch:

return dependencies.entrySet().stream()
     .allMatch(entry -> (entry.getValue() == 1 || node.getAssignments().get(entry.getKey()) instanceof Literal)
                            && !tryArguments.contains(entry.getKey()))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh thanks!

{
// Non-singletons should not be pushed down
assertPlan(
"with t as (select orderkey * 2 x from orders) select * from t where x + x > 1",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think placing the with and the select in separate lines will improve readability:

assertPlan(
    "with t as (select orderkey * 2 x from orders) " + 
    "select * from t where x + x > 1",
    anyTree(...));

tableScan("orders", ImmutableMap.of(
"orderkey", "orderkey"))))));

// composit singletons should be pushed down
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: composite

@yingsu00 yingsu00 force-pushed the InlineSymbols branch 3 times, most recently from 41762bd to fbdec2b Compare July 19, 2018 01:04
// Non-singletons should not be pushed down
assertPlan(
"with t as (select orderkey * 2 x from orders) " +
"select * from t where x + x > 1",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use capital letters for SQL keywords, here would be SELECT * FROM t WHERE

@@ -267,4 +268,56 @@ public void testPredicatePushDownThroughMarkDistinct()
project(assignUniqueId("unique", filter("A = 1", values("A")))),
project(filter("1 = B", values("B"))))));
}

@Test
public void testPredicatePushDownOverProjection()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not see a test with TRY

// All non-deterministic conjuncts, if any, will be in the filter node.
// We partition the expressions in the deterministicConjuncts into two lists, and only inline the
// expressions that are in the inlining targets list.
Map<Boolean, List<Expression>> partitionedConjuncts = deterministicConjuncts.stream()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

partitionedConjuncts sounds like it might having related to data partitioning. Maybe inlineConjuncts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

// 2. references to complex expressions that
// a. are not inputs to try() expressions
// b. appear only once across all expressions
// which come from the node, as opposed to an enclosing scope.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please put this comment to javadoc of this method

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no javadoc for any other method in the class. I'm not sure if I should do that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Up to you. In my opinion this comment explains the method behaviour so it should at the top level of the method.

}

return rewrittenNode;
}

private boolean isInliningCandidate(Expression expression, ProjectNode node)
{
Set<Symbol> childOutputSet = ImmutableSet.copyOf(node.getOutputSymbols());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/childOutputSet/input ?

Set<Symbol> tryArguments = extractTryArguments(expression);

return dependencies.entrySet().stream()
.allMatch(entry -> (entry.getValue() == 1 || node.getAssignments().get(entry.getKey()) instanceof Literal)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is not 1 too restrictive? Mentioned issue was about inlining when number of occurrences grew exponentially. Maybe we could have 10 here? Inlining expression 10 times should not affect planning but it should allow to handle more predicate pushdowns.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, you get exponential behavior with anything > 1 depending on the depth of the plan. In the example shown in #10455 the number of occurrences is 3 for each expression.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, thanks for explanation.


// All deterministic conjuncts that contains non-inlining targets, and non-deterministic conjuncts,
// if any, will be in the filter node.
List<Expression> nonInliningConjuncts = partitionedConjuncts.get(false);
if (!conjuncts.get(false).isEmpty()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is if is not needed


// exclude any complex inputs to TRY expressions. Inlining them would potentially
// change the semantics of those expressions
Set<Symbol> tryArguments = extractTryArguments(expression);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks like a separate issue

This is to fix github issue prestodb#10455.

We used to inline all predicate when doing push down predicates. This could have problems when the projection columns contains complex expressions like the following:

WITH
t1 (v) AS (VALUES 1),
t2 AS( select if(v = 0, v, v) v from t1 ),
t3 AS( select if(v = 0, v, v) v from t2 ),
t4 AS( select if(v = 0, v, v) v from t3 ),
t5 AS( select if(v = 0, v, v) v from t4 ),
t6 AS( select if(v = 0, v, v) v from t5 ),
t7 AS( select if(v = 0, v, v) v from t6 ),
t8 AS( select if(v = 0, v, v) v from t7 ),
t9 AS( select if(v = 0, v, v) v from t8 ),
t10 AS( select if(v = 0, v, v) v from t9 ),
t11 AS( select if(v = 0, v, v) v from t10 ),
t12 AS( select if(v = 0, v, v) v from t11 ),
t13 AS( select if(v = 0, v, v) v from t12 ),
t14 AS( select if(v = 0, v, v) v from t13 ),
t15 AS( select if(v = 0, v, v) v from t14 ),
t16 AS( select if(v = 0, v, v) v from t15 )
select *
from t16
where v = 0

This short-term fix is to adjust the inlining heuristics to only do it if the expressions are trivial or appear only once (similar to how the InlineProjections rule works)
@facebook-github-bot
Copy link
Collaborator

Thank you for your pull request. We require contributors to sign our Contributor License Agreement, and yours has expired.

Before we can review or merge your code, we need you to email [email protected] with your details so we can update your status.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants