Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-12161][SQL] Ignore order of predicates in cache matching #10163

Closed
wants to merge 26 commits into from

Conversation

codingjaguar
Copy link

This PR improves LogicalPlan.sameResult so that semantically equivalent queries with different order of predicates are still matched.

Consider an example:
Query 1: CACHE TABLE first AS SELECT * FROM table A where A.id >100 AND A.id < 200;
Query 2: SELECT * FROM table A where A.id < 200 AND A.id > 100;
Currently in SparkSQL, Query 2 cannot utilize the cache result of query 1, although query 1 and query 2 are the same if ignoring the order of the predicates.
We modified the compare function LogicalPlan.sameResult. The idea is to split the condition of filter into a sequence of expressions and wrap it into a set. Now we can easily compare the sets rather than literally compare the conditions, thus ignoring the order of the predicates.

@cloud-fan
Copy link
Contributor

This is a great feature! Can we implement it in individual expressions instead of centralizing them in LogicalPlan.samResult? A lof of commutative operators need it like And, Multiply, Max, etc. Maybe we can improve Expression.semanticEquals?

@codingjaguar
Copy link
Author

Thanks for giving feedback! We think it would be nice to support all commutative operators in Expression.semanticEquals, but it doesn't seem to directly help cache matching. LogicalPlan.sameResult doesn't use Expression.semanticEquals, instead it simply checked cleanRight.cleanArgs == cleanLeft.cleanArgs. Here we refactored the code by moving equivalentConditions to PredicateHelper.equivalentPredicates, so that people can also use this feature in other scenarios.
We didn't support other communicative operators because they rarely show up in WHERE clause as root expression. For example, WHERE Max(a.id, 5) is not a meaningful sql statement.

@liancheng
Copy link
Contributor

ok to test

@SparkQA
Copy link

SparkQA commented Dec 8, 2015

Test build #47328 has finished for PR 10163 at commit da46b1c.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):\n * abstract class LogicalPlan extends QueryPlan[LogicalPlan] with PredicateHelper with Logging\n

@SparkQA
Copy link

SparkQA commented Dec 8, 2015

Test build #47342 has finished for PR 10163 at commit c81aa46.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):\n * abstract class LogicalPlan extends QueryPlan[LogicalPlan] with PredicateHelper with Logging\n

@SparkQA
Copy link

SparkQA commented Dec 8, 2015

Test build #47364 has finished for PR 10163 at commit cc8fdfe.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):\n * public class JavaIndexToStringExample\n * abstract class LogicalPlan extends QueryPlan[LogicalPlan] with PredicateHelper with Logging\n

@@ -127,33 +127,41 @@ abstract class LogicalPlan extends QueryPlan[LogicalPlan] with Logging {
cleanLeft.children.size == cleanRight.children.size && {
logDebug(
s"[${cleanRight.cleanArgs.mkString(", ")}] == [${cleanLeft.cleanArgs.mkString(", ")}]")
cleanRight.cleanArgs == cleanLeft.cleanArgs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about we just change this to:

cleanRight.zip(cleanArgs).forall {
  case (e1: Expression, e2: Expression) => e1 semanticEquals e2 
  caes (a1, a2) => a1 == a2
}

then we can just improve Expression.sentaicEquals

@cloud-fan
Copy link
Contributor

We didn't support other communicative operators because they rarely show up in WHERE clause as root expression.

How about something like WHERE a + b = c? I think it's quite common that we have other expressions inside predicates.

@windscope
Copy link

To improve semanticEquals, we tried to implement a template function Expression.splitWithCommutativeOperator[T: Manifest](): Seq[Expression] so that we don't need to implement a split function for each commutative operator. However, we cannot perform pattern matching on T.
Should we simply call PredicateHelper.splitConjunctivePredicates in Expression.semanticEquals and implement a few similar split function for other commutative operators?

@codingjaguar
Copy link
Author

In last change we deleted equivalentPredicate and moved the functionality to Expression.semanticEquals.

checkSemantic(splitDisjunctivePredicates(left).toSet.toSeq,
splitDisjunctivePredicates(right).toSet.toSeq)
case _ => checkSemantic(elements1, elements2)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I didn't clarify it clearly. I mean we can override semanticEquals in concrete expressions like Or, And, etc. And we don't need to support all commutative operators at once, you can only finish the predicates parts in this PR and open follow-up PRs for other parts(like Add, Multiply). Let's do it step-by-step :)

@SparkQA
Copy link

SparkQA commented Dec 9, 2015

Test build #47419 has finished for PR 10163 at commit 07128b3.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):\n * abstract class Expression extends TreeNode[Expression] with PredicateHelper\n * abstract class LogicalPlan extends QueryPlan[LogicalPlan] with PredicateHelper with Logging\n

@codingjaguar
Copy link
Author

We updated semanticEquals. Now And and Or override semanticEquals and ignore ordering.

@SparkQA
Copy link

SparkQA commented Dec 9, 2015

Test build #47446 has finished for PR 10163 at commit 99626a4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):\n * abstract class Expression extends TreeNode[Expression]\n * case class And(left: Expression, right: Expression) extends BinaryOperator\n * case class Or(left: Expression, right: Expression) extends BinaryOperator\n

@SparkQA
Copy link

SparkQA commented Dec 10, 2015

Test build #47452 has finished for PR 10163 at commit 2efca2f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):\n * abstract class Expression extends TreeNode[Expression]\n * case class And(left: Expression, right: Expression) extends BinaryOperator\n * case class Or(left: Expression, right: Expression) extends BinaryOperator\n

// elements1. If they are semantically equivalent, elements1 should be empty at the end.
elements1.size == elements2.size && {
for (e <- elements2) elements1 = removeFirstSemanticEquivalent(elements1, e)
elements1.isEmpty
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I may missed something here, can we just write:

override def semanticEquals(other: Expression): Boolean = other match {
  case And(otherLeft, otherRight) =>
    (left.semanticEquals(otherLeft) && right.semanticEquals(otherRight)) ||
    (left.semanticEquals(otherRight) && right.semanticEquals(otherLeft))
  case _ => false
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider this example
e1 = And(a, And(b, c))
e2 = And(And(a,b), c))
They are semantically equivalent, but will return false in your code.
splitConjunctivePredicates will crunch the expression tree into a sequence of (a, b, c).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah I see, this makes sense.
But I think a better way is to add an optimization rule to turn all predicates into CNF, before we begin to check the semantic, or it will be hard to cover all cases like a || (b && c) == (a || b) && (a || c)

cc @liancheng

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems there is an open PR that implements CNF normalization. Is there any reason why it hasn't been merged?
#8200

cc @yjshen

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@rxin
Copy link
Contributor

rxin commented Jun 15, 2016

Thanks for the pull request. I'm going through a list of pull requests to cut them down since the sheer number is breaking some of the tooling we have. Due to lack of activity on this pull request, I'm going to push a commit to close it. Feel free to reopen it or create a new one.

@asfgit asfgit closed this in 1a33f2e Jun 15, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants