-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-12161][SQL] Ignore order of predicates in cache matching #10163
Conversation
refactor cleanArgs so that we can reuse cleanExpression().
… jiang.filter-set
… jiang.filter-set
… jiang.filter-set
This is a great feature! Can we implement it in individual expressions instead of centralizing them in |
Thanks for giving feedback! We think it would be nice to support all commutative operators in |
ok to test |
Test build #47328 has finished for PR 10163 at commit
|
Test build #47342 has finished for PR 10163 at commit
|
Test build #47364 has finished for PR 10163 at commit
|
@@ -127,33 +127,41 @@ abstract class LogicalPlan extends QueryPlan[LogicalPlan] with Logging { | |||
cleanLeft.children.size == cleanRight.children.size && { | |||
logDebug( | |||
s"[${cleanRight.cleanArgs.mkString(", ")}] == [${cleanLeft.cleanArgs.mkString(", ")}]") | |||
cleanRight.cleanArgs == cleanLeft.cleanArgs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about we just change this to:
cleanRight.zip(cleanArgs).forall {
case (e1: Expression, e2: Expression) => e1 semanticEquals e2
caes (a1, a2) => a1 == a2
}
then we can just improve Expression.sentaicEquals
How about something like |
To improve semanticEquals, we tried to implement a template function |
In last change we deleted |
checkSemantic(splitDisjunctivePredicates(left).toSet.toSeq, | ||
splitDisjunctivePredicates(right).toSet.toSeq) | ||
case _ => checkSemantic(elements1, elements2) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I didn't clarify it clearly. I mean we can override semanticEquals
in concrete expressions like Or
, And
, etc. And we don't need to support all commutative operators at once, you can only finish the predicates parts in this PR and open follow-up PRs for other parts(like Add
, Multiply
). Let's do it step-by-step :)
Test build #47419 has finished for PR 10163 at commit
|
We updated |
Test build #47446 has finished for PR 10163 at commit
|
Test build #47452 has finished for PR 10163 at commit
|
// elements1. If they are semantically equivalent, elements1 should be empty at the end. | ||
elements1.size == elements2.size && { | ||
for (e <- elements2) elements1 = removeFirstSemanticEquivalent(elements1, e) | ||
elements1.isEmpty |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I may missed something here, can we just write:
override def semanticEquals(other: Expression): Boolean = other match {
case And(otherLeft, otherRight) =>
(left.semanticEquals(otherLeft) && right.semanticEquals(otherRight)) ||
(left.semanticEquals(otherRight) && right.semanticEquals(otherLeft))
case _ => false
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider this example
e1 = And(a, And(b, c))
e2 = And(And(a,b), c))
They are semantically equivalent, but will return false in your code.
splitConjunctivePredicates
will crunch the expression tree into a sequence of (a, b, c)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah I see, this makes sense.
But I think a better way is to add an optimization rule to turn all predicates into CNF, before we begin to check the semantic, or it will be hard to cover all cases like a || (b && c) == (a || b) && (a || c)
cc @liancheng
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can one of the admins verify this patch? |
Thanks for the pull request. I'm going through a list of pull requests to cut them down since the sheer number is breaking some of the tooling we have. Due to lack of activity on this pull request, I'm going to push a commit to close it. Feel free to reopen it or create a new one. |
This PR improves
LogicalPlan.sameResult
so that semantically equivalent queries with different order of predicates are still matched.Consider an example:
Query 1: CACHE TABLE first AS SELECT * FROM table A where A.id >100 AND A.id < 200;
Query 2: SELECT * FROM table A where A.id < 200 AND A.id > 100;
Currently in SparkSQL, Query 2 cannot utilize the cache result of query 1, although query 1 and query 2 are the same if ignoring the order of the predicates.
We modified the compare function
LogicalPlan.sameResult
. The idea is to split the condition of filter into a sequence of expressions and wrap it into a set. Now we can easily compare the sets rather than literally compare the conditions, thus ignoring the order of the predicates.