Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-12161][SQL] Ignore order of predicates in cache matching #10163

Closed
wants to merge 26 commits into from
Closed
Show file tree
Hide file tree
Changes from 20 commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
579b5a2
Add equivalentConditions()
codingjaguar Dec 5, 2015
0a09698
set comparison for projection
windscope Dec 5, 2015
85f1ebc
Fix set conversion bug
windscope Dec 5, 2015
1a2b534
Remove set comparison of projection
windscope Dec 5, 2015
5fcb85c
Add test case for filter condition order
windscope Dec 5, 2015
8f93c6a
Fix style error
windscope Dec 5, 2015
6eb6fdd
add testcase for SameResultSuite
windscope Dec 6, 2015
02fc878
add testcase for OR split filter condition
windscope Dec 6, 2015
bcb6df0
Supported expressions with disjunctive predicates;
codingjaguar Dec 6, 2015
360bb2b
Merge branch 'jiang.filter-set' of github.com:codingjaguar/spark into…
windscope Dec 6, 2015
94837d6
Merge branch 'jiang.filter-set'
codingjaguar Dec 6, 2015
0de3d7e
Merge branch 'jiang.filter-set' of github.com:codingjaguar/spark into…
windscope Dec 6, 2015
13ce03f
Removed dead code
codingjaguar Dec 6, 2015
9f6df41
Merge branch 'jiang.filter-set' of github.com:codingjaguar/spark into…
codingjaguar Dec 6, 2015
e63df88
Merge branch 'jiang.filter-set'
codingjaguar Dec 6, 2015
9043afd
Refactor change. Move equivalentConditions to PredicateHelper
codingjaguar Dec 7, 2015
da46b1c
Merge branch 'jiang.filter-set'
codingjaguar Dec 7, 2015
ba29f0b
Fix style issue
windscope Dec 8, 2015
c81aa46
Merge branch 'jiang.filter-set'
windscope Dec 8, 2015
cc8fdfe
Merge branch 'master' of https://github.com/apache/spark
codingjaguar Dec 8, 2015
b8ba919
Integrate the functionality of equivalentPredicate with semanticEquals
codingjaguar Dec 9, 2015
a0e8c4a
Refactor sameResult to utilize semanticEquals
codingjaguar Dec 9, 2015
07128b3
Merge branch 'jiang.filter-set'
codingjaguar Dec 9, 2015
4af3622
Rewrite semanticEquals in And and Or to ignore their ordering.
codingjaguar Dec 9, 2015
99626a4
Rewrite the code block that compares the equivalency of
codingjaguar Dec 9, 2015
2efca2f
Fix a bug in semanticEqual. Add some comment.
codingjaguar Dec 9, 2015
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,20 @@ trait PredicateHelper {
}
}

/**
* Returns true if two expressions are equivalent predicates. Equality check is tolerant of
* ordering different.
*/
def equivalentPredicates(left: Expression, right: Expression): Boolean = {
val leftAndPredicates = splitConjunctivePredicates(left).toSet
val rightAndPredicates = splitConjunctivePredicates(right).toSet
val leftOrPredicates = splitDisjunctivePredicates(left).toSet
val rightOrPredicates = splitDisjunctivePredicates(right).toSet
// We split the two conditions into conjunctive predicates and disjunctive predicates
// If either of them match, we consider them equivalent conditions
(leftAndPredicates == rightAndPredicates) || (leftOrPredicates == rightOrPredicates)
}

// Substitute any known alias from a map.
protected def replaceAlias(
condition: Expression,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ import org.apache.spark.sql.catalyst.plans.QueryPlan
import org.apache.spark.sql.catalyst.trees.{CurrentOrigin, TreeNode}


abstract class LogicalPlan extends QueryPlan[LogicalPlan] with Logging {
abstract class LogicalPlan extends QueryPlan[LogicalPlan] with PredicateHelper with Logging {

private var _analyzed: Boolean = false

Expand Down Expand Up @@ -127,33 +127,41 @@ abstract class LogicalPlan extends QueryPlan[LogicalPlan] with Logging {
cleanLeft.children.size == cleanRight.children.size && {
logDebug(
s"[${cleanRight.cleanArgs.mkString(", ")}] == [${cleanLeft.cleanArgs.mkString(", ")}]")
cleanRight.cleanArgs == cleanLeft.cleanArgs
(cleanRight.cleanArgs == cleanLeft.cleanArgs) || ((cleanLeft, cleanRight) match {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about we just change this to:

cleanRight.zip(cleanArgs).forall {
  case (e1: Expression, e2: Expression) => e1 semanticEquals e2 
  caes (a1, a2) => a1 == a2
}

then we can just improve Expression.sentaicEquals

case (l: Filter, r: Filter) =>
equivalentPredicates(cleanExpression(l.condition, l.children.flatMap(_.output)),
cleanExpression(r.condition, r.children.flatMap(_.output)))
case _ => false
})
} &&
(cleanLeft.children, cleanRight.children).zipped.forall(_ sameResult _)
}

/** Clean an expression so that differences in expression id should not affect equality */
def cleanExpression(e: Expression, input: Seq[Attribute]): Expression = e match {
case a: Alias =>
// As the root of the expression, Alias will always take an arbitrary exprId, we need
// to erase that for equality testing.
val cleanedExprId = Alias(a.child, a.name)(ExprId(-1), a.qualifiers)
BindReferences.bindReference(cleanedExprId, input, allowFailures = true)
case other => BindReferences.bindReference(other, input, allowFailures = true)
}


/** Args that have cleaned such that differences in expression id should not affect equality */
protected lazy val cleanArgs: Seq[Any] = {
val input = children.flatMap(_.output)
def cleanExpression(e: Expression) = e match {
case a: Alias =>
// As the root of the expression, Alias will always take an arbitrary exprId, we need
// to erase that for equality testing.
val cleanedExprId = Alias(a.child, a.name)(ExprId(-1), a.qualifiers)
BindReferences.bindReference(cleanedExprId, input, allowFailures = true)
case other => BindReferences.bindReference(other, input, allowFailures = true)
}

productIterator.map {
// Children are checked using sameResult above.
case tn: TreeNode[_] if containsChild(tn) => null
case e: Expression => cleanExpression(e)
case e: Expression => cleanExpression(e, input)
case s: Option[_] => s.map {
case e: Expression => cleanExpression(e)
case e: Expression => cleanExpression(e, input)
case other => other
}
case s: Seq[_] => s.map {
case e: Expression => cleanExpression(e)
case e: Expression => cleanExpression(e, input)
case other => other
}
case other => other
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -28,8 +28,8 @@ import org.apache.spark.sql.catalyst.util._
* Tests for the sameResult function of [[LogicalPlan]].
*/
class SameResultSuite extends SparkFunSuite {
val testRelation = LocalRelation('a.int, 'b.int, 'c.int)
val testRelation2 = LocalRelation('a.int, 'b.int, 'c.int)
val testRelation = LocalRelation('a.int, 'b.int, 'c.int, 'd.int)
val testRelation2 = LocalRelation('a.int, 'b.int, 'c.int, 'd.int)

def assertSameResult(a: LogicalPlan, b: LogicalPlan, result: Boolean = true): Unit = {
val aAnalyzed = a.analyze
Expand Down Expand Up @@ -57,6 +57,21 @@ class SameResultSuite extends SparkFunSuite {

test("filters") {
assertSameResult(testRelation.where('a === 'b), testRelation2.where('a === 'b))
assertSameResult(testRelation.where('a === 'b && 'c === 'd),
testRelation2.where('c === 'd && 'a === 'b )
)
assertSameResult(testRelation.where('a === 'b || 'c === 'd),
testRelation2.where('c === 'd || 'a === 'b )
)

assertSameResult(testRelation.where('a === 'b && 'c === 'd),
testRelation2.where('a === 'c && 'b === 'd),
result = false
)
assertSameResult(testRelation.where('a === 'b || 'c === 'd),
testRelation2.where('a === 'c || 'b === 'd),
result = false
)
}

test("sorts") {
Expand Down