Skip to content

Commit

Permalink
[SPARK-22961][REGRESSION] Constant columns should generate QueryPlanC…
Browse files Browse the repository at this point in the history
…onstraints

## What changes were proposed in this pull request?

#19201 introduced the following regression: given something like `df.withColumn("c", lit(2))`, we're no longer picking up `c === 2` as a constraint and infer filters from it when joins are involved, which may lead to noticeable performance degradation.

This patch re-enables this optimization by picking up Aliases of Literals in Projection lists as constraints and making sure they're not treated as aliased columns.

## How was this patch tested?

Unit test was added.

Author: Adrian Ionescu <[email protected]>

Closes #20155 from adrian-ionescu/constant_constraints.

(cherry picked from commit 51c33bd)
Signed-off-by: gatorsmile <[email protected]>
  • Loading branch information
adrian-ionescu authored and gatorsmile committed Jan 5, 2018
1 parent f9dcdbc commit fd4e304
Show file tree
Hide file tree
Showing 3 changed files with 16 additions and 1 deletion.
Original file line number Diff line number Diff line change
Expand Up @@ -247,6 +247,8 @@ abstract class UnaryNode extends LogicalPlan {
protected def getAliasedConstraints(projectList: Seq[NamedExpression]): Set[Expression] = {
var allConstraints = child.constraints.asInstanceOf[Set[Expression]]
projectList.foreach {
case a @ Alias(l: Literal, _) =>
allConstraints += EqualTo(a.toAttribute, l)
case a @ Alias(e, _) =>
// For every alias in `projectList`, replace the reference in constraints by its attribute.
allConstraints ++= allConstraints.map(_ transform {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -98,7 +98,7 @@ trait QueryPlanConstraints { self: LogicalPlan =>
// we may avoid producing recursive constraints.
private lazy val aliasMap: AttributeMap[Expression] = AttributeMap(
expressions.collect {
case a: Alias => (a.toAttribute, a.child)
case a: Alias if !a.child.isInstanceOf[Literal] => (a.toAttribute, a.child)
} ++ children.flatMap(_.asInstanceOf[QueryPlanConstraints].aliasMap))
// Note: the explicit cast is necessary, since Scala compiler fails to infer the type.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -236,4 +236,17 @@ class InferFiltersFromConstraintsSuite extends PlanTest {
comparePlans(optimized, originalQuery)
}
}

test("constraints should be inferred from aliased literals") {
val originalLeft = testRelation.subquery('left).as("left")
val optimizedLeft = testRelation.subquery('left).where(IsNotNull('a) && 'a === 2).as("left")

val right = Project(Seq(Literal(2).as("two")), testRelation.subquery('right)).as("right")
val condition = Some("left.a".attr === "right.two".attr)

val original = originalLeft.join(right, Inner, condition)
val correct = optimizedLeft.join(right, Inner, condition)

comparePlans(Optimize.execute(original.analyze), correct.analyze)
}
}

0 comments on commit fd4e304

Please sign in to comment.