Skip to content

Commit

Permalink
[SPARK-17712][SQL] Fix invalid pushdown of data-independent filters b…
Browse files Browse the repository at this point in the history
…eneath aggregates

## What changes were proposed in this pull request?

This patch fixes a minor correctness issue impacting the pushdown of filters beneath aggregates. Specifically, if a filter condition references no grouping or aggregate columns (e.g. `WHERE false`) then it would be incorrectly pushed beneath an aggregate.

Intuitively, the only case where you can push a filter beneath an aggregate is when that filter is deterministic and is defined over the grouping columns / expressions, since in that case the filter is acting to exclude entire groups from the query (like a `HAVING` clause). The existing code would only push deterministic filters beneath aggregates when all of the filter's references were grouping columns, but this logic missed the case where a filter has no references. For example, `WHERE false` is deterministic but is independent of the actual data.

This patch fixes this minor bug by adding a new check to ensure that we don't push filters beneath aggregates when those filters don't reference any columns.

## How was this patch tested?

New regression test in FilterPushdownSuite.

Author: Josh Rosen <[email protected]>

Closes #15289 from JoshRosen/SPARK-17712.
  • Loading branch information
JoshRosen authored and hvanhovell committed Sep 29, 2016
1 parent 7dfad4b commit 37eb918
Show file tree
Hide file tree
Showing 2 changed files with 18 additions and 1 deletion.
Original file line number Diff line number Diff line change
Expand Up @@ -710,7 +710,7 @@ object PushDownPredicate extends Rule[LogicalPlan] with PredicateHelper {

val (pushDown, rest) = candidates.partition { cond =>
val replaced = replaceAlias(cond, aliasMap)
replaced.references.subsetOf(aggregate.child.outputSet)
cond.references.nonEmpty && replaced.references.subsetOf(aggregate.child.outputSet)
}

val stayUp = rest ++ containingNonDeterministic
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -687,6 +687,23 @@ class FilterPushdownSuite extends PlanTest {
comparePlans(optimized, correctAnswer)
}

test("SPARK-17712: aggregate: don't push down filters that are data-independent") {
val originalQuery = LocalRelation.apply(testRelation.output, Seq.empty)
.select('a, 'b)
.groupBy('a)(count('a))
.where(false)

val optimized = Optimize.execute(originalQuery.analyze)

val correctAnswer = testRelation
.select('a, 'b)
.groupBy('a)(count('a))
.where(false)
.analyze

comparePlans(optimized, correctAnswer)
}

test("broadcast hint") {
val originalQuery = BroadcastHint(testRelation)
.where('a === 2L && 'b + Rand(10).as("rnd") === 3)
Expand Down

0 comments on commit 37eb918

Please sign in to comment.