[SPARK-22961][REGRESSION] Constant columns should generate QueryPlanConstraints #20155

adrian-ionescu · 2018-01-04T18:07:17Z

What changes were proposed in this pull request?

#19201 introduced the following regression: given something like df.withColumn("c", lit(2)), we're no longer picking up c === 2 as a constraint and infer filters from it when joins are involved, which may lead to noticeable performance degradation.

This patch re-enables this optimization by picking up Aliases of Literals in Projection lists as constraints and making sure they're not treated as aliased columns.

How was this patch tested?

Unit test was added.

SparkQA · 2018-01-04T21:21:52Z

Test build #85687 has finished for PR 20155 at commit 9c8f5c2.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile

LGTM

gatorsmile · 2018-01-05T13:31:49Z

Thanks! Merged to master/2.3

…onstraints ## What changes were proposed in this pull request? #19201 introduced the following regression: given something like `df.withColumn("c", lit(2))`, we're no longer picking up `c === 2` as a constraint and infer filters from it when joins are involved, which may lead to noticeable performance degradation. This patch re-enables this optimization by picking up Aliases of Literals in Projection lists as constraints and making sure they're not treated as aliased columns. ## How was this patch tested? Unit test was added. Author: Adrian Ionescu <[email protected]> Closes #20155 from adrian-ionescu/constant_constraints. (cherry picked from commit 51c33bd) Signed-off-by: gatorsmile <[email protected]>

gengliangwang · 2018-01-05T17:56:00Z

A late LGTM!

## What changes were proposed in this pull request? How to reproduce: ```scala val df1 = spark.createDataFrame(Seq( (1, 1) )).toDF("a", "b").withColumn("c", lit(null).cast("int")) val df2 = df1.union(df1).withColumn("d", spark_partition_id).filter($"c".isNotNull) df2.show +---+---+----+---+ | a| b| c| d| +---+---+----+---+ | 1| 1|null| 0| | 1| 1|null| 1| +---+---+----+---+ ``` `filter($"c".isNotNull)` was transformed to `(null <=> c#10)` before #19201, but it is transformed to `(c#10 = null)` since #20155. This pr revert it to `(null <=> c#10)` to fix this issue. ## How was this patch tested? unit tests Closes #22368 from wangyum/SPARK-25368. Authored-by: Yuming Wang <[email protected]> Signed-off-by: gatorsmile <[email protected]> (cherry picked from commit 77c9964) Signed-off-by: gatorsmile <[email protected]>

How to reproduce: ```scala val df1 = spark.createDataFrame(Seq( (1, 1) )).toDF("a", "b").withColumn("c", lit(null).cast("int")) val df2 = df1.union(df1).withColumn("d", spark_partition_id).filter($"c".isNotNull) df2.show +---+---+----+---+ | a| b| c| d| +---+---+----+---+ | 1| 1|null| 0| | 1| 1|null| 1| +---+---+----+---+ ``` `filter($"c".isNotNull)` was transformed to `(null <=> c#10)` before #19201, but it is transformed to `(c#10 = null)` since #20155. This pr revert it to `(null <=> c#10)` to fix this issue. unit tests Closes #22368 from wangyum/SPARK-25368. Authored-by: Yuming Wang <[email protected]> Signed-off-by: gatorsmile <[email protected]> (cherry picked from commit 77c9964) Signed-off-by: gatorsmile <[email protected]>

fixed

9c8f5c2

gatorsmile reviewed Jan 5, 2018

View reviewed changes

asfgit closed this in 51c33bd Jan 5, 2018

wangyum mentioned this pull request Sep 9, 2018

[SPARK-25368][SQL] Incorrect predicate pushdown returns wrong result #22368

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-22961][REGRESSION] Constant columns should generate QueryPlanConstraints #20155

[SPARK-22961][REGRESSION] Constant columns should generate QueryPlanConstraints #20155

adrian-ionescu commented Jan 4, 2018

SparkQA commented Jan 4, 2018

gatorsmile left a comment

gatorsmile commented Jan 5, 2018

gengliangwang commented Jan 5, 2018

[SPARK-22961][REGRESSION] Constant columns should generate QueryPlanConstraints #20155

[SPARK-22961][REGRESSION] Constant columns should generate QueryPlanConstraints #20155

Conversation

adrian-ionescu commented Jan 4, 2018

What changes were proposed in this pull request?

How was this patch tested?

SparkQA commented Jan 4, 2018

gatorsmile left a comment

Choose a reason for hiding this comment

gatorsmile commented Jan 5, 2018

gengliangwang commented Jan 5, 2018