[SPARK-25690][SQL] Analyzer rule HandleNullInputsForUDF does not stabilize and can be applied infinitely #22701

maryannxue · 2018-10-11T16:57:43Z

What changes were proposed in this pull request?

The HandleNullInputsForUDF rule can generate new If node infinitely, thus causing problems like match of SQL cache missed.
This was fixed in SPARK-24891 and was then broken by SPARK-25044.
The unit test in AnalysisSuite added in SPARK-24891 should have failed but didn't because it wasn't properly updated after the ScalaUDF constructor signature change. So this PR also updates the test accordingly based on the new ScalaUDF constructor.

How was this patch tested?

Updated the original UT. This should be justified as the original UT became invalid after SPARK-25044.

…ilize and can be applied infinitely

srowen · 2018-10-11T18:03:21Z

sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisSuite.scala

@@ -351,8 +351,8 @@ class AnalysisSuite extends AnalysisTest with Matchers {
  test("SPARK-24891 Fix HandleNullInputsForUDF rule") {
    val a = testRelation.output(0)
    val func = (x: Int, y: Int) => x + y
-    val udf1 = ScalaUDF(func, IntegerType, a :: a :: Nil)
-    val udf2 = ScalaUDF(func, IntegerType, a :: udf1 :: Nil)
+    val udf1 = ScalaUDF(func, IntegerType, a :: a :: Nil, nullableTypes = false :: false :: Nil)


So clearly we should make both of these changes. This change fixes the test here. But is the change above, involving KnownNotNull, important for correctness? that is can some user code trigger this infinite loop you mention in SPARK-24891? I'm trying to figure out whether the change here is absolutely required for 2.4, or an important change that could happen in 2.4.1.

It's two separate issues.
If nullableTypes is not added here, the HandleNullInputsForUDF will do nothing, which means null checks will be missed. So it is itself a problem, which can be potentially triggered by a user.
As to test, if the rule is not doing anything, the "doing something infinitely" bug cannot be reproduced. But the infinite issue is one on a theoretical level and is quite unlikely to have any end-user impact, thanks to @rxin's fix for SPARK-24865.

OK, it sounds like we will have another 2.4 RC anyway, so we should get all of these changes in.

SparkQA · 2018-10-11T20:36:05Z

Test build #97273 has finished for PR 22701 at commit 736625b.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2018-10-11T20:40:07Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala

@@ -2151,7 +2151,7 @@ class Analyzer(
            // TODO: skip null handling for not-nullable primitive inputs after we can completely
            // trust the `nullable` information.
            val inputsNullCheck = nullableTypes.zip(inputs)
-              .filter { case (nullable, _) => !nullable }
+              .filter { case (nullable, expr) => !nullable && !expr.isInstanceOf[KnownNotNull] }


let us use the original way? create a val needsNullCheck. in the PR #21851

val needsNullCheck = ...

srowen · 2018-10-11T22:11:33Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala

@@ -2150,8 +2150,10 @@ class Analyzer(

            // TODO: skip null handling for not-nullable primitive inputs after we can completely
            // trust the `nullable` information.
+            val needsNullCheck = (nullable: Boolean, expr: Expression) =>


Should this param be something like cantBeNull or something? this receives !nullableType as its arg but is called nullable?

Yes, that's because "nullableType" is flipped around here. "nullableType" should really be "cantBeNull" or "doesntNeedNullCheck". I'll change this in other PR.

SparkQA · 2018-10-12T01:50:13Z

Test build #97283 has finished for PR 22701 at commit dfa301e.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2018-10-12T03:44:28Z

LGTM

Thanks! Merged to master/2.4

…ilize and can be applied infinitely ## What changes were proposed in this pull request? The HandleNullInputsForUDF rule can generate new If node infinitely, thus causing problems like match of SQL cache missed. This was fixed in SPARK-24891 and was then broken by SPARK-25044. The unit test in `AnalysisSuite` added in SPARK-24891 should have failed but didn't because it wasn't properly updated after the `ScalaUDF` constructor signature change. So this PR also updates the test accordingly based on the new `ScalaUDF` constructor. ## How was this patch tested? Updated the original UT. This should be justified as the original UT became invalid after SPARK-25044. Closes #22701 from maryannxue/spark-25690. Authored-by: maryannxue <[email protected]> Signed-off-by: gatorsmile <[email protected]> (cherry picked from commit 3685130) Signed-off-by: gatorsmile <[email protected]>

…ilize and can be applied infinitely ## What changes were proposed in this pull request? The HandleNullInputsForUDF rule can generate new If node infinitely, thus causing problems like match of SQL cache missed. This was fixed in SPARK-24891 and was then broken by SPARK-25044. The unit test in `AnalysisSuite` added in SPARK-24891 should have failed but didn't because it wasn't properly updated after the `ScalaUDF` constructor signature change. So this PR also updates the test accordingly based on the new `ScalaUDF` constructor. ## How was this patch tested? Updated the original UT. This should be justified as the original UT became invalid after SPARK-25044. Closes apache#22701 from maryannxue/spark-25690. Authored-by: maryannxue <[email protected]> Signed-off-by: gatorsmile <[email protected]>

[SPARK-25690][SQL] Analyzer rule HandleNullInputsForUDF does not stab…

736625b

…ilize and can be applied infinitely

maryannxue mentioned this pull request Oct 11, 2018

[SPARK-25044][SQL] (take 2) Address translation of LMF closure primitive args to Object in Scala 2.12 #22259

Closed

srowen approved these changes Oct 11, 2018

View reviewed changes

srowen reviewed Oct 11, 2018

View reviewed changes

gatorsmile reviewed Oct 11, 2018

View reviewed changes

Address review comments

dfa301e

srowen reviewed Oct 11, 2018

View reviewed changes

asfgit closed this in 3685130 Oct 12, 2018

maryannxue mentioned this pull request Oct 15, 2018

[SPARK-25691][SQL] Use semantic equality in AliasViewChild in order to compare attributes #22713

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-25690][SQL] Analyzer rule HandleNullInputsForUDF does not stabilize and can be applied infinitely #22701

[SPARK-25690][SQL] Analyzer rule HandleNullInputsForUDF does not stabilize and can be applied infinitely #22701

maryannxue commented Oct 11, 2018

srowen Oct 11, 2018

maryannxue Oct 11, 2018

srowen Oct 11, 2018

SparkQA commented Oct 11, 2018

gatorsmile Oct 11, 2018 •

edited

Loading

srowen Oct 11, 2018

maryannxue Oct 12, 2018

SparkQA commented Oct 12, 2018

gatorsmile commented Oct 12, 2018

[SPARK-25690][SQL] Analyzer rule HandleNullInputsForUDF does not stabilize and can be applied infinitely #22701

[SPARK-25690][SQL] Analyzer rule HandleNullInputsForUDF does not stabilize and can be applied infinitely #22701

Conversation

maryannxue commented Oct 11, 2018

What changes were proposed in this pull request?

How was this patch tested?

srowen Oct 11, 2018

Choose a reason for hiding this comment

maryannxue Oct 11, 2018

Choose a reason for hiding this comment

srowen Oct 11, 2018

Choose a reason for hiding this comment

SparkQA commented Oct 11, 2018

gatorsmile Oct 11, 2018 • edited Loading

Choose a reason for hiding this comment

srowen Oct 11, 2018

Choose a reason for hiding this comment

maryannxue Oct 12, 2018

Choose a reason for hiding this comment

SparkQA commented Oct 12, 2018

gatorsmile commented Oct 12, 2018

gatorsmile Oct 11, 2018 •

edited

Loading