Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-25690][SQL] Analyzer rule HandleNullInputsForUDF does not stabilize and can be applied infinitely #22701

Closed
wants to merge 2 commits into from

Conversation

maryannxue
Copy link
Contributor

What changes were proposed in this pull request?

The HandleNullInputsForUDF rule can generate new If node infinitely, thus causing problems like match of SQL cache missed.
This was fixed in SPARK-24891 and was then broken by SPARK-25044.
The unit test in AnalysisSuite added in SPARK-24891 should have failed but didn't because it wasn't properly updated after the ScalaUDF constructor signature change. So this PR also updates the test accordingly based on the new ScalaUDF constructor.

How was this patch tested?

Updated the original UT. This should be justified as the original UT became invalid after SPARK-25044.

@@ -351,8 +351,8 @@ class AnalysisSuite extends AnalysisTest with Matchers {
test("SPARK-24891 Fix HandleNullInputsForUDF rule") {
val a = testRelation.output(0)
val func = (x: Int, y: Int) => x + y
val udf1 = ScalaUDF(func, IntegerType, a :: a :: Nil)
val udf2 = ScalaUDF(func, IntegerType, a :: udf1 :: Nil)
val udf1 = ScalaUDF(func, IntegerType, a :: a :: Nil, nullableTypes = false :: false :: Nil)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So clearly we should make both of these changes. This change fixes the test here. But is the change above, involving KnownNotNull, important for correctness? that is can some user code trigger this infinite loop you mention in SPARK-24891? I'm trying to figure out whether the change here is absolutely required for 2.4, or an important change that could happen in 2.4.1.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's two separate issues.
If nullableTypes is not added here, the HandleNullInputsForUDF will do nothing, which means null checks will be missed. So it is itself a problem, which can be potentially triggered by a user.
As to test, if the rule is not doing anything, the "doing something infinitely" bug cannot be reproduced. But the infinite issue is one on a theoretical level and is quite unlikely to have any end-user impact, thanks to @rxin's fix for SPARK-24865.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, it sounds like we will have another 2.4 RC anyway, so we should get all of these changes in.

@SparkQA
Copy link

SparkQA commented Oct 11, 2018

Test build #97273 has finished for PR 22701 at commit 736625b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -2151,7 +2151,7 @@ class Analyzer(
// TODO: skip null handling for not-nullable primitive inputs after we can completely
// trust the `nullable` information.
val inputsNullCheck = nullableTypes.zip(inputs)
.filter { case (nullable, _) => !nullable }
.filter { case (nullable, expr) => !nullable && !expr.isInstanceOf[KnownNotNull] }
Copy link
Member

@gatorsmile gatorsmile Oct 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let us use the original way? create a val needsNullCheck. in the PR #21851

val needsNullCheck = ... 

@@ -2150,8 +2150,10 @@ class Analyzer(

// TODO: skip null handling for not-nullable primitive inputs after we can completely
// trust the `nullable` information.
val needsNullCheck = (nullable: Boolean, expr: Expression) =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this param be something like cantBeNull or something? this receives !nullableType as its arg but is called nullable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's because "nullableType" is flipped around here. "nullableType" should really be "cantBeNull" or "doesntNeedNullCheck". I'll change this in other PR.

@SparkQA
Copy link

SparkQA commented Oct 12, 2018

Test build #97283 has finished for PR 22701 at commit dfa301e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

LGTM

Thanks! Merged to master/2.4

asfgit pushed a commit that referenced this pull request Oct 12, 2018
…ilize and can be applied infinitely

## What changes were proposed in this pull request?

The HandleNullInputsForUDF rule can generate new If node infinitely, thus causing problems like match of SQL cache missed.
This was fixed in SPARK-24891 and was then broken by SPARK-25044.
The unit test in `AnalysisSuite` added in SPARK-24891 should have failed but didn't because it wasn't properly updated after the `ScalaUDF` constructor signature change. So this PR also updates the test accordingly based on the new `ScalaUDF` constructor.

## How was this patch tested?

Updated the original UT. This should be justified as the original UT became invalid after SPARK-25044.

Closes #22701 from maryannxue/spark-25690.

Authored-by: maryannxue <[email protected]>
Signed-off-by: gatorsmile <[email protected]>
(cherry picked from commit 3685130)
Signed-off-by: gatorsmile <[email protected]>
@asfgit asfgit closed this in 3685130 Oct 12, 2018
jackylee-ch pushed a commit to jackylee-ch/spark that referenced this pull request Feb 18, 2019
…ilize and can be applied infinitely

## What changes were proposed in this pull request?

The HandleNullInputsForUDF rule can generate new If node infinitely, thus causing problems like match of SQL cache missed.
This was fixed in SPARK-24891 and was then broken by SPARK-25044.
The unit test in `AnalysisSuite` added in SPARK-24891 should have failed but didn't because it wasn't properly updated after the `ScalaUDF` constructor signature change. So this PR also updates the test accordingly based on the new `ScalaUDF` constructor.

## How was this patch tested?

Updated the original UT. This should be justified as the original UT became invalid after SPARK-25044.

Closes apache#22701 from maryannxue/spark-25690.

Authored-by: maryannxue <[email protected]>
Signed-off-by: gatorsmile <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants