-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-25690][SQL] Analyzer rule HandleNullInputsForUDF does not stabilize and can be applied infinitely #22701
Conversation
…ilize and can be applied infinitely
@@ -351,8 +351,8 @@ class AnalysisSuite extends AnalysisTest with Matchers { | |||
test("SPARK-24891 Fix HandleNullInputsForUDF rule") { | |||
val a = testRelation.output(0) | |||
val func = (x: Int, y: Int) => x + y | |||
val udf1 = ScalaUDF(func, IntegerType, a :: a :: Nil) | |||
val udf2 = ScalaUDF(func, IntegerType, a :: udf1 :: Nil) | |||
val udf1 = ScalaUDF(func, IntegerType, a :: a :: Nil, nullableTypes = false :: false :: Nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So clearly we should make both of these changes. This change fixes the test here. But is the change above, involving KnownNotNull
, important for correctness? that is can some user code trigger this infinite loop you mention in SPARK-24891? I'm trying to figure out whether the change here is absolutely required for 2.4, or an important change that could happen in 2.4.1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's two separate issues.
If nullableTypes
is not added here, the HandleNullInputsForUDF
will do nothing, which means null checks will be missed. So it is itself a problem, which can be potentially triggered by a user.
As to test, if the rule is not doing anything, the "doing something infinitely" bug cannot be reproduced. But the infinite issue is one on a theoretical level and is quite unlikely to have any end-user impact, thanks to @rxin's fix for SPARK-24865.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, it sounds like we will have another 2.4 RC anyway, so we should get all of these changes in.
Test build #97273 has finished for PR 22701 at commit
|
@@ -2151,7 +2151,7 @@ class Analyzer( | |||
// TODO: skip null handling for not-nullable primitive inputs after we can completely | |||
// trust the `nullable` information. | |||
val inputsNullCheck = nullableTypes.zip(inputs) | |||
.filter { case (nullable, _) => !nullable } | |||
.filter { case (nullable, expr) => !nullable && !expr.isInstanceOf[KnownNotNull] } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let us use the original way? create a val needsNullCheck. in the PR #21851
val needsNullCheck = ...
@@ -2150,8 +2150,10 @@ class Analyzer( | |||
|
|||
// TODO: skip null handling for not-nullable primitive inputs after we can completely | |||
// trust the `nullable` information. | |||
val needsNullCheck = (nullable: Boolean, expr: Expression) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this param be something like cantBeNull
or something? this receives !nullableType
as its arg but is called nullable
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's because "nullableType" is flipped around here. "nullableType" should really be "cantBeNull" or "doesntNeedNullCheck". I'll change this in other PR.
Test build #97283 has finished for PR 22701 at commit
|
LGTM Thanks! Merged to master/2.4 |
…ilize and can be applied infinitely ## What changes were proposed in this pull request? The HandleNullInputsForUDF rule can generate new If node infinitely, thus causing problems like match of SQL cache missed. This was fixed in SPARK-24891 and was then broken by SPARK-25044. The unit test in `AnalysisSuite` added in SPARK-24891 should have failed but didn't because it wasn't properly updated after the `ScalaUDF` constructor signature change. So this PR also updates the test accordingly based on the new `ScalaUDF` constructor. ## How was this patch tested? Updated the original UT. This should be justified as the original UT became invalid after SPARK-25044. Closes #22701 from maryannxue/spark-25690. Authored-by: maryannxue <[email protected]> Signed-off-by: gatorsmile <[email protected]> (cherry picked from commit 3685130) Signed-off-by: gatorsmile <[email protected]>
…ilize and can be applied infinitely ## What changes were proposed in this pull request? The HandleNullInputsForUDF rule can generate new If node infinitely, thus causing problems like match of SQL cache missed. This was fixed in SPARK-24891 and was then broken by SPARK-25044. The unit test in `AnalysisSuite` added in SPARK-24891 should have failed but didn't because it wasn't properly updated after the `ScalaUDF` constructor signature change. So this PR also updates the test accordingly based on the new `ScalaUDF` constructor. ## How was this patch tested? Updated the original UT. This should be justified as the original UT became invalid after SPARK-25044. Closes apache#22701 from maryannxue/spark-25690. Authored-by: maryannxue <[email protected]> Signed-off-by: gatorsmile <[email protected]>
What changes were proposed in this pull request?
The HandleNullInputsForUDF rule can generate new If node infinitely, thus causing problems like match of SQL cache missed.
This was fixed in SPARK-24891 and was then broken by SPARK-25044.
The unit test in
AnalysisSuite
added in SPARK-24891 should have failed but didn't because it wasn't properly updated after theScalaUDF
constructor signature change. So this PR also updates the test accordingly based on the newScalaUDF
constructor.How was this patch tested?
Updated the original UT. This should be justified as the original UT became invalid after SPARK-25044.