Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-25690][SQL] Analyzer rule HandleNullInputsForUDF does not stabilize and can be applied infinitely #22701

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -2151,7 +2151,7 @@ class Analyzer(
// TODO: skip null handling for not-nullable primitive inputs after we can completely
// trust the `nullable` information.
val inputsNullCheck = nullableTypes.zip(inputs)
.filter { case (nullable, _) => !nullable }
.filter { case (nullable, expr) => !nullable && !expr.isInstanceOf[KnownNotNull] }
Copy link
Member

@gatorsmile gatorsmile Oct 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let us use the original way? create a val needsNullCheck. in the PR #21851

val needsNullCheck = ... 

.map { case (_, expr) => IsNull(expr) }
.reduceLeftOption[Expression]((e1, e2) => Or(e1, e2))
// Once we add an `If` check above the udf, it is safe to mark those checked inputs
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -351,8 +351,8 @@ class AnalysisSuite extends AnalysisTest with Matchers {
test("SPARK-24891 Fix HandleNullInputsForUDF rule") {
val a = testRelation.output(0)
val func = (x: Int, y: Int) => x + y
val udf1 = ScalaUDF(func, IntegerType, a :: a :: Nil)
val udf2 = ScalaUDF(func, IntegerType, a :: udf1 :: Nil)
val udf1 = ScalaUDF(func, IntegerType, a :: a :: Nil, nullableTypes = false :: false :: Nil)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So clearly we should make both of these changes. This change fixes the test here. But is the change above, involving KnownNotNull, important for correctness? that is can some user code trigger this infinite loop you mention in SPARK-24891? I'm trying to figure out whether the change here is absolutely required for 2.4, or an important change that could happen in 2.4.1.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's two separate issues.
If nullableTypes is not added here, the HandleNullInputsForUDF will do nothing, which means null checks will be missed. So it is itself a problem, which can be potentially triggered by a user.
As to test, if the rule is not doing anything, the "doing something infinitely" bug cannot be reproduced. But the infinite issue is one on a theoretical level and is quite unlikely to have any end-user impact, thanks to @rxin's fix for SPARK-24865.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, it sounds like we will have another 2.4 RC anyway, so we should get all of these changes in.

val udf2 = ScalaUDF(func, IntegerType, a :: udf1 :: Nil, nullableTypes = false :: false :: Nil)
val plan = Project(Alias(udf2, "")() :: Nil, testRelation)
comparePlans(plan.analyze, plan.analyze.analyze)
}
Expand Down