-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-18467][SQL] Extracts method for preparing arguments from StaticInvoke, Invoke and NewInstance and modify to short circuit if arguments have null when needNullCheck == true
.
#15901
Conversation
So the refactoring for simplification adds more code than it deletes? Did you introduce new features as well? |
@rxin I introduced features as follows:
|
cc @cloud-fan |
|
||
def propagateNull: Boolean | ||
|
||
def prepareArguments(ctx: CodegenContext, ev: ExprCode): (String, String, String) = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add comment for this method? It is not simple enough to skip the comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
def prepareArguments(ctx: CodegenContext, ev: ExprCode): (String, String, String) = { | ||
|
||
val argsHaveNull = | ||
if (propagateNull && arguments.exists(_.nullable)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
propagateNull && arguments.exists(_.nullable)
is used many times, define a variable for it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
Test build #68700 has finished for PR 15901 at commit
|
This looks good overall. The more codes added are mostly used to deal with short circuit cases of nullability. |
@@ -109,7 +171,7 @@ case class Invoke( | |||
functionName: String, | |||
dataType: DataType, | |||
arguments: Seq[Expression] = Nil, | |||
propagateNull: Boolean = true) extends Expression with NonSQLExpression { | |||
propagateNull: Boolean = true) extends Expression with InvokeLike with NonSQLExpression { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment of Invoke
doesn't have explanation for the param propagateNull
. Can you also add it too?
The title is not well explained this pr and can be updated to reflect its purpose. The explanation of the features you added can be moved to the pr description. |
|
||
def prepareArguments(ctx: CodegenContext, ev: ExprCode): (String, String, String) = { | ||
|
||
val argsHaveNull = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this variable name looks like a boolean flag, can you think of a better name?
@@ -33,6 +33,73 @@ import org.apache.spark.sql.catalyst.util.{ArrayBasedMapData, GenericArrayData} | |||
import org.apache.spark.sql.types._ | |||
|
|||
/** | |||
* Common base class for [[StaticInvoke]], [[Invoke]], and [[NewInstance]]. | |||
*/ | |||
trait InvokeLike extends Expression { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it need to extend Expression
?
propageteNull == true
.
@viirya @cloud-fan I addressed your comments and updated pr title and description. |
* Prepares codes for arguments. | ||
* | ||
* - generate codes for argument. | ||
* - use ctx.splitExpressions() not to exceed 64kb JVM limit while preparing arguments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: not to
-> to not
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, fixed.
* - use ctx.splitExpressions() not to exceed 64kb JVM limit while preparing arguments. | ||
* - avoid some of nullabilty checking which are not needed because the expression is not | ||
* nullable. | ||
* - when progagateNull == true, short circuit if we found one of arguments is null because |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
progagateNull
-> propagatingNull
? Actually can we think of better name than propagatingNull
? how about needNullCheck
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I used needNullCheck
for now.
val reset = s"$containsNullInArguments = false;" | ||
val argCodes = arguments.zipWithIndex.map { case (e, i) => | ||
val expr = e.genCode(ctx) | ||
s""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about:
val updateContainsNull = if (e.nullable) {
"$containsNull = ${expr.isNull};"
} else {
""
}
s"""
if (!$containsNull) {
${expr.code}
$updateContainsNull
${argValues(i)} = ${expr.value};
}
"""
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I'll use it.
@@ -50,7 +132,7 @@ case class StaticInvoke( | |||
dataType: DataType, | |||
functionName: String, | |||
arguments: Seq[Expression] = Nil, | |||
propagateNull: Boolean = true) extends Expression with NonSQLExpression { | |||
propagateNull: Boolean = true) extends Expression with InvokeLike with NonSQLExpression { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not make InvokeLike
extends Expression
and NonSQLExpression
, then we can just extends InvokeLike
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, I'll modify them.
if ($argIsNulls[idx]) { $isNull = true; break; } | ||
} | ||
""" | ||
val prepareIsNull = if (propagateNull && arguments.exists(_.nullable)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use propogatingNull
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, I missed it, fixed.
Test build #68777 has finished for PR 15901 at commit
|
Test build #68778 has finished for PR 15901 at commit
|
LGTM cc @cloud-fan for another check. |
* @param ev an [[ExprCode]] with unique terms. | ||
* @return (code to prepare arguments, argument string, result of argument null check) | ||
*/ | ||
def prepareArguments(ctx: CodegenContext, ev: ExprCode): (String, String, String) = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ev
is not used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, removed.
if (!${obj.isNull}) { | ||
$argCode | ||
} | ||
boolean ${ev.isNull} = ${obj.isNull} || $resultIsNull; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this assumes resultIsNull
is a member variable not local variable, can we avoid doing this? We can make the code clearer and more robust like:
javaType ${ev.value} = defaultValue;
boolean ${ev.isNull} = true;
if (!${obj.isNull}) {
$argCode
${ev.isNull} = $resultIsNull
...
$postNullCheck
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm afraid not because if evaluating arguments is split to some methods, resultIsNull
can't be referred from the split methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know resultIsNull
must be a member variable now, but in the future we may make it a local variable if the argument code is not so long to exceed the 64kb limit. Here why not we assume resultIsNull
can be local variable and make us more robust?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I modified the code as you mentioned.
But I'm sorry, I didn't understand what you want to do.
- You are assuming that the argument code is not so long, so use local variable without splitting evaluations? (could also make the argument variable as local variable)
- I think the argument code would easily become large enough to exceed the 64kb limit.
- You want to modify only
resultIsNull
to local variable and pass it and return the value to the split methods?- We need to extend the method
ctx.splitExpressions()
to be able to pass and return value and handle it as [SPARK-16845][SQL]GeneratedClass$SpecificOrdering
grows beyond 64 KB #15480 is trying.
- We need to extend the method
- You want to switch local variable or member variable by whether split or not?
- We need to know whether split or not but we can't now. Modify
ctx.splitExpressions()
to return split or not?
- We need to know whether split or not but we can't now. Modify
Could you let me know if you have other thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are assuming that the argument code is not so long, so use local variable without splitting evaluations? (could also make the argument variable as local variable)
This is what I mean. We can generate code first and see if it exceeds the 64kb limitation, so logically we do have a chance to use local variables.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Let's see if it exceeds the limit or not, using local variables.
$prepareIsNull | ||
""" + | ||
(if (needNullCheck) { | ||
s"final $javaType ${ev.value} = $isNull ? ${ctx.defaultValue(javaType)} : $constructorCall;" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it necessary? I'm pretty sure javac can optimize ... = false ? a : b
to ... = b
for us.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we use janino instead of javac?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry I mean "java compiler"... I think janino is smart enough about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. I go to check janino source code. It has the optimization: https://github.com/janino-compiler/janino/blob/22a7787e62037049e337cb1ce4064c29a4856022/janino/src/main/java/org/codehaus/janino/UnitCompiler.java#L4247.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, updated. Thanks.
LGTM except some minor comments, thanks for working on it. |
Test build #68833 has finished for PR 15901 at commit
|
$javaType ${ev.value} = ${ctx.defaultValue(dataType)}; | ||
if (!${ev.isNull}) { | ||
$evaluate | ||
if (!${obj.isNull}) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anyway, I think this new code looks clearer than before, what do you think? @ueshin
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree with you. Thank you for your suggestion.
} | ||
""" | ||
val prepareIsNull = if (needNullCheck) { | ||
s"boolean $isNull = $resultIsNull;" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one more optimization: we can just write ev.isNull = resultIsNull
, then if resultIsNull
is false
, the final evaluation will become ... = false ? aaa : bbb
and get optimized by janino.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see what you mean, modified. Thanks.
f8acda6 is a test to see if it exceeds the 64kb JVM limit or not, using only local variables. |
Test build #68858 has finished for PR 15901 at commit
|
This reverts commit f8acda6.
Test build #68877 has finished for PR 15901 at commit
|
${argGen.map(_.code).mkString("\n")} | ||
$setIsNull | ||
$argCode | ||
boolean ${ev.isNull} = $resultIsNull; | ||
final $javaType ${ev.value} = ${ev.isNull} ? ${ctx.defaultValue(dataType)} : $callFunc; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ${ev.isNull}
at line 153 can be $resultIsNull
. So the optimization for false ? ${ctx.defaultValue(dataType)} : $callFunc;
can work here too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's right. I modified. Thanks.
Test build #68881 has finished for PR 15901 at commit
|
LGTM cc @cloud-fan for another look. |
thanks, merging to master/2.1! |
…cInvoke, Invoke and NewInstance and modify to short circuit if arguments have null when `needNullCheck == true`. ## What changes were proposed in this pull request? This pr extracts method for preparing arguments from `StaticInvoke`, `Invoke` and `NewInstance` and modify to short circuit if arguments have `null` when `propageteNull == true`. The steps are as follows: 1. Introduce `InvokeLike` to extract common logic from `StaticInvoke`, `Invoke` and `NewInstance` to prepare arguments. `StaticInvoke` and `Invoke` had a risk to exceed 64kb JVM limit to prepare arguments but after this patch they can handle them because they share the preparing code of NewInstance, which handles the limit well. 2. Remove unneeded null checking and fix nullability of `NewInstance`. Avoid some of nullabilty checking which are not needed because the expression is not nullable. 3. Modify to short circuit if arguments have `null` when `needNullCheck == true`. If `needNullCheck == true`, preparing arguments can be skipped if we found one of them is `null`, so modified to short circuit in the case. ## How was this patch tested? Existing tests. Author: Takuya UESHIN <[email protected]> Closes #15901 from ueshin/issues/SPARK-18467. (cherry picked from commit 6585479) Signed-off-by: Wenchen Fan <[email protected]>
…cInvoke, Invoke and NewInstance and modify to short circuit if arguments have null when `needNullCheck == true`. ## What changes were proposed in this pull request? This pr extracts method for preparing arguments from `StaticInvoke`, `Invoke` and `NewInstance` and modify to short circuit if arguments have `null` when `propageteNull == true`. The steps are as follows: 1. Introduce `InvokeLike` to extract common logic from `StaticInvoke`, `Invoke` and `NewInstance` to prepare arguments. `StaticInvoke` and `Invoke` had a risk to exceed 64kb JVM limit to prepare arguments but after this patch they can handle them because they share the preparing code of NewInstance, which handles the limit well. 2. Remove unneeded null checking and fix nullability of `NewInstance`. Avoid some of nullabilty checking which are not needed because the expression is not nullable. 3. Modify to short circuit if arguments have `null` when `needNullCheck == true`. If `needNullCheck == true`, preparing arguments can be skipped if we found one of them is `null`, so modified to short circuit in the case. ## How was this patch tested? Existing tests. Author: Takuya UESHIN <[email protected]> Closes apache#15901 from ueshin/issues/SPARK-18467.
What changes were proposed in this pull request?
This pr extracts method for preparing arguments from
StaticInvoke
,Invoke
andNewInstance
and modify to short circuit if arguments havenull
whenpropageteNull == true
.The steps are as follows:
Introduce
InvokeLike
to extract common logic fromStaticInvoke
,Invoke
andNewInstance
to prepare arguments.StaticInvoke
andInvoke
had a risk to exceed 64kb JVM limit to prepare arguments but after this patch they can handle them because they share the preparing code of NewInstance, which handles the limit well.Remove unneeded null checking and fix nullability of
NewInstance
.Avoid some of nullabilty checking which are not needed because the expression is not nullable.
Modify to short circuit if arguments have
null
whenneedNullCheck == true
.If
needNullCheck == true
, preparing arguments can be skipped if we found one of them isnull
, so modified to short circuit in the case.How was this patch tested?
Existing tests.