Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-18467][SQL] Extracts method for preparing arguments from StaticInvoke, Invoke and NewInstance and modify to short circuit if arguments have null when needNullCheck == true. #15901

Closed
wants to merge 29 commits into from
Closed
Changes from 21 commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
9bf9aa9
Introduce `InvokeLike` to extract common logic from `StaticInvoke`, `…
ueshin Nov 15, 2016
28f6200
Refactor to remove unneeded null checking and fix nullability of `New…
ueshin Nov 15, 2016
2f30f53
Modify to short circuit if arguments have null when propageteNull == …
ueshin Nov 16, 2016
9ac3f28
Add a comment for `prepareArguments()`.
ueshin Nov 17, 2016
5ad6966
Define a variable for `propagateNull && arguments.exists(_.nullable)`.
ueshin Nov 17, 2016
a0ac177
Add a missing param comment for `propagateNull`.
ueshin Nov 17, 2016
757d33e
Rename variable from `argsHaveNull` to `containsNullInArguments`.
ueshin Nov 17, 2016
240fde4
Modify `InvokeLike` not to extend `Expression` because no need to ext…
ueshin Nov 17, 2016
6f6e0b3
Fix comments.
ueshin Nov 17, 2016
2bd6e50
Modify for readability.
ueshin Nov 17, 2016
bcb93db
Make `InvokeLike` extend `Expression` and `NonSQLExpression`.
ueshin Nov 17, 2016
d448b60
Use `propagatingNull`.
ueshin Nov 17, 2016
8894a96
Use `needNullCheck` instread of `propagatingNull`.
ueshin Nov 17, 2016
ca4558f
Modify `prepareArguments()` to return the result of argument null che…
ueshin Nov 17, 2016
4d9a037
Skip evaluate arguments if `obj.isNull == true` in `Invoke`.
ueshin Nov 17, 2016
ebb1241
Modify to prepare for further optimization.
ueshin Nov 17, 2016
99a59b2
Revert a part of last 2 commits.
ueshin Nov 17, 2016
243888a
Revert "Revert a part of last 2 commits."
ueshin Nov 17, 2016
e12a9bd
Revert "Modify to prepare for further optimization."
ueshin Nov 17, 2016
2c52d91
Use `obj.isNull` instead of `ev.isNull`.
ueshin Nov 17, 2016
43d2693
Revert and use a simple case.
ueshin Nov 17, 2016
bd9c09f
Remove unused argument.
ueshin Nov 18, 2016
501095f
Remove unneeded if.
ueshin Nov 18, 2016
1baac55
Modify generated code of `Invoke`.
ueshin Nov 18, 2016
831c521
Use resultIsNull as ev.isNull in `NewInstance`.
ueshin Nov 18, 2016
0b210f8
Remove unneeded zipWithIndex.
ueshin Nov 18, 2016
f8acda6
Use local variables for `resultIsNull`s and `argValue`s.
ueshin Nov 18, 2016
fe2871c
Revert "Use local variables for `resultIsNull`s and `argValue`s."
ueshin Nov 18, 2016
c88a1ff
Optimize a code in `StaticInvoke`.
ueshin Nov 19, 2016
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,79 @@ import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, ExprCo
import org.apache.spark.sql.catalyst.util.{ArrayBasedMapData, GenericArrayData}
import org.apache.spark.sql.types._

/**
* Common base class for [[StaticInvoke]], [[Invoke]], and [[NewInstance]].
*/
trait InvokeLike extends Expression with NonSQLExpression {

def arguments: Seq[Expression]

def propagateNull: Boolean

protected lazy val needNullCheck: Boolean = propagateNull && arguments.exists(_.nullable)

/**
* Prepares codes for arguments.
*
* - generate codes for argument.
* - use ctx.splitExpressions() to not exceed 64kb JVM limit while preparing arguments.
* - avoid some of nullabilty checking which are not needed because the expression is not
* nullable.
* - when needNullCheck == true, short circuit if we found one of arguments is null because
* preparing rest of arguments can be skipped in the case.
*
* @param ctx a [[CodegenContext]]
* @param ev an [[ExprCode]] with unique terms.
* @return (code to prepare arguments, argument string, result of argument null check)
*/
def prepareArguments(ctx: CodegenContext, ev: ExprCode): (String, String, String) = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add comment for this method? It is not simple enough to skip the comment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ev is not used.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, removed.


val resultIsNull = if (needNullCheck) {
val resultIsNull = ctx.freshName("resultIsNull")
ctx.addMutableState("boolean", resultIsNull, "")
resultIsNull
} else {
"false"
}
val argValues = arguments.zipWithIndex.map { case (e, i) =>
val argValue = ctx.freshName("argValue")
ctx.addMutableState(ctx.javaType(e.dataType), argValue, "")
argValue
}

val argCodes = if (needNullCheck) {
val reset = s"$resultIsNull = false;"
val argCodes = arguments.zipWithIndex.map { case (e, i) =>
val expr = e.genCode(ctx)
val updateResultIsNull = if (e.nullable) {
s"$resultIsNull = ${expr.isNull};"
} else {
""
}
s"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about:

val updateContainsNull = if (e.nullable) {
  "$containsNull = ${expr.isNull};"
} else {
  ""
}
s"""
  if (!$containsNull) {
      ${expr.code}
      $updateContainsNull
      ${argValues(i)} = ${expr.value};
  }
"""

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I'll use it.

if (!$resultIsNull) {
${expr.code}
$updateResultIsNull
${argValues(i)} = ${expr.value};
}
"""
}
reset +: argCodes
} else {
arguments.zipWithIndex.map { case (e, i) =>
val expr = e.genCode(ctx)
s"""
${expr.code}
${argValues(i)} = ${expr.value};
"""
}
}
val argCode = ctx.splitExpressions(ctx.INPUT_ROW, argCodes)

(argCode, argValues.mkString(", "), resultIsNull)
}
}

/**
* Invokes a static function, returning the result. By default, any of the arguments being null
* will result in returning null instead of calling the function.
Expand All @@ -50,7 +123,7 @@ case class StaticInvoke(
dataType: DataType,
functionName: String,
arguments: Seq[Expression] = Nil,
propagateNull: Boolean = true) extends Expression with NonSQLExpression {
propagateNull: Boolean = true) extends InvokeLike {

val objectName = staticObject.getName.stripSuffix("$")

Expand All @@ -62,16 +135,10 @@ case class StaticInvoke(

override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
val javaType = ctx.javaType(dataType)
val argGen = arguments.map(_.genCode(ctx))
val argString = argGen.map(_.value).mkString(", ")

val callFunc = s"$objectName.$functionName($argString)"
val (argCode, argString, resultIsNull) = prepareArguments(ctx, ev)

val setIsNull = if (propagateNull && arguments.nonEmpty) {
s"boolean ${ev.isNull} = ${argGen.map(_.isNull).mkString(" || ")};"
} else {
s"boolean ${ev.isNull} = false;"
}
val callFunc = s"$objectName.$functionName($argString)"

// If the function can return null, we do an extra check to make sure our null bit is still set
// correctly.
Expand All @@ -82,8 +149,8 @@ case class StaticInvoke(
}

val code = s"""
${argGen.map(_.code).mkString("\n")}
$setIsNull
$argCode
boolean ${ev.isNull} = $resultIsNull;
final $javaType ${ev.value} = ${ev.isNull} ? ${ctx.defaultValue(dataType)} : $callFunc;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ${ev.isNull} at line 153 can be $resultIsNull. So the optimization for false ? ${ctx.defaultValue(dataType)} : $callFunc; can work here too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's right. I modified. Thanks.

$postNullCheck
"""
Expand All @@ -103,13 +170,15 @@ case class StaticInvoke(
* @param functionName The name of the method to call.
* @param dataType The expected return type of the function.
* @param arguments An optional list of expressions, whos evaluation will be passed to the function.
* @param propagateNull When true, and any of the arguments is null, null will be returned instead
* of calling the function.
*/
case class Invoke(
targetObject: Expression,
functionName: String,
dataType: DataType,
arguments: Seq[Expression] = Nil,
propagateNull: Boolean = true) extends Expression with NonSQLExpression {
propagateNull: Boolean = true) extends InvokeLike {

override def nullable: Boolean = true
override def children: Seq[Expression] = targetObject +: arguments
Expand All @@ -131,8 +200,8 @@ case class Invoke(
override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
val javaType = ctx.javaType(dataType)
val obj = targetObject.genCode(ctx)
val argGen = arguments.map(_.genCode(ctx))
val argString = argGen.map(_.value).mkString(", ")

val (argCode, argString, resultIsNull) = prepareArguments(ctx, ev)

val returnPrimitive = method.isDefined && method.get.getReturnType.isPrimitive
val needTryCatch = method.isDefined && method.get.getExceptionTypes.nonEmpty
Expand Down Expand Up @@ -164,23 +233,20 @@ case class Invoke(
"""
}

val setIsNull = if (propagateNull && arguments.nonEmpty) {
s"boolean ${ev.isNull} = ${obj.isNull} || ${argGen.map(_.isNull).mkString(" || ")};"
} else {
s"boolean ${ev.isNull} = ${obj.isNull};"
}

// If the function can return null, we do an extra check to make sure our null bit is still set
// correctly.
val postNullCheck = if (ctx.defaultValue(dataType) == "null") {
s"${ev.isNull} = ${ev.value} == null;"
} else {
""
}

val code = s"""
${obj.code}
${argGen.map(_.code).mkString("\n")}
$setIsNull
if (!${obj.isNull}) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyway, I think this new code looks clearer than before, what do you think? @ueshin

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree with you. Thank you for your suggestion.

$argCode
}
boolean ${ev.isNull} = ${obj.isNull} || $resultIsNull;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this assumes resultIsNull is a member variable not local variable, can we avoid doing this? We can make the code clearer and more robust like:

javaType ${ev.value} = defaultValue;
boolean ${ev.isNull} = true;
if (!${obj.isNull}) {
  $argCode
  ${ev.isNull} = $resultIsNull
  ...
  $postNullCheck
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid not because if evaluating arguments is split to some methods, resultIsNull can't be referred from the split methods.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know resultIsNull must be a member variable now, but in the future we may make it a local variable if the argument code is not so long to exceed the 64kb limit. Here why not we assume resultIsNull can be local variable and make us more robust?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I modified the code as you mentioned.

But I'm sorry, I didn't understand what you want to do.

  • You are assuming that the argument code is not so long, so use local variable without splitting evaluations? (could also make the argument variable as local variable)
    • I think the argument code would easily become large enough to exceed the 64kb limit.
  • You want to modify only resultIsNull to local variable and pass it and return the value to the split methods?
  • You want to switch local variable or member variable by whether split or not?
    • We need to know whether split or not but we can't now. Modify ctx.splitExpressions() to return split or not?

Could you let me know if you have other thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are assuming that the argument code is not so long, so use local variable without splitting evaluations? (could also make the argument variable as local variable)

This is what I mean. We can generate code first and see if it exceeds the 64kb limitation, so logically we do have a chance to use local variables.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Let's see if it exceeds the limit or not, using local variables.

$javaType ${ev.value} = ${ctx.defaultValue(dataType)};
if (!${ev.isNull}) {
$evaluate
Expand Down Expand Up @@ -223,10 +289,10 @@ case class NewInstance(
arguments: Seq[Expression],
propagateNull: Boolean,
dataType: DataType,
outerPointer: Option[() => AnyRef]) extends Expression with NonSQLExpression {
outerPointer: Option[() => AnyRef]) extends InvokeLike {
private val className = cls.getName

override def nullable: Boolean = propagateNull
override def nullable: Boolean = needNullCheck

override def children: Seq[Expression] = arguments

Expand All @@ -245,51 +311,35 @@ case class NewInstance(

override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
val javaType = ctx.javaType(dataType)
val argIsNulls = ctx.freshName("argIsNulls")
ctx.addMutableState("boolean[]", argIsNulls,
s"$argIsNulls = new boolean[${arguments.size}];")
val argValues = arguments.zipWithIndex.map { case (e, i) =>
val argValue = ctx.freshName("argValue")
ctx.addMutableState(ctx.javaType(e.dataType), argValue, "")
argValue
}

val argCodes = arguments.zipWithIndex.map { case (e, i) =>
val expr = e.genCode(ctx)
expr.code + s"""
$argIsNulls[$i] = ${expr.isNull};
${argValues(i)} = ${expr.value};
"""
}
val argCode = ctx.splitExpressions(ctx.INPUT_ROW, argCodes)
val (argCode, argString, resultIsNull) = prepareArguments(ctx, ev)

val outer = outerPointer.map(func => Literal.fromObject(func()).genCode(ctx))

var isNull = ev.isNull
val setIsNull = if (propagateNull && arguments.nonEmpty) {
s"""
boolean $isNull = false;
for (int idx = 0; idx < ${arguments.length}; idx++) {
if ($argIsNulls[idx]) { $isNull = true; break; }
}
"""
val prepareIsNull = if (needNullCheck) {
s"boolean $isNull = $resultIsNull;"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one more optimization: we can just write ev.isNull = resultIsNull, then if resultIsNull is false, the final evaluation will become ... = false ? aaa : bbb and get optimized by janino.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see what you mean, modified. Thanks.

} else {
isNull = "false"
""
}

val constructorCall = outer.map { gen =>
s"""${gen.value}.new ${cls.getSimpleName}(${argValues.mkString(", ")})"""
s"${gen.value}.new ${cls.getSimpleName}($argString)"
}.getOrElse {
s"new $className(${argValues.mkString(", ")})"
s"new $className($argString)"
}

val code = s"""
$argCode
${outer.map(_.code).getOrElse("")}
$setIsNull
final $javaType ${ev.value} = $isNull ? ${ctx.defaultValue(javaType)} : $constructorCall;
"""
$prepareIsNull
""" +
(if (needNullCheck) {
s"final $javaType ${ev.value} = $isNull ? ${ctx.defaultValue(javaType)} : $constructorCall;"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If needNullCheck is true, we don't need to evaluate arguments if $isNull == true.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@viirya I think we do need because if needNullCheck is true, the $isNull stores the result of evaluating arguments.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or am I misunderstanding what you mean..?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I see. This is correct.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. Thanks.

Copy link
Contributor

@cloud-fan cloud-fan Nov 18, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it necessary? I'm pretty sure javac can optimize ... = false ? a : b to ... = b for us.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we use janino instead of javac?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry I mean "java compiler"... I think janino is smart enough about it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, updated. Thanks.

} else {
s"final $javaType ${ev.value} = $constructorCall;"
})
ev.copy(code = code, isNull = isNull)
}

Expand Down