-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-22750][SQL] Reuse mutable states when possible #19940
Conversation
Test build #84695 has finished for PR 19940 at commit
|
@cloud-fan @kiszk @viirya might you please help reviewing this? Thanks. |
A high level question is, do we need to share mutable status, if we can compact global variables into array later? Will sharing mutable status increase the difficulty of debugging codegen in the future? |
@viirya we have seen that using arrays affects performance. Thus if we can reduce their usage it is better. I don't think that debugging is harder. These variables I made shared are never assigned, but in the initialization. Do you have an other opinion? Or are you thinking for something specific? |
def addSingleMutableState( | ||
javaType: String, | ||
variableName: String, | ||
initCode: String = ""): Unit = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How can we support different initCode for the same name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not supported.
variableName: String, | ||
initCode: String = ""): Unit = { | ||
if (!singleMutableStates.contains(variableName)) { | ||
addMutableState(javaType, variableName, initCode) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we add an assert here to make sure initCode
is same with the previous one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you want, I can add it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also check if the java type is the same. If one expression uses the same name with different type, we should alert it early.
@mgaido91 Do you mean the shared global variables are required to be only assigned once (initialization) and never changed again? |
@viirya this is the requirement I followed in this change which ensures that it is safe to share the variable across all the operators, since all the access are read only and there cannot be influences. Maybe this might be relaxed in the future, but if we follow this requirement, we are sure that this is safe. |
Shall we make them as |
Oh, the initialization is not right away in declaration. |
Test build #84714 has finished for PR 19940 at commit
|
I have one question. We are implementing #19811 to compact mutable states. When it will be merged, does this PR can reduce large number of constant pool entries? |
@kiszk for instance it can remove one entry for every timestamp function (to_timestamp or from_utc_timestamp). Of course #19811 is the most important PR, because it solves the problem. But I think we all agree that if we can avoid to waste global variables it is better and there have been many PRs to avoid the usage of global variables. This is one of the many. |
Jenkins, retest this please |
|
For 2., I noticed there are two types of initialization. One is in |
Test build #84775 has finished for PR 19940 at commit
|
I don't think so, it would be very risky, since |
Test build #84846 has finished for PR 19940 at commit
|
@cloud-fan @kiszk @viirya any more comments on this? |
@@ -170,6 +170,14 @@ class CodegenContext { | |||
val mutableStates: mutable.ArrayBuffer[(String, String, String)] = | |||
mutable.ArrayBuffer.empty[(String, String, String)] | |||
|
|||
/** | |||
* A map containing the mutable states which have been defined so far using | |||
* `addSingleMutableState`. Each entry contains the name of the mutable state as key and its |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is addSingleMutableState
old one?
@@ -401,4 +401,16 @@ class CodeGenerationSuite extends SparkFunSuite with ExpressionEvalHelper { | |||
ctx.addReferenceObj("foo", foo) | |||
assert(ctx.mutableStates.isEmpty) | |||
} | |||
|
|||
test("SPARK-22750: addSingleMutableState") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is addSingleMutableState
old one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, thanks, good catch!
Test build #84956 has finished for PR 19940 at commit
|
the test failure is not related to the PR. It looks the same R failure we had some days ago, I thought it was solved:
|
Jenkins, retest this please |
Test build #84972 has finished for PR 19940 at commit
|
Test build #85202 has finished for PR 19940 at commit
|
Test build #85212 has finished for PR 19940 at commit
|
Jenkins, retest this please |
the test errors are unrelated to this change. Any other comments @cloud-fan @kiszk @viirya ? |
Test build #85252 has finished for PR 19940 at commit
|
Jenkins, retest this please |
Test build #85261 has finished for PR 19940 at commit
|
* `addImmutableStateIfNotExists`. Each entry contains the name of the mutable state as key and | ||
* its Java type and init code as value. | ||
*/ | ||
val singleMutableStates: mutable.Map[String, (String, String)] = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
immutableStates?
@@ -1193,7 +1196,8 @@ case class ToUTCTimestamp(left: Expression, right: Expression) | |||
val dtu = DateTimeUtils.getClass.getName.stripSuffix("$") | |||
val tzTerm = ctx.addMutableState(tzClass, "tz", | |||
v => s"""$v = $dtu.getTimeZone("$tz");""") | |||
val utcTerm = ctx.addMutableState(tzClass, "utc", | |||
val utcTerm = "tzUTC" | |||
ctx.addImmutableStateIfNotExists(tzClass, utcTerm, | |||
v => s"""$v = $dtu.getTimeZone("UTC");""") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unrelated question: in the codebase sometimes we use UTC sometimes we use GMT, is it corrected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, there is no difference between them in practice. But I think that being consistent would be better for readability
LGTM |
Test build #85273 has finished for PR 19940 at commit
|
thanks, merging to master! |
What changes were proposed in this pull request?
The PR introduces a new method
addImmutableStateIfNotExists
toCodeGenerator
to allow reusing and sharing the same global variable between different Expressions. This helps reducing the number of global variables needed, which is important to limit the impact on the constant pool.How was this patch tested?
added UTs