-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-16845][SQL] GeneratedClass$SpecificOrdering
grows beyond 64 KB
#15480
Changes from 3 commits
ecc6720
1ae9935
33b5fd8
0aedc47
3d31cb3
4aef473
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -72,7 +72,7 @@ object GenerateOrdering extends CodeGenerator[Seq[SortOrder], Ordering[InternalR | |
* Generates the code for ordering based on the given order. | ||
*/ | ||
def genComparisons(ctx: CodegenContext, ordering: Seq[SortOrder]): String = { | ||
val comparisons = ordering.map { order => | ||
def comparisons(orderingGroup: Seq[SortOrder]) = orderingGroup.map { order => | ||
val eval = order.child.genCode(ctx) | ||
val asc = order.isAscending | ||
val isNullA = ctx.freshName("isNullA") | ||
|
@@ -118,7 +118,42 @@ object GenerateOrdering extends CodeGenerator[Seq[SortOrder], Ordering[InternalR | |
} | ||
""" | ||
}.mkString("\n") | ||
comparisons | ||
|
||
/* | ||
* 40 = 7000 bytes / 170 (around 170 bytes per ordering comparison). | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. how do you get the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah sorry this is vague. Actually I ran all the 36 test cases in
I thought it'd be safer if we pick 40 (taking minor future changes into account). Thus 170 should be considered as some kind of a safe assumption (or not?). Would you share your thoughts on this? or anyway we can improve this? thanks. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. test cases are not real world workloads, we can't estimate comparison code size based on test cases. My idea is that, first we generate the comparison code like before, and check the code size, if it exceeds 1024(see #15620 (comment)), go to the splitting branch. In the splitting branch, we can generate method for each ordering expression. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If you use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. thanks @cloud-fan @ueshin for your valuable comments. @ueshin do you have an on-going work of refactoring There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
* The maximum byte code size to be compiled for HotSpot is 8000 bytes. | ||
* We should keep less than 8000 bytes. | ||
*/ | ||
val numberOfComparisonsThreshold = 40 | ||
|
||
if (ordering.size <= numberOfComparisonsThreshold) { | ||
comparisons(ordering) | ||
} else { | ||
val groupedOrderingItr = ordering.grouped(numberOfComparisonsThreshold) | ||
val funcNamePrefix = ctx.freshName("compare") | ||
val funcNames = groupedOrderingItr.zipWithIndex.map { case (orderingGroup, i) => | ||
val funcName = s"${funcNamePrefix}_$i" | ||
val funcCode = | ||
s""" | ||
|private int $funcName(InternalRow a, InternalRow b) { | ||
| InternalRow ${ctx.INPUT_ROW} = null; // Holds current row being evaluated. | ||
| ${comparisons(orderingGroup)} | ||
| return 0; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. if we make the comparison result a member variable, then we don't need There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yea that's right -- here we're returning ints because There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. for performance concerns, we should avoid using member variables. If there is no easy way to reuse There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm interested in refactoring approach, which will be useful for more general case. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am also interested in refactoring. However, it would be better to do in another PR. |
||
|} | ||
""".stripMargin | ||
ctx.addNewFunction(funcName, funcCode) | ||
funcName | ||
} | ||
|
||
funcNames.zipWithIndex.map { case (funcName, i) => | ||
s""" | ||
|int comp_$i = ${funcName}(a, b); | ||
|if (comp_$i != 0) { | ||
| return comp_$i; | ||
|} | ||
""".stripMargin | ||
}.mkString | ||
} | ||
} | ||
|
||
protected def create(ordering: Seq[SortOrder]): BaseOrdering = { | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -127,4 +127,17 @@ class OrderingSuite extends SparkFunSuite with ExpressionEvalHelper { | |
} | ||
} | ||
} | ||
|
||
test("SPARK-16845: GeneratedClass$SpecificOrdering grows beyond 64 KB") { | ||
val sortOrder = Literal("abc").asc | ||
|
||
// this is passing prior to SPARK-16845, and it should also be passing after SPARK-16845 | ||
GenerateOrdering.generate(Array.fill(40)(sortOrder)) | ||
|
||
// this is FAILING prior to SPARK-16845, but it should be passing after SPARK-16845 | ||
GenerateOrdering.generate(Array.fill(450)(sortOrder)) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is unnecessary, it's covered by the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sure, let's remove the |
||
|
||
// verify that we can support up to 10000 ordering comparisons, which should be sufficient | ||
GenerateOrdering.generate(Array.fill(10000)(sortOrder)) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of this implementation, is it possible to use
this function
by adding a statement forreturn
as a default argument?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @kiszk for bringing this up!
@ueshin @kiszk any comments please on do we want to expand
CodeGenerator#private splitExpressions
to:numberOfComparisonsThreshold = 40
rather than the string lengthThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @kiszk that we would use the function if possible, but I have no idea to expand the function to apply to this case simply.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lw-lin , good points.
For the second issue, can we use the string length as a proxy of
numberOfComparisonsThreshold
? I know this is not the exact estimation.For the first issue, how about the following approach? In advance, I am sorry that I have not compiled it myself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kiszk @ueshin thanks!
i kind of implemented a prove of concept (lw-lin@d0c1198) on how to extend
splitExpressions
to rewrite this PRcould you comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lw-lin I commented to the commit. Please look at them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@davies could you take a quick look here:
this PR tries to break a huge generated method into smaller pieces. @kiszk @ueshin and I were discussing whether we should:
CodeGenerator.splitExpressions(...)
to generally support this breaking-ups, like in the POC lw-lin@d0c1198@davies could you advise on this? thanks!