[SPARK-16845][SQL] `GeneratedClass$SpecificOrdering` grows beyond 64 KB #15480

lw-lin · 2016-10-14T07:33:11Z

What changes were proposed in this pull request?

Prior to this patch, we'll generate compare(...) for GeneratedClass$SpecificOrdering like below, leading to Janino exceptions saying the code grows beyond 64 KB.

/* 005 */ class SpecificOrdering extends o.a.s.sql.catalyst.expressions.codegen.BaseOrdering {
/* ..... */   ...
/* 10969 */   private int compare(InternalRow a, InternalRow b) {
/* 10970 */     InternalRow i = null;  // Holds current row being evaluated.
/* 10971 */
/* 1.... */     code for comparing field0
/* 1.... */     code for comparing field1
/* 1.... */     ...
/* 1.... */     code for comparing field449
/* 15012 */
/* 15013 */     return 0;
/* 15014 */   }
/* 15015 */ }

This patch would break compare(...) into smaller compare_xxx(...) methods when necessary; then we'll get generated compare(...) like:

/* 001 */ public SpecificOrdering generate(Object[] references) {
/* 002 */   return new SpecificOrdering(references);
/* 003 */ }
/* 004 */
/* 005 */ class SpecificOrdering extends o.a.s.sql.catalyst.expressions.codegen.BaseOrdering {
/* 006 */
/* 007 */     ...
/* 1.... */
/* 11290 */   private int compare_0(InternalRow a, InternalRow b) {
/* 11291 */     InternalRow i = null;  // Holds current row being evaluated.
/* 11292 */
/* 11293 */     i = a;
/* 11294 */     boolean isNullA;
/* 11295 */     UTF8String primitiveA;
/* 11296 */     {
/* 11297 */
/* 11298 */       Object obj = ((Expression) references[0]).eval(null);
/* 11299 */       UTF8String value = (UTF8String) obj;
/* 11300 */       isNullA = false;
/* 11301 */       primitiveA = value;
/* 11302 */     }
/* 11303 */     i = b;
/* 11304 */     boolean isNullB;
/* 11305 */     UTF8String primitiveB;
/* 11306 */     {
/* 11307 */
/* 11308 */       Object obj = ((Expression) references[0]).eval(null);
/* 11309 */       UTF8String value = (UTF8String) obj;
/* 11310 */       isNullB = false;
/* 11311 */       primitiveB = value;
/* 11312 */     }
/* 11313 */     if (isNullA && isNullB) {
/* 11314 */       // Nothing
/* 11315 */     } else if (isNullA) {
/* 11316 */       return -1;
/* 11317 */     } else if (isNullB) {
/* 11318 */       return 1;
/* 11319 */     } else {
/* 11320 */       int comp = primitiveA.compare(primitiveB);
/* 11321 */       if (comp != 0) {
/* 11322 */         return comp;
/* 11323 */       }
/* 11324 */     }
/* 11325 */
/* 11326 */
/* 11327 */     i = a;
/* 11328 */     boolean isNullA1;
/* 11329 */     UTF8String primitiveA1;
/* 11330 */     {
/* 11331 */
/* 11332 */       Object obj1 = ((Expression) references[1]).eval(null);
/* 11333 */       UTF8String value1 = (UTF8String) obj1;
/* 11334 */       isNullA1 = false;
/* 11335 */       primitiveA1 = value1;
/* 11336 */     }
/* 11337 */     i = b;
/* 11338 */     boolean isNullB1;
/* 11339 */     UTF8String primitiveB1;
/* 11340 */     {
/* 11341 */
/* 11342 */       Object obj1 = ((Expression) references[1]).eval(null);
/* 11343 */       UTF8String value1 = (UTF8String) obj1;
/* 11344 */       isNullB1 = false;
/* 11345 */       primitiveB1 = value1;
/* 11346 */     }
/* 11347 */     if (isNullA1 && isNullB1) {
/* 11348 */       // Nothing
/* 11349 */     } else if (isNullA1) {
/* 11350 */       return -1;
/* 11351 */     } else if (isNullB1) {
/* 11352 */       return 1;
/* 11353 */     } else {
/* 11354 */       int comp = primitiveA1.compare(primitiveB1);
/* 11355 */       if (comp != 0) {
/* 11356 */         return comp;
/* 11357 */       }
/* 11358 */     }
/* 1.... */
/* 1.... */   ...
/* 1.... */
/* 12652 */     return 0;
/* 12653 */   }
/* 1.... */
/* 1.... */   ...
/* 15387 */
/* 15388 */   public int compare(InternalRow a, InternalRow b) {
/* 15389 */
/* 15390 */     int comp_0 = compare_0(a, b);
/* 15391 */     if (comp_0 != 0) {
/* 15392 */       return comp_0;
/* 15393 */     }
/* 15394 */
/* 15395 */     int comp_1 = compare_1(a, b);
/* 15396 */     if (comp_1 != 0) {
/* 15397 */       return comp_1;
/* 15398 */     }
/* 1.... */
/* 1.... */     ...
/* 1.... */
/* 15450 */     return 0;
/* 15451 */   }
/* 15452 */ }

How was this patch tested?

a new added test case which
- would fail prior to this patch
- would pass with this patch
ordering correctness should already be covered by existing tests like those in OrderingSuite

Acknowledgement

A major part of this PR - the refactoring work of splitExpression() - has been done by @ueshin.

SparkQA · 2016-10-14T09:14:18Z

Test build #66954 has finished for PR 15480 at commit ecc6720.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2016-10-14T14:59:16Z

Maybe we could run rebuild for this if you are very sure on this failure is due to a flaky test.

lw-lin · 2016-10-14T23:51:22Z

Flaky test I think unrelated to this PR. Thanks, @HyukjinKwon!

lw-lin · 2016-10-14T23:52:35Z

Jenkins retest this please

lw-lin · 2016-10-15T00:16:16Z

@davies @kiszk it'd be great if you can take a look

SparkQA · 2016-10-15T02:10:40Z

Test build #66999 has finished for PR 15480 at commit ecc6720.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

rxin · 2016-10-15T07:40:10Z

cc @ueshin want to help review this?

kiszk · 2016-10-15T09:24:55Z

...lyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateOrdering.scala

@@ -118,7 +118,45 @@ object GenerateOrdering extends CodeGenerator[Seq[SortOrder], Ordering[InternalR
          }
      """
    }.mkString("\n")
-    comparisons
+
+    /*


Instead of this implementation, is it possible to use this function by adding a statement for return as a default argument?

thanks @kiszk for bringing this up!

@ueshin @kiszk any comments please on do we want to expand CodeGenerator#private splitExpressions to:

add one more argument to specify how we want to "return" things?

add arguments to specify in which manner do we want to break expression? E.g. in this pr's case, we might want to break it according to numberOfComparisonsThreshold = 40 rather than the string length

I agree with @kiszk that we would use the function if possible, but I have no idea to expand the function to apply to this case simply.

@lw-lin , good points.
For the second issue, can we use the string length as a proxy of numberOfComparisonsThreshold? I know this is not the exact estimation.
For the first issue, how about the following approach? In advance, I am sorry that I have not compiled it myself.

// CodeGenerator.scalar def splitExpressions(expressions: Seq[String], funcName: String, arguments: Seq[(String, String), returns: (String, String] =("void", "")): String = { .. val code = s""" |private $(returns._1) $name(${arguments.map { case (t, name) => s"$t $name" }.mkString(", ")}) { | $body | ${returns._2) |} """.stripMargin ... }

// GenerateOrdering.scala val groupedOrderingItr = ordering.grouped(numberOfComparisonsThreshold) // var groupedOrderingLength = 0 val functions = ctx.splitExpressions { .. } val comp = freshName("comp") functions.zipWithIndex.map { case(func , i) => val name = s"${comp}_$i" s""" |int $name = $func(a, b); |if ($name != 0) { | return $name; |} """.stripMargin }.mkString

@kiszk @ueshin thanks!
i kind of implemented a prove of concept (lw-lin@d0c1198) on how to extend splitExpressions to rewrite this PR
could you comment?

@lw-lin I commented to the commit. Please look at them.

@davies could you take a quick look here:

this PR tries to break a huge generated method into smaller pieces. @kiszk @ueshin and I were discussing whether we should:

(a) just do the breaking-ups case by case, like in this PR

or (b) expand CodeGenerator.splitExpressions(...) to generally support this breaking-ups, like in the POC lw-lin@d0c1198

@davies could you advise on this? thanks!

ueshin · 2016-10-15T11:01:12Z

...lyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateOrdering.scala

+      var groupedOrderingLength = 0
+      groupedOrderingItr.zipWithIndex.foreach { case (orderingGroup, i) =>
+        groupedOrderingLength += 1
+        val funcName = s"compare_$i"


We need to use fresh name for funcName or its prefix (see here).

let me fix this, thanks!

ueshin · 2016-10-15T11:44:16Z

...lyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateOrdering.scala

+        ctx.addNewFunction(funcName, funcCode)
+      }
+
+      (0 to groupedOrderingLength - 1).map { i =>


nit: use (0 until groupedOrderingLength).

let me fix this, thanks!

SparkQA · 2016-10-18T10:15:23Z

Test build #67116 has finished for PR 15480 at commit 1ae9935.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

ueshin · 2016-10-18T11:57:45Z

...atalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala

@@ -537,7 +537,6 @@ class CodegenContext {
      val funcCode: String =
        s"""
          public int $compareFunc(InternalRow a, InternalRow b) {
-            InternalRow i = null;


I think this line should not be moved into GenerateOrdering.genComparisons() if there is not a special reason.

thanks -- but not removing it from here would lead to declaring a local variable i that we might not use when (ordering.size > numberOfComparisonsThreshold). what do you think?

I think GenerateOrdering.genComparisons() is expected to generate a PART of comparison method. If the declaration of the variable i is in the part, we can't use GenerateOrdering.genComparisons() twice or more for the same method.
Declaration of unused variable will not be a problem.

Oh I see, makes a lot of sense. Let me fix this, thanks!

lw-lin · 2016-11-04T06:36:16Z

Jenkins retest this please

SparkQA · 2016-11-04T09:11:39Z

Test build #68113 has finished for PR 15480 at commit 1ae9935.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-11-05T15:29:58Z

Test build #68210 has finished for PR 15480 at commit 33b5fd8.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

lw-lin · 2016-11-06T02:18:44Z

Jenkins retest this please

SparkQA · 2016-11-06T03:14:48Z

Test build #68220 has finished for PR 15480 at commit 33b5fd8.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

kiszk · 2016-11-06T16:12:41Z

Jenkins retest this please

SparkQA · 2016-11-06T18:33:24Z

Test build #68244 has finished for PR 15480 at commit 33b5fd8.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

lw-lin · 2016-11-08T13:28:42Z

@hvanhovell it'd be great if you can take a look at this, thanks!

lw-lin · 2016-11-23T03:09:33Z

@cloud-fan @hvanhovell would you take a look at this? Seems like it's targeted for 2.1. Thanks!

lw-lin · 2016-11-23T03:09:42Z

Jenkins retest this please

SparkQA · 2016-11-23T05:39:31Z

Test build #69044 has finished for PR 15480 at commit 33b5fd8.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2016-11-23T15:33:03Z

...lyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateOrdering.scala

+             |private int $funcName(InternalRow a, InternalRow b) {
+             |  InternalRow ${ctx.INPUT_ROW} = null;  // Holds current row being evaluated.
+             |  ${comparisons(orderingGroup)}
+             |  return 0;


if we make the comparison result a member variable, then we don't need return in the comparison code right?

Yea that's right -- here we're returning ints because comparisons(ordering/orderingGroup)(L75 of this same file) is returning ints. Should we change that all along?

for performance concerns, we should avoid using member variables. If there is no easy way to reuse splitExpressions, I'm ok with the current approach.

Ah then maybe this is ready to go? I recall @ueshin or @kiszk might be interested in refactoring splitExpressions in future PRs.

Thanks for reviewing this!

I'm interested in refactoring approach, which will be useful for more general case.

I am also interested in refactoring. However, it would be better to do in another PR.

cloud-fan · 2016-11-28T14:22:58Z

...lyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateOrdering.scala

-    comparisons
+
+    /*
+     * 40 = 7000 bytes / 170 (around 170 bytes per ordering comparison).


how do you get the 170?

Ah sorry this is vague.

Actually I ran all the 36 test cases in OrderingSuite and logged each of the generated comparisonXXX method's size. Each method:

＜ 4.9 KB when numberOfComparisonsThreshold = 40

＜ 6.4 KB when numberOfComparisonsThreshold = 50

＜ 7.8 KB when numberOfComparisonsThreshold = 60

I thought it'd be safer if we pick 40 (taking minor future changes into account). Thus 170 should be considered as some kind of a safe assumption (or not?).

Would you share your thoughts on this? or anyway we can improve this? thanks.

test cases are not real world workloads, we can't estimate comparison code size based on test cases. My idea is that, first we generate the comparison code like before, and check the code size, if it exceeds 1024(see #15620 (comment)), go to the splitting branch. In the splitting branch, we can generate method for each ordering expression.

If you use ctx.splitExpressions() approach, we don't need to calculate the size because the method will split based on the code size of 1024.

thanks @cloud-fan @ueshin for your valuable comments.

@ueshin do you have an on-going work of refactoring ctx.splitExpressions()? If you do, I might update this PR to rely on your work -- do not want to step on your toes :)

I tried to refactor ctx.splitExpressions() like 0aedc47 based on @lw-lin's PoC.
Unfortunately, the max number of comparisons we can support declined to around 5000 (current approach can support 10000 or more) because ctx.splitExpressions() splits comparisons into smaller size than current rule.

mallman · 2017-01-04T19:28:42Z

Hi @lw-lin. Just FYI we use this patch at VideoAmp and would love to see it merged in. I notice this PR has gone a little cold. I'm sorry I can't offer much concrete help, but I wanted to check with you to see if you'll be able to pick this up again soon. Cheers.

lw-lin · 2017-01-05T14:54:23Z

Hi @mallman, I'll pick this up within this week. Thanks for the feedback! :)

SparkQA · 2017-01-09T05:47:39Z

Test build #71063 has started for PR 15480 at commit 3d31cb3.

lw-lin · 2017-01-09T08:16:13Z

Jenkins retest this please

SparkQA · 2017-01-09T10:36:45Z

Test build #71073 has finished for PR 15480 at commit 3d31cb3.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

lw-lin · 2017-01-10T02:42:26Z

I've cherry-picked the refactoring work of splitExpression (from @ueshin -- thank you!) into this. Also test passed.
So @cloud-fan would you take a look at this again at your convenience? Thanks!

cloud-fan · 2017-01-10T04:50:43Z

...lyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateOrdering.scala

+        val comp = ctx.freshName("comp")
+        funCalls.zipWithIndex.map { case (funCall, i) =>
+          s"""
+            int ${comp}_$i = $funCall;


nit: ctx.freshName already adds postfix to the name, you don't need to add _$i again.

It's comp_0, comp_1 in the following:

/* 15388 */ public int compare(InternalRow a, InternalRow b) { /* 15389 */ /* 15390 */ int comp_0 = compare_0(a, b); /* 15391 */ if (comp_0 != 0) { /* 15392 */ return comp_0; /* 15393 */ } /* 15394 */ /* 15395 */ int comp_1 = compare_1(a, b); /* 15396 */ if (comp_1 != 0) { /* 15397 */ return comp_1; /* 15398 */ } /* 1.... */ /* 1.... */ ... /* 1.... */ /* 15450 */ return 0; /* 15451 */ }

so maybe let's keep this _$i?

can you double check? the implementation of freshName is

if (freshNameIds.contains(fullName)) { val id = freshNameIds(fullName) freshNameIds(fullName) = id + 1 s"$fullName$id" } else { freshNameIds += fullName -> 1 fullName }

it already adds an id postfix.

Ah I mis-understood it. You meant moving ctx.freshName("comp") into the funCalls.zipWithIndex.map {...}, right? like the following:

// val comp = ctx.freshName("comp") // this is moved into the map {...} funCalls.zipWithIndex.map { case (funCall, i) => s""" int ${ctx.freshName("comp")} = $funCall; ... }

I mean

val comp = ctx.freshName("comp") funCalls.zipWithIndex.map { case (funCall, i) => s""" int $comp = $funCall; ... }

just remove all the _$i here

thanks for clarifying on this. but we'll get something like (suppose we got a fresh name comp_1):

/* 15388 */ public int compare(InternalRow a, InternalRow b) { /* 15389 */ /* 15390 */ int comp_1 = compare_0(a, b); // comp_1 /* 15391 */ if (comp_1 != 0) { /* 15392 */ return comp_1; /* 15393 */ } /* 15394 */ /* 15395 */ int comp_1 = compare_1(a, b); // still comp_1 /* 15396 */ if (comp_1 != 0) { /* 15397 */ return comp_1; /* 15398 */ } /* 1.... */ /* 1.... */ ... /* 1.... */ /* 15450 */ return 0; /* 15451 */ }

oh sorry, we should put the freshName in the map function

funCalls.zipWithIndex.map { case (funCall, i) => val comp = ctx.freshName("comp") s""" int $comp = $funCall; ... }

Ah i see - let me update this :-)

cloud-fan · 2017-01-10T04:52:29Z

sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/OrderingSuite.scala

+    GenerateOrdering.generate(Array.fill(40)(sortOrder))
+
+    // this is FAILING prior to SPARK-16845, but it should be passing after SPARK-16845
+    GenerateOrdering.generate(Array.fill(450)(sortOrder))


This is unnecessary, it's covered by the 5000 test case.

Sure, let's remove the 450 test case

cloud-fan · 2017-01-10T04:52:46Z

LGTM

SparkQA · 2017-01-10T07:43:37Z

Test build #71115 has started for PR 15480 at commit 4aef473.

lw-lin · 2017-01-10T08:24:15Z

Jenkins retest this please

SparkQA · 2017-01-10T10:50:07Z

Test build #71116 has finished for PR 15480 at commit 4aef473.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

## What changes were proposed in this pull request? Prior to this patch, we'll generate `compare(...)` for `GeneratedClass$SpecificOrdering` like below, leading to Janino exceptions saying the code grows beyond 64 KB. ``` scala /* 005 */ class SpecificOrdering extends o.a.s.sql.catalyst.expressions.codegen.BaseOrdering { /* ..... */ ... /* 10969 */ private int compare(InternalRow a, InternalRow b) { /* 10970 */ InternalRow i = null; // Holds current row being evaluated. /* 10971 */ /* 1.... */ code for comparing field0 /* 1.... */ code for comparing field1 /* 1.... */ ... /* 1.... */ code for comparing field449 /* 15012 */ /* 15013 */ return 0; /* 15014 */ } /* 15015 */ } ``` This patch would break `compare(...)` into smaller `compare_xxx(...)` methods when necessary; then we'll get generated `compare(...)` like: ``` scala /* 001 */ public SpecificOrdering generate(Object[] references) { /* 002 */ return new SpecificOrdering(references); /* 003 */ } /* 004 */ /* 005 */ class SpecificOrdering extends o.a.s.sql.catalyst.expressions.codegen.BaseOrdering { /* 006 */ /* 007 */ ... /* 1.... */ /* 11290 */ private int compare_0(InternalRow a, InternalRow b) { /* 11291 */ InternalRow i = null; // Holds current row being evaluated. /* 11292 */ /* 11293 */ i = a; /* 11294 */ boolean isNullA; /* 11295 */ UTF8String primitiveA; /* 11296 */ { /* 11297 */ /* 11298 */ Object obj = ((Expression) references[0]).eval(null); /* 11299 */ UTF8String value = (UTF8String) obj; /* 11300 */ isNullA = false; /* 11301 */ primitiveA = value; /* 11302 */ } /* 11303 */ i = b; /* 11304 */ boolean isNullB; /* 11305 */ UTF8String primitiveB; /* 11306 */ { /* 11307 */ /* 11308 */ Object obj = ((Expression) references[0]).eval(null); /* 11309 */ UTF8String value = (UTF8String) obj; /* 11310 */ isNullB = false; /* 11311 */ primitiveB = value; /* 11312 */ } /* 11313 */ if (isNullA && isNullB) { /* 11314 */ // Nothing /* 11315 */ } else if (isNullA) { /* 11316 */ return -1; /* 11317 */ } else if (isNullB) { /* 11318 */ return 1; /* 11319 */ } else { /* 11320 */ int comp = primitiveA.compare(primitiveB); /* 11321 */ if (comp != 0) { /* 11322 */ return comp; /* 11323 */ } /* 11324 */ } /* 11325 */ /* 11326 */ /* 11327 */ i = a; /* 11328 */ boolean isNullA1; /* 11329 */ UTF8String primitiveA1; /* 11330 */ { /* 11331 */ /* 11332 */ Object obj1 = ((Expression) references[1]).eval(null); /* 11333 */ UTF8String value1 = (UTF8String) obj1; /* 11334 */ isNullA1 = false; /* 11335 */ primitiveA1 = value1; /* 11336 */ } /* 11337 */ i = b; /* 11338 */ boolean isNullB1; /* 11339 */ UTF8String primitiveB1; /* 11340 */ { /* 11341 */ /* 11342 */ Object obj1 = ((Expression) references[1]).eval(null); /* 11343 */ UTF8String value1 = (UTF8String) obj1; /* 11344 */ isNullB1 = false; /* 11345 */ primitiveB1 = value1; /* 11346 */ } /* 11347 */ if (isNullA1 && isNullB1) { /* 11348 */ // Nothing /* 11349 */ } else if (isNullA1) { /* 11350 */ return -1; /* 11351 */ } else if (isNullB1) { /* 11352 */ return 1; /* 11353 */ } else { /* 11354 */ int comp = primitiveA1.compare(primitiveB1); /* 11355 */ if (comp != 0) { /* 11356 */ return comp; /* 11357 */ } /* 11358 */ } /* 1.... */ /* 1.... */ ... /* 1.... */ /* 12652 */ return 0; /* 12653 */ } /* 1.... */ /* 1.... */ ... /* 15387 */ /* 15388 */ public int compare(InternalRow a, InternalRow b) { /* 15389 */ /* 15390 */ int comp_0 = compare_0(a, b); /* 15391 */ if (comp_0 != 0) { /* 15392 */ return comp_0; /* 15393 */ } /* 15394 */ /* 15395 */ int comp_1 = compare_1(a, b); /* 15396 */ if (comp_1 != 0) { /* 15397 */ return comp_1; /* 15398 */ } /* 1.... */ /* 1.... */ ... /* 1.... */ /* 15450 */ return 0; /* 15451 */ } /* 15452 */ } ``` ## How was this patch tested? - a new added test case which - would fail prior to this patch - would pass with this patch - ordering correctness should already be covered by existing tests like those in `OrderingSuite` ## Acknowledgement A major part of this PR - the refactoring work of `splitExpression()` - has been done by ueshin. Author: Liwei Lin <[email protected]> Author: Takuya UESHIN <[email protected]> Author: Takuya Ueshin <[email protected]> Closes #15480 from lw-lin/spec-ordering-64k-. (cherry picked from commit acfc5f3) Signed-off-by: Wenchen Fan <[email protected]>

cloud-fan · 2017-01-10T11:36:56Z

thanks, merging to master/2.1!

## What changes were proposed in this pull request? Prior to this patch, we'll generate `compare(...)` for `GeneratedClass$SpecificOrdering` like below, leading to Janino exceptions saying the code grows beyond 64 KB. ``` scala /* 005 */ class SpecificOrdering extends o.a.s.sql.catalyst.expressions.codegen.BaseOrdering { /* ..... */ ... /* 10969 */ private int compare(InternalRow a, InternalRow b) { /* 10970 */ InternalRow i = null; // Holds current row being evaluated. /* 10971 */ /* 1.... */ code for comparing field0 /* 1.... */ code for comparing field1 /* 1.... */ ... /* 1.... */ code for comparing field449 /* 15012 */ /* 15013 */ return 0; /* 15014 */ } /* 15015 */ } ``` This patch would break `compare(...)` into smaller `compare_xxx(...)` methods when necessary; then we'll get generated `compare(...)` like: ``` scala /* 001 */ public SpecificOrdering generate(Object[] references) { /* 002 */ return new SpecificOrdering(references); /* 003 */ } /* 004 */ /* 005 */ class SpecificOrdering extends o.a.s.sql.catalyst.expressions.codegen.BaseOrdering { /* 006 */ /* 007 */ ... /* 1.... */ /* 11290 */ private int compare_0(InternalRow a, InternalRow b) { /* 11291 */ InternalRow i = null; // Holds current row being evaluated. /* 11292 */ /* 11293 */ i = a; /* 11294 */ boolean isNullA; /* 11295 */ UTF8String primitiveA; /* 11296 */ { /* 11297 */ /* 11298 */ Object obj = ((Expression) references[0]).eval(null); /* 11299 */ UTF8String value = (UTF8String) obj; /* 11300 */ isNullA = false; /* 11301 */ primitiveA = value; /* 11302 */ } /* 11303 */ i = b; /* 11304 */ boolean isNullB; /* 11305 */ UTF8String primitiveB; /* 11306 */ { /* 11307 */ /* 11308 */ Object obj = ((Expression) references[0]).eval(null); /* 11309 */ UTF8String value = (UTF8String) obj; /* 11310 */ isNullB = false; /* 11311 */ primitiveB = value; /* 11312 */ } /* 11313 */ if (isNullA && isNullB) { /* 11314 */ // Nothing /* 11315 */ } else if (isNullA) { /* 11316 */ return -1; /* 11317 */ } else if (isNullB) { /* 11318 */ return 1; /* 11319 */ } else { /* 11320 */ int comp = primitiveA.compare(primitiveB); /* 11321 */ if (comp != 0) { /* 11322 */ return comp; /* 11323 */ } /* 11324 */ } /* 11325 */ /* 11326 */ /* 11327 */ i = a; /* 11328 */ boolean isNullA1; /* 11329 */ UTF8String primitiveA1; /* 11330 */ { /* 11331 */ /* 11332 */ Object obj1 = ((Expression) references[1]).eval(null); /* 11333 */ UTF8String value1 = (UTF8String) obj1; /* 11334 */ isNullA1 = false; /* 11335 */ primitiveA1 = value1; /* 11336 */ } /* 11337 */ i = b; /* 11338 */ boolean isNullB1; /* 11339 */ UTF8String primitiveB1; /* 11340 */ { /* 11341 */ /* 11342 */ Object obj1 = ((Expression) references[1]).eval(null); /* 11343 */ UTF8String value1 = (UTF8String) obj1; /* 11344 */ isNullB1 = false; /* 11345 */ primitiveB1 = value1; /* 11346 */ } /* 11347 */ if (isNullA1 && isNullB1) { /* 11348 */ // Nothing /* 11349 */ } else if (isNullA1) { /* 11350 */ return -1; /* 11351 */ } else if (isNullB1) { /* 11352 */ return 1; /* 11353 */ } else { /* 11354 */ int comp = primitiveA1.compare(primitiveB1); /* 11355 */ if (comp != 0) { /* 11356 */ return comp; /* 11357 */ } /* 11358 */ } /* 1.... */ /* 1.... */ ... /* 1.... */ /* 12652 */ return 0; /* 12653 */ } /* 1.... */ /* 1.... */ ... /* 15387 */ /* 15388 */ public int compare(InternalRow a, InternalRow b) { /* 15389 */ /* 15390 */ int comp_0 = compare_0(a, b); /* 15391 */ if (comp_0 != 0) { /* 15392 */ return comp_0; /* 15393 */ } /* 15394 */ /* 15395 */ int comp_1 = compare_1(a, b); /* 15396 */ if (comp_1 != 0) { /* 15397 */ return comp_1; /* 15398 */ } /* 1.... */ /* 1.... */ ... /* 1.... */ /* 15450 */ return 0; /* 15451 */ } /* 15452 */ } ``` ## How was this patch tested? - a new added test case which - would fail prior to this patch - would pass with this patch - ordering correctness should already be covered by existing tests like those in `OrderingSuite` ## Acknowledgement A major part of this PR - the refactoring work of `splitExpression()` - has been done by ueshin. Author: Liwei Lin <[email protected]> Author: Takuya UESHIN <[email protected]> Author: Takuya Ueshin <[email protected]> Closes apache#15480 from lw-lin/spec-ordering-64k-.

dsimmie · 2017-02-24T13:31:39Z

Is there any plan to apply this fix to 1.6? (am using the Cloudera version spark-1.6.0-cdh5.9.0) and am seeing this problem with running a countDistinct over a DF with 600 columns.

MGwynne · 2017-03-03T14:32:47Z

Same question as @dsimmie - we are having the same issues for 1.6.2 and can't upgrade to 2.X immediately - would be good to get this backported.

Prior to this patch, we'll generate `compare(...)` for `GeneratedClass$SpecificOrdering` like below, leading to Janino exceptions saying the code grows beyond 64 KB. ``` scala /* 005 */ class SpecificOrdering extends o.a.s.sql.catalyst.expressions.codegen.BaseOrdering { /* ..... */ ... /* 10969 */ private int compare(InternalRow a, InternalRow b) { /* 10970 */ InternalRow i = null; // Holds current row being evaluated. /* 10971 */ /* 1.... */ code for comparing field0 /* 1.... */ code for comparing field1 /* 1.... */ ... /* 1.... */ code for comparing field449 /* 15012 */ /* 15013 */ return 0; /* 15014 */ } /* 15015 */ } ``` This patch would break `compare(...)` into smaller `compare_xxx(...)` methods when necessary; then we'll get generated `compare(...)` like: ``` scala /* 001 */ public SpecificOrdering generate(Object[] references) { /* 002 */ return new SpecificOrdering(references); /* 003 */ } /* 004 */ /* 005 */ class SpecificOrdering extends o.a.s.sql.catalyst.expressions.codegen.BaseOrdering { /* 006 */ /* 007 */ ... /* 1.... */ /* 11290 */ private int compare_0(InternalRow a, InternalRow b) { /* 11291 */ InternalRow i = null; // Holds current row being evaluated. /* 11292 */ /* 11293 */ i = a; /* 11294 */ boolean isNullA; /* 11295 */ UTF8String primitiveA; /* 11296 */ { /* 11297 */ /* 11298 */ Object obj = ((Expression) references[0]).eval(null); /* 11299 */ UTF8String value = (UTF8String) obj; /* 11300 */ isNullA = false; /* 11301 */ primitiveA = value; /* 11302 */ } /* 11303 */ i = b; /* 11304 */ boolean isNullB; /* 11305 */ UTF8String primitiveB; /* 11306 */ { /* 11307 */ /* 11308 */ Object obj = ((Expression) references[0]).eval(null); /* 11309 */ UTF8String value = (UTF8String) obj; /* 11310 */ isNullB = false; /* 11311 */ primitiveB = value; /* 11312 */ } /* 11313 */ if (isNullA && isNullB) { /* 11314 */ // Nothing /* 11315 */ } else if (isNullA) { /* 11316 */ return -1; /* 11317 */ } else if (isNullB) { /* 11318 */ return 1; /* 11319 */ } else { /* 11320 */ int comp = primitiveA.compare(primitiveB); /* 11321 */ if (comp != 0) { /* 11322 */ return comp; /* 11323 */ } /* 11324 */ } /* 11325 */ /* 11326 */ /* 11327 */ i = a; /* 11328 */ boolean isNullA1; /* 11329 */ UTF8String primitiveA1; /* 11330 */ { /* 11331 */ /* 11332 */ Object obj1 = ((Expression) references[1]).eval(null); /* 11333 */ UTF8String value1 = (UTF8String) obj1; /* 11334 */ isNullA1 = false; /* 11335 */ primitiveA1 = value1; /* 11336 */ } /* 11337 */ i = b; /* 11338 */ boolean isNullB1; /* 11339 */ UTF8String primitiveB1; /* 11340 */ { /* 11341 */ /* 11342 */ Object obj1 = ((Expression) references[1]).eval(null); /* 11343 */ UTF8String value1 = (UTF8String) obj1; /* 11344 */ isNullB1 = false; /* 11345 */ primitiveB1 = value1; /* 11346 */ } /* 11347 */ if (isNullA1 && isNullB1) { /* 11348 */ // Nothing /* 11349 */ } else if (isNullA1) { /* 11350 */ return -1; /* 11351 */ } else if (isNullB1) { /* 11352 */ return 1; /* 11353 */ } else { /* 11354 */ int comp = primitiveA1.compare(primitiveB1); /* 11355 */ if (comp != 0) { /* 11356 */ return comp; /* 11357 */ } /* 11358 */ } /* 1.... */ /* 1.... */ ... /* 1.... */ /* 12652 */ return 0; /* 12653 */ } /* 1.... */ /* 1.... */ ... /* 15387 */ /* 15388 */ public int compare(InternalRow a, InternalRow b) { /* 15389 */ /* 15390 */ int comp_0 = compare_0(a, b); /* 15391 */ if (comp_0 != 0) { /* 15392 */ return comp_0; /* 15393 */ } /* 15394 */ /* 15395 */ int comp_1 = compare_1(a, b); /* 15396 */ if (comp_1 != 0) { /* 15397 */ return comp_1; /* 15398 */ } /* 1.... */ /* 1.... */ ... /* 1.... */ /* 15450 */ return 0; /* 15451 */ } /* 15452 */ } ``` - a new added test case which - would fail prior to this patch - would pass with this patch - ordering correctness should already be covered by existing tests like those in `OrderingSuite` A major part of this PR - the refactoring work of `splitExpression()` - has been done by ueshin. Author: Liwei Lin <[email protected]> Author: Takuya UESHIN <[email protected]> Author: Takuya Ueshin <[email protected]> Closes apache#15480 from lw-lin/spec-ordering-64k-.

… beyond 64 KB ## What changes were proposed in this pull request? This is a backport pr of #15480 into `branch-2.0`. ## How was this patch tested? Existing tests. Author: Liwei Lin <[email protected]> Closes #17157 from ueshin/issues/SPARK-16845_2.0.

… beyond 64 KB ## What changes were proposed in this pull request? This is a backport pr of #15480 into `branch-1.6`. ## How was this patch tested? Existing tests. Author: Liwei Lin <[email protected]> Closes #17158 from ueshin/issues/SPARK-16845_1.6.

… beyond 64 KB ## What changes were proposed in this pull request? This is a backport pr of apache#15480 into `branch-1.6`. ## How was this patch tested? Existing tests. Author: Liwei Lin <[email protected]> Closes apache#17158 from ueshin/issues/SPARK-16845_1.6. (cherry picked from commit 23f9faa)

[SPARK-16845][SQL] GeneratedClass$SpecificOrdering grows beyond 64 KB

ecc6720

kiszk reviewed Oct 15, 2016

View reviewed changes

ueshin requested changes Oct 15, 2016

View reviewed changes

ueshin reviewed Oct 15, 2016

View reviewed changes

Address comments

1ae9935

ueshin reviewed Oct 18, 2016

View reviewed changes

Address more comments

33b5fd8

ueshin mentioned this pull request Nov 18, 2016

[SPARK-18467][SQL] Extracts method for preparing arguments from StaticInvoke, Invoke and NewInstance and modify to short circuit if arguments have null when needNullCheck == true. #15901

Closed

cloud-fan reviewed Nov 23, 2016

View reviewed changes

cloud-fan reviewed Nov 28, 2016

View reviewed changes

Refactor ctx.splitExpressions().

0aedc47

Add a comment for splitExpressions.

3d31cb3

danking mentioned this pull request Dec 15, 2016

out of memory error when writing on the cloud hail-is/hail#1186

Closed

cloud-fan reviewed Jan 10, 2017

View reviewed changes

@cloud-fan's comments

4aef473

asfgit closed this in acfc5f3 Jan 10, 2017

lw-lin deleted the spec-ordering-64k- branch January 11, 2017 02:00

This was referenced Mar 4, 2017

[SPARK-16845][SQL][BRANCH-2.0] GeneratedClass$SpecificOrdering grows beyond 64 KB #17157

Closed

[SPARK-16845][SQL][BRANCH-1.6] GeneratedClass$SpecificOrdering grows beyond 64 KB #17158

Closed

[SPARK-16845][SQL] GeneratedClass$SpecificOrdering grows beyond 64 KB #15480

[SPARK-16845][SQL] GeneratedClass$SpecificOrdering grows beyond 64 KB #15480

Conversation

lw-lin commented Oct 14, 2016 • edited Loading

What changes were proposed in this pull request?

How was this patch tested?

Acknowledgement

SparkQA commented Oct 14, 2016

HyukjinKwon commented Oct 14, 2016 • edited Loading

lw-lin commented Oct 14, 2016 • edited Loading

lw-lin commented Oct 14, 2016

lw-lin commented Oct 15, 2016

SparkQA commented Oct 15, 2016

rxin commented Oct 15, 2016

kiszk Oct 15, 2016 • edited Loading

Choose a reason for hiding this comment

lw-lin Oct 17, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kiszk Oct 17, 2016 • edited Loading

Choose a reason for hiding this comment

lw-lin Oct 18, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Oct 18, 2016

ueshin Oct 18, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lw-lin commented Nov 4, 2016

SparkQA commented Nov 4, 2016

SparkQA commented Nov 5, 2016

lw-lin commented Nov 6, 2016

SparkQA commented Nov 6, 2016

kiszk commented Nov 6, 2016

SparkQA commented Nov 6, 2016

lw-lin commented Nov 8, 2016

lw-lin commented Nov 23, 2016

lw-lin commented Nov 23, 2016

SparkQA commented Nov 23, 2016

Choose a reason for hiding this comment

lw-lin Nov 24, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lw-lin Nov 24, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lw-lin Dec 1, 2016 • edited Loading

Choose a reason for hiding this comment

cloud-fan Dec 1, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mallman commented Jan 4, 2017

lw-lin commented Jan 5, 2017 • edited Loading

SparkQA commented Jan 9, 2017

lw-lin commented Jan 9, 2017

SparkQA commented Jan 9, 2017

lw-lin commented Jan 10, 2017

Choose a reason for hiding this comment

lw-lin Jan 10, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lw-lin Jan 10, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lw-lin Jan 10, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

[SPARK-16845][SQL] `GeneratedClass$SpecificOrdering` grows beyond 64 KB #15480

[SPARK-16845][SQL] `GeneratedClass$SpecificOrdering` grows beyond 64 KB #15480

lw-lin commented Oct 14, 2016 •

edited

Loading

HyukjinKwon commented Oct 14, 2016 •

edited

Loading

lw-lin commented Oct 14, 2016 •

edited

Loading

kiszk Oct 15, 2016 •

edited

Loading

lw-lin Oct 17, 2016 •

edited

Loading

kiszk Oct 17, 2016 •

edited

Loading

lw-lin Oct 18, 2016 •

edited

Loading

ueshin Oct 18, 2016 •

edited

Loading

lw-lin Nov 24, 2016 •

edited

Loading

lw-lin Nov 24, 2016 •

edited

Loading

lw-lin Dec 1, 2016 •

edited

Loading

cloud-fan Dec 1, 2016 •

edited

Loading

lw-lin commented Jan 5, 2017 •

edited

Loading

lw-lin Jan 10, 2017 •

edited

Loading

lw-lin Jan 10, 2017 •

edited

Loading

lw-lin Jan 10, 2017 •

edited

Loading

lw-lin Jan 10, 2017 •

edited

Loading

MGwynne commented Mar 3, 2017 •

edited

Loading