-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-12951] [SQL] support spilling in generated aggregate #10998
Conversation
Can you add some documentation in the code somewhere appropriate to explain the high level flow (e.g. when X happens, we switch to Y through Z)? |
retest this please |
Test build #50460 has finished for PR 10998 at commit
|
Test build #50494 has finished for PR 10998 at commit
|
Can you include the generated code? |
For query:
will generate:
|
s""" | ||
// generate grouping key | ||
${keyCode.code} | ||
UnsafeRow $buffer = $hashMapTerm.getAggregationBufferFromUnsafeRow($key); | ||
UnsafeRow $buffer = null; | ||
if ($checkFallback) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't get this. You seem to do a look up twice and line (539) . Is that intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nvm. I get it now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if it is confusing we should consider adding documentations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add comments.
In the generated code, what is agg_count38 used for? |
This naming is getting really hard to follow. We should as a follow up make the variable names better. In this case for example, it should be clear if the value's are input values or intermediate result values. |
@nongli agg_count38 is used for checking fallback (only for testing). |
@davies If this debugging is generally useful to have, can comment it in the code and possibly update the variable to be more specific than "count". If this is a one off you used for this, let's remove it. |
Test build #50598 has finished for PR 10998 at commit
|
LGTM |
Test build #2497 has finished for PR 10998 at commit
|
Merged into master. |
This PR add spilling support for generated TungstenAggregate.
If spilling happened, it's not that bad to do the iterator based sort-merge-aggregate (not generated).
The changes will be covered by TungstenAggregationQueryWithControlledFallbackSuite