-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-16844][SQL] Generate code for sort based aggregation #14481
Conversation
Ok to test |
Test build #3202 has finished for PR 14481 at commit
|
retest this please |
@yucai could you post some benchmark results? I would think that the overall runtime of the sort based aggregation path is dominated by the preceding exchange and sort operations, and that as a result this will not yield a enormous speed-up. Could you also post the generated code for a simple case? The helps during the review. |
@hvanhovell thanks very much for the advice, yes, I will post the benchmark results first. |
@hvanhovell Summary Workload
Example 2: aggregate with keys
In above workload pattern, sort actually occpies few time, most of time is used in aggregation, that's the main reason why sortagg code gen speeds up. |
Generated code example, not for code review yet
|
Generated code example, not for code view yet.
|
@chenghao-intel Hao, kindly take a look at. |
@yucai can you please rebase the code? |
@yucai thanks for posting the benchmarks and the code. One high level comment would be to start with a properly sorted dataset for the second benchmark. I would like to know how much time is actually spend in aggregation. |
9048ff0
to
72a0c8a
Compare
461c737
to
958dc05
Compare
0a12860
to
2c22f81
Compare
@hvanhovell What's the status of this? If nobody takes this, I'll do. |
@maropu, I am doing some refactor recently, will update it soon. |
@yucai okay, thanks! |
Can one of the admins verify this patch? |
Any update? |
gentle ping @yucai, let me propose to close this if it is still inactive. |
This PR is in internal review and will ask for community review later.