-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[C++] Add ordered aggregation #32884
Labels
Comments
rtpsw
added a commit
to rtpsw/arrow
that referenced
this issue
Feb 23, 2023
rtpsw
added a commit
to rtpsw/arrow
that referenced
this issue
Feb 23, 2023
rtpsw
added a commit
to rtpsw/arrow
that referenced
this issue
Mar 3, 2023
PR comments from apache#34311
Follow-ups listed in #34475 |
icexelloss
added a commit
that referenced
this issue
Mar 10, 2023
This PR implements "Segmented Aggregation" to the existing aggregation node to improve aggregation on ordered data. A segment group is defined as "a continuous chunk of data that have the same segment key value. e.g, if the input data looks like ``` [0, 0, 0, 1, 2, 2] ``` Then there are three segments `[0, 0, 0]` `[1]` `[2, 2]` (Note the "group" in "segment group" here is added to differentiate from "segment", which is defined as "a continuous chunk of data with in a ExecBatch") Segment aggregation can be used to replace existing hash aggregation in the case that data are ordered. The benefit of this is (1) We can output aggregation result earlier (as soon as a segment group is fully consumed). (2) We only need to hold partial aggregation for one segment group to reduce memory usage. See https://issues.apache.org/jira/browse/ARROW-17642 Replaces #14352 * Closes: #32884 Follow ups ======= * #34475 * #34529 --------- Co-authored-by: Li Jin <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Ordered aggregation is similar to grouped aggregation except that one column in the grouping key is (known to be) ordered. The result of both types of aggregations is the same but the existence of an ordered column enables optimizing.
Reporter: Yaron Gvili / @rtpsw
Assignee: Yaron Gvili / @rtpsw
PRs and other links:
Note: This issue was originally created as ARROW-17642. Please see the migration documentation for further details.
The text was updated successfully, but these errors were encountered: