[C++] Add ordered aggregation #32884

asfimport · 2022-09-07T11:10:19Z

Ordered aggregation is similar to grouped aggregation except that one column in the grouping key is (known to be) ordered. The result of both types of aggregations is the same but the existence of an ordered column enables optimizing.

Reporter: Yaron Gvili / @rtpsw
Assignee: Yaron Gvili / @rtpsw

PRs and other links:

GitHub Pull Request #14352

_{Note: This issue was originally created as ARROW-17642. Please see the migration documentation for further details.}

PR comments from apache#34311

Minor doc/rename changes

rtpsw · 2023-03-09T19:29:23Z

Follow-ups listed in #34475

This PR implements "Segmented Aggregation" to the existing aggregation node to improve aggregation on ordered data. A segment group is defined as "a continuous chunk of data that have the same segment key value. e.g, if the input data looks like ``` [0, 0, 0, 1, 2, 2] ``` Then there are three segments `[0, 0, 0]` `[1]` `[2, 2]` (Note the "group" in "segment group" here is added to differentiate from "segment", which is defined as "a continuous chunk of data with in a ExecBatch") Segment aggregation can be used to replace existing hash aggregation in the case that data are ordered. The benefit of this is (1) We can output aggregation result earlier (as soon as a segment group is fully consumed). (2) We only need to hold partial aggregation for one segment group to reduce memory usage. See https://issues.apache.org/jira/browse/ARROW-17642 Replaces #14352 * Closes: #32884 Follow ups ======= * #34475 * #34529 --------- Co-authored-by: Li Jin <[email protected]>

asfimport added this to the 11.0.0 milestone Jan 11, 2023

raulcd removed this from the 11.0.0 milestone Jan 11, 2023

rtpsw added a commit to rtpsw/arrow that referenced this issue Feb 23, 2023

apacheGH-32884: [C++] Add ordered aggregation

b2b10cf

github-actions bot mentioned this issue Feb 23, 2023

GH-32884: [C++] Add ordered aggregation #34311

Merged

rtpsw added a commit to rtpsw/arrow that referenced this issue Feb 23, 2023

Merge branch 'main' into apacheGH-32884

636701e

rtpsw added a commit to rtpsw/arrow that referenced this issue Mar 3, 2023

Merge pull request #1 from icexelloss/apacheGH-32884-ordered-agg

5ac8c65

PR comments from apache#34311

rtpsw added a commit to rtpsw/arrow that referenced this issue Mar 7, 2023

Merge pull request #2 from icexelloss/apacheGH-32884-ordered-agg

7b6e953

Minor doc/rename changes

drin mentioned this issue Mar 9, 2023

[C++][Python] A metadata standard for sorted datasets. #34451

Open

github-actions bot assigned rtpsw Mar 10, 2023

icexelloss closed this as completed in #34311 Mar 10, 2023

rtpsw mentioned this issue Mar 17, 2023

[C++] Add ordered/segmented aggregation Substrait extension #34626

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[C++] Add ordered aggregation #32884

[C++] Add ordered aggregation #32884

asfimport commented Sep 7, 2022

rtpsw commented Mar 9, 2023

[C++] Add ordered aggregation #32884

[C++] Add ordered aggregation #32884

Comments

asfimport commented Sep 7, 2022

PRs and other links:

rtpsw commented Mar 9, 2023