-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Array agg groups accumulator #10149
Array agg groups accumulator #10149
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we know of a benchmark we can run to test these changes? 👀
@@ -96,6 +109,139 @@ impl AggregateExpr for ArrayAgg { | |||
fn name(&self) -> &str { | |||
&self.name | |||
} | |||
|
|||
fn groups_accumulator_supported(&self) -> bool { | |||
self.input_data_type.is_primitive() || self.input_data_type == DataType::Utf8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we able to support LargeUtf8 and Binary as well? Given how similar they are to Utf8?
Thank you for this PR @lkt -- I will take a look at this one carefully shortly |
opt_filter: Option<&BooleanArray>, | ||
total_num_groups: usize, | ||
) -> Result<()> { | ||
assert_eq!(new_values.len(), 1, "single argument to update_batch"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please return an internal_err!("") with more user friendly message
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your suggestion. I would keep the assert_eq!
here to keep it consistent with other implementations of GroupsAccumulator
. Eg. see here among others:
https://github.com/apache/datafusion/blob/main/datafusion/physical-expr/src/aggregate/groups_accumulator/prim_op.rs?plain=1#L93
Also this assertion is meant for developer to ensure the correct number of elements in the array between aggregation stages, not the end user. Let me know what you think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We prefer to get rid of application crashes, returning error instead of panic will help. But thanks for pointing out we may want to revisit assertions and return errors wherever possible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Filed #10232
opt_filter: Option<&BooleanArray>, | ||
total_num_groups: usize, | ||
) -> Result<()> { | ||
assert_eq!(new_values.len(), 1, "single argument to update_batch"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
opt_filter: Option<&BooleanArray>, | ||
total_num_groups: usize, | ||
) -> Result<()> { | ||
assert_eq!(values.len(), 1, "single argument to merge_batch"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
Thanks @lkt for your contribution, does this PR stand for |
Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or this will be closed in 7 days. |
Seems to be superceded by #11096 |
Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or this will be closed in 7 days. |
Which issue does this PR close?
Closes #10145.
Rationale for this change
See the issue.
What changes are included in this PR?
GroupsAccumulator for
array_agg
aggregation function for:Not included:
Are these changes tested?
Extended tests in
array_agg.rs
Are there any user-facing changes?
No, just a performance improvement.