-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support ORDER BY in AggregateUDF #8984
Comments
This is a neat project to work on, I think largely it would be an API design question |
Interested in this |
I think we also need to update |
@alamb I'm trying to add /// Returns the lexical ordering requirements of the aggregate expression.
fn ordering_req(&self) -> &LexOrdering {
&self.ordering_req
} I'm testing with If |
It seems that #8793 has the similar issue #8793 (comment). But I'm not sure can I also move |
Maybe we can have the AggregateUDF declare its needed ordering in terms of Perhaps we can follow the model of pub file_sort_order: Vec<Vec<Expr>>, In ListingOptions: https://docs.rs/datafusion/latest/datafusion/datasource/listing/struct.ListingOptions.html |
@alamb Do you know the rationale of For example let accumulator: AccumulatorFactoryFunction =
Arc::new(move |_| Ok(Box::new(Self::new(Arc::clone(&captured_state)))));
// Directly construct accumulator and pass around it.
let accumulator = Box::new(Self::new(Arc::clone(&captured_state)));
or let accumulator = Arc::new(Self::new(Arc::clone(&captured_state))); Rewrite to /// Old
/// Return a new [`Accumulator`] that aggregates values for a specific
/// group during query execution.
fn accumulator(&self, arg: &DataType) -> Result<Box<dyn Accumulator>>;
/// New: no arguments, and replace box with Arc to have share ownership
fn accumulator(&self) -> Result<Arc<dyn Accumulator>>; The current |
One thing that may be related is that the hash aggregator creates an And each |
Complete with #9874 |
Is your feature request related to a problem or challenge?
Some built in aggregates (such as
FIRST_VALUE
,LAST_VALUE
andARRAY_AGG
) support an optional ORDER BY argument that defines the order they see their input. For example:This is not supported today in user defined aggregates
Describe the solution you'd like
I would like to be be able to create a user defined aggregate that can specify its input order.
This would roughly require:
AggregateUDFImpl
trait to communicate the ordering somehow .Here are some other places that likely need to changed
https://github.com/apache/arrow-datafusion/blob/b5db7187763bc4511aaffdd6d89b2f0908f17938/datafusion/core/src/physical_planner.rs#L242-L252
https://github.com/apache/arrow-datafusion/blob/b5db7187763bc4511aaffdd6d89b2f0908f17938/datafusion/core/src/physical_planner.rs#L1663-L1690
Maybe looking at how
OrderSensitiveArrayAgg
is implemented can help https://github.com/apache/arrow-datafusion/blob/5d70c32a9a4accf21e9f27ff5ed62666cbbcbe54/datafusion/physical-expr/src/aggregate/array_agg_ordered.rs#L45Describe alternatives you've considered
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: