-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove AggregateState
wrapper
#4582
Remove AggregateState
wrapper
#4582
Conversation
@@ -519,7 +519,7 @@ fn create_batch_from_map( | |||
accumulators.group_states.iter().map(|group_state| { | |||
group_state.accumulator_set[x] | |||
.state() | |||
.and_then(|x| x[y].as_scalar().map(|v| v.clone())) | |||
.map(|x| x[y].clone()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removing the extra layer of indirection simplifies the code significantly
cc @crepererum and @tustvold as I believe you mentioned plans to simplify aggregates and grouping state, perhaps |
@@ -108,10 +107,10 @@ impl Accumulator for GeometricMean { | |||
// This function serializes our state to `ScalarValue`, which DataFusion uses | |||
// to pass this state between execution stages. | |||
// Note that this can be arbitrary data. | |||
fn state(&self) -> Result<Vec<AggregateState>> { | |||
fn state(&self) -> Result<Vec<ScalarValue>> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eventually it might be nice to change this to be ArrayRef
or something, but we will see what shakes out of #2723
I plan to merge this tomorrow unless anyone else would like more time to review or has additional comments |
.map(|s| vec![s.clone()]) | ||
.and_then(ScalarValue::iter_to_array) | ||
}) | ||
.map(|s| ScalarValue::iter_to_array(vec![s.clone()])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just an observation, this doesn't need to be a vec?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Benchmark runs are scheduled for baseline = 84d3ae8 and contender = 5d424ef. 5d424ef is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
Which issue does this PR close?
As promised on #4488
Rationale for this change
All actual aggregate implementations use
AggregateState::Scalar
It is bad to have
AggregateState::Array
because:What changes are included in this PR?
Are these changes tested?
yes, covered by existing tests
Are there any user-facing changes?
yes -- User defined aggregates are now simpler to write / less error prone
cc @andygrove as he originally wrote this code in #3009
Note that @tustvold is contemplating more significant changes in the Grouping code, so this PR should hopefully help him