-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-6657: [Rust] [DataFusion] Add Count Aggregate Expression #5513
Conversation
Hi @sinistersnare thanks for this! Just let us know when you think it is ready for review (or if you have any questions)? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great! Please add SQL tests to context.rs based on the ones for SUM:
Those are exactly the tests I was looking for. Thanks, I will push an update tonight! |
@sinistersnare I see you merged master into your branch .. that can lead to issues because we don't use a merging model on this repo. See https://andygrove.io/apache_arrow_git_tips/ for more info. |
b93c61a
to
7820840
Compare
Took a bit longer than expected (moving currently), but I added some SQL tests! Aside from my worry from above, I think I am ready for this. |
7820840
to
2e85443
Compare
Fixed the style errors too, @andygrove @paddyhoran this should be good-to-go! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks @sinistersnare
|
||
impl Accumulator for CountAccumulator { | ||
fn accumulate(&mut self, batch: &RecordBatch, row_index: usize) -> Result<()> { | ||
let array = self.expr.evaluate(batch)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just spotted an issue with this, and I have the same issue with the SumExpr
implementation ... we are evaluating the expression against the whole batch multiple times (once for every row in the batch). This is a design flaw in the accumulator trait I guess. I'll give this some thought today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be best if we merge this in without this optimization/fix, so you can simply fix both instances at the same time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please take a look at the proposed fix in #5542 and let me know what you think. I'd prefer to get this reviewed and merged first, then you can rebase this PR and implement the changes.
@sinistersnare Please rebase against the latest master and I can approve and merge |
2e85443
to
895f2ca
Compare
@andygrove updated! |
895f2ca
to
6c2d823
Compare
6c2d823
to
12d0c2c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM pending CI
Hi, I added this code, and the tests pass. I still need to actually test it using a real example, so I would say its not completely ready for merge yet. Closes #5513 from sinistersnare/ARROW-6657 and squashes the following commits: 64d0c00 <Andy Grove> formatting 12d0c2c <Davis Silverman> Add Count Aggregate Expression Lead-authored-by: Davis Silverman <[email protected]> Co-authored-by: Andy Grove <[email protected]> Signed-off-by: Andy Grove <[email protected]>
Hi, I added this code, and the tests pass. I still need to actually test it using a real example, so I would say its not completely ready for merge yet.