Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-6657: [Rust] [DataFusion] Add Count Aggregate Expression #5513

Closed
wants to merge 2 commits into from

Conversation

sinistersnare
Copy link
Contributor

Hi, I added this code, and the tests pass. I still need to actually test it using a real example, so I would say its not completely ready for merge yet.

@paddyhoran
Copy link
Contributor

Hi @sinistersnare thanks for this! Just let us know when you think it is ready for review (or if you have any questions)?

Copy link
Member

@andygrove andygrove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! Please add SQL tests to context.rs based on the ones for SUM:

https://github.com/apache/arrow/blob/master/rust/datafusion/src/execution/context.rs#L616-L642

@sinistersnare
Copy link
Contributor Author

Those are exactly the tests I was looking for. Thanks, I will push an update tonight!

@andygrove
Copy link
Member

@sinistersnare I see you merged master into your branch .. that can lead to issues because we don't use a merging model on this repo. See https://andygrove.io/apache_arrow_git_tips/ for more info.

@sinistersnare
Copy link
Contributor Author

Took a bit longer than expected (moving currently), but I added some SQL tests! Aside from my worry from above, I think I am ready for this.

@sinistersnare
Copy link
Contributor Author

Fixed the style errors too, @andygrove @paddyhoran this should be good-to-go!

Copy link
Member

@andygrove andygrove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks @sinistersnare


impl Accumulator for CountAccumulator {
fn accumulate(&mut self, batch: &RecordBatch, row_index: usize) -> Result<()> {
let array = self.expr.evaluate(batch)?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just spotted an issue with this, and I have the same issue with the SumExpr implementation ... we are evaluating the expression against the whole batch multiple times (once for every row in the batch). This is a design flaw in the accumulator trait I guess. I'll give this some thought today.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be best if we merge this in without this optimization/fix, so you can simply fix both instances at the same time?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please take a look at the proposed fix in #5542 and let me know what you think. I'd prefer to get this reviewed and merged first, then you can rebase this PR and implement the changes.

@github-actions
Copy link

@andygrove
Copy link
Member

@sinistersnare Please rebase against the latest master and I can approve and merge

@sinistersnare
Copy link
Contributor Author

@andygrove updated!

Copy link
Member

@andygrove andygrove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM pending CI

@andygrove andygrove closed this in 368562b Oct 4, 2019
@sinistersnare sinistersnare deleted the ARROW-6657 branch October 4, 2019 22:20
kszucs pushed a commit that referenced this pull request Oct 5, 2019
Hi, I added this code, and the tests pass. I still need to actually test it using a real example, so I would say its not completely ready for merge yet.

Closes #5513 from sinistersnare/ARROW-6657 and squashes the following commits:

64d0c00 <Andy Grove> formatting
12d0c2c <Davis Silverman> Add Count Aggregate Expression

Lead-authored-by: Davis Silverman <[email protected]>
Co-authored-by: Andy Grove <[email protected]>
Signed-off-by: Andy Grove <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants