Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-6736: [Rust] [DataFusion] Evaluate the input to the aggregate expression just once per batch #5542

Closed
wants to merge 1 commit into from

Conversation

andygrove
Copy link
Member

@andygrove andygrove commented Sep 30, 2019

The current implementation of aggregate expressions in the new physical plan had a flaw where the input to the aggregate expression was repeatedly being evaluated (once per row instead of once per batch). This PR fixes this.

@github-actions
Copy link

@andygrove
Copy link
Member Author

@sinistersnare Could you review this? This is blocking your PRs being merged. Thanks!

@andygrove
Copy link
Member Author

@alippai Could you review this? This is blocking your PRs being merged. Thanks!

@andygrove
Copy link
Member Author

Also could could I get a review from a committer please @paddyhoran @sunchao @nevi-me

@alippai
Copy link
Contributor

alippai commented Oct 3, 2019

@andygrove looks good! Nit: was adding input as a parameter for create_accumulator() considered? It would keep the iterations&maps easier to read later (the downside it's adding extra state to the accumulator)

@sinistersnare
Copy link
Contributor

I also think it looks good for the sum/count use-case, and also for new cases.

@andygrove
Copy link
Member Author

@alippai A new input is created for each batch and the same accumulator is used across batches, so I think it needs to be this way.

@andygrove
Copy link
Member Author

@paddyhoran I have two LGTMs from contributors (not commiters) so if you're OK rubber stamping this I can go ahead and merge and this will allow me to review the other three pending PRs which build on this.

@andygrove andygrove closed this in 399ab8f Oct 4, 2019
kszucs pushed a commit that referenced this pull request Oct 5, 2019
…xpression just once per batch

The current implementation of aggregate expressions in the new physical plan had a flaw where the input to the aggregate expression was repeatedly being evaluated (once per row instead of once per batch). This PR fixes this.

Closes #5542 from andygrove/ARROW-6736 and squashes the following commits:

f0fadaf <Andy Grove> Evaluate the input to the aggregate expression just once per batch

Authored-by: Andy Grove <[email protected]>
Signed-off-by: Andy Grove <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants