-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-6736: [Rust] [DataFusion] Evaluate the input to the aggregate expression just once per batch #5542
Conversation
@sinistersnare Could you review this? This is blocking your PRs being merged. Thanks! |
@alippai Could you review this? This is blocking your PRs being merged. Thanks! |
Also could could I get a review from a committer please @paddyhoran @sunchao @nevi-me |
@andygrove looks good! Nit: was adding |
I also think it looks good for the sum/count use-case, and also for new cases. |
@alippai A new input is created for each batch and the same accumulator is used across batches, so I think it needs to be this way. |
@paddyhoran I have two LGTMs from contributors (not commiters) so if you're OK rubber stamping this I can go ahead and merge and this will allow me to review the other three pending PRs which build on this. |
…xpression just once per batch The current implementation of aggregate expressions in the new physical plan had a flaw where the input to the aggregate expression was repeatedly being evaluated (once per row instead of once per batch). This PR fixes this. Closes #5542 from andygrove/ARROW-6736 and squashes the following commits: f0fadaf <Andy Grove> Evaluate the input to the aggregate expression just once per batch Authored-by: Andy Grove <[email protected]> Signed-off-by: Andy Grove <[email protected]>
The current implementation of aggregate expressions in the new physical plan had a flaw where the input to the aggregate expression was repeatedly being evaluated (once per row instead of once per batch). This PR fixes this.