Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Add running window fixer for first and last #9299

Closed
revans2 opened this issue Sep 26, 2023 · 3 comments · Fixed by #9623
Closed

[FEA] Add running window fixer for first and last #9299

revans2 opened this issue Sep 26, 2023 · 3 comments · Fixed by #9623
Assignees
Labels
reliability Features to improve reliability or bugs that severly impact the reliability of the plugin

Comments

@revans2
Copy link
Collaborator

revans2 commented Sep 26, 2023

Is your feature request related to a problem? Please describe.
Running window, unbounded preceding to current row, is a very common operation. First and last do show up here, but not that often. However, they should be very trivial to support so we should do it. Just make sure to take into account the null-ability of first and last.

@revans2 revans2 added feature request New feature or request ? - Needs Triage Need team to review and classify reliability Features to improve reliability or bugs that severly impact the reliability of the plugin labels Sep 26, 2023
@revans2
Copy link
Collaborator Author

revans2 commented Sep 26, 2023

First and last do not work for scan aggregations so this will only be a memory optimization.

@sameerz sameerz removed the ? - Needs Triage Need team to review and classify label Sep 26, 2023
@sameerz sameerz removed the feature request New feature or request label Oct 16, 2023
@mythrocks
Copy link
Collaborator

At a future date, there will be value in adding support for FIRST and LAST to be implemented as groupby_scan_aggregation in libcudf. I'll leave that out of scope for this issue.

I'm tackling FIRST as a ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW window aggregation first. RANGE support can follow.

@mythrocks
Copy link
Collaborator

mythrocks commented Oct 20, 2023

I'm tackling FIRST as a ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW...

#9489 should address the ROWS BETWEEN... case.

I'm currently working on the same, for RANGE queries.

mythrocks added a commit that referenced this issue Oct 27, 2023
* Batching support for ROW-based FIRST() window function

Partially addresses #9299.

This commit adds support for batched execution of `FIRST()` aggregations row-based window functions.  This helps relax the erstwhile requirement that the window aggregation be executed over a single in-memory batch, thereby alleviating memory pressure and out-of-memory failures.

Signed-off-by: MithunR <[email protected]>
mythrocks added a commit to mythrocks/spark-rapids that referenced this issue Nov 3, 2023
Fixes NVIDIA#9520.
Fixes NVIDIA#9299.

This is a followup to the changes in NVIDIA#9489, which adds the running
window optimization to the `FIRST()` window fuction.

This commit adds running window support for the `LAST()` window
function, with support for both `ROWS` and `RANGE` based window
specifications.
This change should allow the `LAST()` aggregation to run in multiple
batches if the window is `[UNBOUNDED PRECEDING, CURRENT ROW`]. This
will allow for much larger window and group sizes, because the group
no longer needs to fit in GPU memory. This should help mitigate
out-of-memory errors with `LAST()` window aggregations.

Signed-off-by: MithunR <[email protected]>
mythrocks added a commit that referenced this issue Nov 6, 2023
* Running window optimization for LAST()

Fixes #9520.
Fixes #9299.

This is a followup to the changes in #9489, which adds the running
window optimization to the `FIRST()` window fuction.

This commit adds running window support for the `LAST()` window
function, with support for both `ROWS` and `RANGE` based window
specifications.
This change should allow the `LAST()` aggregation to run in multiple
batches if the window is `[UNBOUNDED PRECEDING, CURRENT ROW`]. This
will allow for much larger window and group sizes, because the group
no longer needs to fit in GPU memory. This should help mitigate
out-of-memory errors with `LAST()` window aggregations.

Signed-off-by: MithunR <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
reliability Features to improve reliability or bugs that severly impact the reliability of the plugin
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants