-
Notifications
You must be signed in to change notification settings - Fork 236
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Batching support for ROW-based FIRST()
window function
#9489
Batching support for ROW-based FIRST()
window function
#9489
Conversation
This commit adds support for `FIRST()` window functions, running in a ROWS context. This does not currently support ignore/include nulls. Signed-off-by: MithunR <[email protected]>
Plus, tests for partitioned/unpartitioned, and keep/ignore nulls.
Also removed unused import.
Build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks okay to me, but It would be nice to not have First as a special case in GpuWindowExec. Also because you are not doing last could you file a follow on issue for it?
sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuWindowExec.scala
Outdated
Show resolved
Hide resolved
Signed-off-by: MithunR <[email protected]>
Agreed. The latest commit addresses that. The snag was that
I've filed #9520. I'd be happy to tackle it right after the |
Build |
sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuWindowExec.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuWindowExec.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/aggregate/aggregateFunctions.scala
Show resolved
Hide resolved
Build |
The failure last time around was a Parquet test for timestamp push down. |
That is a known issue and the test has been disabled while CUDF tries to fix it. Please upmerge and then rerun your tests. |
Build |
This test appears to be failing because the |
Build |
build |
I've merged this change. Thank you for the reviews. Focusing on #9544 now, for |
Fixes NVIDIA#9569. NVIDIA#9489 added `NTH_VALUE()` tests with option to `IGNORE NULLS`, but mistakenly enabled `IGNORE NULLS` for Spark versions prior to `3.2.1`. This commit restricts tests for `IGNORE NULLS` to only Spark versions exceeding `3.1.x`, where the feature is available.
Fixes NVIDIA#9569. NVIDIA#9489 added `NTH_VALUE()` tests with option to `IGNORE NULLS`, but mistakenly enabled `IGNORE NULLS` for Spark versions prior to `3.2.1`. This commit restricts tests for `IGNORE NULLS` to only Spark versions exceeding `3.1.x`, where the feature is available. Signed-off-by: MithunR <[email protected]>
Fixes #9569. #9489 added `NTH_VALUE()` tests with option to `IGNORE NULLS`, but mistakenly enabled `IGNORE NULLS` for Spark versions prior to `3.2.1`. This commit restricts tests for `IGNORE NULLS` to only Spark versions exceeding `3.1.x`, where the feature is available. Signed-off-by: MithunR <[email protected]>
Fixes NVIDIA#9520. Fixes NVIDIA#9299. This is a followup to the changes in NVIDIA#9489, which adds the running window optimization to the `FIRST()` window fuction. This commit adds running window support for the `LAST()` window function, with support for both `ROWS` and `RANGE` based window specifications. This change should allow the `LAST()` aggregation to run in multiple batches if the window is `[UNBOUNDED PRECEDING, CURRENT ROW`]. This will allow for much larger window and group sizes, because the group no longer needs to fit in GPU memory. This should help mitigate out-of-memory errors with `LAST()` window aggregations. Signed-off-by: MithunR <[email protected]>
* Running window optimization for LAST() Fixes #9520. Fixes #9299. This is a followup to the changes in #9489, which adds the running window optimization to the `FIRST()` window fuction. This commit adds running window support for the `LAST()` window function, with support for both `ROWS` and `RANGE` based window specifications. This change should allow the `LAST()` aggregation to run in multiple batches if the window is `[UNBOUNDED PRECEDING, CURRENT ROW`]. This will allow for much larger window and group sizes, because the group no longer needs to fit in GPU memory. This should help mitigate out-of-memory errors with `LAST()` window aggregations. Signed-off-by: MithunR <[email protected]>
Partially addresses #9299.
This commit adds support for batched execution of
FIRST()
aggregations row-based window functions. This helps relax the erstwhile requirement that the window aggregation be executed over a single in-memory batch, thereby alleviating memory pressure and out-of-memory failures.