-
Notifications
You must be signed in to change notification settings - Fork 908
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[REVIEW] Add support for LEAD/LAG window functions for fixed-width types [redux] #6277
[REVIEW] Add support for LEAD/LAG window functions for fixed-width types [redux] #6277
Conversation
Please update the changelog in order to start CI tests. View the gpuCI docs here. |
06af3db
to
2caa5e6
Compare
Initial implementation for LEAD/LAG window functions, for fixed-width types. This version is implemented within grouped_rolling_window().
2caa5e6
to
a56a106
Compare
Codecov Report
@@ Coverage Diff @@
## branch-0.16 #6277 +/- ##
============================================
Coverage 83.00% 83.00%
============================================
Files 95 95
Lines 14901 14901
============================================
Hits 12368 12368
Misses 2533 2533 Continue to review full report at Codecov.
|
@jrhemstad, does this PR look like a more acceptable approach than #6261? Also, @rgsl888prabhu, could I please bother you to have a look? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initial review
1. Fixed includes 2. Un-templatized DeviceLeadLag 3. Documentation fixes
rerun tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only a comment to handle minor stuff, other than this looks good.
1. Moved Lead/Lag specific error checks out of rolling_window() 2. Moved Lead/Lag specific short-circuit evaluation out of rolling_window() 3. Added tests for error-checking
@jrhemstad, have the concerns raised here been addressed satisfactorily? |
1. Remove redundant checks for LEAD/LAG in non-LEAD/LAG SFINAE path 2. Remove hardcode for stream id
Thank you for the review, and the great catches, @jrhemstad. |
This commit extends the [LEAD()/LAG() offset window functions](https://learnsql.com/blog/lead-and-lag-functions-in-sql/) introduced in #6277 to work with more than just fixed-width data types. The functions are defined as follows: 1. `LEAD(N)`: Returns the row from the input column, at the specified offset past the current row. If the offset crosses the grouping boundary or column boundary for a given row, a "default" value is returned if available. The "default" value is null, by default. 1. `LAG(N)`: returns the row from the input column at the specified offset preceding the current row. If the offset crosses the grouping boundary or column boundary for a given row, a "default" value is returned if available. The "default" value is null, by default. As an illustration, consider the following example input array input column, with two groups (`G1` and `G2`): ```c++ [ [1,1,1], [2], [3,3,3], [4,4,4,4], [66,66], [], [88,88], [99] ] <--------------G1--------------> <------------G2-----------> ``` `LEAD(col, 1)` yields: ```c++ [ [2], [3,3,3], [4,4,4,4], ∅, [], [88,88], [99], ∅ ] ``` `LAG(input_col, 2)` yields: ```c++ [ ∅, ∅, [2], [3,3,3], ∅, ∅, [], [88,88] ] ``` If a `defaults` column is specified with contents: ```c++ [ [999], [999], [999], [999], [999], [999], [999], [999] ] ``` then, `LEAD(col, 1, defaults)` yields: ```c++ [ [2], [3,3,3], [4,4,4,4], [999], [], [88,88], [99], [999] ] ``` Note that in the cases where the offset (`1`) would cross the column/group boundary (i.e. indices `3` and `7`), the corresponding entry from the `defaults` column is returned instead of `∅` (i.e. `null`). Authors: - MithunR (https://github.com/mythrocks) Approvers: - Nghia Truong (https://github.com/ttnghia) - Ram (Ramakrishna Prabhu) (https://github.com/rgsl888prabhu) - Karthikeyan (https://github.com/karthikeyann) URL: #8062
This is an alternative implementation for what #6261 attempts to solve, i.e.
LEAD()
/LAG()
offset window functions for fixed-width data types.Offset-based window functions include LEAD and LAG:
LEAD(N)
: Returns the row from the input column, at the specified offset past thecurrent row. If the offset crosses the grouping boundary or column boundary for
a given row, a "default" value is returned if available. The "default" value is
null, by default.
LAG(N)
: returns the row from the input column at the specified offset precedingthe current row. If the offset crosses the grouping boundary or column boundary for
a given row, a "default" value is returned if available. The "default" value is
null, by default.
E.g. Consider an input column with the following values and grouping:
[10, 11, 12, 13, 20, 21, 22, 23]
<------G1-----> <------G2------>
LEAD(input_col, 1)
yields:[11, 12, 13, null, 21, 22, 23, null]
LAG(input_col, 2)
yields:[null, null, 11, 12, null, null, 21, 22]
LEAD(input_col, 1, 99)
(where 99 indicates the default) yields:[11, 12, 13, 99, 21, 22, 23, 99]