Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend LEAD/LAG to work with non-fixed-width types #8062

Merged
merged 9 commits into from
May 3, 2021

Conversation

mythrocks
Copy link
Contributor

This commit extends the LEAD()/LAG() offset window functions introduced in #6277 to work with more than just fixed-width data types. The functions are defined as follows:

  1. LEAD(N): Returns the row from the input column, at the specified offset past the
    current row. If the offset crosses the grouping boundary or column boundary for
    a given row, a "default" value is returned if available. The "default" value is
    null, by default.
  2. LAG(N): returns the row from the input column at the specified offset preceding
    the current row. If the offset crosses the grouping boundary or column boundary for
    a given row, a "default" value is returned if available. The "default" value is
    null, by default.

As an illustration, consider the following example input array input column, with two groups (G1 and G2):

[ [1,1,1], [2], [3,3,3], [4,4,4,4],  [66,66], [], [88,88], [99] ]
  <--------------G1-------------->   <------------G2----------->

LEAD(col, 1) yields:

[ [2], [3,3,3], [4,4,4,4], ∅, [], [88,88], [99], ∅ ]

LAG(input_col, 2) yields:

[ ∅, ∅, [2], [3,3,3], ∅, ∅, [], [88,88] ]

If a defaults column is specified with contents:

[ [999], [999], [999], [999], [999], [999], [999], [999] ]

then, LEAD(col, 1, defaults) yields:

[ [2], [3,3,3], [4,4,4,4], [999], [], [88,88], [99], [999] ]

Note that in the cases where the offset (1) would cross the column/group boundary (i.e. indices 3 and 7), the corresponding entry from the defaults column is returned instead of (i.e. null).

@mythrocks mythrocks requested a review from a team as a code owner April 26, 2021 17:37
@github-actions github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Apr 26, 2021
@mythrocks mythrocks self-assigned this Apr 26, 2021
@mythrocks mythrocks added feature request New feature or request non-breaking Non-breaking change labels Apr 26, 2021
@mythrocks
Copy link
Contributor Author

Looks to be a transient test failure:

11:21:25 FATAL: Non-zero status 255: [nvidia-smi, -L]
11:21:25 java.lang.RuntimeException: Non-zero status 255: [nvidia-smi, -L]
11:21:25 	at com.gpuopenanalytics.jenkins.remotedocker.config.NvidiaGpuDevicesConfigItem.executeWithOutput(NvidiaGpuDevicesConfigItem.java:130)
11:21:25 	at com.gpuopenanalytics.jenkins.remotedocker.config.NvidiaGpuDevicesConfigItem.addCreateArgs(NvidiaGpuDevicesConfigItem.java:83)
11:21:25 	at com.gpuopenanalytics.jenkins.remotedocker.job.DockerImageConfiguration.lambda$addCreateArgs$0(DockerImageConfiguration.java:140)
11:21:25 	at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1384)
11:21:25 	at java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:647)
11:21:25 	at com.gpuopenanalytics.jenkins.remotedocker.job.DockerImageConfiguration.addCreateArgs(DockerImageConfiguration.java:140)
11:21:25 	at com.gpuopenanalytics.jenkins.remotedocker.DockerState.getlaunchArgs(DockerState.java:275)
11:21:25 	at com.gpuopenanalytics.jenkins.remotedocker.DockerState.launchContainer(DockerState.java:285)
11:21:25 	at com.gpuopenanalytics.jenkins.remotedocker.DockerState.launchContainers(DockerState.java:199)
11:21:25 	at com.gpuopenanalytics.jenkins.remotedocker.RemoteDockerBuildWrapper.setUp(RemoteDockerBuildWrapper.java:175)
11:21:25 	at hudson.model.Build$BuildExecution.doRun(Build.java:157)
11:21:25 	at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:513)
11:21:25 	at hudson.model.Run.execute(Run.java:1907)
11:21:25 	at hudson.matrix.MatrixRun.run(MatrixRun.java:153)
11:21:25 	at hudson.model.ResourceController.execute(ResourceController.java:97)
11:21:25 	at hudson.model.Executor.run(Executor.java:429)
11:21:25 Recording test results

@codecov
Copy link

codecov bot commented Apr 26, 2021

Codecov Report

Merging #8062 (16eb271) into branch-0.20 (51336df) will increase coverage by 0.01%.
The diff coverage is n/a.

Impacted file tree graph

@@               Coverage Diff               @@
##           branch-0.20    #8062      +/-   ##
===============================================
+ Coverage        82.88%   82.90%   +0.01%     
===============================================
  Files              103      103              
  Lines            17668    17877     +209     
===============================================
+ Hits             14645    14821     +176     
- Misses            3023     3056      +33     
Impacted Files Coverage Δ
python/cudf/cudf/core/abc.py 91.66% <ø> (+0.17%) ⬆️
python/cudf/cudf/core/algorithms.py 82.35% <ø> (ø)
python/cudf/cudf/core/column/__init__.py 100.00% <ø> (ø)
python/cudf/cudf/core/column/categorical.py 92.37% <ø> (+0.13%) ⬆️
python/cudf/cudf/core/column/column.py 88.20% <ø> (-0.44%) ⬇️
python/cudf/cudf/core/column/datetime.py 88.03% <ø> (-1.88%) ⬇️
python/cudf/cudf/core/column/decimal.py 90.29% <ø> (-2.64%) ⬇️
python/cudf/cudf/core/column/interval.py 91.66% <ø> (+0.55%) ⬆️
python/cudf/cudf/core/column/lists.py 86.98% <ø> (-0.43%) ⬇️
python/cudf/cudf/core/column/numerical.py 94.72% <ø> (+0.29%) ⬆️
... and 44 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 27ae8c1...16eb271. Read the comment docs.

cpp/src/rolling/lead_lag_nested_detail.cuh Outdated Show resolved Hide resolved
cpp/src/rolling/lead_lag_nested_detail.cuh Outdated Show resolved Hide resolved
cpp/src/rolling/lead_lag_nested_detail.cuh Outdated Show resolved Hide resolved
cpp/src/rolling/lead_lag_nested_detail.cuh Show resolved Hide resolved
1. Renamed NULL_INDEX to _null_index.
2. Removed an unused variable.
3. Used memory resource and stream to construct output column.
@mythrocks mythrocks requested a review from ttnghia April 28, 2021 22:47
1. Struct -> Class
2. Uniform class member prefixes.
@mythrocks mythrocks requested a review from ttnghia April 29, 2021 04:35
@ttnghia
Copy link
Contributor

ttnghia commented Apr 29, 2021

Rerun tests.

@mythrocks
Copy link
Contributor Author

Rerun tests

cpp/src/rolling/rolling_detail.cuh Outdated Show resolved Hide resolved
cpp/src/rolling/rolling_detail.cuh Outdated Show resolved Hide resolved
cpp/src/rolling/rolling_detail.cuh Outdated Show resolved Hide resolved
cpp/src/rolling/rolling_detail.cuh Outdated Show resolved Hide resolved
cpp/src/rolling/rolling_detail.cuh Outdated Show resolved Hide resolved
cpp/src/rolling/lead_lag_nested_detail.cuh Outdated Show resolved Hide resolved
cpp/src/rolling/lead_lag_nested_detail.cuh Show resolved Hide resolved
cpp/src/rolling/lead_lag_nested_detail.cuh Outdated Show resolved Hide resolved
1. Removed unnecessary headers.
2. Switched gather_map to mark nulls with `size`
   instead of `size+1`.
3. Minor header rearrangement.
@mythrocks
Copy link
Contributor Author

Thanks for the reviews, chaps! I'll wait for CI to pass before checking this in.

@mythrocks mythrocks added the 5 - Ready to Merge Testing and reviews complete, ready to merge label May 3, 2021
@harrism
Copy link
Member

harrism commented May 3, 2021

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 5d50cde into rapidsai:branch-0.20 May 3, 2021
@mythrocks mythrocks deleted the lead_lag_nested branch May 3, 2021 22:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to Merge Testing and reviews complete, ready to merge feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants