Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor scatter for list columns #8255

Merged
merged 23 commits into from
Jun 7, 2021

Conversation

isVoid
Copy link
Contributor

@isVoid isVoid commented May 14, 2021

This PR refactors scatter for LIST type columns. Previously there were nested for_each_n when constructing child columns. The outer loop loops over the rows and the inner loops over the elements of each row. We can replace these loops with a single transform because we already have the offsets information of the column to construct.

For each element, we first lookup the unbound_list_view it belongs to via binary searching the offset vector. Then the corresponding element to copy from can be retrieved by dereferencing bounded list_view with the proper intra index.

Struct type refactor is different. Currently the implementation wraps every child in a lists column and dispatch to the list type specialization. This is fine, but the wrapping process now deep copies the list offsets and child column for dispatching. We can simplify it by just wrapping it with a view.

Since scatter.cuh is included in many other files, separating scatter implementation detail can help reducing compilation time during refactoring the code. Most helper function is moved into scatter_helper.cu.

Benchmarking code for scattering lists is added. Benchmark snapshot is below:

Benchmark                                                                      Time             CPU      Time Old      Time New       CPU Old       CPU New
-----------------------------------------------------------------------------------------------------------------------------------------------------------
ScatterLists/double_type_colesce_o/1024/64/manual_time                      -0.1073         -0.0926        110648         98781        129731        117724
ScatterLists/double_type_colesce_o/4096/64/manual_time                      -0.1177         -0.1015        113393        100045        132412        118971
ScatterLists/double_type_colesce_o/32768/64/manual_time                     -0.3785         -0.3391        167288        103962        185599        122663
ScatterLists/double_type_colesce_o/262144/64/manual_time                    -0.3175         -0.2834        171123        116785        188191        134865
ScatterLists/double_type_colesce_o/2097152/64/manual_time                   -0.2581         -0.2426        270225        200472        290363        219934
ScatterLists/double_type_colesce_o/16777216/64/manual_time                  -0.8464         -0.8438       6205089        953139       6224867        972548
ScatterLists/double_type_colesce_o/33554432/64/manual_time                  -0.8437         -0.8423      12087712       1889483      12107066       1909170
ScatterLists/double_type_colesce_o/1024/512/manual_time                     -0.3487         -0.3111        150169         97810        169463        116736
ScatterLists/double_type_colesce_o/4096/512/manual_time                     -0.3499         -0.3116        151978         98794        170918        117661
ScatterLists/double_type_colesce_o/32768/512/manual_time                    -0.4337         -0.3901        196663        111364        215048        131162
ScatterLists/double_type_colesce_o/262144/512/manual_time                   -0.8083         -0.7844        590691        113251        607891        131089
ScatterLists/double_type_colesce_o/2097152/512/manual_time                  -0.7018         -0.6815        641149        191192        661107        210559
ScatterLists/double_type_colesce_o/16777216/512/manual_time                 -0.6893         -0.6842       2581320        802057       2601542        821602
ScatterLists/double_type_colesce_o/33554432/512/manual_time                 -0.8277         -0.8259       9150244       1576769       9169846       1596137
ScatterLists/double_type_colesce_o/1024/2048/manual_time                    -0.6584         -0.6178        284006         97008        303179        115869
ScatterLists/double_type_colesce_o/4096/2048/manual_time                    -0.6648         -0.6250        289209         96934        308413        115647
ScatterLists/double_type_colesce_o/32768/2048/manual_time                   -0.7433         -0.7089        386115         99120        404566        117774
ScatterLists/double_type_colesce_o/262144/2048/manual_time                  -0.8214         -0.7984        611876        109305        629110        126803
ScatterLists/double_type_colesce_o/2097152/2048/manual_time                 -0.9107         -0.9024       2098263        187417       2118254        206798
ScatterLists/double_type_colesce_o/16777216/2048/manual_time                -0.6869         -0.6816       2527109        791306       2546819        810805
ScatterLists/double_type_colesce_o/33554432/2048/manual_time                -0.5102         -0.5070       3018595       1478458       3038315       1497923

@github-actions github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label May 14, 2021
@isVoid isVoid added improvement Improvement / enhancement to an existing function 2 - In Progress Currently a work in progress labels May 14, 2021
@isVoid isVoid self-assigned this May 14, 2021
@mythrocks
Copy link
Contributor

Thanks for picking this up, @isVoid. I had started to address this in #6791, but got distracted on other PRs. I'm looking forward to reviewing this one. :]

@mythrocks mythrocks self-requested a review May 15, 2021 00:17
@codecov
Copy link

codecov bot commented May 15, 2021

Codecov Report

❗ No coverage uploaded for pull request base (branch-21.08@5b8895d). Click here to learn what that means.
The diff coverage is n/a.

❗ Current head 1689b37 differs from pull request most recent head 3c8cb01. Consider uploading reports for the commit 3c8cb01 to get more accurate results
Impacted file tree graph

@@               Coverage Diff               @@
##             branch-21.08    #8255   +/-   ##
===============================================
  Coverage                ?   82.83%           
===============================================
  Files                   ?      109           
  Lines                   ?    17901           
  Branches                ?        0           
===============================================
  Hits                    ?    14828           
  Misses                  ?     3073           
  Partials                ?        0           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5b8895d...3c8cb01. Read the comment docs.

@github-actions github-actions bot added CMake CMake build issue conda labels May 21, 2021
@isVoid
Copy link
Contributor Author

isVoid commented May 22, 2021

Here are some initial benchmark results. The two control variables are base_row_number and elements_per_row. The performance increase is most noticeable when we have relatively large per-row elements. (>512).

Benchmark code is uploaded.

Benchmark                                                                      Time             CPU      Time Old      Time New       CPU Old       CPU New
-----------------------------------------------------------------------------------------------------------------------------------------------------------
ScatterLists/double_type_colesce_o/1024/64/manual_time                      -0.1073         -0.0926        110648         98781        129731        117724
ScatterLists/double_type_colesce_o/4096/64/manual_time                      -0.1177         -0.1015        113393        100045        132412        118971
ScatterLists/double_type_colesce_o/32768/64/manual_time                     -0.3785         -0.3391        167288        103962        185599        122663
ScatterLists/double_type_colesce_o/262144/64/manual_time                    -0.3175         -0.2834        171123        116785        188191        134865
ScatterLists/double_type_colesce_o/2097152/64/manual_time                   -0.2581         -0.2426        270225        200472        290363        219934
ScatterLists/double_type_colesce_o/16777216/64/manual_time                  -0.8464         -0.8438       6205089        953139       6224867        972548
ScatterLists/double_type_colesce_o/33554432/64/manual_time                  -0.8437         -0.8423      12087712       1889483      12107066       1909170
ScatterLists/double_type_colesce_o/1024/512/manual_time                     -0.3487         -0.3111        150169         97810        169463        116736
ScatterLists/double_type_colesce_o/4096/512/manual_time                     -0.3499         -0.3116        151978         98794        170918        117661
ScatterLists/double_type_colesce_o/32768/512/manual_time                    -0.4337         -0.3901        196663        111364        215048        131162
ScatterLists/double_type_colesce_o/262144/512/manual_time                   -0.8083         -0.7844        590691        113251        607891        131089
ScatterLists/double_type_colesce_o/2097152/512/manual_time                  -0.7018         -0.6815        641149        191192        661107        210559
ScatterLists/double_type_colesce_o/16777216/512/manual_time                 -0.6893         -0.6842       2581320        802057       2601542        821602
ScatterLists/double_type_colesce_o/33554432/512/manual_time                 -0.8277         -0.8259       9150244       1576769       9169846       1596137
ScatterLists/double_type_colesce_o/1024/2048/manual_time                    -0.6584         -0.6178        284006         97008        303179        115869
ScatterLists/double_type_colesce_o/4096/2048/manual_time                    -0.6648         -0.6250        289209         96934        308413        115647
ScatterLists/double_type_colesce_o/32768/2048/manual_time                   -0.7433         -0.7089        386115         99120        404566        117774
ScatterLists/double_type_colesce_o/262144/2048/manual_time                  -0.8214         -0.7984        611876        109305        629110        126803
ScatterLists/double_type_colesce_o/2097152/2048/manual_time                 -0.9107         -0.9024       2098263        187417       2118254        206798
ScatterLists/double_type_colesce_o/16777216/2048/manual_time                -0.6869         -0.6816       2527109        791306       2546819        810805
ScatterLists/double_type_colesce_o/33554432/2048/manual_time                -0.5102         -0.5070       3018595       1478458       3038315       1497923

@github-actions github-actions bot added Java Affects Java cuDF API. Python Affects Python cuDF API. labels May 25, 2021
Copy link
Contributor

@robertmaynard robertmaynard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CMake code LGTM

@revans2
Copy link
Contributor

revans2 commented May 26, 2021

When I try to build this I keep getting errors about

../include/cudf/lists/detail/scatter.cuh(128): error: identifier "build_list_child_column_recursive" is undefined

am I missing something?

@isVoid
Copy link
Contributor Author

isVoid commented May 26, 2021

@revans2 that results from a bad rename attempt. Sorry for the noise! Updating.

#include <cudf/detail/get_value.cuh>
#include <cudf/detail/valid_if.cuh>
#include <cudf/lists/detail/copying.hpp>
#include <cudf/lists/detail/scatter_helper.cuh>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you put the header in lists/detail but the cu file in lists/copying?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't seem like there's a separation between detail/non-detail files in src files.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact unless the header is needed by more than one module, it can go in the src directory rather than in lists/detail.

@isVoid
Copy link
Contributor Author

isVoid commented Jun 2, 2021

rerun tests

Copy link
Contributor

@mythrocks mythrocks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of minor nitpicks. But this looks good to me.

(Sorry, it took a while to review.)

@isVoid isVoid requested a review from mythrocks June 7, 2021 17:25
Copy link
Contributor

@mythrocks mythrocks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Good job, sir.

@isVoid isVoid added 5 - Ready to Merge Testing and reviews complete, ready to merge and removed 3 - Ready for Review Ready for review by team labels Jun 7, 2021
@isVoid
Copy link
Contributor Author

isVoid commented Jun 7, 2021

@gpucibot merge

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to Merge Testing and reviews complete, ready to merge CMake CMake build issue improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants