Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix fallback to sort aggregation for grouping only hash aggregate (21.12) #9898

Merged

Conversation

abellina
Copy link
Contributor

This is a cherry-pick of 335862b for the 21.12 branch.

I'll run a few tests with this, so posting as is (a clean cherry-pick) just to get the ball rolling.

…pidsai#9891)

The following fixes what looks like an unintended fallback to sort aggregate introduced in rapidsai#9545 for a grouping only (no aggregation request) case.

In the PR, the `std::all_of` function is used to determine whether the aggregation requests would be for struct types. That said, when there are no aggregation requests the `std::all_of` function will return true, causing a fallback to the sort aggregation (relevant code: https://github.com/rapidsai/cudf/pull/9545/files#diff-e409f72ddc11ad10fa0099e21b409b92f12bfac8ba1817266696c34a620aa081R645-R650).

I added a benchmark `group_no_requests_benchmark.cu` by mostly copying `group_sum_benchmark.cu` but I changed one critical part. I am re-creating the `groupby` object for each `state`:

```
  for (auto _ : state) {
    cuda_event_timer timer(state, true);
    cudf::groupby::groupby gb_obj(cudf::table_view({keys}));e
    auto result = gb_obj.aggregate(requests);
  }
```

This shows what would happen in the scenario where the `groupby` instance is created each time an aggregate is issued, which would re-create the `helper` each time for the sorted case.

If the `groupby` object is not recreated each time, the difference in performance between the before/after cases is negligible. We never recycle a `groupby` instance when using the groupby API from Spark.

Posting this as draft for feedback as I am not sure if I handled the benchmark part correctly.

This was executed on a T4 GPU.

Before the patch:

```
Groupby/BasicNoRequest/10000/manual_time               0.158 ms        0.184 ms         4420
Groupby/BasicNoRequest/1000000/manual_time              1.72 ms         1.74 ms          408
Groupby/BasicNoRequest/10000000/manual_time             18.9 ms         18.9 ms           37
Groupby/BasicNoRequest/100000000/manual_time             198 ms          198 ms            3
```
<details>
<summary>Full output</summary>
<p>

```
2021-12-12T13:41:08+00:00
Running ./GROUPBY_BENCH
Run on (64 X 2801.89 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x32)
  L1 Instruction 32 KiB (x32)
  L2 Unified 1024 KiB (x32)
  L3 Unified 22528 KiB (x2)
Load Average: 1.01, 0.78, 0.42
--------------------------------------------------------------------------------------------
Benchmark                                                  Time             CPU   Iterations
--------------------------------------------------------------------------------------------
Groupby/Basic/10000/manual_time                        0.117 ms        0.143 ms         5906
Groupby/Basic/1000000/manual_time                      0.524 ms        0.542 ms         1352
Groupby/Basic/10000000/manual_time                      4.41 ms         4.43 ms          159
Groupby/Basic/100000000/manual_time                     50.1 ms         50.1 ms           13
Groupby/PreSorted/1000000/manual_time                  0.332 ms        0.350 ms         2118
Groupby/PreSorted/10000000/manual_time                  2.22 ms         2.23 ms          315
Groupby/PreSorted/100000000/manual_time                 22.2 ms         22.2 ms           30
Groupby/PreSortedNth/1000000/manual_time               0.160 ms        0.188 ms         4381
Groupby/PreSortedNth/10000000/manual_time              0.890 ms        0.917 ms          774
Groupby/PreSortedNth/100000000/manual_time              8.43 ms         8.46 ms           68
Groupby/Shift/1000000/manual_time                      0.764 ms        0.785 ms          902
Groupby/Shift/10000000/manual_time                      9.51 ms         9.53 ms           63
Groupby/Shift/100000000/manual_time                      145 ms          145 ms            4
Groupby/Aggregation/10000/manual_time                   1.56 ms         1.58 ms          442
Groupby/Aggregation/16384/manual_time                   1.59 ms         1.62 ms          435
Groupby/Aggregation/65536/manual_time                   1.73 ms         1.76 ms          404
Groupby/Aggregation/262144/manual_time                  2.95 ms         2.98 ms          237
Groupby/Aggregation/1048576/manual_time                 9.20 ms         9.23 ms           73
Groupby/Aggregation/4194304/manual_time                 36.3 ms         36.3 ms           19
Groupby/Aggregation/10000000/manual_time                92.0 ms         92.1 ms            7
Groupby/Scan/10000/manual_time                          1.56 ms         1.58 ms          447
Groupby/Scan/16384/manual_time                          1.62 ms         1.65 ms          429
Groupby/Scan/65536/manual_time                          1.85 ms         1.88 ms          378
Groupby/Scan/262144/manual_time                         3.54 ms         3.56 ms          197
Groupby/Scan/1048576/manual_time                        12.0 ms         12.0 ms           57
Groupby/Scan/4194304/manual_time                        48.6 ms         48.6 ms           14
Groupby/Scan/10000000/manual_time                        126 ms          126 ms            4
Groupby/BasicNoRequest/10000/manual_time               0.158 ms        0.184 ms         4420
Groupby/BasicNoRequest/1000000/manual_time              1.72 ms         1.74 ms          408
Groupby/BasicNoRequest/10000000/manual_time             18.9 ms         18.9 ms           37
Groupby/BasicNoRequest/100000000/manual_time             198 ms          198 ms            3
Groupby/PreSortedNoRequests/1000000/manual_time        0.194 ms        0.214 ms         3624
Groupby/PreSortedNoRequests/10000000/manual_time        1.25 ms         1.27 ms          571
Groupby/PreSortedNoRequests/100000000/manual_time       12.6 ms         12.7 ms           50
```

</details>

After the patch:

```
Groupby/BasicNoRequest/10000/manual_time               0.058 ms        0.085 ms        11991
Groupby/BasicNoRequest/1000000/manual_time             0.282 ms        0.301 ms         2478
Groupby/BasicNoRequest/10000000/manual_time             2.42 ms         2.44 ms          291
Groupby/BasicNoRequest/100000000/manual_time            29.2 ms         29.2 ms           21
```

<details>
<summary>Full output</summary>
<p>

```
2021-12-12T13:37:50+00:00
Running ./GROUPBY_BENCH
Run on (64 X 2654.22 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x32)
  L1 Instruction 32 KiB (x32)
  L2 Unified 1024 KiB (x32)
  L3 Unified 22528 KiB (x2)
Load Average: 0.64, 0.50, 0.26
--------------------------------------------------------------------------------------------
Benchmark                                                  Time             CPU   Iterations
--------------------------------------------------------------------------------------------
Groupby/Basic/10000/manual_time                        0.116 ms        0.142 ms         5918
Groupby/Basic/1000000/manual_time                      0.523 ms        0.542 ms         1374
Groupby/Basic/10000000/manual_time                      4.37 ms         4.39 ms          162
Groupby/Basic/100000000/manual_time                     51.4 ms         51.5 ms           10
Groupby/PreSorted/1000000/manual_time                  0.331 ms        0.350 ms         2121
Groupby/PreSorted/10000000/manual_time                  2.21 ms         2.23 ms          316
Groupby/PreSorted/100000000/manual_time                 22.2 ms         22.2 ms           27
Groupby/PreSortedNth/1000000/manual_time               0.160 ms        0.188 ms         4384
Groupby/PreSortedNth/10000000/manual_time              0.888 ms        0.915 ms          775
Groupby/PreSortedNth/100000000/manual_time              8.36 ms         8.39 ms           70
Groupby/Shift/1000000/manual_time                      0.764 ms        0.785 ms          904
Groupby/Shift/10000000/manual_time                      9.50 ms         9.52 ms           63
Groupby/Shift/100000000/manual_time                      146 ms          146 ms            4
Groupby/Aggregation/10000/manual_time                   1.53 ms         1.55 ms          446
Groupby/Aggregation/16384/manual_time                   1.58 ms         1.61 ms          437
Groupby/Aggregation/65536/manual_time                   1.72 ms         1.75 ms          405
Groupby/Aggregation/262144/manual_time                  2.93 ms         2.96 ms          236
Groupby/Aggregation/1048576/manual_time                 9.18 ms         9.21 ms           74
Groupby/Aggregation/4194304/manual_time                 36.2 ms         36.3 ms           19
Groupby/Aggregation/10000000/manual_time                91.5 ms         91.6 ms            7
Groupby/Scan/10000/manual_time                          1.55 ms         1.57 ms          452
Groupby/Scan/16384/manual_time                          1.60 ms         1.62 ms          434
Groupby/Scan/65536/manual_time                          1.84 ms         1.87 ms          379
Groupby/Scan/262144/manual_time                         3.54 ms         3.56 ms          198
Groupby/Scan/1048576/manual_time                        12.0 ms         12.0 ms           57
Groupby/Scan/4194304/manual_time                        48.4 ms         48.4 ms           14
Groupby/Scan/10000000/manual_time                        125 ms          125 ms            4
Groupby/BasicNoRequest/10000/manual_time               0.058 ms        0.085 ms        11991
Groupby/BasicNoRequest/1000000/manual_time             0.282 ms        0.301 ms         2478
Groupby/BasicNoRequest/10000000/manual_time             2.42 ms         2.44 ms          291
Groupby/BasicNoRequest/100000000/manual_time            29.2 ms         29.2 ms           21
Groupby/PreSortedNoRequests/1000000/manual_time        0.195 ms        0.215 ms         3604
Groupby/PreSortedNoRequests/10000000/manual_time        1.25 ms         1.27 ms          575
Groupby/PreSortedNoRequests/100000000/manual_time       12.7 ms         12.8 ms           50
```

</details>

Authors:
  - Alessandro Bellina (https://github.com/abellina)

Approvers:
  - Jake Hemstad (https://github.com/jrhemstad)
  - Nghia Truong (https://github.com/ttnghia)
  - Conor Hoekstra (https://github.com/codereport)

URL: rapidsai#9891
@github-actions github-actions bot added CMake CMake build issue libcudf Affects libcudf (C++/CUDA) code. labels Dec 14, 2021
@abellina abellina added bug Something isn't working non-breaking Non-breaking change Performance Performance related issue and removed libcudf Affects libcudf (C++/CUDA) code. CMake CMake build issue labels Dec 14, 2021
@abellina
Copy link
Contributor Author

Given CI is blocked because of:

Branch Checker — Base branch is not under active development

I am not sure someone needs to pull this patch manually, or if this PR is the right approach.

@jrhemstad jrhemstad added the ! - Hotfix Hotfix is a bug that affects the majority of users for which there is no reasonable workaround label Dec 14, 2021
@codecov
Copy link

codecov bot commented Dec 14, 2021

Codecov Report

Merging #9898 (8fbc469) into branch-21.12 (0930f71) will increase coverage by 0.02%.
The diff coverage is n/a.

Impacted file tree graph

@@               Coverage Diff                @@
##           branch-21.12    #9898      +/-   ##
================================================
+ Coverage         10.58%   10.60%   +0.02%     
================================================
  Files               118      118              
  Lines             19686    20082     +396     
================================================
+ Hits               2083     2130      +47     
- Misses            17603    17952     +349     
Impacted Files Coverage Δ
python/cudf/cudf/io/hdf.py 0.00% <0.00%> (ø)
python/cudf/cudf/io/orc.py 0.00% <0.00%> (ø)
python/cudf/cudf/_version.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/abc.py 0.00% <0.00%> (ø)
python/cudf/cudf/api/types.py 0.00% <0.00%> (ø)
python/cudf/cudf/io/dlpack.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/frame.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/index.py 0.00% <0.00%> (ø)
python/cudf/cudf/io/feather.py 0.00% <0.00%> (ø)
python/cudf/cudf/io/parquet.py 0.00% <0.00%> (ø)
... and 47 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0930f71...8fbc469. Read the comment docs.

@abellina abellina marked this pull request as ready for review December 14, 2021 19:30
@abellina abellina requested a review from a team as a code owner December 14, 2021 19:30
@abellina abellina requested review from mythrocks and karthikeyann and removed request for a team December 14, 2021 19:30
@abellina
Copy link
Contributor Author

abellina commented Dec 14, 2021

One thing that this is not doing is changing the version of the cudf jar artifact.

@ajschmidt8 should I change the version, or does that happen separately? There's the version in the jar itself (https://github.com/rapidsai/cudf/blob/branch-21.12/java/pom.xml#L24), and at least one from the cpp side (https://github.com/rapidsai/cudf/blob/branch-22.02/cpp/CMakeLists.txt#L28).

@ajschmidt8
Copy link
Member

One thing that this is not doing is changing the version of the cudf jar artifact.

@ajschmidt8 should I change the version, or does that happen separately? There's the version in the jar itself (https://github.com/rapidsai/cudf/blob/branch-21.12/java/pom.xml#L24), and at least one from the cpp side (https://github.com/rapidsai/cudf/blob/branch-22.02/cpp/CMakeLists.txt#L28).

No need to touch the version strings. We will update any necessary strings during the hotfix release.

@abellina
Copy link
Contributor Author

I've used a jar that I temporarily created as 21.12.1 and had it used with our plugin (21.12.0) as staged. I have run our hash aggregate integration tests that touch this code, and those passed. I'll also run all the tests and then run some benchmark code as well on our side to double check.

@abellina
Copy link
Contributor Author

We ran more tests, including our Spark rapids CI against the would be 21.12.1 jar, and did not find issues. Thanks for the reviews!

@raydouglass raydouglass merged commit fc5661f into rapidsai:branch-21.12 Dec 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
! - Hotfix Hotfix is a bug that affects the majority of users for which there is no reasonable workaround bug Something isn't working non-breaking Non-breaking change Performance Performance related issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants