Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix edge case in tdigest scalar generation for groups containing all nulls. #9551

Merged

Conversation

nvdbaranec
Copy link
Contributor

For the scalar input case, groups containing all nulls would generate empty digests (those with no clusters). However during the final reduce_by_key call those values were still getting visited and were therefore expecting a place to store the unused reduced values, which would cause a subsequent overrun later on.

The fix is to specifically leave a single "stub" centroid entry for this specific case and then strip them out in a postprocess.

@nvdbaranec nvdbaranec added bug Something isn't working libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change labels Oct 28, 2021
@nvdbaranec nvdbaranec requested a review from a team as a code owner October 28, 2021 18:55
@codecov
Copy link

codecov bot commented Oct 28, 2021

Codecov Report

Merging #9551 (f3356bb) into branch-21.12 (ab4bfaa) will decrease coverage by 0.14%.
The diff coverage is n/a.

❗ Current head f3356bb differs from pull request most recent head c2efde7. Consider uploading reports for the commit c2efde7 to get more accurate results
Impacted file tree graph

@@               Coverage Diff                @@
##           branch-21.12    #9551      +/-   ##
================================================
- Coverage         10.79%   10.64%   -0.15%     
================================================
  Files               116      117       +1     
  Lines             18869    19339     +470     
================================================
+ Hits               2036     2059      +23     
- Misses            16833    17280     +447     
Impacted Files Coverage Δ
python/dask_cudf/dask_cudf/sorting.py 92.90% <0.00%> (-1.21%) ⬇️
python/cudf/cudf/io/csv.py 0.00% <0.00%> (ø)
python/cudf/cudf/io/orc.py 0.00% <0.00%> (ø)
python/cudf/cudf/__init__.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/frame.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/index.py 0.00% <0.00%> (ø)
python/cudf/cudf/io/parquet.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/series.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/reshape.py 0.00% <0.00%> (ø)
python/cudf/cudf/utils/dtypes.py 0.00% <0.00%> (ø)
... and 42 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cfcf90f...c2efde7. Read the comment docs.

@nvdbaranec
Copy link
Contributor Author

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 275f5fc into rapidsai:branch-21.12 Nov 3, 2021
thrust::make_counting_iterator(0),
_weights->mutable_view().begin<double>(),
is_stub_weight);
auto _means = remove_stubs(*means, num_stubs);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants