Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for unique groupby aggregation #7726

Merged
merged 2 commits into from
Mar 26, 2021

Conversation

shwina
Copy link
Contributor

@shwina shwina commented Mar 25, 2021

Adds support for SeriesGroupBy.unique(). Also adds support for DataFrameGroupBy.unique() but that's not tested, as Pandas doesn't support it (yet?).

Resolves #2973

@shwina shwina requested a review from a team as a code owner March 25, 2021 19:17
@github-actions github-actions bot added the Python Affects Python cuDF API. label Mar 25, 2021
@pytest.mark.parametrize(
"by,data",
[
# ([], []), # error?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

libcudf raises here, but maybe we should early-return anyway. Looking into this.

Copy link
Contributor Author

@shwina shwina Mar 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue as #7611. Likely should handle in a separate fix.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a catch in the Python side temporarily to at least raise a nice NotImplementedError?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Another instance of us catching the libcudf error string..

@shwina shwina added feature request New feature or request non-breaking Non-breaking change labels Mar 25, 2021
@ttnghia
Copy link
Contributor

ttnghia commented Mar 25, 2021

Wait, is this the same as #7664? Or does it have any overlapping work with that PR?

@kkraus14
Copy link
Collaborator

Wait, is this the same as #7664? Or does it have any overlapping work with that PR?

This is different. That PR is enabling elementwise unique, this is just the expected Python API to map to the libcudf groupby collect_set aggregation.

@codecov
Copy link

codecov bot commented Mar 25, 2021

Codecov Report

Merging #7726 (5444ab7) into branch-0.19 (7871e7a) will increase coverage by 0.65%.
The diff coverage is n/a.

Impacted file tree graph

@@               Coverage Diff               @@
##           branch-0.19    #7726      +/-   ##
===============================================
+ Coverage        81.86%   82.52%   +0.65%     
===============================================
  Files              101      101              
  Lines            16884    17458     +574     
===============================================
+ Hits             13822    14407     +585     
+ Misses            3062     3051      -11     
Impacted Files Coverage Δ
python/cudf/cudf/core/buffer.py 84.21% <ø> (+4.96%) ⬆️
python/cudf/cudf/core/column/categorical.py 91.97% <ø> (+0.58%) ⬆️
python/cudf/cudf/core/column/column.py 87.61% <ø> (-0.15%) ⬇️
python/cudf/cudf/core/column/datetime.py 89.73% <ø> (+0.63%) ⬆️
python/cudf/cudf/core/column/decimal.py 92.95% <ø> (-1.92%) ⬇️
python/cudf/cudf/core/column/lists.py 87.68% <ø> (-3.72%) ⬇️
python/cudf/cudf/core/column/numerical.py 94.83% <ø> (-0.20%) ⬇️
python/cudf/cudf/core/column/string.py 86.79% <ø> (+0.30%) ⬆️
python/cudf/cudf/core/column/timedelta.py 90.68% <ø> (+2.45%) ⬆️
python/cudf/cudf/core/column_accessor.py 96.13% <ø> (+0.82%) ⬆️
... and 55 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1417297...5444ab7. Read the comment docs.

@kkraus14 kkraus14 added the 5 - Ready to Merge Testing and reviews complete, ready to merge label Mar 26, 2021
@kkraus14
Copy link
Collaborator

@gpucibot merge

@ttnghia
Copy link
Contributor

ttnghia commented Mar 26, 2021

Rerun tests.

@rapids-bot rapids-bot bot merged commit bf2e96c into rapidsai:branch-0.19 Mar 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to Merge Testing and reviews complete, ready to merge feature request New feature or request non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] Support collect_set
3 participants