Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for group_keys in groupby #11659

Merged
merged 8 commits into from
Sep 8, 2022

Conversation

galipremsagar
Copy link
Contributor

Description

  • This PR adds support for group_keys in groupby. Starting pandas 1.5.0, issues around group_keys have been resolved:

pandas-dev/pandas#34998
pandas-dev/pandas#47185

  • This PR defaults group_keys to False which is the same as what pandas is going to be defaulting to in the future version.
  • Required to unblock pandas-1.5.0 upgrade in cudf: [REVIEW] Upgrade pandas to 1.5 #11617

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@galipremsagar galipremsagar added 3 - Ready for Review Ready for review by team Python Affects Python cuDF API. 4 - Needs cuDF (Python) Reviewer improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Sep 7, 2022
@galipremsagar galipremsagar requested a review from shwina September 7, 2022 00:02
@galipremsagar galipremsagar self-assigned this Sep 7, 2022
@galipremsagar galipremsagar requested a review from a team as a code owner September 7, 2022 00:02
@galipremsagar galipremsagar requested a review from vyasr September 7, 2022 00:02
@codecov
Copy link

codecov bot commented Sep 7, 2022

Codecov Report

❗ No coverage uploaded for pull request base (branch-22.10@66b5a0c). Click here to learn what that means.
Patch has no changes to coverable lines.

❗ Current head a20593e differs from pull request most recent head 86b3c97. Consider uploading reports for the commit 86b3c97 to get more accurate results

Additional details and impacted files
@@               Coverage Diff               @@
##             branch-22.10   #11659   +/-   ##
===============================================
  Coverage                ?   86.42%           
===============================================
  Files                   ?      145           
  Lines                   ?    23000           
  Branches                ?        0           
===============================================
  Hits                    ?    19877           
  Misses                  ?     3123           
  Partials                ?        0           

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all looks fine to me! Multi-indexing is always complicated but I'm happy with this approach. Thank you for the helpful notes in the docs, as well.

@galipremsagar galipremsagar requested a review from shwina September 7, 2022 16:08
@galipremsagar galipremsagar added 5 - Ready to Merge Testing and reviews complete, ready to merge and removed 3 - Ready for Review Ready for review by team 4 - Needs cuDF (Python) Reviewer labels Sep 8, 2022
@galipremsagar
Copy link
Contributor Author

@gpucibot merge

@rapids-bot rapids-bot bot merged commit d3e8f6d into rapidsai:branch-22.10 Sep 8, 2022
@galipremsagar galipremsagar mentioned this pull request Sep 12, 2022
14 tasks
rapids-bot bot pushed a commit that referenced this pull request Sep 21, 2022
This PR introduces `pandas-1.5` support in `cudf`. The changes include:

- [x] Requires `group_keys` support in `groupby` for `dask_cudf` to work: #11659
- [x] Requires `zfill` updates to match `pandas-1.5` behavior: #11634
- [x] `where` API: Ability to inspect a scalar value if it can be fit into the existing dtype, similar to: pandas-dev/pandas#48373
- [x] Switches `ValueError` to `TypeError` when an unknown category is being set to a `CategoricalColumn`
- [x] Handles breaking change of an `ArrowIntervalType` related import that has resulted in `cudf` to error on import itself.
- [x] Fix an issue with `IntervalColumn.to_pandas`.
- [x] Raises error when an object of `boolean` dtype is being set to a `NumericalColumn`.
- [x] Raises error when `pat` is None in `Series.str.startswith` & `Series.str.endswith`.
- [x] Add `IntervalDtype.to_pandas` with appropriate versioning.
- [x] Handle `get_window_bounds` signature changes.
- [x] Fix and version a bunch of pytests.

```python
branch-22.10:

== 4275 failed, 79837 passed, 2049 skipped, 1193 xfailed, 1923 xpassed, 6597 warnings, 4 errors in 1103.52s (0:18:23) ==
== 803 failed, 106 passed, 14 skipped, 14 xfailed, 324 warnings, 17 errors in 148.46s (0:02:28) ==

This PR:

== 84041 passed, 2049 skipped, 1199 xfailed, 1710 xpassed, 6599 warnings in 359.27s (0:05:59) ==
== 954 passed, 14 skipped, 7 xfailed, 3 xpassed, 580 warnings in 54.75s ==
```

Authors:
  - GALI PREM SAGAR (https://github.com/galipremsagar)

Approvers:
  - Ashwin Srinath (https://github.com/shwina)
  - Matthew Roeschke (https://github.com/mroeschke)
  - Mark Sadang (https://github.com/msadang)

URL: #11617
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to Merge Testing and reviews complete, ready to merge improvement Improvement / enhancement to an existing function non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants