-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Fix some cases of groupby(...).transform with dropna=True #45953
Conversation
…ndas into transform_dropna_1 � Conflicts: � pandas/core/groupby/groupby.py � pandas/core/groupby/ops.py
pandas/core/groupby/ops.py
Outdated
Get the original integer locations of result_index in the input. | ||
""" | ||
# Original indices are where group_index would go via sorting. | ||
# But when dropna is true, we need to remove null values whilst accounting for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"whilst" may be tough for non-native speakers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks - changed to while
@@ -193,10 +191,6 @@ class TestPrinting(base.BasePrintingTests): | |||
|
|||
class TestGroupBy(base.BaseGroupbyTests): | |||
def test_groupby_extension_transform(self, data_for_grouping, request): | |||
if data_for_grouping.dtype.storage == "pyarrow" and pa_version_under2p0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
huh surprised that this solves it. neat.
doc/source/whatsnew/v1.5.0.rst
Outdated
@@ -57,6 +57,39 @@ Styler | |||
|
|||
- Fixed bug in :class:`CSSToExcelConverter` leading to ``TypeError`` when border color provided without border style for ``xlsxwriter`` engine (:issue:`42276`) | |||
|
|||
.. _whatsnew_150.notable_bug_fixes.groupby_transform_dropna: | |||
|
|||
Using ``dropna=True`` with ``groupby`` transformers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we use "transformer" elsewhere? Could use "transform"/"transforms"? Is there a risk of confusion for ML people for whom "transformer" means something else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do mention transformer once in the docs, but agreed with sticking to transform here.
doc/source/whatsnew/v1.5.0.rst
Outdated
|
||
A transformer is an operation whose result has the same size as its input. When the | ||
result is a :class:`DataFrame` or :class:`Series`, it is also required that the | ||
index of the result is equal to that of the input. In pandas 1.4, using |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"is equal" -> "matches"?
I think there was a recent issue with something like df.groupby(foo).transform(DataFrame.sort_index) where there was some question as to whether the index should be retained. Did that get sorted out? (Or is it not relevant here?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It didn't yet get sorted out. In fact, I went to improve the docs because of that issue, and found that dropna
was not well supported for transforms. Wanted to first improve this before returning to the docs.
I agree matches is better than equal here - and plan to clarify that groupby(...).transform
will align when possible.
Thanks @jbrockmendel - changes made and green (so far, at least) |
…ndas into transform_dropna_1
thanks @rhshadrach very nice! |
Part of #17093
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.Plan to expand the whatsnew note as more parts of #45839 are merged.
Ran groupby ASVs;
groupby.Nth.time_frame_nth
showed a performance decrease, then increase, then twice not significantly changed. No other ASVs had significant changes.