BUG: DataFrameGroupBy.getitem fails to propagate dropna #35078

arw2019 · 2020-07-01T02:50:08Z

closes BUG: DataFrameGroupBy.__getitem__ fails to propagate dropna #35014
tests added / passed
passes black pandas
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

WillAyd

Can you add test(s)?

pandas/core/groupby/groupby.py

arw2019 · 2020-07-07T04:27:47Z

A note - in addition to the missing values handling, his PR also fixes a version of #14466 in the SeriesGroupBy version of transform

I solved it similarly to #12559

pep8speaks · 2020-07-09T17:17:16Z

Hello @arw2019! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-08-07 19:26:30 UTC

arw2019 · 2020-07-09T19:42:52Z

Ready to go modulo any comments

arw2019 · 2020-07-16T18:25:36Z

@TomAugspurger what do you think? This aims to resolve the SeriesGroupBy.__getitem__ bug you pointed out a few weeks ago

jreback

we want to make the simplest change possible here - so it means make it at a lower level

jreback · 2020-07-17T10:31:40Z

pandas/core/groupby/generic.py

@@ -548,8 +548,10 @@ def _transform_general(
        # we will only try to coerce the result type if
        # we have a numeric dtype, as these are *always* user-defined funcs
        # the cython take a different path (and casting)
+        # make sure we don't accidentally upcast (GH35014)


how is this related?

without this change equivalent results for SeriesGroupBy andDataFrameGroupBy are cast differently

In [2]: df = pd.DataFrame({"A": [0, 0, 1, None], "B": [1, 2, 3, None]}) In [3]: gb = df.groupby("A", dropna=False) In [4]: gb[['B']].transform(len) Out[4]: B 0 2 1 2 2 1 3 1 In[5]: gb['B'].transform(len) Out[5]: 0 2.0 1 2.0 2 1.0 3 1.0 Name: B, dtype: float64

I tracked this down to SeriesGroupBy._selected_obj which for some reason upcasts:

In [9]: gb['B']._selected_obj Out[9]: 0 1.0 1 2.0 2 3.0 3 NaN Name: B, dtype: float64

jreback · 2020-07-17T10:33:05Z

pandas/core/groupby/groupby.py

@@ -624,7 +625,10 @@ def _get_index(self, name):
        """
        Safe get index, translate keys for datelike to underlying repr.
        """
-        return self._get_indices([name])[0]
+        if isna(name):
+            return self._get_indices([pd.NaT])[0]


we would want _get_indices to handle a null rather than this way

ok! moved this

jreback · 2020-08-03T23:36:16Z

I think #35444 is a more general soln here.

rhshadrach · 2020-08-04T23:01:59Z

@jreback: Unfortunately my PR is not sufficient here. The root issue lies with the use of a dictionary for self.indices within the _GroupBy class. Trying to make a key be None (or np.nan or pd.NaT) causes issues. AFAICT, dropna is never used once the GroupBy object is created.

I think this isn't an issue with propagating dropna, but rather with the SeriesGroupBy transform function itself. Somehow DataFrameGroupBy avoids these issues, although I need to spend more time looking through the code to understand how.

pandas/core/groupby/grouper.py

pandas/core/groupby/groupby.py

jreback · 2020-08-05T02:14:33Z

you are making a lot of changes here, pls try to simplify.

arw2019 · 2020-08-05T04:50:31Z

you are making a lot of changes here, pls try to simplify.

@jreback Ok!

I redid the solution by copying the logic in DataFrameGroupBy._transform_general which is very similar & avoids this problem (thanks @rhshadrach for the suggestion)

…ropna-doesnt-propagate

pandas/core/groupby/generic.py

pandas/tests/groupby/test_groupby_dropna.py

pandas/core/groupby/generic.py

jreback · 2020-08-07T21:33:10Z

thanks @arw2019

arw2019 · 2020-08-07T21:53:39Z

thanks @jreback for reviewing

WillAyd reviewed Jul 1, 2020

View reviewed changes

pandas/core/groupby/groupby.py Outdated Show resolved Hide resolved

WillAyd added the Groupby label Jul 1, 2020

arw2019 mentioned this pull request Jul 7, 2020

ENH: should pandas.core.arrays.Categorical have a dropna=False option? #35162

Closed

arw2019 requested a review from WillAyd July 9, 2020 19:42

jreback requested changes Jul 17, 2020

View reviewed changes

jreback requested changes Aug 5, 2020

View reviewed changes

pandas/core/groupby/grouper.py Outdated Show resolved Hide resolved

pandas/core/groupby/groupby.py Outdated Show resolved Hide resolved

pandas/core/groupby/groupby.py Outdated Show resolved Hide resolved

arw2019 added 16 commits August 7, 2020 01:05

add values.dtype.kind==f branch to array_with_unit_datetime

23c05f7

revert pandas/_libs/tslib.pyx

f564d48

added cast_from_unit definition for float

afe1869

revert accidental changes

c2594e0

revert changes

ef36084

revert accidental changes

cd92bc7

update Grouping.indicies to return for nan values

077bd8e

updated _GroupBy._get_index to return for nan values

72f66d4

revert accidental changes

c5c3a28

revert accidental changes

0214470

revert accidental changes

2a3a86b

styling change

cf71b75

added tests

9051166

fixed groupby/groupby.py's _get_indicies

84e04c0

removed debug statement

86ce781

fixed naming error in test

7090e2d

arw2019 added 14 commits August 7, 2020 01:09

rewrite for loop as list comprehension

736ac69

rewrote if statement as dict comp + ternary

68902eb

fixed small bug in list comp in groupby/groupby.py

c6668f0

deleted debug statement in groupby/groupby.py

46949ea

rewrite _get_index using next_iter to set default value

e16a495

update exepcted test_groupby_nat_exclude for new missing values handling

e00d71d

remove print statement

6d5a441

reworked solution

9c24cf2

fixed PEP8 issue

5637c3e

run pre-commit checks

29c13f6

styling fix

2ea68af

update whatnew + styling improvements

3f5c6d6

move NaN handling to _get_indicies

10147b0

removed 1.1 release note

c9f6f7e

arw2019 force-pushed the groupby-getitem-dropna-doesnt-propagate branch 2 times, most recently from 516d474 to fa2d90a Compare August 7, 2020 01:30

redo solution - modify SeriesGroupBy._transform_general only

9b536dd

arw2019 force-pushed the groupby-getitem-dropna-doesnt-propagate branch from fa2d90a to 9b536dd Compare August 7, 2020 01:32

Merge remote-tracking branch 'upstream/master' into groupby-getitem-d…

2c9de8e

…ropna-doesnt-propagate

jreback requested changes Aug 7, 2020

View reviewed changes

pandas/core/groupby/generic.py Outdated Show resolved Hide resolved

pandas/core/groupby/generic.py Show resolved Hide resolved

pandas/tests/groupby/test_groupby_dropna.py Outdated Show resolved Hide resolved

rewrite case + rewrite tests w fixtures

8d991d5

arw2019 mentioned this pull request Aug 7, 2020

BUG: DataFrameGroupBy.__getitem__ fails to propagate dropna=True #35612

Closed

3 tasks

fix mypy error

570ce21

jreback added this to the 1.2 milestone Aug 7, 2020

jreback requested changes Aug 7, 2020

View reviewed changes

pandas/core/groupby/generic.py Show resolved Hide resolved

jreback approved these changes Aug 7, 2020

View reviewed changes

jreback merged commit f194094 into pandas-dev:master Aug 7, 2020

arw2019 mentioned this pull request Oct 21, 2020

df.groupby(by=["b"], dropna=False).sum() returns"groupby() got an unexpected keyword argument 'dropna'" #37323

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: DataFrameGroupBy.getitem fails to propagate dropna #35078

BUG: DataFrameGroupBy.getitem fails to propagate dropna #35078

arw2019 commented Jul 1, 2020 •

edited

Loading

WillAyd left a comment

arw2019 commented Jul 7, 2020 •

edited

Loading

pep8speaks commented Jul 9, 2020 •

edited

Loading

arw2019 commented Jul 9, 2020

arw2019 commented Jul 16, 2020

jreback left a comment

jreback Jul 17, 2020

arw2019 Jul 22, 2020

jreback Jul 17, 2020

arw2019 Jul 22, 2020

jreback commented Aug 3, 2020

rhshadrach commented Aug 4, 2020

jreback commented Aug 5, 2020

arw2019 commented Aug 5, 2020

jreback commented Aug 7, 2020

arw2019 commented Aug 7, 2020

BUG: DataFrameGroupBy.__getitem__ fails to propagate dropna #35078

BUG: DataFrameGroupBy.__getitem__ fails to propagate dropna #35078

Conversation

arw2019 commented Jul 1, 2020 • edited Loading

WillAyd left a comment

Choose a reason for hiding this comment

arw2019 commented Jul 7, 2020 • edited Loading

pep8speaks commented Jul 9, 2020 • edited Loading

Comment last updated at 2020-08-07 19:26:30 UTC

arw2019 commented Jul 9, 2020

arw2019 commented Jul 16, 2020

jreback left a comment

Choose a reason for hiding this comment

jreback Jul 17, 2020

Choose a reason for hiding this comment

arw2019 Jul 22, 2020

Choose a reason for hiding this comment

jreback Jul 17, 2020

Choose a reason for hiding this comment

arw2019 Jul 22, 2020

Choose a reason for hiding this comment

jreback commented Aug 3, 2020

rhshadrach commented Aug 4, 2020

jreback commented Aug 5, 2020

arw2019 commented Aug 5, 2020

jreback commented Aug 7, 2020

arw2019 commented Aug 7, 2020

BUG: DataFrameGroupBy.getitem fails to propagate dropna #35078

BUG: DataFrameGroupBy.getitem fails to propagate dropna #35078

arw2019 commented Jul 1, 2020 •

edited

Loading

arw2019 commented Jul 7, 2020 •

edited

Loading

pep8speaks commented Jul 9, 2020 •

edited

Loading