BUG: Copy categorical codes if empty (fixes #18051) #18279

ghasemnaddaf · 2017-11-14T05:34:18Z

If old_categories is empty (all nan categories) then _recode_for_categories
should return codes.copy() so that the writable flag is True.

closes BUG: method .nunique on categorical series in v0.21 with only NaNs gives ValueError #18051
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

ghasemnaddaf · 2017-11-14T05:36:31Z

pandas/core/categorical.py

@@ -2279,7 +2279,7 @@ def _recode_for_categories(codes, old_categories, new_categories):

    if len(old_categories) == 0:
        # All null anyway, so just retain the nulls
-        return codes
+        return codes.copy()


return codes causes writable flag to be False hence we get the error reported in #18051

codecov · 2017-11-14T06:59:37Z

Codecov Report

Merging #18279 into master will decrease coverage by 0.01%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master   #18279      +/-   ##
==========================================
- Coverage    91.4%   91.38%   -0.02%     
==========================================
  Files         164      164              
  Lines       49878    49878              
==========================================
- Hits        45590    45581       -9     
- Misses       4288     4297       +9

Flag	Coverage Δ
#multiple	`89.19% <ø> (ø)`	⬆️
#single	`39.41% <ø> (-0.07%)`	⬇️

Impacted Files	Coverage Δ
pandas/core/categorical.py	`95.75% <ø> (ø)`	⬆️
pandas/io/gbq.py	`25% <0%> (-58.34%)`	⬇️
pandas/core/frame.py	`97.8% <0%> (-0.1%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 69472f9...b35f16e. Read the comment docs.

codecov · 2017-11-14T06:59:47Z

Codecov Report

Merging #18279 into master will decrease coverage by 0.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #18279      +/-   ##
==========================================
- Coverage    91.4%   91.38%   -0.02%     
==========================================
  Files         164      164              
  Lines       49880    49880              
==========================================
- Hits        45592    45583       -9     
- Misses       4288     4297       +9

Flag	Coverage Δ
#multiple	`89.19% <100%> (ø)`	⬆️
#single	`39.42% <0%> (-0.07%)`	⬇️

Impacted Files	Coverage Δ
pandas/core/categorical.py	`95.75% <100%> (ø)`	⬆️
pandas/io/gbq.py	`25% <0%> (-58.34%)`	⬇️
pandas/core/frame.py	`97.8% <0%> (-0.1%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 148ed63...21734aa. Read the comment docs.

jorisvandenbossche · 2017-11-14T12:18:33Z

pandas/tests/test_categorical.py

@@ -1227,6 +1227,10 @@ def test_set_categories(self):
        exp_categories = Index(["a", "b", "c", "d"])
        tm.assert_index_equal(cat.categories, exp_categories)

+        # all-nan categories GH 18051
+        cat_nan = Categorical([np.nan])
+        assert cat_nan.unique()._codes.flags.writeable


this are all tests about set_categories so it feels a bit strange to put this one in here

jreback

pls add a whatsnew note in 0.21.1

jreback · 2017-11-14T13:03:27Z

pandas/tests/test_categorical.py

@@ -1227,6 +1227,10 @@ def test_set_categories(self):
        exp_categories = Index(["a", "b", "c", "d"])
        tm.assert_index_equal(cat.categories, exp_categories)

+        # all-nan categories GH 18051
+        cat_nan = Categorical([np.nan])
+        assert cat_nan.unique()._codes.flags.writeable


just assert the results of nunique

there is no nunique for categorical (?) What is being tested makes sense to me, that is the root cause was that the codes in the output of unique() was not writeable.

rebased to remove conflicts in whats new

ghasemnaddaf · 2017-11-15T00:33:43Z

synced off master and rebased to remove conflict in whatsnew

jorisvandenbossche · 2017-11-15T10:41:07Z

I would maybe add the original reported case of pd.Series(pd.Categorical([np.nan])).nunique() as a test as well? (eg in test_categorical.py::TestCategoricalAsBlock)

topper-123 · 2017-11-16T00:14:44Z

Works for me.

This issue is a bit too much under the hood for me to understand, so I can't speak to whether this is the best solution, but it works fine for me.

jreback · 2017-11-16T00:16:27Z

pandas/tests/test_categorical.py

@@ -1673,6 +1673,10 @@ def test_unique(self):
        exp_cat = Categorical(["b", np.nan, "a"], categories=["b", "a"])
        tm.assert_categorical_equal(res, exp_cat)

+        # GH 18051 unique()._codes should be writeable


you just need to compare the result of .unique() that's the user visible thing we are testing here. The non-writable issue is detail.

ghasemnaddaf · 2017-11-20T20:46:11Z

pandas/tests/test_categorical.py

+        cat = Categorical([np.nan])
+        res = cat.unique()
+        exp_cat = Categorical([np.nan], categories=[])
+        tm.assert_categorical_equal(res, exp_cat)


@jreback , do you mean like this? (This is not failing on 0.21.0 though)

There should be a test for

assert pd.Series(pd.Categorical([np.nan])).nunique() == 0

Note that nunique is a method on Series, not Categorical.

@topper-123 I have added that test in https://github.com/pandas-dev/pandas/pull/18279/files#diff-ed4f442894a2f521dfac3193a3a8d8a0R2185 (L2185) below.

That looks great IMO.

Also, you mention above that Categorical([np.nan]).unique() doesn't fail in 0.21.0, but pd.Series(pd.Categorical([np.nan])).unique() does fail. So if you could add a test for that as well in the test for series, then IMO the PR is good (but let @jreback make the final call on that).

topper-123 · 2017-11-21T20:59:03Z

Hi,

I've looked into this again and IMO the tests should go in tests/series/test_analytics.py, as that's where Series.unique and Series.nunique() are tested. These test failures actually are not happening on Categoricals themselves, o tests shouldn't go into test_categorical.py.

So in method test_value_counts_nunique I'd add these lines:

# GH 18051
s = pd.Series(pd.Categorical([]))
assert s.nunique() == 0
s = pd.Series(pd.Categorical([np.nan]))
assert s.nunique() == 0

and in method test_unique I'd add:

# GH 18051
s = pd.Series(pd.Categorical([]))
tm.assert_categorical_equal(s.unique(), pd.Categorical([]),
                            check_dtype=False)
s = pd.Series(pd.Categorical([np.nan]))
tm.assert_categorical_equal(s.unique(), pd.Categorical([np.nan]),
                            check_dtype=False)

and have no tests in test_categorical.py.

@jreback , do you agree?

jreback · 2017-11-22T00:09:42Z

@topper-123 suggestions look reasonable.

topper-123 · 2017-11-22T22:17:24Z

@jreback, is it too late to get this in 0.21.1?

@ghasemnaddaf, if you don't have time to finish this up, I could open a new PR in order to get this into 0.21.1. Alternatively, I'd be very happy if you could finish this up for 0.21.1.

ghasemnaddaf · 2017-11-22T22:18:43Z

sorry @topper-123 im busy till weekend. Go for it thanks

jreback · 2017-11-22T22:42:37Z

superseded by #18436

ghasemnaddaf mentioned this pull request Nov 14, 2017

BUG: method .nunique on categorical series in v0.21 with only NaNs gives ValueError #18051

Closed

ghasemnaddaf commented Nov 14, 2017

View reviewed changes

jorisvandenbossche added Categorical Categorical Data Type Regression Functionality that used to work in a prior pandas version labels Nov 14, 2017

jorisvandenbossche added this to the 0.21.1 milestone Nov 14, 2017

jorisvandenbossche reviewed Nov 14, 2017

View reviewed changes

jreback requested changes Nov 14, 2017

View reviewed changes

BUG: Copy categorical codes if empty (fixes pandas-dev#18051)

6c751c0

rebased to remove conflicts in whats new

ghasemnaddaf force-pushed the catDtype_copy_nan_codes branch from 9189c78 to 6c751c0 Compare November 15, 2017 00:32

jreback requested changes Nov 16, 2017

View reviewed changes

Unit tests

21734aa

ghasemnaddaf commented Nov 20, 2017

View reviewed changes

jreback removed this from the 0.21.1 milestone Nov 22, 2017

jreback added the Needs Backport label Nov 22, 2017

topper-123 mentioned this pull request Nov 22, 2017

BUG: Copy categorical codes if empty (fixes #18051) #18436

Merged

jreback closed this Nov 22, 2017

jreback added this to the No action milestone Nov 22, 2017

jorisvandenbossche removed the Needs Backport label Nov 23, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Copy categorical codes if empty (fixes #18051) #18279

BUG: Copy categorical codes if empty (fixes #18051) #18279

ghasemnaddaf commented Nov 14, 2017 •

edited

Loading

ghasemnaddaf Nov 14, 2017 •

edited

Loading

codecov bot commented Nov 14, 2017

codecov bot commented Nov 14, 2017 •

edited

Loading

jorisvandenbossche Nov 14, 2017

ghasemnaddaf Nov 14, 2017

jreback left a comment

jreback Nov 14, 2017

ghasemnaddaf Nov 14, 2017 •

edited

Loading

ghasemnaddaf commented Nov 15, 2017

jorisvandenbossche commented Nov 15, 2017

topper-123 commented Nov 16, 2017

jreback Nov 16, 2017

ghasemnaddaf Nov 20, 2017 •

edited

Loading

topper-123 Nov 21, 2017 •

edited

Loading

ghasemnaddaf Nov 21, 2017 •

edited

Loading

topper-123 Nov 21, 2017 •

edited

Loading

topper-123 commented Nov 21, 2017

jreback commented Nov 22, 2017

topper-123 commented Nov 22, 2017

ghasemnaddaf commented Nov 22, 2017

jreback commented Nov 22, 2017

BUG: Copy categorical codes if empty (fixes #18051) #18279

BUG: Copy categorical codes if empty (fixes #18051) #18279

Conversation

ghasemnaddaf commented Nov 14, 2017 • edited Loading

ghasemnaddaf Nov 14, 2017 • edited Loading

Choose a reason for hiding this comment

codecov bot commented Nov 14, 2017

Codecov Report

codecov bot commented Nov 14, 2017 • edited Loading

Codecov Report

jorisvandenbossche Nov 14, 2017

Choose a reason for hiding this comment

ghasemnaddaf Nov 14, 2017

Choose a reason for hiding this comment

jreback left a comment

Choose a reason for hiding this comment

jreback Nov 14, 2017

Choose a reason for hiding this comment

ghasemnaddaf Nov 14, 2017 • edited Loading

Choose a reason for hiding this comment

ghasemnaddaf commented Nov 15, 2017

jorisvandenbossche commented Nov 15, 2017

topper-123 commented Nov 16, 2017

jreback Nov 16, 2017

Choose a reason for hiding this comment

ghasemnaddaf Nov 20, 2017 • edited Loading

Choose a reason for hiding this comment

topper-123 Nov 21, 2017 • edited Loading

Choose a reason for hiding this comment

ghasemnaddaf Nov 21, 2017 • edited Loading

Choose a reason for hiding this comment

topper-123 Nov 21, 2017 • edited Loading

Choose a reason for hiding this comment

topper-123 commented Nov 21, 2017

jreback commented Nov 22, 2017

topper-123 commented Nov 22, 2017

ghasemnaddaf commented Nov 22, 2017

jreback commented Nov 22, 2017

ghasemnaddaf commented Nov 14, 2017 •

edited

Loading

ghasemnaddaf Nov 14, 2017 •

edited

Loading

codecov bot commented Nov 14, 2017 •

edited

Loading

ghasemnaddaf Nov 14, 2017 •

edited

Loading

ghasemnaddaf Nov 20, 2017 •

edited

Loading

topper-123 Nov 21, 2017 •

edited

Loading

ghasemnaddaf Nov 21, 2017 •

edited

Loading

topper-123 Nov 21, 2017 •

edited

Loading