-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Copy categorical codes if empty (fixes #18051) #18279
BUG: Copy categorical codes if empty (fixes #18051) #18279
Conversation
@@ -2279,7 +2279,7 @@ def _recode_for_categories(codes, old_categories, new_categories): | |||
|
|||
if len(old_categories) == 0: | |||
# All null anyway, so just retain the nulls | |||
return codes | |||
return codes.copy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return codes
causes writable flag to be False
hence we get the error reported in #18051
Codecov Report
@@ Coverage Diff @@
## master #18279 +/- ##
==========================================
- Coverage 91.4% 91.38% -0.02%
==========================================
Files 164 164
Lines 49878 49878
==========================================
- Hits 45590 45581 -9
- Misses 4288 4297 +9
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #18279 +/- ##
==========================================
- Coverage 91.4% 91.38% -0.02%
==========================================
Files 164 164
Lines 49880 49880
==========================================
- Hits 45592 45583 -9
- Misses 4288 4297 +9
Continue to review full report at Codecov.
|
pandas/tests/test_categorical.py
Outdated
@@ -1227,6 +1227,10 @@ def test_set_categories(self): | |||
exp_categories = Index(["a", "b", "c", "d"]) | |||
tm.assert_index_equal(cat.categories, exp_categories) | |||
|
|||
# all-nan categories GH 18051 | |||
cat_nan = Categorical([np.nan]) | |||
assert cat_nan.unique()._codes.flags.writeable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this are all tests about set_categories
so it feels a bit strange to put this one in here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls add a whatsnew note in 0.21.1
pandas/tests/test_categorical.py
Outdated
@@ -1227,6 +1227,10 @@ def test_set_categories(self): | |||
exp_categories = Index(["a", "b", "c", "d"]) | |||
tm.assert_index_equal(cat.categories, exp_categories) | |||
|
|||
# all-nan categories GH 18051 | |||
cat_nan = Categorical([np.nan]) | |||
assert cat_nan.unique()._codes.flags.writeable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just assert the results of nunique
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is no nunique
for categorical (?) What is being tested makes sense to me, that is the root cause was that the codes
in the output of unique()
was not writeable.
rebased to remove conflicts in whats new
9189c78
to
6c751c0
Compare
synced off master and rebased to remove conflict in whatsnew |
I would maybe add the original reported case of |
Works for me. This issue is a bit too much under the hood for me to understand, so I can't speak to whether this is the best solution, but it works fine for me. |
pandas/tests/test_categorical.py
Outdated
@@ -1673,6 +1673,10 @@ def test_unique(self): | |||
exp_cat = Categorical(["b", np.nan, "a"], categories=["b", "a"]) | |||
tm.assert_categorical_equal(res, exp_cat) | |||
|
|||
# GH 18051 unique()._codes should be writeable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you just need to compare the result of .unique()
that's the user visible thing we are testing here. The non-writable issue is detail.
cat = Categorical([np.nan]) | ||
res = cat.unique() | ||
exp_cat = Categorical([np.nan], categories=[]) | ||
tm.assert_categorical_equal(res, exp_cat) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback , do you mean like this? (This is not failing on 0.21.0
though)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should be a test for
assert pd.Series(pd.Categorical([np.nan])).nunique() == 0
Note that nunique
is a method on Series
, not Categorical
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@topper-123 I have added that test in https://github.com/pandas-dev/pandas/pull/18279/files#diff-ed4f442894a2f521dfac3193a3a8d8a0R2185 (L2185) below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That looks great IMO.
Also, you mention above that Categorical([np.nan]).unique()
doesn't fail in 0.21.0, but pd.Series(pd.Categorical([np.nan])).unique()
does fail. So if you could add a test for that as well in the test for series, then IMO the PR is good (but let @jreback make the final call on that).
Hi, I've looked into this again and IMO the tests should go in So in method # GH 18051
s = pd.Series(pd.Categorical([]))
assert s.nunique() == 0
s = pd.Series(pd.Categorical([np.nan]))
assert s.nunique() == 0 and in method # GH 18051
s = pd.Series(pd.Categorical([]))
tm.assert_categorical_equal(s.unique(), pd.Categorical([]),
check_dtype=False)
s = pd.Series(pd.Categorical([np.nan]))
tm.assert_categorical_equal(s.unique(), pd.Categorical([np.nan]),
check_dtype=False) and have no tests in @jreback , do you agree? |
@topper-123 suggestions look reasonable. |
@jreback, is it too late to get this in 0.21.1? @ghasemnaddaf, if you don't have time to finish this up, I could open a new PR in order to get this into 0.21.1. Alternatively, I'd be very happy if you could finish this up for 0.21.1. |
sorry @topper-123 im busy till weekend. Go for it thanks |
superseded by #18436 |
If
old_categories
is empty (all nan categories) then_recode_for_categories
should return
codes.copy()
so that the writable flag is True.git diff upstream/master -u -- "*.py" | flake8 --diff