Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: method .nunique on categorical series in v0.21 with only NaNs gives ValueError #18051

Closed
topper-123 opened this issue Oct 31, 2017 · 6 comments · Fixed by #18436
Closed
Labels
Bug Regression Functionality that used to work in a prior pandas version
Milestone

Comments

@topper-123
Copy link
Contributor

topper-123 commented Oct 31, 2017

Code Sample, a copy-pastable example if possible

>>> ser = pd.Series(pd.Categorical([np.nan]))
>>> ser.nunique()
ValueError: buffer source array is read-only

Problem description

The above code gave 0 in v20.3 and is expected to give 0 also in v0.21. The problem is independent of if I set some categories.

EDIT: Actually this doesn't give error if I set categories. so this only happens if no categories are set. The use case for no categories in my case is programmatically reading in data, where some columns are empty and of dtype categorical.

Expected Output

0 (zero)

Output of pd.show_versions()

INSTALLED VERSIONS

commit: 8137209
python: 3.5.4.final.0
python-bits: 32
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 78 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.21.0
pytest: 3.2.3
pip: 9.0.1
setuptools: 36.5.0.post20170922
Cython: None
numpy: 1.13.3
scipy: None
pyarrow: None
xarray: None
IPython: 6.1.0
sphinx: None
patsy: None
dateutil: 2.6.1
pytz: 2017.3
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: 2.4.8
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@topper-123 topper-123 changed the title BUG: method .nunique on categoricals in v0.21 with only NaNs gives ValueError BUG: method .nunique on categorical series in v0.21 with only NaNs gives ValueError Oct 31, 2017
@jreback
Copy link
Contributor

jreback commented Oct 31, 2017

this is a variation of #10043, fixed by #10070
and duplicated by #17192, though this is example is simple, so should fix.

PR's welcome!

@jreback jreback added Bug Compat pandas objects compatability with Numpy or Python functions Difficulty Intermediate labels Oct 31, 2017
@jreback jreback added this to the 0.21.1 milestone Oct 31, 2017
@topper-123
Copy link
Contributor Author

topper-123 commented Oct 31, 2017

This individual issue can be fixed with a simple ìf not len(self.cat.categories): return 0 but that feels like bypassing the issue in the .unique method. Is that ok is something more involved required?

If this is something in cython or requires larger refactoring, this will be beyond my ability, I'm sorry.

@jreback
Copy link
Contributor

jreback commented Oct 31, 2017

no, this should be fixed in cython.

@jorisvandenbossche jorisvandenbossche added Regression Functionality that used to work in a prior pandas version and removed Compat pandas objects compatability with Numpy or Python functions labels Oct 31, 2017
@topper-123
Copy link
Contributor Author

Tthis is a regression from v0.20.3. Maybe this has something to do with the new CategoricalDtype, @TomAugspurger ?

@TomAugspurger
Copy link
Contributor

I think it's more likely to be the changes to take_nd, but I may be wrong.

@topper-123 topper-123 mentioned this issue Nov 13, 2017
58 tasks
ghasemnaddaf pushed a commit to ghasemnaddaf/pandas that referenced this issue Nov 14, 2017
If `old_categories` is empty (all nan categories) then `_recode_for_categories`
should return `codes.copy()` so that the writable flag is True.
@ghasemnaddaf
Copy link
Contributor

ghasemnaddaf commented Nov 14, 2017

@topper-123 @TomAugspurger @jreback please review #18279

ghasemnaddaf pushed a commit to ghasemnaddaf/pandas that referenced this issue Nov 15, 2017
rebased to remove conflicts in whats new
topper-123 pushed a commit to topper-123/pandas that referenced this issue Nov 22, 2017
topper-123 pushed a commit to topper-123/pandas that referenced this issue Nov 22, 2017
TomAugspurger pushed a commit to TomAugspurger/pandas that referenced this issue Dec 8, 2017
TomAugspurger pushed a commit that referenced this issue Dec 11, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Regression Functionality that used to work in a prior pandas version
Projects
None yet
5 participants