-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API: Add dtype parameter to Categorical.from_codes #24398
API: Add dtype parameter to Categorical.from_codes #24398
Conversation
b026e50
to
e2543df
Compare
Codecov Report
@@ Coverage Diff @@
## master #24398 +/- ##
==========================================
+ Coverage 92.37% 92.38% +<.01%
==========================================
Files 166 166
Lines 52315 52323 +8
==========================================
+ Hits 48327 48337 +10
+ Misses 3988 3986 -2
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #24398 +/- ##
==========================================
+ Coverage 92.3% 92.3% +<.01%
==========================================
Files 162 162
Lines 51875 51874 -1
==========================================
+ Hits 47883 47884 +1
+ Misses 3992 3990 -2
Continue to review full report at Codecov.
|
pandas/core/arrays/categorical.py
Outdated
if dtype is not None: | ||
if categories is not None or ordered is not None: | ||
raise ValueError("Cannot specify both `dtype` and `categories`" | ||
" or `ordered`.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error message confuses me. Are you saying: both "dtype" and ("categories" / "ordered") ?
I think this will need to be reworded for clarity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I copied that message from Categorical.__init__
, but I agree, and have changed it in both locations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fantastic. Thanks for doing that!
fcf731b
to
c790639
Compare
Hello @topper-123! Thanks for updating the PR. Cheers ! There are no PEP8 issues in this Pull Request. 🍻 Comment last updated on January 08, 2019 at 14:54 Hours UTC |
6997fd8
to
31974af
Compare
096c7a5
to
229b474
Compare
pandas/io/packers.py
Outdated
@@ -59,6 +59,7 @@ | |||
Categorical, CategoricalIndex, DataFrame, DatetimeIndex, Float64Index, | |||
Index, Int64Index, Interval, IntervalIndex, MultiIndex, NaT, Panel, Period, | |||
PeriodIndex, RangeIndex, Series, TimedeltaIndex, Timestamp) | |||
from pandas.api.types import CategoricalDtype as CDT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same style
pandas/io/pytables.py
Outdated
@@ -30,6 +30,7 @@ | |||
DataFrame, DatetimeIndex, Index, Int64Index, MultiIndex, Panel, | |||
PeriodIndex, Series, SparseDataFrame, SparseSeries, TimedeltaIndex, compat, | |||
concat, isna, to_datetime) | |||
from pandas.api.types import CategoricalDtype |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
merge master |
@topper-123 can you merge master and update |
d6d3f81
to
8a6ec5d
Compare
Ok, i’ve reverted the deprecation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may be missing it, but did you add a test for Categorical.from_codes(codes, categories, dtype=dtype)
raising?
Categorical.from_codes([0, 1], Categorical(['a', 'b', 'a'])) | ||
codes = np.random.choice([0, 1], 5, p=[0.9, 0.1]) | ||
dtype = CategoricalDtype(categories=["train", "test"]) | ||
Categorical.from_codes(codes, dtype=dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this test is duplicative with earlier ones (even on master). I'd be OK with removing it.
Yes, +1 on not deprecating categories and ordered |
so ok with adding |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see comments
Mainly balancing the volume of code that is out there using categories &
orderer, against the (small) cost of supporting both.
…On Tue, Jan 8, 2019 at 6:45 AM Jeff Reback ***@***.***> wrote:
so ok with adding dtype in .from_codes as that promotes consistency, but
why are folks not in favor of deprcating categories and ordered? this is
just moving code away from the single point of using CDT for all operations.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#24398 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABQHIopOGnDpPo60kgmb1BgMOvTbSueTks5vBJL8gaJpZM4ZfnHF>
.
|
We have exactly the same pattern in the main |
ok I can see the argument for this then. But this is a tag confusing, maybe let's enhance the doc-strings slightly on the constructor & from_codes to make this even more cclear that you should pass (categories, ordered) or dtype (yes it errors, but a doc-string not will help). @topper-123 can you raise an issue / PR for this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small comments and @TomAugspurger has some corrections
6008c08 has some changes
|
Buglet when neither In [1]: import pandas as pd
In [2]: pd.Categorical.from_codes([0, 1])
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-2-e8a6a967ddf5> in <module>
----> 1 pd.Categorical.from_codes([0, 1])
~/sandbox/pandas/pandas/core/arrays/categorical.py in from_codes(cls, codes, categories, ordered, dtype)
661
662 if len(codes) and (
--> 663 codes.max() >= len(dtype.categories) or codes.min() < -1):
664 raise ValueError("codes need to be between -1 and "
665 "len(categories)-1")
TypeError: object of type 'NoneType' has no len() fixing now. |
6008c08 also had a bug with the tests for raising when both |
+1 to the changes made by @TomAugspurger |
Thanks. Merging in a few hours if now objections. |
yeah this is ok. |
* Add dtype to Categorical.from_codes
* Add dtype to Categorical.from_codes
#6581 lists the categorical and ordered kwargs as deprecated, but i dont see any warnings to that effect. Am i missing something? |
That seems a mistake indeed, as those arguments are not deprecated. Looking at the last part of the comments above, there was some discussion about this but finally decided to not deprecate. Removed it from the list in #6581 |
Yeah, the original proposal was to deprecate, but we decided to keep them after all, as it we felt it was more convenient to allow a similar instantiation method as in |
@topper-123 thanks for clarifying. Does this mean that either a) an entry can be removed from #6581 or b) some FutureWarnings can be removed from CategoricalDtype/Categorical? If so, are you up for taking point on that? |
@jbrockmendel as I said above, I already removed it from #6581 There were no warnings introduced in this PR, so also nothing to remove (there are other categorical related ones though, but not related to this PR). |
git diff upstream/master -u -- "*.py" | flake8 --diff
Added parameter
dtype
toCategorical.from_codes
.