-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH union_categoricals supports ignore_order GH13410 #15219
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
Added a few comments. Can you add a whatsnew note in v0.20.0.txt and add something to the docs about this (http://pandas.pydata.org/pandas-docs/stable/categorical.html#unioning)
Can you also add a test case with differing categories (not only different order)? So a case that would raise in case of ignore_ordered=False (eg the first test case of test_union_categoricals_sort
but with ordered categories)
pandas/tools/tests/test_concat.py
Outdated
tm.assert_categorical_equal(res, exp) | ||
|
||
res = union_categoricals([c1, c1], ignore_order=True) | ||
exp = Categorical([1, 2, 3, 1, 2, 3], ordered=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the difference with this and the test case above? (ordered=False is the default)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was intended to test two ordered categoricals with identical categories and orders. The above was a mixed ordered and unordered with identical categories.
I did remove ordered=False from this test and other tests since it is the default.
pandas/tools/tests/test_concat.py
Outdated
tm.assert_categorical_equal(res, exp) | ||
|
||
c1 = Categorical([1, 2, 3], categories=[3, 2, 1], ordered=True) | ||
c2 = Categorical([1, 2, 3], ordered=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can leave this out, and use the c1 and c2 from above (but just swap them in the code [c2, c1]
)
pandas/types/concat.py
Outdated
@@ -222,6 +222,9 @@ def union_categoricals(to_union, sort_categories=False): | |||
sort_categories : boolean, default False | |||
If true, resulting categories will be lexsorted, otherwise | |||
they will be ordered as they appear in the data. | |||
ignore_order: boolean, default False | |||
If true, ordered categories will be ignored. Results in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"ordered categories" -> "the ordered attribute of the categorical" / "whether the categorical is ordered or not" ? as the categories itself are not ignored, only its "orderedness"
pandas/types/concat.py
Outdated
raise TypeError("Cannot use sort_categories=True with " | ||
"ordered Categoricals") | ||
|
||
if sort_categories and not categories.is_monotonic_increasing: | ||
categories = categories.sort_values() | ||
indexer = categories.get_indexer(first.categories) | ||
new_codes = take_1d(indexer, new_codes, fill_value=-1) | ||
elif all(not c.ordered for c in to_union): | ||
elif ignore_order | all(not c.ordered for c in to_union): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| -> or
@@ -297,6 +300,9 @@ def _maybe_unwrap(x): | |||
else: | |||
raise TypeError('Categorical.ordered must be the same') | |||
|
|||
if ignore_order: | |||
ordered = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think ordered is already False? (line 263) Is this still needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The if statement on line 264 can be entered if the ordered categoricals have the same categories and order.
is_dtype_equal checks categories and ordering
Thanks for the comments. Pull request has been updated. |
Codecov Report
@@ Coverage Diff @@
## master #15219 +/- ##
==========================================
+ Coverage 90.37% 90.37% +<.01%
==========================================
Files 135 135
Lines 49464 49466 +2
==========================================
+ Hits 44702 44705 +3
+ Misses 4762 4761 -1
Continue to review full report at Codecov.
|
@jorisvandenbossche Any other actions on my part? |
@js3711 minor corrections and a flake issue you can see
ping when pushed / green. |
can you rebase / update |
I'll update this weekend. Apologies for the delay. |
6dc5bb8
to
d278d62
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. just a couple of minor doc changes, and you have a flake error. you can use git diff master | flake8 --diff
to see this
pandas/tools/tests/test_concat.py
Outdated
@@ -1666,6 +1666,42 @@ def test_union_categoricals_ordered(self): | |||
with tm.assertRaisesRegexp(TypeError, msg): | |||
union_categoricals([c1, c2]) | |||
|
|||
def test_union_categoricals_ignore_order(self): | |||
c1 = Categorical([1, 2, 3], ordered=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add the issue number here as a comment
doc/source/whatsnew/v0.20.0.txt
Outdated
@@ -146,6 +146,8 @@ Other enhancements | |||
- ``pandas.io.json.json_normalize()`` gained the option ``errors='ignore'|'raise'``; the default is ``errors='raise'`` which is backward compatible. (:issue:`14583`) | |||
- ``.select_dtypes()`` now allows the string 'datetimetz' to generically select datetimes with tz (:issue:`14910`) | |||
- ``pd.merge_asof()`` gained the option ``direction='backward'|'forward'|'nearest'`` (:issue:`14887`) | |||
|
|||
- ``ignore_ordered`` argument added to ``pd.types.concat.union_categoricals``; setting the argument to true will ignore the ordered attribute of unioned categoricals (:issue:`13410`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you reverse this
pd.types.concat.....union_categoricals(..)
has gained the ignore_ordered=False
argument.
@@ -222,6 +222,9 @@ def union_categoricals(to_union, sort_categories=False): | |||
sort_categories : boolean, default False | |||
If true, resulting categories will be lexsorted, otherwise | |||
they will be ordered as they appear in the data. | |||
ignore_order: boolean, default False | |||
If true, the ordered attribute of the Categoricals will be ignored. | |||
Results in an unordered categorical. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a versionadded 0.20.0 tag
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add the versionadded tag
pandas/tools/tests/test_concat.py
Outdated
c2 = Categorical([1, 2, 3], ordered=False) | ||
|
||
res = union_categoricals([c1, c2], ignore_order=True) | ||
exp = Categorical([1, 2, 3, 1, 2, 3]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you an an explicit test with ignore_order=False
that raises (there are tests in other sections, but should have one that explicityly specifies)
pandas/tests/tools/test_concat.py
Outdated
@@ -1662,6 +1662,42 @@ def test_union_categoricals_ordered(self): | |||
with tm.assertRaisesRegexp(TypeError, msg): | |||
union_categoricals([c1, c2]) | |||
|
|||
def test_union_categoricals_ignore_order(self): | |||
c1 = Categorical([1, 2, 3], ordered=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add the issue number as a comment. pls add a tests with ignore_order=False
(explicity set), and additional one with no ignore_order
kw passed (to tests the defaults)
doc/source/whatsnew/v0.20.0.txt
Outdated
@@ -157,6 +157,7 @@ Other enhancements | |||
- HTML table output skips ``colspan`` or ``rowspan`` attribute if equal to 1. (:issue:`15403`) | |||
|
|||
.. _ISO 8601 duration: https://en.wikipedia.org/wiki/ISO_8601#Durations | |||
- ``ignore_ordered`` argument added to ``pd.types.concat.union_categoricals``; setting the argument to true will ignore the ordered attribute of unioned categoricals (:issue:`13410`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add in a :ref:
to the docs you added.
@jreback thanks for the feedback. I made the suggested changes and submitted a new PR. |
thanks for the PR @js3711 |
xref pandas-dev#13410 (ignore_order portion) Author: Justin Solinsky <[email protected]> Closes pandas-dev#15219 from js3711/GH13410-ENHunion_categoricals and squashes the following commits: e9d00de [Justin Solinsky] GH15219 Documentation fixes based on feedback d278d62 [Justin Solinsky] ENH union_categoricals supports ignore_order GH13410 9b827ef [Justin Solinsky] ENH union_categoricals supports ignore_order GH13410
xref #13410 (ignore_order portion)