-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplify bool ops closures, improve tests #19795
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about we do the tests with appropriate xfails separately. it is really confusing what you are actually changing.
doc/source/whatsnew/v0.23.0.txt
Outdated
@@ -866,3 +866,4 @@ Other | |||
^^^^^ | |||
|
|||
- Improved error message when attempting to use a Python keyword as an identifier in a ``numexpr`` backed query (:issue:`18221`) | |||
- Boolean operations ``&, |, ^`` between a ``Series`` and an ``Index`` will no longer raise ``TypeError`` (:issue:`19792`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not a bug, rather an API breaking change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will move.
pandas/core/ops.py
Outdated
x = _ensure_object(x) | ||
y = _ensure_object(y) | ||
result = lib.vec_binop(x, y, op) | ||
assert not isinstance(y, (list, ABCSeries, pd.Index)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is oddly specific. what about a tuple?
basicaly you are trying to catch ndarray or non-list-like right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This assertion is codifying existing behavior. ATM the line following this checks for isinstance(y, (np.ndarray, ABCSeries))
and the line above converts lists to ndarray. I removed the ABCSeries part, converted lists a step earlier, and added the assertion to make sure we don't accidentally miss either of these cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove this and let's instead qualify the below with
if isinstance(y, np.ndarray):
...
elif is_list_like(.....):
else:
# scalar
pandas/core/ops.py
Outdated
result = lib.vec_binop(x, y, op) | ||
assert not isinstance(y, (list, ABCSeries, pd.Index)) | ||
if isinstance(y, np.ndarray): | ||
# This next assertion is here because there used to be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not a useful comment. imagine a new reader coming along with no history
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yah, not sure what to put here. ATM there is a non-reachable branch with a TODO comment specifically asking how it would be reached. This seemed worth documenting somehow since someone at some point found it worth special-casing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think the assertion pretty much explains this, though maybe add some text that we are trying to compare non-bool things here
pandas/core/ops.py
Outdated
filler = (fill_int if is_self_int_dtype and | ||
is_integer_dtype(np.asarray(other)) else fill_bool) | ||
is_other_int_dtype = is_integer_dtype(np.asarray(other)) | ||
if isinstance(other, list): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a list only?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ATM lists are handled inside na_op on 1066-1067. This is just moving the same two lines up the call stack (and outside of an exception-handling block)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the comment also includes tuple and np.array, should not this check for also for tuple?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it depends on if we want to treat tuples as "an array the user forgot to wrap in an array". For lists we do that pretty consistently (e.g. on line 1034 and 1172 in core.ops), but not so much for tuples. I don't have a strong opinion as to what the ideal behavior is, but would prefer to address it separately from this cleanup/testing PR. (it could go in the same Issue as the TODO comment just below this isinstance check)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be is_list_like
OK |
Codecov Report
@@ Coverage Diff @@
## master #19795 +/- ##
==========================================
- Coverage 91.61% 91.6% -0.01%
==========================================
Files 150 150
Lines 48887 48891 +4
==========================================
+ Hits 44786 44789 +3
- Misses 4101 4102 +1
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #19795 +/- ##
==========================================
+ Coverage 91.66% 91.66% +<.01%
==========================================
Files 150 150
Lines 49026 49031 +5
==========================================
+ Hits 44939 44945 +6
+ Misses 4087 4086 -1
Continue to review full report at Codecov.
|
needs a rebase and I have yet to actually review |
pandas/core/ops.py
Outdated
result = lib.vec_binop(x, y, op) | ||
assert not isinstance(y, (list, ABCSeries, pd.Index)) | ||
if isinstance(y, np.ndarray): | ||
# This next assertion is here because there used to be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think the assertion pretty much explains this, though maybe add some text that we are trying to compare non-bool things here
pandas/core/ops.py
Outdated
filler = (fill_int if is_self_int_dtype and | ||
is_integer_dtype(np.asarray(other)) else fill_bool) | ||
is_other_int_dtype = is_integer_dtype(np.asarray(other)) | ||
if isinstance(other, list): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the comment also includes tuple and np.array, should not this check for also for tuple?
pandas/core/ops.py
Outdated
x = _ensure_object(x) | ||
y = _ensure_object(y) | ||
result = lib.vec_binop(x, y, op) | ||
assert not isinstance(y, (list, ABCSeries, pd.Index)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove this and let's instead qualify the below with
if isinstance(y, np.ndarray):
...
elif is_list_like(.....):
else:
# scalar
pandas/core/ops.py
Outdated
filler = (fill_int if is_self_int_dtype and | ||
is_integer_dtype(np.asarray(other)) else fill_bool) | ||
is_other_int_dtype = is_integer_dtype(np.asarray(other)) | ||
if isinstance(other, list): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be is_list_like
Reverted some suggested changes because I'm not confident they handle Categorical correctly. Until the tests+docs for these methods are fleshed out, sticking to cleanup without changing behavior. |
needs a rebase on master. note always rebase when pushing. |
x = _ensure_object(x) | ||
y = _ensure_object(y) | ||
result = libops.vec_binop(x, y, op) | ||
assert not isinstance(y, (list, ABCSeries, pd.Index)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add tuple here, or assert is_list_like
and not np.ndarray
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or maybe just remove the assertion and make the else into an is_scalar
and make an else that raises an AssertionError
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned above, I’m not inclined to change any behavior until the docs/tests are fleshed out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then let's do that first (in this PR or a pre-cursor)
unfilled = self._constructor(res_values, index=self.index) | ||
return filler(unfilled).__finalize__(self) | ||
is_other_int_dtype = is_integer_dtype(np.asarray(other)) | ||
if is_list_like(other) and not isinstance(other, np.ndarray): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then thismust be a scalar, put this in another elsif rather than trying to catch the list-like case (ndarray, list, tuple) AND scalar case here
s_fff = Series([False, False, False], index=index) | ||
s_tff = Series([True, False, False], index=index) | ||
s_empty = Series([]) | ||
# We cannot wrap s111.astype(np.int32) with pd.Index because it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you exercise both lhs and rhs are scalars (with a Series/Index) (True, False) and also None/np.nan
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i c you might have these cases below, if you do make add _scalars in the test name (and see if you can fully exercise them)
there is likely an issue for this already, see if you can find it |
x = _ensure_object(x) | ||
y = _ensure_object(y) | ||
result = libops.vec_binop(x, y, op) | ||
assert not isinstance(y, (list, ABCSeries, pd.Index)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then let's do that first (in this PR or a pre-cursor)
Series bool ops are not going to be possible to dispatch to their Index counterparts, so this one I'm mostly going to leave alone. This breaks up the tests, adds an xfail for op(Series, Categorical), and fixes+tests op(series, Index). It makes some headway in simplifying the closure so the possible paths through the code are more obvious.
There is a TODO comment on line 1113 where I think there is a potential bug, but need someone who understands the intended behavior better to weigh in.