Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify bool ops closures, improve tests #19795

Closed
wants to merge 12 commits into from

Conversation

jbrockmendel
Copy link
Member

Series bool ops are not going to be possible to dispatch to their Index counterparts, so this one I'm mostly going to leave alone. This breaks up the tests, adds an xfail for op(Series, Categorical), and fixes+tests op(series, Index). It makes some headway in simplifying the closure so the possible paths through the code are more obvious.

There is a TODO comment on line 1113 where I think there is a potential bug, but need someone who understands the intended behavior better to weigh in.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about we do the tests with appropriate xfails separately. it is really confusing what you are actually changing.

@@ -866,3 +866,4 @@ Other
^^^^^

- Improved error message when attempting to use a Python keyword as an identifier in a ``numexpr`` backed query (:issue:`18221`)
- Boolean operations ``&, |, ^`` between a ``Series`` and an ``Index`` will no longer raise ``TypeError`` (:issue:`19792`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not a bug, rather an API breaking change.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will move.

x = _ensure_object(x)
y = _ensure_object(y)
result = lib.vec_binop(x, y, op)
assert not isinstance(y, (list, ABCSeries, pd.Index))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is oddly specific. what about a tuple?

basicaly you are trying to catch ndarray or non-list-like right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assertion is codifying existing behavior. ATM the line following this checks for isinstance(y, (np.ndarray, ABCSeries)) and the line above converts lists to ndarray. I removed the ABCSeries part, converted lists a step earlier, and added the assertion to make sure we don't accidentally miss either of these cases.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this and let's instead qualify the below with

if isinstance(y, np.ndarray):
   ...
elif is_list_like(.....):

else:
    # scalar

result = lib.vec_binop(x, y, op)
assert not isinstance(y, (list, ABCSeries, pd.Index))
if isinstance(y, np.ndarray):
# This next assertion is here because there used to be
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not a useful comment. imagine a new reader coming along with no history

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yah, not sure what to put here. ATM there is a non-reachable branch with a TODO comment specifically asking how it would be reached. This seemed worth documenting somehow since someone at some point found it worth special-casing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think the assertion pretty much explains this, though maybe add some text that we are trying to compare non-bool things here

filler = (fill_int if is_self_int_dtype and
is_integer_dtype(np.asarray(other)) else fill_bool)
is_other_int_dtype = is_integer_dtype(np.asarray(other))
if isinstance(other, list):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a list only?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ATM lists are handled inside na_op on 1066-1067. This is just moving the same two lines up the call stack (and outside of an exception-handling block)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the comment also includes tuple and np.array, should not this check for also for tuple?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it depends on if we want to treat tuples as "an array the user forgot to wrap in an array". For lists we do that pretty consistently (e.g. on line 1034 and 1172 in core.ops), but not so much for tuples. I don't have a strong opinion as to what the ideal behavior is, but would prefer to address it separately from this cleanup/testing PR. (it could go in the same Issue as the TODO comment just below this isinstance check)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be is_list_like

@jreback jreback added the Dtype Conversions Unexpected or buggy dtype conversions label Feb 21, 2018
@jbrockmendel
Copy link
Member Author

how about we do the tests with appropriate xfails separately. it is really confusing what you are actually changing.

OK

@codecov
Copy link

codecov bot commented Feb 21, 2018

Codecov Report

Merging #19795 into master will decrease coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #19795      +/-   ##
==========================================
- Coverage   91.61%    91.6%   -0.01%     
==========================================
  Files         150      150              
  Lines       48887    48891       +4     
==========================================
+ Hits        44786    44789       +3     
- Misses       4101     4102       +1
Flag Coverage Δ
#multiple 89.98% <100%> (-0.01%) ⬇️
#single 41.8% <65.38%> (ø) ⬆️
Impacted Files Coverage Δ
pandas/core/ops.py 96.93% <100%> (+0.18%) ⬆️
pandas/util/testing.py 83.64% <0%> (-0.21%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 740ad9a...c03cc55. Read the comment docs.

@codecov
Copy link

codecov bot commented Feb 21, 2018

Codecov Report

Merging #19795 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #19795      +/-   ##
==========================================
+ Coverage   91.66%   91.66%   +<.01%     
==========================================
  Files         150      150              
  Lines       49026    49031       +5     
==========================================
+ Hits        44939    44945       +6     
+ Misses       4087     4086       -1
Flag Coverage Δ
#multiple 90.04% <100%> (ø) ⬆️
#single 41.83% <65.21%> (ø) ⬆️
Impacted Files Coverage Δ
pandas/core/ops.py 97.11% <100%> (+0.17%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9958ce6...0d99c94. Read the comment docs.

@jreback
Copy link
Contributor

jreback commented Feb 22, 2018

needs a rebase and I have yet to actually review

result = lib.vec_binop(x, y, op)
assert not isinstance(y, (list, ABCSeries, pd.Index))
if isinstance(y, np.ndarray):
# This next assertion is here because there used to be
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think the assertion pretty much explains this, though maybe add some text that we are trying to compare non-bool things here

filler = (fill_int if is_self_int_dtype and
is_integer_dtype(np.asarray(other)) else fill_bool)
is_other_int_dtype = is_integer_dtype(np.asarray(other))
if isinstance(other, list):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the comment also includes tuple and np.array, should not this check for also for tuple?

x = _ensure_object(x)
y = _ensure_object(y)
result = lib.vec_binop(x, y, op)
assert not isinstance(y, (list, ABCSeries, pd.Index))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this and let's instead qualify the below with

if isinstance(y, np.ndarray):
   ...
elif is_list_like(.....):

else:
    # scalar

filler = (fill_int if is_self_int_dtype and
is_integer_dtype(np.asarray(other)) else fill_bool)
is_other_int_dtype = is_integer_dtype(np.asarray(other))
if isinstance(other, list):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be is_list_like

@jbrockmendel
Copy link
Member Author

Reverted some suggested changes because I'm not confident they handle Categorical correctly. Until the tests+docs for these methods are fleshed out, sticking to cleanup without changing behavior.

@jreback
Copy link
Contributor

jreback commented Mar 1, 2018

needs a rebase on master. note always rebase when pushing.

x = _ensure_object(x)
y = _ensure_object(y)
result = libops.vec_binop(x, y, op)
assert not isinstance(y, (list, ABCSeries, pd.Index))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add tuple here, or assert is_list_like and not np.ndarray

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or maybe just remove the assertion and make the else into an is_scalar and make an else that raises an AssertionError

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned above, I’m not inclined to change any behavior until the docs/tests are fleshed out.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then let's do that first (in this PR or a pre-cursor)

unfilled = self._constructor(res_values, index=self.index)
return filler(unfilled).__finalize__(self)
is_other_int_dtype = is_integer_dtype(np.asarray(other))
if is_list_like(other) and not isinstance(other, np.ndarray):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then thismust be a scalar, put this in another elsif rather than trying to catch the list-like case (ndarray, list, tuple) AND scalar case here

s_fff = Series([False, False, False], index=index)
s_tff = Series([True, False, False], index=index)
s_empty = Series([])
# We cannot wrap s111.astype(np.int32) with pd.Index because it
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you exercise both lhs and rhs are scalars (with a Series/Index) (True, False) and also None/np.nan

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i c you might have these cases below, if you do make add _scalars in the test name (and see if you can fully exercise them)

@jreback jreback added the Reshaping Concat, Merge/Join, Stack/Unstack, Explode label Mar 2, 2018
@jreback
Copy link
Contributor

jreback commented Mar 2, 2018

there is likely an issue for this already, see if you can find it

x = _ensure_object(x)
y = _ensure_object(y)
result = libops.vec_binop(x, y, op)
assert not isinstance(y, (list, ABCSeries, pd.Index))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then let's do that first (in this PR or a pre-cursor)

@jbrockmendel jbrockmendel deleted the bool_ops3 branch April 5, 2020 17:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dtype Conversions Unexpected or buggy dtype conversions Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants