Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: Index/Series/DataFrame op 1-d list-like coercion #13637

Closed
sinhrks opened this issue Jul 13, 2016 · 11 comments
Closed

API: Index/Series/DataFrame op 1-d list-like coercion #13637

sinhrks opened this issue Jul 13, 2016 · 11 comments
Labels
API - Consistency Internal Consistency of API/Behavior Bug Needs Tests Unit test(s) needed to prevent regressions Numeric Operations Arithmetic, Comparison, and Logical operations

Comments

@sinhrks
Copy link
Member

sinhrks commented Jul 13, 2016

xref #1134. There is another inconsistencies related to op. Index/Series/DataFrame op can accept 1-d list-like as input and coercing to Index/Series. However, supported 1-d list-likes differ depending on the class and kind of op.

Code Sample, a copy-pastable example if possible

# Series + equal length list, OK
pd.Series([1, 2, 3]) + [2, 2, 2]
#0    3
#1    4
#2    5
# dtype: int64

# Series + equal length Index, OK
pd.Series([1, 2, 3]) + pd.Index([2, 2, 2])
#0    3
#1    4
#2    5
# dtype: int64

# DataFrame + equal length list, OK
pd.DataFrame([[1, 2, 3]]) + [2, 2, 2]
#    0  1  2
#0  3  4  5

# DataFrame + equal length Index, NG
pd.DataFrame([[1, 2, 3]]) + pd.Index([2, 2, 2])
# ValueError: cannot evaluate a numeric op with unequal lengths

I've organize the result once as the below table. I think all these must be supported and consistent.

Class op list tuple ndarray(1dim) Index
Index Arithmetic x x o o
Index Comparison o o o o
Index Boolean o o o o
Series Arithmetic o o o o
Series Comparison o o o o
Series Boolean o o o x
Frame Arithmetic o o o x
Frame Comparison o o o o
Frame Boolean o o o x

NOTE: Index result may depends on its type.

output of pd.show_versions()

0.18.1

@sinhrks sinhrks added API Design Numeric Operations Arithmetic, Comparison, and Logical operations labels Jul 13, 2016
@sinhrks sinhrks added this to the 0.19.0 milestone Jul 13, 2016
@jorisvandenbossche
Copy link
Member

@sinhrks Nice overview! It are the 'x' that fail?
But Series boolean op with Index is indicated with 'x', but seems to work:

In [42]: pd.Series([1, 2, 3]) == pd.Index([2, 2, 2])
Out[42]:
0    False
1     True
2    False
dtype: bool

I agree these should probably all work consistently

@sinhrks
Copy link
Member Author

sinhrks commented Jul 13, 2016

@jorisvandenbossche I meant boolean op with logical op. Though using bool Index is not very effective, it should be consistent (or should output understandable error at least).

pd.Series([True, True, False]) & pd.Index([True, False, True])
# TypeError: cannot compare a dtyped [bool] array with a scalar of type [Index]

@jorisvandenbossche
Copy link
Member

@sinhrks Whoops, yes, == is of course 'comparison' :-)

@sinhrks
Copy link
Member Author

sinhrks commented Jul 16, 2016

Found 2 related issues which will be broken if we introduce the above rules:

#4576 Pandas 0.12 unexpected results when comparing dataframe to list or tuple

on current master: list comparison performed per columns

df = pd.DataFrame([[1, 2], [3, 4], [5, 6]])
df
#    0  1
# 0  1  2
# 1  3  4
# 2  5  6

df > [2, 2]
# ValueError: operands could not be broadcast together with shapes (2,3) (2,) 

df > [2, 2, 2]
#        0      1
# 0  False  False
# 1   True   True
# 2   True   True

after the change: list comparison performed performed per row

df > [2, 2]
       0      1
0  False  False
1   True   True
2   True   True

df > [2, 2, 2]
# ValueError

# ref for comparison: UNCHANGED by the fix
df > pd.Series([2, 2])
       0      1
0  False  False
1   True   True
2   True   True

#11339 Boolean comparison vs tuple fails in 0.17.0

on current master: comparison may be broadcasted regarding tuple as "single" element.

s = pd.Series([(1,1),(1,2)])
s
# 0    (1, 1)
# 1    (1, 2)
# dtype: object

s == (1, 1)
# 0     True
# 1    False
# dtype: bool

after the change: tuple will no longer be regarded as a "single" element. should be the same as == pd.Series([1, 1])

s == (1, 1)
# 0    False
# 1    False
# dtype: bool

# ref for comparison: UNCHANGED by the fix
s == pd.Series([1, 1])
# 0    False
# 1    False
# dtype: bool

@jbrockmendel
Copy link
Member

@sinhrks I'm trying to figure out what which arithmetic-related Issues are still outstanding, as a lot has been fixed recently. I'm not quite sure how to mark this Issue. Can you help me pin down exactly what this Issue calls for?

@sinhrks
Copy link
Member Author

sinhrks commented Oct 25, 2018

@jbrockmendel I don't catch up recent changes yet. Maybe we should maintain comprehensive ops tests to clarify & guarantee current behaviour.

@jorisvandenbossche
Copy link
Member

@jbrockmendel if you look at the table in the top post, there are still cases ('x') in that table that still don't work.

The "Series, Boolean, Index (pd.Series([True, True, False]) & pd.Index([True, False, True])) and "Frame, Arithmetic, Index" (pd.DataFrame([[1, 2, 3]]) + pd.Index([2, 2, 2])) cases now seem to work correctly.

But the "Index, Arithmetic, List" and "Index, Arithmetic, Tuple" cases still fail:

In [34]: pd.Index([1, 2, 3]) + [1, 2, 3]
...
TypeError: can only perform ops with scalar values

@mroeschke
Copy link
Member

Looks like the rest of the cases in the table are fixed. Are these tested @jbrockmendel?

In [107]: pd.Index([1, 2, 3]) + [1, 2, 3]
Out[107]: Int64Index([2, 4, 6], dtype='int64')

In [108]: pd.Index([1, 2, 3]) + (1, 2, 3)
Out[108]: Int64Index([2, 4, 6], dtype='int64')

In [109]: pd.__version__
Out[109]: '0.26.0.dev0+593.g9d45934af'

@TomAugspurger
Copy link
Contributor

@jbrockmendel can you confirm that all the cases are fixed / tested? I think things are good, but would be nice to have your input.

@jbrockmendel
Copy link
Member

I think these are in good enough shape not to be a blocker, but not in good enough shape to close. Ideally I'd like to work this into something like unpack_zerodim_and_defer to make sure we have a consistent implementation

@TomAugspurger TomAugspurger modified the milestones: 1.0, 1.1 Jan 3, 2020
@TomAugspurger TomAugspurger modified the milestones: 1.1, Contributions Welcome Jul 6, 2020
@jbrockmendel jbrockmendel added the API - Consistency Internal Consistency of API/Behavior label Sep 20, 2020
@mroeschke mroeschke added Bug Needs Tests Unit test(s) needed to prevent regressions and removed API Design labels May 1, 2021
@jbrockmendel
Copy link
Member

Closing as addressed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API - Consistency Internal Consistency of API/Behavior Bug Needs Tests Unit test(s) needed to prevent regressions Numeric Operations Arithmetic, Comparison, and Logical operations
Projects
None yet
Development

No branches or pull requests

6 participants