Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: xs not working with slice #35301

Closed
2 of 3 tasks
wiso opened this issue Jul 15, 2020 · 13 comments · Fixed by #35411
Closed
2 of 3 tasks

BUG: xs not working with slice #35301

wiso opened this issue Jul 15, 2020 · 13 comments · Fixed by #35411
Labels
Error Reporting Incorrect or improved errors from pandas Indexing Related to indexing on series/frames, not to indexes themselves
Milestone

Comments

@wiso
Copy link

wiso commented Jul 15, 2020

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.

Code Sample, a copy-pastable example

data = """
C1  C2 V
A1  0  10
A1  1  20
A2  0  2
A2  1  3
B1  0  2
B2  1  3
"""

import pandas as pd
from io import StringIO
df = pd.read_csv(StringIO(data), sep=' +').set_index(['C1', 'C2'])

df.xs(pd.IndexSlice['A1', :])

Problem description

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/***/lib/python3.7/site-packages/pandas/core/generic.py", line 3535, in xs
    loc, new_index = self.index.get_loc_level(key, drop_level=drop_level)
  File "/home/***/lib/python3.7/site-packages/pandas/core/indexes/multi.py", line 2835, in get_loc_level
    raise TypeError(key)
TypeError: ('A1', slice(None, None, None))

also similar code produce the same problem (df.xs(('A1', slice(None)))). Strangely this works:

df = pd.DataFrame({'a': [1, 2, 3, 1], 'b': ['a', 'b', 'c', 'd'], 'v': [2, 3, 4, 5]}).set_index(['a', 'b'])
df.xs(pd.IndexSlice[1, :])

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit : None python : 3.7.7.final.0 python-bits : 64 OS : Linux OS-release : 5.7.7-100.fc31.x86_64 machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : it_IT.UTF-8 LOCALE : it_IT.UTF-8

pandas : 1.0.5
numpy : 1.19.0
pytz : 2019.2
dateutil : 2.7.5
pip : 20.1.1
setuptools : 41.6.0
Cython : 0.29.15
pytest : 4.0.0
hypothesis : None
sphinx : 3.1.1
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.4.0
html5lib : 1.0.1
pymysql : None
psycopg2 : None
jinja2 : 2.10
IPython : 7.16.1
pandas_datareader: 0.8.0
bs4 : 4.7.1
bottleneck : 1.2.1
fastparquet : None
gcsfs : None
lxml.etree : 4.4.0
matplotlib : 3.2.2
numexpr : 2.7.1
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
pytest : 4.0.0
pyxlsb : None
s3fs : 0.4.2
scipy : 1.5.1
sqlalchemy : None
tables : 3.5.2
tabulate : 0.8.5
xarray : 0.12.1
xlrd : 1.2.0
xlwt : 1.1.2
xlsxwriter : None
numba : 0.48.0

@wiso wiso added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jul 15, 2020
@arw2019
Copy link
Member

arw2019 commented Jul 16, 2020

+1 that this should work.

I've checked that this is a problem on the 1.1 master

Output of pd.versions() INSTALLED VERSIONS ------------------ commit : b59831e python : 3.8.3.final.0 python-bits : 64 OS : Linux OS-release : 5.4.0-40-generic Version : #44-Ubuntu SMP Tue Jun 23 00:01:04 UTC 2020 machine : x86_64 processor : byteorder : little LC_ALL : C.UTF-8 LANG : C.UTF-8 LOCALE : en_US.UTF-8

pandas : 1.1.0.dev0+2129.gb59831e97
numpy : 1.18.5
pytz : 2020.1
dateutil : 2.8.1
pip : 20.1.1
setuptools : 49.1.0.post20200704
Cython : 0.29.21
pytest : 5.4.3
hypothesis : 5.19.0
sphinx : 3.1.1
blosc : None
feather : None
xlsxwriter : 1.2.9
lxml.etree : 4.5.2
html5lib : 1.1
pymysql : None
psycopg2 : None
jinja2 : 2.11.2
IPython : 7.16.1
pandas_datareader: None
bs4 : 4.9.1
bottleneck : 1.3.2
fsspec : 0.7.4
fastparquet : 0.4.0
gcsfs : 0.6.2
matplotlib : 3.2.2
numexpr : 2.7.1
odfpy : None
openpyxl : 3.0.4
pandas_gbq : None
pyarrow : 0.17.1
pytables : None
pyxlsb : None
s3fs : 0.4.2
scipy : 1.5.0
sqlalchemy : 1.3.18
tables : 3.6.1
tabulate : 0.8.7
xarray : 0.15.1
xlrd : 1.2.0
xlwt : 1.3.0
numba : 0.50.1

I think the fix will involve changing

pandas/pandas/core/generic.py

Lines 3415 to 3574 in b687cd4

def xs(self, key, axis=0, level=None, drop_level: bool_t = True):
"""
Return cross-section from the Series/DataFrame.
This method takes a `key` argument to select data at a particular
level of a MultiIndex.
Parameters
----------
key : label or tuple of label
Label contained in the index, or partially in a MultiIndex.
axis : {0 or 'index', 1 or 'columns'}, default 0
Axis to retrieve cross-section on.
level : object, defaults to first n levels (n=1 or len(key))
In case of a key partially contained in a MultiIndex, indicate
which levels are used. Levels can be referred by label or position.
drop_level : bool, default True
If False, returns object with same levels as self.
Returns
-------
Series or DataFrame
Cross-section from the original Series or DataFrame
corresponding to the selected index levels.
See Also
--------
DataFrame.loc : Access a group of rows and columns
by label(s) or a boolean array.
DataFrame.iloc : Purely integer-location based indexing
for selection by position.
Notes
-----
`xs` can not be used to set values.
MultiIndex Slicers is a generic way to get/set values on
any level or levels.
It is a superset of `xs` functionality, see
:ref:`MultiIndex Slicers <advanced.mi_slicers>`.
Examples
--------
>>> d = {'num_legs': [4, 4, 2, 2],
... 'num_wings': [0, 0, 2, 2],
... 'class': ['mammal', 'mammal', 'mammal', 'bird'],
... 'animal': ['cat', 'dog', 'bat', 'penguin'],
... 'locomotion': ['walks', 'walks', 'flies', 'walks']}
>>> df = pd.DataFrame(data=d)
>>> df = df.set_index(['class', 'animal', 'locomotion'])
>>> df
num_legs num_wings
class animal locomotion
mammal cat walks 4 0
dog walks 4 0
bat flies 2 2
bird penguin walks 2 2
Get values at specified index
>>> df.xs('mammal')
num_legs num_wings
animal locomotion
cat walks 4 0
dog walks 4 0
bat flies 2 2
Get values at several indexes
>>> df.xs(('mammal', 'dog'))
num_legs num_wings
locomotion
walks 4 0
Get values at specified index and level
>>> df.xs('cat', level=1)
num_legs num_wings
class locomotion
mammal walks 4 0
Get values at several indexes and levels
>>> df.xs(('bird', 'walks'),
... level=[0, 'locomotion'])
num_legs num_wings
animal
penguin 2 2
Get values at specified column and axis
>>> df.xs('num_wings', axis=1)
class animal locomotion
mammal cat walks 0
dog walks 0
bat flies 2
bird penguin walks 2
Name: num_wings, dtype: int64
"""
axis = self._get_axis_number(axis)
labels = self._get_axis(axis)
if level is not None:
loc, new_ax = labels.get_loc_level(key, level=level, drop_level=drop_level)
# create the tuple of the indexer
_indexer = [slice(None)] * self.ndim
_indexer[axis] = loc
indexer = tuple(_indexer)
result = self.iloc[indexer]
setattr(result, result._get_axis_name(axis), new_ax)
return result
if axis == 1:
return self[key]
self._consolidate_inplace()
index = self.index
if isinstance(index, MultiIndex):
loc, new_index = self.index.get_loc_level(key, drop_level=drop_level)
else:
loc = self.index.get_loc(key)
if isinstance(loc, np.ndarray):
if loc.dtype == np.bool_:
(inds,) = loc.nonzero()
return self._take_with_is_copy(inds, axis=axis)
else:
return self._take_with_is_copy(loc, axis=axis)
if not is_scalar(loc):
new_index = self.index[loc]
if is_scalar(loc):
new_values = self._data.fast_xs(loc)
# may need to box a datelike-scalar
#
# if we encounter an array-like and we only have 1 dim
# that means that their are list/ndarrays inside the Series!
# so just return them (GH 6394)
if not is_list_like(new_values) or self.ndim == 1:
return com.maybe_box_datetimelike(new_values)
result = self._constructor_sliced(
new_values,
index=self.columns,
name=self.index[loc],
dtype=new_values.dtype,
)
else:
result = self.iloc[loc]
result.index = new_index
# this could be a view
# but only in a single-dtyped view sliceable case
result._set_is_copy(self, copy=not result._is_view)
return result

or possibly some of the methods that chunk refers to. Happy to do a PR on this unless @wiso you'd like to?

@wiso
Copy link
Author

wiso commented Jul 16, 2020

@arw2019 thanks for checking. I am just a user, will take ages to understand where to put my hands inside the code.

@arw2019
Copy link
Member

arw2019 commented Jul 16, 2020

ok! In that case I'll take it

@zky001
Copy link

zky001 commented Jul 17, 2020

ok! In that case I'll take it

hi,could I fix this bug?I have fix this issue In my own computer!

@arw2019
Copy link
Member

arw2019 commented Jul 17, 2020

hi,could I fix this bug?I have fix this issue In my own computer!

Yeah go for it

zky001 added a commit to zky001/pandas that referenced this issue Jul 17, 2020
fix bugs be metioned on issue pandas-dev#35301
@zky001 zky001 mentioned this issue Jul 17, 2020
5 tasks
zky001 added a commit to zky001/pandas that referenced this issue Jul 17, 2020
fix bugs be metioned on issue pandas-dev#35301
zky001 added a commit to zky001/pandas that referenced this issue Jul 17, 2020
fix bugs be metioned on issue pandas-dev#35301
@zky001 zky001 mentioned this issue Jul 17, 2020
5 tasks
@jbrockmendel
Copy link
Member

xs is intended for scalar lookups

@arw2019
Copy link
Member

arw2019 commented Jul 19, 2020

@jbrockmendel ahh thanks! In fairness that is stated in the DataFrame.xs doc

@wiso Do you think it's worth adding a clarification there? If yes I'll do a PR for that

Barring that we can probably close this

@simonjayhawkins
Copy link
Member

maybe a better error message; something like TypeError: expected label or tuple of label, got ('A1', slice(None, None, None))

@simonjayhawkins simonjayhawkins added Usage Question Error Reporting Incorrect or improved errors from pandas and removed Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jul 19, 2020
@zky001
Copy link

zky001 commented Jul 20, 2020

maybe a better error message; something like TypeError: expected label or tuple of label, got ('A1', slice(None, None, None))

I think it's a good idea!

@wiso
Copy link
Author

wiso commented Jul 20, 2020

Sorry, I am not sure to understand everything. Are you saying I can't pass a pd.IndexSlice to xs? Why not? It is working in one case and it is very useful.

By the way in general, I am searching for an alternative to loc that drop the columns automatically since loc is doing that only if the index is in the first level, while xs has a specific option about this behaviour.

@simonjayhawkins
Copy link
Member

the difference between xs and loc is that xs is a method on the DataFrame class whereas loc is an 'accessor' that uses __getitem__ under the hood to allowing indexing using the square bracket operator.

pd.Indexslice is a convenience to more easily perform multi-index slicing to get around issues using the bracket annotation.

so at present, the working case is not documented, tested or supported.

I would therefore say, the issue is not a bug, but could maybe be an enhancement if it makes sense.

@wiso
Copy link
Author

wiso commented Jul 20, 2020

I see, so if it is not supported I guess it should trigger an error if the user try to pass pd.Indexslice to xs.

As I said one big difference between loc and xs is how the index used by the function are dropped. It would be good to have the same control you have in xs (drop_level: bool) in loc.

@simonjayhawkins
Copy link
Member

As I said one big difference between loc and xs is how the index used by the function are dropped. It would be good to have the same control you have in xs (drop_level: bool) in loc.

Agreed. There has been some previous discussion on deprecating xs, but until this capability of xs is easily achieved with loc that's unlikely to happen.

so rather than enhancing xs to accept pd.Indexslice, it would probably be time better spent addressing the loc functionality.

I'll leave this labelled as error reporting for now, as improving the error message raised would be beneficial in the meantime.

@jreback jreback added this to the 1.2 milestone Aug 6, 2020
@jreback jreback added Indexing Related to indexing on series/frames, not to indexes themselves and removed Usage Question labels Aug 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Error Reporting Incorrect or improved errors from pandas Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants