BUG: xs not working with slice #35301

wiso · 2020-07-15T23:09:33Z

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
(optional) I have confirmed this bug exists on the master branch of pandas.

Code Sample, a copy-pastable example

data = """
C1  C2 V
A1  0  10
A1  1  20
A2  0  2
A2  1  3
B1  0  2
B2  1  3
"""

import pandas as pd
from io import StringIO
df = pd.read_csv(StringIO(data), sep=' +').set_index(['C1', 'C2'])

df.xs(pd.IndexSlice['A1', :])

Problem description

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/***/lib/python3.7/site-packages/pandas/core/generic.py", line 3535, in xs
    loc, new_index = self.index.get_loc_level(key, drop_level=drop_level)
  File "/home/***/lib/python3.7/site-packages/pandas/core/indexes/multi.py", line 2835, in get_loc_level
    raise TypeError(key)
TypeError: ('A1', slice(None, None, None))

also similar code produce the same problem (df.xs(('A1', slice(None)))). Strangely this works:

df = pd.DataFrame({'a': [1, 2, 3, 1], 'b': ['a', 'b', 'c', 'd'], 'v': [2, 3, 4, 5]}).set_index(['a', 'b'])
df.xs(pd.IndexSlice[1, :])

Output of `pd.show_versions()`

INSTALLED VERSIONS ------------------ commit : None python : 3.7.7.final.0 python-bits : 64 OS : Linux OS-release : 5.7.7-100.fc31.x86_64 machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : it_IT.UTF-8 LOCALE : it_IT.UTF-8

pandas : 1.0.5
numpy : 1.19.0
pytz : 2019.2
dateutil : 2.7.5
pip : 20.1.1
setuptools : 41.6.0
Cython : 0.29.15
pytest : 4.0.0
hypothesis : None
sphinx : 3.1.1
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.4.0
html5lib : 1.0.1
pymysql : None
psycopg2 : None
jinja2 : 2.10
IPython : 7.16.1
pandas_datareader: 0.8.0
bs4 : 4.7.1
bottleneck : 1.2.1
fastparquet : None
gcsfs : None
lxml.etree : 4.4.0
matplotlib : 3.2.2
numexpr : 2.7.1
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
pytest : 4.0.0
pyxlsb : None
s3fs : 0.4.2
scipy : 1.5.1
sqlalchemy : None
tables : 3.5.2
tabulate : 0.8.5
xarray : 0.12.1
xlrd : 1.2.0
xlwt : 1.1.2
xlsxwriter : None
numba : 0.48.0

The text was updated successfully, but these errors were encountered:

arw2019 · 2020-07-16T08:45:56Z

+1 that this should work.

I've checked that this is a problem on the 1.1 master

Output of pd.versions()

INSTALLED VERSIONS ------------------ commit : b59831e python : 3.8.3.final.0 python-bits : 64 OS : Linux OS-release : 5.4.0-40-generic Version : #44-Ubuntu SMP Tue Jun 23 00:01:04 UTC 2020 machine : x86_64 processor : byteorder : little LC_ALL : C.UTF-8 LANG : C.UTF-8 LOCALE : en_US.UTF-8

pandas : 1.1.0.dev0+2129.gb59831e97
numpy : 1.18.5
pytz : 2020.1
dateutil : 2.8.1
pip : 20.1.1
setuptools : 49.1.0.post20200704
Cython : 0.29.21
pytest : 5.4.3
hypothesis : 5.19.0
sphinx : 3.1.1
blosc : None
feather : None
xlsxwriter : 1.2.9
lxml.etree : 4.5.2
html5lib : 1.1
pymysql : None
psycopg2 : None
jinja2 : 2.11.2
IPython : 7.16.1
pandas_datareader: None
bs4 : 4.9.1
bottleneck : 1.3.2
fsspec : 0.7.4
fastparquet : 0.4.0
gcsfs : 0.6.2
matplotlib : 3.2.2
numexpr : 2.7.1
odfpy : None
openpyxl : 3.0.4
pandas_gbq : None
pyarrow : 0.17.1
pytables : None
pyxlsb : None
s3fs : 0.4.2
scipy : 1.5.0
sqlalchemy : 1.3.18
tables : 3.6.1
tabulate : 0.8.7
xarray : 0.15.1
xlrd : 1.2.0
xlwt : 1.3.0
numba : 0.50.1

I think the fix will involve changing

pandas/pandas/core/generic.py

Lines 3415 to 3574 in b687cd4

    
               def xs(self, key, axis=0, level=None, drop_level: bool_t = True): 
        
                   """ 
        
                   Return cross-section from the Series/DataFrame. 
        
                   This method takes a `key` argument to select data at a particular 
        
                   level of a MultiIndex. 
        
                   Parameters 
        
                   ---------- 
        
                   key : label or tuple of label 
        
                       Label contained in the index, or partially in a MultiIndex. 
        
                   axis : {0 or 'index', 1 or 'columns'}, default 0 
        
                       Axis to retrieve cross-section on. 
        
                   level : object, defaults to first n levels (n=1 or len(key)) 
        
                       In case of a key partially contained in a MultiIndex, indicate 
        
                       which levels are used. Levels can be referred by label or position. 
        
                   drop_level : bool, default True 
        
                       If False, returns object with same levels as self. 
        
                   Returns 
        
                   ------- 
        
                   Series or DataFrame 
        
                       Cross-section from the original Series or DataFrame 
        
                       corresponding to the selected index levels. 
        
                   See Also 
        
                   -------- 
        
                   DataFrame.loc : Access a group of rows and columns 
        
                       by label(s) or a boolean array. 
        
                   DataFrame.iloc : Purely integer-location based indexing 
        
                       for selection by position. 
        
                   Notes 
        
                   ----- 
        
                   `xs` can not be used to set values. 
        
                   MultiIndex Slicers is a generic way to get/set values on 
        
                   any level or levels. 
        
                   It is a superset of `xs` functionality, see 
        
                   :ref:`MultiIndex Slicers <advanced.mi_slicers>`. 
        
                   Examples 
        
                   -------- 
        
                   >>> d = {'num_legs': [4, 4, 2, 2], 
        
                   ...      'num_wings': [0, 0, 2, 2], 
        
                   ...      'class': ['mammal', 'mammal', 'mammal', 'bird'], 
        
                   ...      'animal': ['cat', 'dog', 'bat', 'penguin'], 
        
                   ...      'locomotion': ['walks', 'walks', 'flies', 'walks']} 
        
                   >>> df = pd.DataFrame(data=d) 
        
                   >>> df = df.set_index(['class', 'animal', 'locomotion']) 
        
                   >>> df 
        
                                              num_legs  num_wings 
        
                   class  animal  locomotion 
        
                   mammal cat     walks              4          0 
        
                          dog     walks              4          0 
        
                          bat     flies              2          2 
        
                   bird   penguin walks              2          2 
        
                   Get values at specified index 
        
                   >>> df.xs('mammal') 
        
                                      num_legs  num_wings 
        
                   animal locomotion 
        
                   cat    walks              4          0 
        
                   dog    walks              4          0 
        
                   bat    flies              2          2 
        
                   Get values at several indexes 
        
                   >>> df.xs(('mammal', 'dog')) 
        
                               num_legs  num_wings 
        
                   locomotion 
        
                   walks              4          0 
        
                   Get values at specified index and level 
        
                   >>> df.xs('cat', level=1) 
        
                                      num_legs  num_wings 
        
                   class  locomotion 
        
                   mammal walks              4          0 
        
                   Get values at several indexes and levels 
        
                   >>> df.xs(('bird', 'walks'), 
        
                   ...       level=[0, 'locomotion']) 
        
                            num_legs  num_wings 
        
                   animal 
        
                   penguin         2          2 
        
                   Get values at specified column and axis 
        
                   >>> df.xs('num_wings', axis=1) 
        
                   class   animal   locomotion 
        
                   mammal  cat      walks         0 
        
                           dog      walks         0 
        
                           bat      flies         2 
        
                   bird    penguin  walks         2 
        
                   Name: num_wings, dtype: int64 
        
                   """ 
        
                   axis = self._get_axis_number(axis) 
        
                   labels = self._get_axis(axis) 
        
                   if level is not None: 
        
                       loc, new_ax = labels.get_loc_level(key, level=level, drop_level=drop_level) 
        
                       # create the tuple of the indexer 
        
                       _indexer = [slice(None)] * self.ndim 
        
                       _indexer[axis] = loc 
        
                       indexer = tuple(_indexer) 
        
                       result = self.iloc[indexer] 
        
                       setattr(result, result._get_axis_name(axis), new_ax) 
        
                       return result 
        
                   if axis == 1: 
        
                       return self[key] 
        
                   self._consolidate_inplace() 
        
                   index = self.index 
        
                   if isinstance(index, MultiIndex): 
        
                       loc, new_index = self.index.get_loc_level(key, drop_level=drop_level) 
        
                   else: 
        
                       loc = self.index.get_loc(key) 
        
                       if isinstance(loc, np.ndarray): 
        
                           if loc.dtype == np.bool_: 
        
                               (inds,) = loc.nonzero() 
        
                               return self._take_with_is_copy(inds, axis=axis) 
        
                           else: 
        
                               return self._take_with_is_copy(loc, axis=axis) 
        
                       if not is_scalar(loc): 
        
                           new_index = self.index[loc] 
        
                   if is_scalar(loc): 
        
                       new_values = self._data.fast_xs(loc) 
        
                       # may need to box a datelike-scalar 
        
                       # 
        
                       # if we encounter an array-like and we only have 1 dim 
        
                       # that means that their are list/ndarrays inside the Series! 
        
                       # so just return them (GH 6394) 
        
                       if not is_list_like(new_values) or self.ndim == 1: 
        
                           return com.maybe_box_datetimelike(new_values) 
        
                       result = self._constructor_sliced( 
        
                           new_values, 
        
                           index=self.columns, 
        
                           name=self.index[loc], 
        
                           dtype=new_values.dtype, 
        
                       ) 
        
                   else: 
        
                       result = self.iloc[loc] 
        
                       result.index = new_index 
        
                   # this could be a view 
        
                   # but only in a single-dtyped view sliceable case 
        
                   result._set_is_copy(self, copy=not result._is_view) 
        
                   return result

or possibly some of the methods that chunk refers to. Happy to do a PR on this unless @wiso you'd like to?

wiso · 2020-07-16T08:49:12Z

@arw2019 thanks for checking. I am just a user, will take ages to understand where to put my hands inside the code.

arw2019 · 2020-07-16T16:01:44Z

ok! In that case I'll take it

zky001 · 2020-07-17T01:07:47Z

ok! In that case I'll take it

hi，could I fix this bug？I have fix this issue In my own computer！

arw2019 · 2020-07-17T02:20:38Z

hi，could I fix this bug？I have fix this issue In my own computer！

Yeah go for it

fix bugs be metioned on issue pandas-dev#35301

jbrockmendel · 2020-07-17T20:27:51Z

xs is intended for scalar lookups

arw2019 · 2020-07-19T03:54:12Z

@jbrockmendel ahh thanks! In fairness that is stated in the DataFrame.xs doc

@wiso Do you think it's worth adding a clarification there? If yes I'll do a PR for that

Barring that we can probably close this

simonjayhawkins · 2020-07-19T13:18:05Z

maybe a better error message; something like TypeError: expected label or tuple of label, got ('A1', slice(None, None, None))

zky001 · 2020-07-20T02:33:37Z

maybe a better error message; something like TypeError: expected label or tuple of label, got ('A1', slice(None, None, None))

I think it's a good idea!

wiso · 2020-07-20T07:22:07Z

Sorry, I am not sure to understand everything. Are you saying I can't pass a pd.IndexSlice to xs? Why not? It is working in one case and it is very useful.

By the way in general, I am searching for an alternative to loc that drop the columns automatically since loc is doing that only if the index is in the first level, while xs has a specific option about this behaviour.

simonjayhawkins · 2020-07-20T10:09:17Z

the difference between xs and loc is that xs is a method on the DataFrame class whereas loc is an 'accessor' that uses __getitem__ under the hood to allowing indexing using the square bracket operator.

pd.Indexslice is a convenience to more easily perform multi-index slicing to get around issues using the bracket annotation.

so at present, the working case is not documented, tested or supported.

I would therefore say, the issue is not a bug, but could maybe be an enhancement if it makes sense.

wiso · 2020-07-20T10:44:39Z

I see, so if it is not supported I guess it should trigger an error if the user try to pass pd.Indexslice to xs.

As I said one big difference between loc and xs is how the index used by the function are dropped. It would be good to have the same control you have in xs (drop_level: bool) in loc.

simonjayhawkins · 2020-07-20T10:52:07Z

As I said one big difference between loc and xs is how the index used by the function are dropped. It would be good to have the same control you have in xs (drop_level: bool) in loc.

Agreed. There has been some previous discussion on deprecating xs, but until this capability of xs is easily achieved with loc that's unlikely to happen.

so rather than enhancing xs to accept pd.Indexslice, it would probably be time better spent addressing the loc functionality.

I'll leave this labelled as error reporting for now, as improving the error message raised would be beneficial in the meantime.

wiso added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jul 15, 2020

zky001 added a commit to zky001/pandas that referenced this issue Jul 17, 2020

Update multi.py

9045f06

fix bugs be metioned on issue pandas-dev#35301

zky001 mentioned this issue Jul 17, 2020

Update multi.py #35318

Closed

5 tasks

zky001 added a commit to zky001/pandas that referenced this issue Jul 17, 2020

Update multi.py

bcc32a3

fix bugs be metioned on issue pandas-dev#35301

zky001 added a commit to zky001/pandas that referenced this issue Jul 17, 2020

Update multi.py

c14660a

fix bugs be metioned on issue pandas-dev#35301

zky001 mentioned this issue Jul 17, 2020

Update multi.py #35319

Closed

5 tasks

simonjayhawkins added Usage Question Error Reporting Incorrect or improved errors from pandas and removed Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jul 19, 2020

This was referenced Jul 25, 2020

CLN: clarify TypeError for IndexSlice argument to pd.xs #35411

Merged

ENH: implement drop_levels argument in loc #35418

Closed

jreback added this to the 1.2 milestone Aug 6, 2020

jreback added Indexing Related to indexing on series/frames, not to indexes themselves and removed Usage Question labels Aug 6, 2020

jreback closed this as completed in #35411 Aug 7, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: xs not working with slice #35301

BUG: xs not working with slice #35301

wiso commented Jul 15, 2020

arw2019 commented Jul 16, 2020

wiso commented Jul 16, 2020

arw2019 commented Jul 16, 2020

zky001 commented Jul 17, 2020

arw2019 commented Jul 17, 2020

jbrockmendel commented Jul 17, 2020

arw2019 commented Jul 19, 2020

simonjayhawkins commented Jul 19, 2020

zky001 commented Jul 20, 2020

wiso commented Jul 20, 2020

simonjayhawkins commented Jul 20, 2020

wiso commented Jul 20, 2020

simonjayhawkins commented Jul 20, 2020

BUG: xs not working with slice #35301

BUG: xs not working with slice #35301

Comments

wiso commented Jul 15, 2020

Code Sample, a copy-pastable example

Problem description

Output of pd.show_versions()

arw2019 commented Jul 16, 2020

wiso commented Jul 16, 2020

arw2019 commented Jul 16, 2020

zky001 commented Jul 17, 2020

arw2019 commented Jul 17, 2020

jbrockmendel commented Jul 17, 2020

arw2019 commented Jul 19, 2020

simonjayhawkins commented Jul 19, 2020

zky001 commented Jul 20, 2020

wiso commented Jul 20, 2020

simonjayhawkins commented Jul 20, 2020

wiso commented Jul 20, 2020

simonjayhawkins commented Jul 20, 2020

Output of `pd.show_versions()`