Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug or unhelpful error message in 0.12 and trunk: IndexError: Out of bounds on buffer access (axis 0) #4825

Closed
ruidc opened this issue Sep 12, 2013 · 29 comments · Fixed by #4833
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves
Milestone

Comments

@ruidc
Copy link
Contributor

ruidc commented Sep 12, 2013

The following raises the error

pandas.Series([0.1, 0.2], index=[1, 2]).loc[[2, 3, 2]]

whilst

pandas.Series([0.1, 0.2], index=[1, 2]).loc[[2, 2, 3]]

does not. Or is there something invalid with this code?

@jreback
Copy link
Contributor

jreback commented Sep 12, 2013

Seem ok in master (they on don't raise because you have least 1 valid value)

In [1]: pandas.Series([0.1, 0.2], index=[1, 2]).loc[[2, 3, 2]]
Out[1]: 
2    0.2
3    NaN
2    0.2
dtype: float64

In [2]: pandas.Series([0.1, 0.2], index=[1, 2]).loc[[2, 2, 3]]
Out[2]: 
2    0.2
2    0.2
3    NaN
dtype: float64

@jreback jreback closed this as completed Sep 12, 2013
@ruidc
Copy link
Contributor Author

ruidc commented Sep 13, 2013

either I've made a mistake in simplifying or the above scenario has been fixed in the last 2 days of trunk.
I've refreshed from git and can duplicate the problem with:

pandas.Series([0.1, 0.2], index=[1, 2]).loc[[3, 2, 3]]

In [69]: pandas.__version__
Out[69]: '0.12.0-399-g1409049'

@jreback
Copy link
Contributor

jreback commented Sep 13, 2013

did u make build?

eg regen the cython code

@ruidc
Copy link
Contributor Author

ruidc commented Sep 13, 2013

I just ran:

python2 setup.py build_ext --inplace

Is that correct?

@jreback
Copy link
Contributor

jreback commented Sep 13, 2013

yep
run make clean
then setup again
does the full test suite pass?

@ruidc
Copy link
Contributor Author

ruidc commented Sep 13, 2013

I've deleted and refetched and rebuilt the pandas source, nosetests show only skips, no errors, the above still fails.

@jreback
Copy link
Contributor

jreback commented Sep 13, 2013

can u show
ci/print_versions.py ?

@ruidc
Copy link
Contributor Author

ruidc commented Sep 13, 2013

INSTALLED VERSIONS
------------------
Python: 2.7.5.final.0
OS: Linux 3.10.10-1-ARCH #1 SMP PREEMPT Fri Aug 30 11:30:06 CEST 2013 x86_64
byteorder: little
LC_ALL: None
LANG: en.US.UTF-8

Cython: 0.19.1
Numpy: 1.7.1
Scipy: Not installed
statsmodels: Not installed
    patsy: Not installed
scikits.timeseries: Not installed
dateutil: 2.1
pytz: 2013d
bottleneck: Not installed
PyTables: Not Installed
    numexpr: Not Installed
matplotlib: Not installed
openpyxl: Not installed
xlrd: Not installed
xlwt: Not installed
sqlalchemy: Not installed
lxml: Not installed
bs4: Not installed
html5lib: Not installed

@ruidc
Copy link
Contributor Author

ruidc commented Sep 13, 2013

It's running on freshly purpose-built VM with an up-to-date ArchLinux x64, although it also fails on 0.12 in Windows - just that without nightly windows builds and not being able to get pandas to compile with either mingw or Win SDK 7.0 I had to build this VM just for this bug!

@ruidc
Copy link
Contributor Author

ruidc commented Sep 13, 2013

Are you having difficulty reproducing?

@jtratner
Copy link
Contributor

@ruidc I can't reproduce this either.

@ruidc
Copy link
Contributor Author

ruidc commented Sep 13, 2013

Can you reproduce on 0.12 ?

@jtratner
Copy link
Contributor

@ruidc yes, it produces an error for me in 0.12.0, but this no longer occurs on current master. I believe we're putting out either a point release or a new release version very soon. I am, however, going to add this to the test suite.

Error was:

Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "pandas/core/indexing.py", line 699, in __getitem__
    return self._getitem_axis(key, axis=0)
  File "pandas/core/indexing.py", line 785, in _getitem_axis
    return self._getitem_iterable(key, axis=axis)
  File "pandas/core/indexing.py", line 493, in _getitem_iterable
    result = result._reindex_with_indexers(*args, copy=False, fill_value=np.nan)
  File "pandas/core/series.py", line 2649, in _reindex_with_indexers
    new_values = com.take_1d(self.values, indexer, fill_value=fill_value)
  File "pandas/core/common.py", line 537, in take_nd
    func(arr, indexer, out, fill_value)
  File "generated.pyx", line 2560, in pandas.algos.take_1d_float64_float64 (pandas/algos.c:68204)
IndexError: Out of bounds on buffer access (axis 0)

@jreback
Copy link
Contributor

jreback commented Sep 13, 2013

can you post the full trace back u r getting?

@jtratner
Copy link
Contributor

@jreback see my comment above for traceback from the error I got on 0.12 [@ruidc might be getting something different]

@ruidc
Copy link
Contributor Author

ruidc commented Sep 13, 2013

Thanks, I look forward to Windows binaries from head working again, or an RC binary for 0.13.

In [1]: import pandas

In [2]: pandas.Series([0.1, 0.2], index=[1, 2]).loc[[3, 2, 3]]
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-2-2abb8a62abc9> in <module>()
----> 1 pandas.Series([0.1, 0.2], index=[1, 2]).loc[[3, 2, 3]]

/home/src/pandas/pandas/core/indexing.pyc in __getitem__(self, key)
    933             return self._getitem_tuple(key)
    934         else:
--> 935             return self._getitem_axis(key, axis=0)
    936
    937     def _getitem_axis(self, key, axis=0):

/home/src/pandas/pandas/core/indexing.pyc in _getitem_axis(self, key, axis)
   1025                 raise ValueError('Cannot index with multidimensional key')
   1026
-> 1027             return self._getitem_iterable(key, axis=axis)
   1028         else:
   1029             return self._get_label(key, axis=axis)

/home/src/pandas/pandas/core/indexing.pyc in _getitem_iterable(self, key, axis)
    716                         raise AssertionError("invalid indexing error with non-unique index")
    717
--> 718                     result = result._reindex_with_indexers({ axis : [ new_labels, new_indexer ] }, copy=True, allow_dups=True)
    719
    720                 return result

/home/src/pandas/pandas/core/generic.pyc in _reindex_with_indexers(self, reindexers, method, fill_value, limit, copy, allow_dups)
   1214                 indexer = com._ensure_int64(indexer)
   1215                 new_data = new_data.reindex_indexer(index, indexer, axis=baxis,
-> 1216                                                     fill_value=fill_value, allow_dups=allow_dups)
   1217
   1218             elif baxis == 0 and index is not None and index is not new_data.axes[baxis]:

/home/src/pandas/pandas/core/internals.pyc in reindex_indexer(self, new_axis, indexer, axis, fill_value, allow_dups)
   2830
   2831         if axis == 0:
-> 2832             return self._reindex_indexer_items(new_axis, indexer, fill_value)
   2833
   2834         new_blocks = []

/home/src/pandas/pandas/core/internals.pyc in _reindex_indexer_items(self, new_items, indexer, fill_value)
   3178     def _reindex_indexer_items(self, new_items, indexer, fill_value):
   3179         # equiv to a reindex
-> 3180         return self.reindex(new_items, indexer=indexer, fill_value=fill_value, copy=False)
   3181
   3182     def reindex_axis0_with_method(self, new_axis, indexer=None, method=None, fill_value=None, limit=None, copy=True):

/home/src/pandas/pandas/core/internals.pyc in reindex(self, new_axis, indexer, method, fill_value, limit, copy)
   3171
   3172         block = self._block.reindex_items_from(new_axis, indexer=indexer, method=method,
-> 3173                                                fill_value=fill_value, limit=limit, copy=copy)
   3174         mgr = SingleBlockManager(block, new_axis)
   3175         mgr._consolidate_inplace()

/home/src/pandas/pandas/core/internals.pyc in reindex_items_from(self, new_ref_items, indexer, method, fill_value, limit, copy)
    234             if self.ndim == 1:
    235                 new_values = com.take_1d(self.values, indexer,
--> 236                                          fill_value=fill_value)
    237             else:
    238

/home/src/pandas/pandas/core/common.pyc in take_nd(arr, indexer, axis, out, fill_value, mask_info, allow_fill)
    551     func = _get_take_nd_function(arr.ndim, arr.dtype, out.dtype,
    552                                  axis=axis, mask_info=mask_info)
--> 553     func(arr, indexer, out, fill_value)
    554     return out
    555

/home/src/pandas/pandas/algos.so in pandas.algos.take_1d_float64_float64 (pandas/algos.c:68204)()

IndexError: Out of bounds on buffer access (axis 0)

@jreback
Copy link
Contributor

jreback commented Sep 13, 2013

no this changed from 0.12
this is partial setting where the frame is expanded for the missing values

@jreback
Copy link
Contributor

jreback commented Sep 13, 2013

Fixed...here's the behavior. Anything look odd? None of these raise (well doing iloc with an out of bounds index will), but loc won't when presented with an iterable

In [1]: ser = Series([0.1, 0.2], index=[1, 2])

In [3]: ser
Out[3]: 
1    0.1
2    0.2
dtype: float64

In [4]: ser.loc[[3, 2, 3]]
Out[4]: 
3    NaN
2    0.2
3    NaN
dtype: float64

In [5]: ser.loc[[3, 3, 3]]
Out[5]: 
3   NaN
3   NaN
3   NaN
dtype: float64

In [6]: ser.loc[[2, 2, 3]]
Out[6]: 
2    0.2
2    0.2
3    NaN
dtype: float64

In [7]: ser.iloc[[1,1,0,0]]
Out[7]: 
2    0.2
2    0.2
1    0.1
1    0.1
dtype: float64

This is a unique index on the selector (effectively like a reindex)

In [12]: ser.loc[[1,2,3]]
Out[12]: 
1    0.1
2    0.2
3    NaN
dtype: float64

These raise as expected

In [10]: ser.iloc[3]
IndexError: index out of bounds

In [8]: ser.loc[3]
KeyError: 3

@jreback
Copy link
Contributor

jreback commented Sep 13, 2013

@jtratner @ruidc thoughts?

@ruidc
Copy link
Contributor Author

ruidc commented Sep 13, 2013

sorry, am away from the office now so can't test the patch until monday


From: jreback [email protected]
To: pydata/pandas [email protected]
Cc: ruidc [email protected]
Sent: Friday, 13 September 2013, 18:18
Subject: Re: [pandas] bug or unhelpful error message in 0.12 and trunk: IndexError: Out of bounds on buffer access (axis 0) (#4825)

@jtratner @ruidc thoughts?

Reply to this email directly or view it on GitHub.

@jtratner
Copy link
Contributor

@jreback that looks fine to me...

@ruidc
Copy link
Contributor Author

ruidc commented Sep 16, 2013

were you able to reproduce in the end?

applying this PR, I get the original passing but:

#passes:
pandas.Series([0.1, 0.2, 0.3], index=[1,2,3]).loc[[3,4,4]]
pandas.Series([0.1, 0.2, 0.3, 0.4], index=[1,2,3,4]).loc[[5,3,3]]

#fails:
pandas.Series([0.1, 0.2, 0.3, 0.4], index=[1,2,3,4]).loc[[5,4,4]]
pandas.Series([0.1, 0.2, 0.3, 0.4], index=[4,5,6,7]).loc[[7,2,2]]
pandas.Series([0.1, 0.2, 0.3, 0.4], index=[1,2,3,4]).loc[[4,5,5]]

@jreback
Copy link
Contributor

jreback commented Sep 16, 2013

updated....was doing something dumb! thanks for the tests

In [2]: #passes:

In [3]: pandas.Series([0.1, 0.2, 0.3], index=[1,2,3]).loc[[3,4,4]]
Out[3]: 
3    0.3
4    NaN
4    NaN
dtype: float64

In [4]: pandas.Series([0.1, 0.2, 0.3, 0.4], index=[1,2,3,4]).loc[[5,3,3]]
Out[4]: 
5    NaN
3    0.3
3    0.3
dtype: float64

In [5]: 

In [5]: #fails:

In [6]: pandas.Series([0.1, 0.2, 0.3, 0.4], index=[1,2,3,4]).loc[[5,4,4]]
Out[6]: 
5    NaN
4    0.4
4    0.4
dtype: float64

In [7]: pandas.Series([0.1, 0.2, 0.3, 0.4], index=[4,5,6,7]).loc[[7,2,2]]
Out[7]: 
7    0.4
2    NaN
2    NaN
dtype: float64

In [8]: pandas.Series([0.1, 0.2, 0.3, 0.4], index=[1,2,3,4]).loc[[4,5,5]]
Out[8]: 
4    0.4
5    NaN
5    NaN
dtype: float64

@ruidc
Copy link
Contributor Author

ruidc commented Sep 16, 2013

Let me know when I can test it again as our data seems to cover a few bases ;)

@jreback
Copy link
Contributor

jreback commented Sep 16, 2013

the PR is updated

@ruidc
Copy link
Contributor Author

ruidc commented Sep 16, 2013

looks great now, thx! do I close the Issue or should it only be closed when the pull request is closed?

@ruidc
Copy link
Contributor Author

ruidc commented Sep 16, 2013

... also for future reference, were you able to reproduce the problem then? or where did we have the disconnect on not having our results match?

@jreback
Copy link
Contributor

jreback commented Sep 16, 2013

@ruidc

no.your examples where showing errors; I added them as test cases.

This particular indexing point in the code was 'new' and just didn't have all the cases covered, as certain examples 'worked' even when was doing something 'wrong'. So had a false positive before.

thanks for the help!

@ruidc
Copy link
Contributor Author

ruidc commented Sep 16, 2013

the thanks is all mine!


From: jreback [email protected]
To: pydata/pandas [email protected]
Cc: ruidc [email protected]
Sent: Monday, 16 September 2013, 14:58
Subject: Re: [pandas] bug or unhelpful error message in 0.12 and trunk: IndexError: Out of bounds on buffer access (axis 0) (#4825)

@ruidc
no.your examples where showing errors; I added them as test cases.
This particular indexing point in the code was 'new' and just didn't have all the cases covered, as certain examples 'worked' even when was doing something 'wrong'. So had a false positive before.
thanks for the help!

Reply to this email directly or view it on GitHub.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
3 participants