-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Emit warning for missing labels in Multiindex.loc[[...]] (and more) #20770
Emit warning for missing labels in Multiindex.loc[[...]] (and more) #20770
Conversation
@jreback @jorisvandenbossche There is some problem with warnings in Python 2.7. On my machine,
passes and
passes, but
fails with Any suggestion (apart from skipping those on Python 2.7)? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks pretty good. need a few more comments. must be not-catching a warning (maybe only in py2).
with catch_warnings(record=True): | ||
assert isna(df.ix[:, [-1]].values).all() | ||
# ix does label-based indexing when having an integer index | ||
with pytest.raises(KeyError): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you also test on the row dim (maybe make these 2 a separate test), may already be an existing test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(done)
with catch_warnings(record=True): | ||
result = dfnu.ix[['E']] | ||
tm.assert_frame_equal(result, expected) | ||
with pytest.raises(KeyError): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you a comment here on what you are testing (maybe separate test)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(done)
@@ -59,12 +59,22 @@ def test_get_nan(): | |||
|
|||
# ensure that fixing the above hasn't broken get | |||
# with multiple elements | |||
idx = [2, 30] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move into separate test
@@ -119,15 +119,15 @@ def test_loc_getitem_label_out_of_range(self): | |||
typs=['ints', 'uints', 'labels', 'mixed', 'ts'], | |||
fails=KeyError) | |||
self.check_result('label range', 'loc', 'f', 'ix', 'f', | |||
typs=['floats'], fails=TypeError) | |||
typs=['floats'], fails=KeyError) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this reflected in the whatsnew?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a change in the API: I just fixed the test, which was checking nothing at all (because test objects for floats were missing).
pandas/core/indexing.py
Outdated
ax = o._get_axis(axis) | ||
indexer, keyarr = ax._convert_listlike_indexer(key, | ||
kind=self.name) | ||
if indexer is not None and (indexer != -1).all(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a comment here
pandas/core/indexing.py
Outdated
the list of keys was actually empty). | ||
""" | ||
ax = self.obj._get_axis(axis) | ||
# True indicates missing values |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe a blank line would help here (not sure what this comment refers)
pandas/core/indexing.py
Outdated
raise KeyError( | ||
u"None of [{key}] are in the [{axis}]".format( | ||
key=key, axis=self.obj._get_axis_name(axis))) | ||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you don't need the else here
pandas/core/indexing.py
Outdated
@@ -1352,6 +1370,33 @@ def _has_valid_type(self, key, axis): | |||
|
|||
return True | |||
|
|||
def _convert_for_reindex(self, key, axis=None): | |||
if axis is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a doc-string
pandas/core/indexing.py
Outdated
if com.is_bool_indexer(key): | ||
key = check_bool_indexer(labels, key) | ||
return labels[key] | ||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
else not needed
pandas/core/indexing.py
Outdated
elif is_integer(key): | ||
return self._is_valid_integer(key, axis) | ||
assert(self._is_valid_integer(key, axis)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the purpose of this assert?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically, _validate_key
should raise an error if the keys are not valid (and they aren't if they are an integer but not a valid integer). Better to test and eventually raise a ValueError
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes this should go thru _validate_key
(the old _has_valid_type
). it should raise if indicated by that type of indexer, so yes test for that
Codecov Report
@@ Coverage Diff @@
## master #20770 +/- ##
==========================================
+ Coverage 91.77% 91.79% +0.01%
==========================================
Files 153 153
Lines 49313 49354 +41
==========================================
+ Hits 45259 45306 +47
+ Misses 4054 4048 -6
Continue to review full report at Codecov.
|
b1f8d41
to
ae6e177
Compare
Hello @toobaz! Thanks for updating the PR. Cheers ! There are no PEP8 issues in this Pull Request. 🍻 Comment last updated on May 01, 2018 at 11:03 Hours UTC |
ae6e177
to
22cc773
Compare
22cc773
to
d035b52
Compare
@jreback ready for me |
Are we doing this for 0.23? |
I guess so! |
yes we should |
@@ -4881,6 +4881,9 @@ def _ensure_index(index_like, copy=False): | |||
if hasattr(index_like, 'name'): | |||
return Index(index_like, name=index_like.name, copy=copy) | |||
|
|||
if is_iterator(index_like): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this covers generators as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure
pandas/core/indexing.py
Outdated
@@ -186,33 +186,21 @@ def __setitem__(self, key, value): | |||
indexer = self._get_setitem_indexer(key) | |||
self._setitem_with_indexer(indexer, value) | |||
|
|||
def _has_valid_type(self, k, axis): | |||
def _validate_key(self, k, axis): | |||
raise NotImplementedError() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you make this an AbstractClassError, and add a doc-string here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you make this an AbstractClassError,
Pointer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AbstractMethoderror in pandas/core/base.py
pandas/core/indexing.py
Outdated
@@ -1337,7 +1377,7 @@ def __init__(self, name, obj): | |||
DeprecationWarning, stacklevel=2) | |||
super(_IXIndexer, self).__init__(name, obj) | |||
|
|||
def _has_valid_type(self, key, axis): | |||
def _validate_key(self, key, axis): | |||
if isinstance(key, slice): | |||
return True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a doc-string to these
pandas/core/indexing.py
Outdated
@@ -1656,7 +1739,7 @@ class _LocIndexer(_LocationIndexer): | |||
"index is integers), listlike of labels, boolean") | |||
_exception = KeyError | |||
|
|||
def _has_valid_type(self, key, axis): | |||
def _validate_key(self, key, axis): | |||
ax = self.obj._get_axis(axis) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
elif isinstance(key, tuple): | ||
# a tuple should already have been caught by this point | ||
# so don't treat a tuple as a valid indexer | ||
raise IndexingError('Too many indexers') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is hit in tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure:
pandas/pandas/tests/frame/test_indexing.py
Line 1389 in 6cacdde
def test_getitem_setitem_fancy_exceptions(self): |
(this is old code I just moved around)
l = len(self.obj._get_axis(axis)) | ||
|
||
if len(arr) and (arr.max() >= l or arr.min() < -l): | ||
raise IndexError("positional indexers are out-of-bounds") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is hit in tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, moreover this is trivial refactoring.
|
||
if len(arr) and (arr.max() >= l or arr.min() < -l): | ||
raise IndexError("positional indexers are out-of-bounds") | ||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can do w/o the else here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I would have to add a
else:
return
branch to the if
just above, because when the indexer is valid it does not return/raise
return self._get_slice_axis(key, axis=axis) | ||
|
||
if isinstance(key, list): | ||
try: | ||
key = np.asarray(key) | ||
except TypeError: # pragma: no cover |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is no longer needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see how this was ever needed. I don't think there is any way in which np.asarray([...])
could raise a TypeError
. np.asarray([0, [1, 2]])
raises ValueError
(catched elsewhere).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right my point was that this was catching something, so let's see what it was (and if not, of course remove)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was added in https://github.com/pandas-dev/pandas/pull/15504/files#diff-7489e290a23303c4db4f803011ecaf8eR1728 ... but I think it was an unnecessary precaution ( @jorisvandenbossche ?), to the best of my understanding it never catched anything.
@@ -131,6 +132,8 @@ def test_setitem_dtype_upcast(self): | |||
assert is_float_dtype(left['foo']) | |||
assert is_float_dtype(left['baz']) | |||
|
|||
@pytest.mark.skipif(PY2, reason=("Catching warnings unreliable with " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?? where does this come from? this means we are NOT catching a warning. need to pin this down.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, as I wrote: #20770 (review)
A bug in testing code only, which arises with Python 2 only, and only when two test files are run together, is in the intersection of my "no idea where to start" and "not high priority anyway" categories.
I can open an issue once merged - as there's no simple way (as far as I know) to reproduce otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as I said, this must be fixed before merging. This means you are not catching a warning. When you run the full suite does it not print out an uncaught warning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I said, see #20770 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(the full suite case behaves like the case with the two test files)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if this would be the only remaining issue
I'm afraid I won't have time for (inherited) docstrings and learning what is a AbstractClassError
soon.
I would make an exception in this case
To be honest I don't see to what is an exception been made. This PR fixes 3 bugs in code, and the fact that it identifies an additional, already existing, one in tests is an added bonus.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@toobaz I am not questing the refactoring or that you fixed some bugs, just that something that was added is hiding a bug, maybe in an exception. This is much worse than known bugs and very very hard to find.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we all agree that the bug is in the test suite? If yes, I guess I don't have much to add: as you correctly point out, it is a nasty bug, which will be hard to isolate - even harder if we don't merge this (because none of us knows how to reproduce otherwise).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but the point is that this refactor created the bug. i have no problem waiting to merge this until this is fixed. would rather have this correct, this is pretty tricky code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but the point is that this refactor created the bug
This "but" has no logical meaning in this discussion. Either you agree with my reasoning that the bug is in the test suite, or you think that a bug in the code can lead a test to pass only when called in isolation, and then I ask you to explain me how. In the first case, this refactor didn't "create" any bug. In the second... I'm patiently waiting for an explanation (not for one more repetition of the same illogical claim).
Then, sure, there might be a bug in the code and a bug in the tests suite. That's why I have manually tried the code in the failing tests, and it emits the warning just as it should - feel free to try.
The point is not just 0.23.0, it is that we have no idea of what the problem is, it is certainly not caused by this branch, it will be most probably not fixed soon, this branch will soon become a rebasing nightmare (I know by experience), and the indexing code desperately needs cleanup (and it is already difficult enough).
And of course, the point is also that I still didn't see any logical argument for postponing. You know, I like when discussions are based on actual arguments (i.e. when they are not just a waste of time).
Not sure... the notice in 0.21.0 (in principle) already covered this case (it didn't mention special cases with specific kinds of |
@toobaz the issue is that I have seen the warnings pop up and signal an actual error multiple times |
In current master they are ignored. |
how’s that? |
To conclude myself: yes the added Anyhow, the full test is also a bit bogus. As it only checks that at least one FutureWarning is raised, but the |
71c9f00
to
5fd21fe
Compare
@jorisvandenbossche thanks, I should have applied your fix. |
@jorisvandenbossche let me know if you see any obvious mistake in my implementation of your fix... |
@jorisvandenbossche for sure your fix is good, but I'm afraid there's something deeper (also because your fix alone should in principle affect Python 2 and 3, and invocation of the entire test suite or of single files, in the same way) |
Too bad :-) There might be other similar cases where warnings are catched and hidden in the testing code. It's a bit strange that both are failing. Locally it was only one of two that are failing. (and with your updated branch none, at least when only running the two files) That said, I am totally fine with skipping those for now, given it is probably a bug in existing testing code. |
5fd21fe
to
48fb72e
Compare
Agree. So OK for 0.23.0? |
@jreback what's your opinion here? I was not yet able to look at the code changes in detail, but I think you said the changes in itself were fine, so that's ok for me. I think the extra warning for MultiIndex/FloatIndex would be really nice to include in 0.23.0, so I would favor merging this. I looked a bit in the failing cases for python 2, and the one we did find (unfortunately not all) was clearly an error in the testing code (a testing for Panel was catching and hiding all warnings, including one for the index, and hence it was not raised anymore for specific tests, not sure why it only was the case on python 2 though). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor changes, lgtm.
doc/source/whatsnew/v0.23.0.txt
Outdated
@@ -873,6 +873,7 @@ Deprecations | |||
- The ``is_copy`` attribute is deprecated and will be removed in a future version (:issue:`18801`). | |||
- ``IntervalIndex.from_intervals`` is deprecated in favor of the :class:`IntervalIndex` constructor (:issue:`19263`) | |||
- ``DataFrame.from_items`` is deprecated. Use :func:`DataFrame.from_dict` instead, or ``DataFrame.from_dict(OrderedDict())`` if you wish to preserve the key order (:issue:`17320`, :issue:`17312`) | |||
- Indexing a ``MultiIndex`` or a ``FloatIndex`` with a list containing some missing keys will now show a ``FutureWarning``, consistently with other types of indexes (:issue:`17758`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add :class:
for these
doc/source/whatsnew/v0.23.0.txt
Outdated
@@ -1107,6 +1108,8 @@ Indexing | |||
- :func:`Index.to_series` now accepts ``index`` and ``name`` kwargs (:issue:`18699`) | |||
- :func:`DatetimeIndex.to_series` now accepts ``index`` and ``name`` kwargs (:issue:`18699`) | |||
- Bug in indexing non-scalar value from ``Series`` having non-unique ``Index`` will return value flattened (:issue:`17610`) | |||
- Bug in indexing with iterator containing only missing keys, which raised no error (:issue:`20748`) | |||
- Fixed inconsistency in ``.ix`` between list and scalar keys when index has integer dtype and does not include desired key(s) (:issue:`20753`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... scalar keys when the index has an integer dtype and does not include the desired ...
pandas/core/indexing.py
Outdated
axis : int | ||
Dimension on which the indexing is being made | ||
|
||
Returns: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we dont' put this if it returns None, @TomAugspurger
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typically we don't.
pandas/core/indexing.py
Outdated
o._get_axis_number(axis)) | ||
d[axis] = (keyarr, indexer) | ||
return o._reindex_with_indexers(d, copy=True, allow_dups=True) | ||
except(KeyError, IndexingError) as detail: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure if its worth it, but is it possible to surround only the op that could raise the IndexingError? (with the try/except)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't address this. I agree that'd be nice, but at a glance it looks like there are 3-4 places that could raise one of those errors. Seems tricky
@@ -1073,8 +1085,9 @@ def _getitem_axis(self, key, axis=None): | |||
if axis is None: | |||
axis = self.axis or 0 | |||
|
|||
if self._should_validate_iterable(axis): | |||
self._has_valid_type(key, axis) | |||
if is_iterator(key): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this is_iterator needed here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't check this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could be moved at a later stage... but I would see it as a loss (more general argument in #20748)
pandas/core/indexing.py
Outdated
axis: int | ||
Dimension on which the indexing is being made | ||
|
||
Returns |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove this I think
@@ -225,6 +221,10 @@ def test_dups_fancy_indexing(self): | |||
result = df.loc[['A', 'A', 'E']] | |||
tm.assert_frame_equal(result, expected) | |||
|
|||
if PY2: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you split this test, and use the skip decorator instead (and add the reason)
Fixed up all but two of the comments. I don't think the tighter exception catching is easy to do correctly. No idea about checking the key for an iterator. Running the tests now to see if it's hit. |
... and in any case, I plan to soon merge that code in the "non-multi" branch. (the "multi" branch should be peculiar in how it retrieves values, not in how it creates indexers) |
Is that going in this PR, or a different one? |
Future (it's the "future refactoring" I mention in the description) |
Great, thanks. Merging on green.
…On Tue, May 1, 2018 at 6:29 AM, Pietro Battiston ***@***.***> wrote:
Is that going in this PR, or a different one?
Future (it's the "future refactoring" I mention in the description
<#20770 (comment)>)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#20770 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABQHIgS-XrheV9Uxkk_JrTJ7-L18zkrDks5tuEcxgaJpZM4Td4lF>
.
|
Some follow-up questions:
|
Sure: #20914
Indeed, the warning for the partial indexing (including indexing by levels, even where each level is specified) requires a separate approach: see #20916 |
git diff upstream/master -u -- "*.py" | flake8 --diff
Main changes:
obj.reindex
was replaced withobj._reindex_with_indexers
_multi_take
is uglier than before... but it's ugly anyway, and must be removed in a future refactoring (in which first indexers are built in all possible code paths, then values are extracted)_has_valid_type
was mostly used in "assertion mode" (without looking at its return value), so I changed it to_validate_key
and now it is only used in "assertion mode"Asv (
--bench indexing
) givenotice that
FloatIndex
was not being checked at all (i.e. #17758 applied to it too), now it is.In any case, as I wrote above, there are several improvements still to be made, but I want to go gradually because the complexity of this PR is already pretty high for me.