-
Notifications
You must be signed in to change notification settings - Fork 902
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor Index hierarchy #9039
Refactor Index hierarchy #9039
Conversation
Rerun tests |
rerun tests |
…ndex tests passing.
…r expected behavior.
…e order so that Frame methods take precedence.
…ic implementation.
…ersion out of the loop.
The diff just got a lot bigger because of the move, but nothing else of substance changed in the last few commits. |
+1 to what Michael suggested! The changes to Overall looks good to me! I had a few questions about some of the changes but I think it looks good! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
python/cudf/cudf/core/index.py
Outdated
def get_loc(self, key, method=None, tolerance=None): | ||
return self._as_int64().get_loc( | ||
key, method=method, tolerance=tolerance | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like there's a more efficient way to implement get_loc
for RangeIndex
. But I'm ok to leave it as is to make it work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh for sure, I think I thought about this earlier and forgot to implement. The formula should be very straightforward, can you request changes to prevent a merge?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@isVoid new get_loc
is implemented. I hope I got all the edge cases right, let me know if anything looks off.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per requested above, blocking for get_loc
.
@gpucibot merge |
Until now the class hierarchy of index types has had numerous logical flaws for reasons of convenience: for instance,
RangeIndex
was always inheriting fromFrame
despite not actually being backed by data, and since #8115MultiIndex
has been aSingleColumnFrame
even though it actually has multiple columns. This PR movesBaseIndex
to the top of its own hierarchy, and uses multiple inheritance withFrame
andSingleColumnFrame
to create a more sensible hierarchy for its subclasses.BaseIndex
is now effectively an ABC defining the interface that subclasses must define, but many of these methods are still inherited fromFrame
types (or in the case ofRangeIndex
, delegated toInt64Index
).These changes remove lots of broken behavior that was previously present in
MultiIndex
andRangeIndex
; for instance, binary operations would previously fail in strange ways forMultiIndex
, and various hacks were necessary forMultiIndex
methods to bypassSingleColumnFrame
.RangeIndex
methods that delegate toInt64Index
are now made explicit (rather than the previous implicit conversion viaself._data
). The new hierarchy also allows much more sensible type-checking by mypy, which revealed numerous additional conceptual issues. The bulk of this PR is actually moving functions around to make the type checker happy, some of which also fixed actual functional issues: for example,RangeIndex.get_loc
was previously broken. The refactor will make it much easier to handle future changes to all classes in the index hierarchy.