Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor Index hierarchy #9039

Merged
merged 30 commits into from
Aug 31, 2021

Conversation

vyasr
Copy link
Contributor

@vyasr vyasr commented Aug 14, 2021

Until now the class hierarchy of index types has had numerous logical flaws for reasons of convenience: for instance, RangeIndex was always inheriting from Frame despite not actually being backed by data, and since #8115 MultiIndex has been a SingleColumnFrame even though it actually has multiple columns. This PR moves BaseIndex to the top of its own hierarchy, and uses multiple inheritance with Frame and SingleColumnFrame to create a more sensible hierarchy for its subclasses. BaseIndex is now effectively an ABC defining the interface that subclasses must define, but many of these methods are still inherited from Frame types (or in the case of RangeIndex, delegated to Int64Index).

These changes remove lots of broken behavior that was previously present in MultiIndex and RangeIndex; for instance, binary operations would previously fail in strange ways for MultiIndex, and various hacks were necessary for MultiIndex methods to bypass SingleColumnFrame. RangeIndex methods that delegate to Int64Index are now made explicit (rather than the previous implicit conversion via self._data). The new hierarchy also allows much more sensible type-checking by mypy, which revealed numerous additional conceptual issues. The bulk of this PR is actually moving functions around to make the type checker happy, some of which also fixed actual functional issues: for example, RangeIndex.get_loc was previously broken. The refactor will make it much easier to handle future changes to all classes in the index hierarchy.

@vyasr vyasr added 2 - In Progress Currently a work in progress code quality Python Affects Python cuDF API. improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Aug 14, 2021
@vyasr vyasr added this to the CuDF Python Refactoring milestone Aug 14, 2021
@vyasr vyasr self-assigned this Aug 14, 2021
@vyasr
Copy link
Contributor Author

vyasr commented Aug 14, 2021

Rerun tests

@vyasr
Copy link
Contributor Author

vyasr commented Aug 16, 2021

rerun tests

@vyasr vyasr requested a review from isVoid August 24, 2021 00:23
@vyasr
Copy link
Contributor Author

vyasr commented Aug 24, 2021

The diff just got a lot bigger because of the move, but nothing else of substance changed in the last few commits.

@marlenezw
Copy link
Contributor

+1 to what Michael suggested! The changes to BaseIndex are good for avoiding circular imports but generally make the code easier to read, I think.

Overall looks good to me! I had a few questions about some of the changes but I think it looks good!

@vyasr vyasr requested a review from marlenezw August 25, 2021 23:16
Copy link
Contributor

@marlenezw marlenezw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@vyasr vyasr requested a review from isVoid August 27, 2021 17:28
Comment on lines 508 to 511
def get_loc(self, key, method=None, tolerance=None):
return self._as_int64().get_loc(
key, method=method, tolerance=tolerance
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like there's a more efficient way to implement get_loc for RangeIndex. But I'm ok to leave it as is to make it work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh for sure, I think I thought about this earlier and forgot to implement. The formula should be very straightforward, can you request changes to prevent a merge?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@isVoid new get_loc is implemented. I hope I got all the edge cases right, let me know if anything looks off.

Copy link
Contributor

@isVoid isVoid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per requested above, blocking for get_loc.

@vyasr vyasr requested a review from isVoid August 27, 2021 21:10
@vyasr vyasr requested a review from isVoid August 30, 2021 16:12
@vyasr
Copy link
Contributor Author

vyasr commented Aug 31, 2021

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 48bc39e into rapidsai:branch-21.10 Aug 31, 2021
@vyasr vyasr deleted the refactor/index_hierarchy branch January 14, 2022 18:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team improvement Improvement / enhancement to an existing function non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants