-
Notifications
You must be signed in to change notification settings - Fork 902
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create a common code path for 1d Frames #8115
Conversation
This reverts commit 58f0a23. # Conflicts: # python/cudf/cudf/core/index.py
Codecov Report
@@ Coverage Diff @@
## branch-0.20 #8115 +/- ##
===============================================
+ Coverage 82.88% 82.92% +0.03%
===============================================
Files 103 103
Lines 17668 17814 +146
===============================================
+ Hits 14645 14773 +128
- Misses 3023 3041 +18
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
@gpucibot merge |
This PR builds on #8115, moving all binary operations from `Index` and `Series` into the `SingleColumnFrame` class. It really should be a negative LOC change, but it doesn't look like it for two reasons: 1) `Index` objects require some special handling due to the awkwardness of needing to return the right _type_ of `Index`, which is frequently not the type that is being operated on (e.g. `RangeIndex + RangeIndex` results in an `Int64Index`), and that's something we'll want to refactor in a future PR, and 2) I've added a significant number of comments both in the form of docstrings and to give context for the issues arising from (1). This PR also significantly speeds up all binary operations for `Index` objects because it removes the round-tripping of data from `Index->Series->Index` that was previously being done to implement binary operations. The percent speedup depends on how expensive the operation itself is, but having tested for a number of data sizes it is >=15%, ranging up to 40% for simpler operations like `__ne__`. Benchmarks to follow. Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Keith Kraus (https://github.com/kkraus14) URL: #8166
Continuation of #8115 and #8166. Moves more logic out of the Index/Series classes into the new common parent class to reduce code duplication and ensure feature parity. Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - https://github.com/brandon-b-miller - Michael Wang (https://github.com/isVoid) URL: #8253
Until now the class hierarchy of index types has had numerous logical flaws for reasons of convenience: for instance, `RangeIndex` was always inheriting from `Frame` despite not actually being backed by data, and since #8115 `MultiIndex` has been a `SingleColumnFrame` even though it actually has multiple columns. This PR moves `BaseIndex` to the top of its own hierarchy, and uses multiple inheritance with `Frame` and `SingleColumnFrame` to create a more sensible hierarchy for its subclasses. `BaseIndex` is now effectively an ABC defining the interface that subclasses must define, but many of these methods are still inherited from `Frame` types (or in the case of `RangeIndex`, delegated to `Int64Index`). These changes remove lots of broken behavior that was previously present in `MultiIndex` and `RangeIndex`; for instance, binary operations would previously fail in strange ways for `MultiIndex`, and various hacks were necessary for `MultiIndex` methods to bypass `SingleColumnFrame`. `RangeIndex` methods that delegate to `Int64Index` are now made explicit (rather than the previous implicit conversion via `self._data`). The new hierarchy also allows much more sensible type-checking by mypy, which revealed numerous additional conceptual issues. The bulk of this PR is actually moving functions around to make the type checker happy, some of which also fixed actual functional issues: for example, `RangeIndex.get_loc` was previously broken. The refactor will make it much easier to handle future changes to all classes in the index hierarchy. Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Marlene (https://github.com/marlenezw) - Michael Wang (https://github.com/isVoid) URL: #9039
This PR unifies many code paths for 1-dimensional frame objects, namely Index and Series types. The bulk of the PR is moving code around, but there are a few renames (in particular
Index._values
->Index._column
) that are necessary for unifying the APIs as well as some additional removals of unnecessary functions. The unification also fixes a few bugs (for instance,bool(cudf.Index(...))
should raise an exception but wasn't) and adds a few missing APIs that were present in one class but not the other.