Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a common code path for 1d Frames #8115

Merged
merged 22 commits into from
May 3, 2021

Conversation

vyasr
Copy link
Contributor

@vyasr vyasr commented Apr 29, 2021

This PR unifies many code paths for 1-dimensional frame objects, namely Index and Series types. The bulk of the PR is moving code around, but there are a few renames (in particular Index._values->Index._column) that are necessary for unifying the APIs as well as some additional removals of unnecessary functions. The unification also fixes a few bugs (for instance, bool(cudf.Index(...)) should raise an exception but wasn't) and adds a few missing APIs that were present in one class but not the other.

@vyasr vyasr requested a review from a team as a code owner April 29, 2021 19:23
@github-actions github-actions bot added the Python Affects Python cuDF API. label Apr 29, 2021
@vyasr vyasr self-assigned this Apr 29, 2021
@vyasr vyasr added 3 - Ready for Review Ready for review by team improvement Improvement / enhancement to an existing function non-breaking Non-breaking change tech debt labels Apr 29, 2021
@codecov
Copy link

codecov bot commented May 1, 2021

Codecov Report

Merging #8115 (2317164) into branch-0.20 (51336df) will increase coverage by 0.03%.
The diff coverage is n/a.

❗ Current head 2317164 differs from pull request most recent head 6c2ee13. Consider uploading reports for the commit 6c2ee13 to get more accurate results
Impacted file tree graph

@@               Coverage Diff               @@
##           branch-0.20    #8115      +/-   ##
===============================================
+ Coverage        82.88%   82.92%   +0.03%     
===============================================
  Files              103      103              
  Lines            17668    17814     +146     
===============================================
+ Hits             14645    14773     +128     
- Misses            3023     3041      +18     
Impacted Files Coverage Δ
python/cudf/cudf/core/abc.py 91.66% <ø> (+0.17%) ⬆️
python/cudf/cudf/core/algorithms.py 82.35% <ø> (ø)
python/cudf/cudf/core/column/__init__.py 100.00% <ø> (ø)
python/cudf/cudf/core/column/categorical.py 92.37% <ø> (+0.13%) ⬆️
python/cudf/cudf/core/column/column.py 88.20% <ø> (-0.44%) ⬇️
python/cudf/cudf/core/column/datetime.py 88.03% <ø> (-1.88%) ⬇️
python/cudf/cudf/core/column/decimal.py 90.29% <ø> (-2.64%) ⬇️
python/cudf/cudf/core/column/interval.py 91.66% <ø> (+0.55%) ⬆️
python/cudf/cudf/core/column/lists.py 86.98% <ø> (-0.43%) ⬇️
python/cudf/cudf/core/column/numerical.py 94.72% <ø> (+0.29%) ⬆️
... and 49 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7623f39...6c2ee13. Read the comment docs.

@shwina
Copy link
Contributor

shwina commented May 3, 2021

Screen Shot 2021-05-03 at 7 54 45 AM

🔥

@vyasr vyasr requested a review from shwina May 3, 2021 15:42
Copy link
Contributor

@isVoid isVoid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@vyasr
Copy link
Contributor Author

vyasr commented May 3, 2021

@gpucibot merge

@rapids-bot rapids-bot bot merged commit ad081ae into rapidsai:branch-0.20 May 3, 2021
rapids-bot bot pushed a commit that referenced this pull request May 6, 2021
This PR builds on #8115, moving all binary operations from `Index` and `Series` into the `SingleColumnFrame` class. It really should be a negative LOC change, but it doesn't look like it for two reasons: 1) `Index` objects require some special handling due to the awkwardness of needing to return the right _type_ of `Index`, which is frequently not the type that is being operated on (e.g. `RangeIndex + RangeIndex` results in an `Int64Index`), and that's something we'll want to refactor in a future PR, and 2) I've added a significant number of comments both in the form of docstrings and to give context for the issues arising from (1). This PR also significantly speeds up all binary operations for `Index` objects because it removes the round-tripping of data from `Index->Series->Index` that was previously being done to implement binary operations. The percent speedup depends on how expensive the operation itself is, but having tested for a number of data sizes it is >=15%, ranging up to 40% for simpler operations like `__ne__`. Benchmarks to follow.

Authors:
  - Vyas Ramasubramani (https://github.com/vyasr)

Approvers:
  - Keith Kraus (https://github.com/kkraus14)

URL: #8166
rapids-bot bot pushed a commit that referenced this pull request May 18, 2021
Continuation of #8115 and #8166. Moves more logic out of the Index/Series classes into the new common parent class to reduce code duplication and ensure feature parity.

Authors:
  - Vyas Ramasubramani (https://github.com/vyasr)

Approvers:
  - https://github.com/brandon-b-miller
  - Michael Wang (https://github.com/isVoid)

URL: #8253
@vyasr vyasr added this to the cuDF Python Refactoring milestone Jul 22, 2021
@vyasr vyasr mentioned this pull request Aug 14, 2021
rapids-bot bot pushed a commit that referenced this pull request Aug 31, 2021
Until now the class hierarchy of index types has had numerous logical flaws for reasons of convenience: for instance, `RangeIndex` was always inheriting from `Frame` despite not actually being backed by data, and since #8115 `MultiIndex` has been a `SingleColumnFrame` even though it actually has multiple columns. This PR moves `BaseIndex` to the top of its own hierarchy, and uses multiple inheritance with `Frame` and `SingleColumnFrame` to create a more sensible hierarchy for its subclasses. `BaseIndex` is now effectively an ABC defining the interface that subclasses must define, but many of these methods are still inherited from `Frame` types (or in the case of `RangeIndex`, delegated to `Int64Index`). 

These changes remove lots of broken behavior that was previously present in `MultiIndex` and `RangeIndex`; for instance, binary operations would previously fail in strange ways for `MultiIndex`, and various hacks were necessary for `MultiIndex` methods to bypass `SingleColumnFrame`. `RangeIndex` methods that delegate to `Int64Index` are now made explicit (rather than the previous implicit conversion via `self._data`). The new hierarchy also allows much more sensible type-checking by mypy, which revealed numerous additional conceptual issues. The bulk of this PR is actually moving functions around to make the type checker happy, some of which also fixed actual functional issues: for example, `RangeIndex.get_loc` was previously broken. The refactor will make it much easier to handle future changes to all classes in the index hierarchy.

Authors:
  - Vyas Ramasubramani (https://github.com/vyasr)

Approvers:
  - Marlene  (https://github.com/marlenezw)
  - Michael Wang (https://github.com/isVoid)

URL: #9039
@vyasr vyasr deleted the refactor/oned_frames branch January 14, 2022 18:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team improvement Improvement / enhancement to an existing function non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants