Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PERF: copy cached attributes on extension index shallow_copy #32640

Merged
merged 2 commits into from
Mar 12, 2020

Conversation

topper-123
Copy link
Contributor

@topper-123 topper-123 commented Mar 11, 2020

Follow-up to #32568.

Copies ._cache also when copying using .shallow_copy for:

  • CategoricalIndex
  • DatetimeIndex
  • PeriodIndex
  • DateTimeIndex
  • IntervalIndex

After this PR, only MultiIndex._shallow_copy is missing this optimization. MultiIndex._shallow_copy is a bit special and might require a refactor so I'd like to take that in a seperate PR.

Example performance improvement:

>>> idx = pd.CategoricalIndex(np.arange(100_000))
>>> %timeit idx.get_loc(99_999)
4.46 µs ± 62.6 ns per loop  # master and this PR
>>> %timeit idx._shallow_copy().get_loc(99_999)
4.19 ms ± 117 µs per loop  # master
8.58 µs ± 254 ns per loop  # this PR

@topper-123 topper-123 force-pushed the perf_shallow_copy_II branch from a3d5f1f to 4558947 Compare March 11, 2020 23:30
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. tiny doc comment, merge on green.

that depend on creating copies of existing indexes (:issue:`28584`)
- The internal index method :meth:`~Index._shallow_copy` now copies cached attributes over to the new index,
avoiding creating these again on the new index. This can speed up many operations that depend on creating copies of
existing indexes (:issue:`28584`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add this PR number here

@jreback jreback added Constructors Series/DataFrame/Index/pd.array Constructors Index Related to the Index class or subclasses Performance Memory or execution speed performance labels Mar 12, 2020
@jreback jreback added this to the 1.1 milestone Mar 12, 2020
@topper-123 topper-123 force-pushed the perf_shallow_copy_II branch from a8facef to caccd83 Compare March 12, 2020 09:55
@topper-123 topper-123 merged commit 9060d88 into pandas-dev:master Mar 12, 2020
@topper-123 topper-123 deleted the perf_shallow_copy_II branch March 12, 2020 10:29
@topper-123 topper-123 mentioned this pull request Mar 12, 2020
5 tasks
SeeminSyed pushed a commit to CSCD01-team01/pandas that referenced this pull request Mar 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Constructors Series/DataFrame/Index/pd.array Constructors Index Related to the Index class or subclasses Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants