Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PERF: copy cached attributes on index shallow_copy #32568

Merged
merged 2 commits into from
Mar 11, 2020
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion doc/source/whatsnew/v1.1.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -187,7 +187,9 @@ Performance improvements
- Performance improvement in :class:`Timedelta` constructor (:issue:`30543`)
- Performance improvement in :class:`Timestamp` constructor (:issue:`30543`)
- Performance improvement in flex arithmetic ops between :class:`DataFrame` and :class:`Series` with ``axis=0`` (:issue:`31296`)
-
- The internal :meth:`Index._shallow_copy` now copies cached attributes over to the new index,
avoiding creating these again on the new index. This can speed up many operations
that depend on creating copies of existing indexes (:issue:`28584`)

.. ---------------------------------------------------------------------------

Expand Down
12 changes: 8 additions & 4 deletions pandas/core/indexes/base.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
from datetime import datetime
import operator
from textwrap import dedent
from typing import TYPE_CHECKING, Any, FrozenSet, Hashable, Union
from typing import TYPE_CHECKING, Any, Dict, FrozenSet, Hashable, Union
import warnings

import numpy as np
Expand Down Expand Up @@ -250,6 +250,7 @@ def _outer_indexer(self, left, right):

_typ = "index"
_data: Union[ExtensionArray, np.ndarray]
_cache: Dict[str, Any]
_id = None
_name: Label = None
# MultiIndex.levels previously allowed setting the index name. We
Expand Down Expand Up @@ -468,6 +469,7 @@ def _simple_new(cls, values, name: Label = None):
# we actually set this value too.
result._index_data = values
result._name = name
result._cache = {}

return result._reset_identity()

Expand Down Expand Up @@ -499,10 +501,12 @@ def _shallow_copy(self, values=None, name: Label = no_default):
"""
name = self.name if name is no_default else name

if values is None:
values = self.values
cache = self._cache if values is None else {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you keep the if values is None: check; i like it for inspection of coverage results

values = self.values if values is None else values

return self._simple_new(values, name=name)
result = self._simple_new(values, name=name)
result._cache = cache
return result

def _shallow_copy_with_infer(self, values, **kwargs):
"""
Expand Down
8 changes: 2 additions & 6 deletions pandas/core/indexes/numeric.py
Original file line number Diff line number Diff line change
Expand Up @@ -104,15 +104,11 @@ def _maybe_cast_slice_bound(self, label, side, kind):

@Appender(Index._shallow_copy.__doc__)
def _shallow_copy(self, values=None, name: Label = lib.no_default):
name = name if name is not lib.no_default else self.name

if values is not None and not self._can_hold_na and values.dtype.kind == "f":
name = self.name if name is lib.no_default else name
# Ensure we are not returning an Int64Index with float data:
return Float64Index._simple_new(values, name=name)

if values is None:
values = self.values
return type(self)._simple_new(values, name=name)
return super()._shallow_copy(values=values, name=name)

def _convert_for_op(self, value):
"""
Expand Down