Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PERF: Avoid np.divmod in maybe_sequence_to_range #57812

Merged
merged 16 commits into from
Mar 21, 2024

Conversation

mroeschke
Copy link
Member

@mroeschke mroeschke commented Mar 11, 2024

xref #57534 (comment)

Made a is_range method (like is_range_indexer) that avoids a np.divmod operation

In [1]: from pandas import *; import numpy as np
   ...: np.random.seed(123)
   ...: size = 1_000_000
   ...: ngroups = 1000
   ...: data = Series(np.random.randint(0, ngroups, size=size))
+ /opt/miniconda3/envs/pandas-dev/bin/ninja
[1/1] Generating write_version_file with a custom command

In [2]: %timeit data.groupby(data).groups
14 ms ± 552 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)  # PR

In [3]: %timeit data.groupby(data).groups
17.8 ms ± 84.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)  # main

@mroeschke mroeschke added Performance Memory or execution speed performance Index Related to the Index class or subclasses labels Mar 11, 2024
@mroeschke mroeschke requested a review from WillAyd as a code owner March 11, 2024 18:34
@mroeschke mroeschke added this to the 3.0 milestone Mar 11, 2024
@@ -678,6 +678,26 @@ def is_range_indexer(ndarray[int6432_t, ndim=1] left, Py_ssize_t n) -> bool:
return True


@cython.wraparound(False)
@cython.boundscheck(False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don’t we only need int64?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sometimes we'll be calling self._shallow_copy(Index[int]._values) so I suppose the int type could be non-int64

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe should just use np.intp ? That matches what range would use internally

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like np.intp didn't work for the 32 bit and Windows build 0b484bd

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you seeing build failures just doing what @jbrockmendel suggests with int64 then? The error message ValueError: Buffer dtype mismatch, expected 'intp_t' but got 'long long' on the 32 bit build would seemingly indicate the 32 bit part of the fused type is unused, since long long is by definition at least 64 bits

https://github.com/pandas-dev/pandas/actions/runs/8268872377/job/22622796389#step:4:43677

Copy link
Member Author

@mroeschke mroeschke Mar 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With just int64_t, 32 bit and Windows builds are failing with ValueError: Buffer dtype mismatch, expected 'int64_t' but got 'long' https://github.com/pandas-dev/pandas/actions/runs/8284194543/job/22669177062?pr=57812

pandas/_libs/lib.pyx Outdated Show resolved Hide resolved
@@ -678,6 +678,26 @@ def is_range_indexer(ndarray[int6432_t, ndim=1] left, Py_ssize_t n) -> bool:
return True


@cython.wraparound(False)
@cython.boundscheck(False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe should just use np.intp ? That matches what range would use internally

@@ -678,6 +678,26 @@ def is_range_indexer(ndarray[int6432_t, ndim=1] left, Py_ssize_t n) -> bool:
return True


@cython.wraparound(False)
@cython.boundscheck(False)
def is_range(ndarray[int6432_t, ndim=1] sequence, int64_t diff) -> bool:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this have a more verbose-but-descriptive name? from this name alone id expect this to be a somehow-optimized isinstance(sequence, range)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thing. Renamed to is_sequence_range

@mroeschke mroeschke changed the title PERF: Avoid np.divmod in RangeIndex._shallow_copy PERF: Avoid np.divmod in maybe_sequence_to_range Mar 20, 2024
@mroeschke
Copy link
Member Author

Any other feedback here @jbrockmendel @WillAyd?

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm


for i in range(n):

if sequence[i] != sequence[0] + i * step:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could get slight improvement by accessing sequence[0] just once outside the loop?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. I'll implement this

@mroeschke mroeschke merged commit bfaf917 into pandas-dev:main Mar 21, 2024
46 checks passed
@mroeschke mroeschke deleted the ref/is_range_indexer/step branch March 21, 2024 17:14
pmhatre1 pushed a commit to pmhatre1/pandas-pmhatre1 that referenced this pull request May 7, 2024
* PERF: Avoid np.divmod in RangeIndex._shallow_copy

* Make is_range

* pyi error

* Use step

* Switch back to int6432

* try int64_t

* Revert "try int64_t"

This reverts commit b8ea98c.

* Adjust maybe_sequence_to_range

* Access first element once
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Index Related to the Index class or subclasses Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants