-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PERF: Avoid np.divmod in maybe_sequence_to_range #57812
PERF: Avoid np.divmod in maybe_sequence_to_range #57812
Conversation
@@ -678,6 +678,26 @@ def is_range_indexer(ndarray[int6432_t, ndim=1] left, Py_ssize_t n) -> bool: | |||
return True | |||
|
|||
|
|||
@cython.wraparound(False) | |||
@cython.boundscheck(False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don’t we only need int64?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sometimes we'll be calling self._shallow_copy(Index[int]._values)
so I suppose the int
type could be non-int64
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe should just use np.intp
? That matches what range would use internally
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like np.intp
didn't work for the 32 bit and Windows build 0b484bd
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you seeing build failures just doing what @jbrockmendel suggests with int64 then? The error message ValueError: Buffer dtype mismatch, expected 'intp_t' but got 'long long'
on the 32 bit build would seemingly indicate the 32 bit part of the fused type is unused, since long long
is by definition at least 64 bits
https://github.com/pandas-dev/pandas/actions/runs/8268872377/job/22622796389#step:4:43677
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With just int64_t
, 32 bit and Windows builds are failing with ValueError: Buffer dtype mismatch, expected 'int64_t' but got 'long'
https://github.com/pandas-dev/pandas/actions/runs/8284194543/job/22669177062?pr=57812
@@ -678,6 +678,26 @@ def is_range_indexer(ndarray[int6432_t, ndim=1] left, Py_ssize_t n) -> bool: | |||
return True | |||
|
|||
|
|||
@cython.wraparound(False) | |||
@cython.boundscheck(False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe should just use np.intp
? That matches what range would use internally
pandas/_libs/lib.pyx
Outdated
@@ -678,6 +678,26 @@ def is_range_indexer(ndarray[int6432_t, ndim=1] left, Py_ssize_t n) -> bool: | |||
return True | |||
|
|||
|
|||
@cython.wraparound(False) | |||
@cython.boundscheck(False) | |||
def is_range(ndarray[int6432_t, ndim=1] sequence, int64_t diff) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this have a more verbose-but-descriptive name? from this name alone id expect this to be a somehow-optimized isinstance(sequence, range)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure thing. Renamed to is_sequence_range
This reverts commit b8ea98c.
Any other feedback here @jbrockmendel @WillAyd? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
pandas/_libs/lib.pyx
Outdated
|
||
for i in range(n): | ||
|
||
if sequence[i] != sequence[0] + i * step: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could get slight improvement by accessing sequence[0] just once outside the loop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea. I'll implement this
* PERF: Avoid np.divmod in RangeIndex._shallow_copy * Make is_range * pyi error * Use step * Switch back to int6432 * try int64_t * Revert "try int64_t" This reverts commit b8ea98c. * Adjust maybe_sequence_to_range * Access first element once
xref #57534 (comment)
Made a
is_range
method (likeis_range_indexer
) that avoids anp.divmod
operation