Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG/CoW: is_range_indexer can't handle very large arrays #53672

Merged
merged 2 commits into from
Jun 15, 2023

Conversation

lithomas1
Copy link
Member

I don't have enough RAM to test that this actually works.

(But the test raises OverflowError before, and doesn't raise after).

@lithomas1 lithomas1 added Indexing Related to indexing on series/frames, not to indexes themselves Copy / view semantics labels Jun 14, 2023
@lithomas1 lithomas1 requested a review from phofl June 14, 2023 18:12
@lithomas1 lithomas1 requested a review from WillAyd as a code owner June 14, 2023 18:12
@phofl
Copy link
Member

phofl commented Jun 14, 2023

thx

# GH53616
left = np.arange(0, 100, dtype=dtype)

assert not lib.is_range_indexer(left, 2**31)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert not lib.is_range_indexer(left, 2**31)
assert not lib.is_range_indexer(left, np.iinfo(np.int32).max)

To fix the failing 32 bit build

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the catch. I can't change the size here unfortunately (need it to be 2**31 == np.iinfo(np.int32).max + 1 to trigger the OverflowError previously).

Gonna skip on 32-bit instead, since the issue can't happen there.
(Max array len should be less than int32 max on 32-bit platforms).

@mroeschke mroeschke added this to the 2.1 milestone Jun 15, 2023
@mroeschke mroeschke merged commit 905fe6b into pandas-dev:main Jun 15, 2023
@mroeschke
Copy link
Member

Thanks @lithomas1

@lithomas1 lithomas1 deleted the bug-cow-bigarray branch June 15, 2023 00:21
mroeschke pushed a commit to mroeschke/pandas that referenced this pull request Jun 15, 2023
…53672)

* BUG: is_range_indexer can't handle very large arrays

* fix test on 32-bit
@phofl
Copy link
Member

phofl commented Jun 15, 2023

@lithomas1 thoughts about back porting this? This is potentially very annoying

@lithomas1
Copy link
Member Author

Yeah, for sure. I'll move the whatsnew in a bit then.

@lithomas1 lithomas1 mentioned this pull request Jun 15, 2023
5 tasks
@lithomas1 lithomas1 modified the milestones: 2.1, 2.0.3 Jun 15, 2023
@lithomas1
Copy link
Member Author

@meeseeksdev backport 2.0.x

@lumberbot-app
Copy link

lumberbot-app bot commented Jun 15, 2023

Owee, I'm MrMeeseeks, Look at me.

There seem to be a conflict, please backport manually. Here are approximate instructions:

  1. Checkout backport branch and update it.
git checkout 2.0.x
git pull
  1. Cherry pick the first parent branch of the this PR on top of the older branch:
git cherry-pick -x -m1 905fe6b0b90f5de334abb1585e15d987935a592e
  1. You will likely have some merge/cherry-pick conflict here, fix them and commit:
git commit -am "Backport PR #53672: BUG/CoW: is_range_indexer can't handle very large arrays"
  1. Push to a named branch:
git push YOURFORK 2.0.x:auto-backport-of-pr-53672-on-2.0.x
  1. Create a PR against branch 2.0.x, I would have named this PR:

"Backport PR #53672 on branch 2.0.x (BUG/CoW: is_range_indexer can't handle very large arrays)"

And apply the correct labels and milestones.

Congratulations — you did some good work! Hopefully your backport PR will be tested by the continuous integration and merged soon!

Remember to remove the Still Needs Manual Backport label once the PR gets merged.

If these instructions are inaccurate, feel free to suggest an improvement.

lithomas1 added a commit to lithomas1/pandas that referenced this pull request Jun 15, 2023
…53672)

* BUG: is_range_indexer can't handle very large arrays

* fix test on 32-bit

(cherry picked from commit 905fe6b)
lithomas1 added a commit that referenced this pull request Jun 15, 2023
…andle very large arrays) (#53691)

BUG/CoW: is_range_indexer can't handle very large arrays (#53672)

* BUG: is_range_indexer can't handle very large arrays

* fix test on 32-bit

(cherry picked from commit 905fe6b)
mroeschke added a commit that referenced this pull request Jun 21, 2023
* CI: Build pandas even if doctests fail

* BUG: groupby sum turning `inf+inf` and `(-inf)+(-inf)` into `nan` (#53623)

* DEPR: method, limit in NDFrame.replace (#53492)

* DEPR: method, limit in NDFrame.replace

* update test, docs

* suppress doctest warning

* doctests

* PERF: Series.str.get_dummies for ArrowDtype(pa.string()) (#53655)

* PERF: Series.str.get_dummies for ArrowDtype(pa.string())

* whatsnew

* typing

* TYP: core.missing (#53625)

* CI: Attempt to fix wheel builds (#53670)

* DOC: Fixing EX01 - Added examples (#53647)

* SeriesGroupBy.fillna example added

* Added examples

* Corrected failing test for timedelta.total_seconds

* Corrected fillna example

* CI/TST: Mark test_to_read_gcs as single_cpu (#53677)

* BUG/CoW: is_range_indexer can't handle very large arrays (#53672)

* BUG: is_range_indexer can't handle very large arrays

* fix test on 32-bit

* TST: Use more pytest fixtures

---------

Co-authored-by: Yao Xiao <[email protected]>
Co-authored-by: jbrockmendel <[email protected]>
Co-authored-by: Luke Manley <[email protected]>
Co-authored-by: Thomas Li <[email protected]>
Co-authored-by: Dea María Léon <[email protected]>
canthonyscott pushed a commit to canthonyscott/pandas-anthony that referenced this pull request Jun 23, 2023
…53672)

* BUG: is_range_indexer can't handle very large arrays

* fix test on 32-bit
canthonyscott pushed a commit to canthonyscott/pandas-anthony that referenced this pull request Jun 23, 2023
* CI: Build pandas even if doctests fail

* BUG: groupby sum turning `inf+inf` and `(-inf)+(-inf)` into `nan` (pandas-dev#53623)

* DEPR: method, limit in NDFrame.replace (pandas-dev#53492)

* DEPR: method, limit in NDFrame.replace

* update test, docs

* suppress doctest warning

* doctests

* PERF: Series.str.get_dummies for ArrowDtype(pa.string()) (pandas-dev#53655)

* PERF: Series.str.get_dummies for ArrowDtype(pa.string())

* whatsnew

* typing

* TYP: core.missing (pandas-dev#53625)

* CI: Attempt to fix wheel builds (pandas-dev#53670)

* DOC: Fixing EX01 - Added examples (pandas-dev#53647)

* SeriesGroupBy.fillna example added

* Added examples

* Corrected failing test for timedelta.total_seconds

* Corrected fillna example

* CI/TST: Mark test_to_read_gcs as single_cpu (pandas-dev#53677)

* BUG/CoW: is_range_indexer can't handle very large arrays (pandas-dev#53672)

* BUG: is_range_indexer can't handle very large arrays

* fix test on 32-bit

* TST: Use more pytest fixtures

---------

Co-authored-by: Yao Xiao <[email protected]>
Co-authored-by: jbrockmendel <[email protected]>
Co-authored-by: Luke Manley <[email protected]>
Co-authored-by: Thomas Li <[email protected]>
Co-authored-by: Dea María Léon <[email protected]>
Daquisu pushed a commit to Daquisu/pandas that referenced this pull request Jul 8, 2023
…53672)

* BUG: is_range_indexer can't handle very large arrays

* fix test on 32-bit
Daquisu pushed a commit to Daquisu/pandas that referenced this pull request Jul 8, 2023
* CI: Build pandas even if doctests fail

* BUG: groupby sum turning `inf+inf` and `(-inf)+(-inf)` into `nan` (pandas-dev#53623)

* DEPR: method, limit in NDFrame.replace (pandas-dev#53492)

* DEPR: method, limit in NDFrame.replace

* update test, docs

* suppress doctest warning

* doctests

* PERF: Series.str.get_dummies for ArrowDtype(pa.string()) (pandas-dev#53655)

* PERF: Series.str.get_dummies for ArrowDtype(pa.string())

* whatsnew

* typing

* TYP: core.missing (pandas-dev#53625)

* CI: Attempt to fix wheel builds (pandas-dev#53670)

* DOC: Fixing EX01 - Added examples (pandas-dev#53647)

* SeriesGroupBy.fillna example added

* Added examples

* Corrected failing test for timedelta.total_seconds

* Corrected fillna example

* CI/TST: Mark test_to_read_gcs as single_cpu (pandas-dev#53677)

* BUG/CoW: is_range_indexer can't handle very large arrays (pandas-dev#53672)

* BUG: is_range_indexer can't handle very large arrays

* fix test on 32-bit

* TST: Use more pytest fixtures

---------

Co-authored-by: Yao Xiao <[email protected]>
Co-authored-by: jbrockmendel <[email protected]>
Co-authored-by: Luke Manley <[email protected]>
Co-authored-by: Thomas Li <[email protected]>
Co-authored-by: Dea María Léon <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Copy / view semantics Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: CoW OverflowError: value too large to convert to int
4 participants