-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-41098: [Python] Add copy keyword in Array.__array__ for numpy 2.0+ compatibility #41071
Changes from 2 commits
2f1b8cc
686865e
2bd63a9
5b87fd3
0e2b402
4e2bc7d
d980cef
4b630f6
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -525,7 +525,8 @@ cdef class ChunkedArray(_PandasConvertible): | |
|
||
return values | ||
|
||
def __array__(self, dtype=None): | ||
def __array__(self, dtype=None, copy=None): | ||
# copy keyword can be ignored because to_numpy() already returns a copy | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Well, There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah yes, that comment is from before I decided to handle the copy=False case for Array. Indeed we should just raise an error here that a no-copy is not possible. Updated. |
||
values = self.to_numpy() | ||
if dtype is None: | ||
return values | ||
|
@@ -1533,7 +1534,8 @@ cdef class _Tabular(_PandasConvertible): | |
raise TypeError(f"Do not call {self.__class__.__name__}'s constructor directly, use " | ||
f"one of the `{self.__class__.__name__}.from_*` functions instead.") | ||
|
||
def __array__(self, dtype=None): | ||
def __array__(self, dtype=None, copy=None): | ||
# copy keyword can be ignored as this always already returns a copy | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same here? |
||
column_arrays = [ | ||
np.asarray(self.column(i), dtype=dtype) for i in range(self.num_columns) | ||
] | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we raise an exception for now? Also, can you open a GH issue for it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't do that I think, because then when numpy would start passing that down in let's say numpy 2.1,
np.array(obj)
would start erroring, while those never errored before (and for some cases this might incorrectly not return a copy (although still marked as read-only), many cases already do copy anyways)Although to be honest, I don't really know what the strategy of numpy will be to enable this keyword. If they would just start to stop copying the result of
__array__
, that would cause such changes in many libraries with__array__
s)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though ideally, we just directly implement
copy=True
as well (it was just lest critical for numpy 2.0 as it is not yet used, and also a bit harder to implement)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On second thought, it might be relatively easy to determine if
to_numpy()
returned a copy or not? I was first thinking we would have to mimic the logic based on the type ("if primitive and no nulls, then it is zero copy"), but we might be able to check if the resulting ndarray has a base pointing to a pyarrow object?Although the simple logic of numeric+no nulls might be easier in practice.