-
-
Notifications
You must be signed in to change notification settings - Fork 10.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NEP: add NEP 56 on array API standard support in main namespace #25542
Conversation
For NumPy 2.0. xref numpygh-25076 as the main tracking issue for this topic. [skip actions] [skip azp] [skip cirrus]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! A few small comments; the only one that needs some thought is the one about the meaning of "array scalar"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great to see, nice work! A few comments but mostly minor editorial bits.
|
||
The ``copy`` keyword in ``astype`` will stick to its current meaning, because | ||
"never copy" when asking for a cast to a different dtype doesn't quite make | ||
sense. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
xref #25168 (comment)
It would be nice to get a little bit more discussion about how downstream should adapt to this change. Just replace np.array(..., copy=False)
with np.asarray()
? If so, do we communicate that in the ValueError
that gets generated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is a at least almost a large impact change: It should have its own sub section in the backwards compatibility section. (It is currently completely missing there).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume you're commenting on asarray
, not astype
. I'd say the migration guide should say: do nothing (because that's more likely than not what the author intended - remember we have a semantic ambiguity here right now), and if you get an exception, inspect the code and if you indeed intended "only if needed", change it to copy=None
.
I also don't think it's a large change. The default is do nothing, and in some cases the exception message will then tell you what to change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is fair, although I still think it would be good to explicitly mention the BC change of having to replace array(..., copy=False)
with asarray(...)
. You can point out that it isn't super common because asarray()
makes more sense and that the it is turns into an easy to fix error.
I am actually not convinced that it's more likely than not what the author intended (unless you found that by code search). Many cases are probably just better served by asarray()
but tried to micro-optimize it away because it had extra overhead until a few years ago.
Also there is the caveat that if we go e.g. via __array__()
, copy=False
will fail even when it shouldn't conceptually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've used np.array(..., copy=False, subok=True)
quite a lot, when indeed it was faster than np.asanyarray
; I don't think I have ever had a cases where I'd want an error if the input was not an array...
That said, I agree with the logic of False/True/None
and don't mind to change my code/astropy (though I wish copy=None
was backward compatible).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also there is the caveat that if we go e.g. via
__array__()
,copy=False
will fail even when it shouldn't conceptually.
This is more problematic, indeed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quick comments while I'm sorting through the text and gh-25168:
- there's a bit of a circular thing going on here, where most of the problem seems to be
array
(a non-recommended / don't care function), and the only reason we even want to updatearray
is for consistency withasarray
(the important function here) - I'm not 100% sure I understand the
__array__
comment. I haven't thought about that one much before, but I think you mean something here like "we cannot know if a call to__array__
is going to cause this non-numpy object to copy data, so we must not call__array__
ifcopy=False
"? If so, I'm not sure that's necessarily a long-term problem; a logical solution would be to addcopy=None
keyword to__array__
so that we can know. If numpy cannot know it preferably shouldn't be guessing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some (partially opinionated) comments. Overall looks nice.
I would prefer somewhat different emphasis at points (as you can probably read out of the comments), but that is opinions.
However, the backwards compatibility section needs to be much clearer about the larger changes/harder to grasp changes.
That might be a small note in the end, the main issue is that we need to try it before I can voice a real opinion. For now I am worried by the change.
other key libraries. So those alternatives are not appealing. Given the amount | ||
of interest in this topic, doing nothing also is not appealing. The "hidden | ||
namespace" option would be a smaller change to this proposal. We prefer not to | ||
do that since it leads to duplicate implementations staying around, a more |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite agree, but fine this is an opinion. We would only duplicate a tiny amount of divergence, in most cases these would be very thin wrappers.
(We might need such a hidden namespace in any case to support versioning of the Array API! Or any other odd future difficult... For that matter the special function additions definitely need something?!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not impossible we may need it in the future, but it's not at all clear that we would ever need it. Versioning isn't needed as long as the added API surface and semantics in newer versions of the standard are a superset. Which should be true for the next version at least, and we should try hard to keep it that way. special
extension isn't a given yet (it could be rejected, and certainly it isn't close to the finish line yet), and otherwise could probably be handled fairly easily in a module-level __getattr__
.
libraries. Having support in the main namespace resolves this issue. Hence this | ||
NEP supersedes NEP 47. The ``numpy.array_api`` module will be moved to a | ||
standalone package, to facilitate easier updates not tied to a NumPy release | ||
cycle. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not super important, but I think this discussion adds little except reasoning why NEP 47 is superseded. I don't think it is necessary at all. You could even just consider NEP 47 withdrawn and mostly ignore it in this one.
Otherwise, I would suggest to move this discussion to the alternatives section. The separate array-object discussion seems like the most interesting part, but NEP 47 is missing an explanation of how that could work, anyway.
|
||
The ``copy`` keyword in ``astype`` will stick to its current meaning, because | ||
"never copy" when asking for a cast to a different dtype doesn't quite make | ||
sense. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is a at least almost a large impact change: It should have its own sub section in the backwards compatibility section. (It is currently completely missing there).
cc:@kmaehashi for vis |
[skip actions] [skip cirrus] [skip azp]
I reworked for textual comments so far (thanks everyone!) and resolved those. All the ones that look like they might benefit from discussion I'll do in a next update and will leave them unresolved. |
[skip actions] [skip azp] [skip cirrus]
[skip actions] [skip cirrus] [skip azp]
Hi all, I realize that discussion is still ongoing, but since we're running short on time for the NumPy 2.0 RC, I wanted to unblock merging some of the remaining PRs for array API support that did not get any significant discussion so far. Specifically, I'm planning on merging #25233 (adding |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates, looks pretty good. I simply have a different angle (such as seeing removing np.array_api
as doing nothing).
There are still a few things that I think are not highlighted (around BC breaks) as much as I would like, but it's fine.
Other than that, I will just point out that I think the alternative isn't terrible in the mailing list: I don't have a strong enough opinion on new aliases/kwargs to worry, but if anyone really dislikes them, I also think that adding a custom namespace is so cheap that I wouldn't worry if anyone dislikes any of those (or a specific backwards compatibility break).
(I.e. to me most API changes in NumPy are great, but I would be also happy to see them as distinct, because most can be with a namespace. Maybe, that is misreading the NEP a bit though: It wants to argue for not doing that, and that is fine.)
[skip actions] [skip azp] [skip cirrus]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates! A few more comments in-line.
|
||
The ``copy`` keyword in ``astype`` will stick to its current meaning, because | ||
"never copy" when asking for a cast to a different dtype doesn't quite make | ||
sense. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've used np.array(..., copy=False, subok=True)
quite a lot, when indeed it was faster than np.asanyarray
; I don't think I have ever had a cases where I'd want an error if the input was not an array...
That said, I agree with the logic of False/True/None
and don't mind to change my code/astropy (though I wish copy=None
was backward compatible).
|
||
The ``copy`` keyword in ``astype`` will stick to its current meaning, because | ||
"never copy" when asking for a cast to a different dtype doesn't quite make | ||
sense. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also there is the caveat that if we go e.g. via
__array__()
,copy=False
will fail even when it shouldn't conceptually.
This is more problematic, indeed.
Also a few smaller textual changes. [skip actions] [skip cirrus] [skip azp]
(continuing a discussion that was started elsewhere) I don't actually see what the issue with scalars is. They duck type as arrays, at least for all the purposes of the standard, so I don't see why they can't be considered as compliant. The text here implies that they aren't but doesn't actually explain why not. In particular here and here. The second note just seems wrong. The only reason that scalars don't work for "an array of the same data type as the indexed array" is if you don't consider scalars to be an array. But this argument is circular, because why isn't a scalar an array, for the purposes of the standard? The only real "not array" thing about scalars is that they are immutable, but mutability is optional in the standard. That and the fact that they aren't called "arrays" in the NumPy docs, but e.g. pytorch doesn't call its arrays "arrays" either. |
And just to be clear, the standard doesn't require all arrays from a given library to have the same type. This is even discussed explicitly https://data-apis.org/array-api/latest/design_topics/static_typing.html
|
Good point. From a numpy user perspective, the main annoyance I have with scalars is that they share most but not all methods with arrays (e.g., they cannot be |
Good points. In principle yes, you are right - since it's not required that the returned 0-D array is of the same type as the input array and there's no
Here's one such paper cut: >>> x_0d.mT
...
ValueError: matrix transpose with ndim < 2 is undefined
>>> np.float32(3.5).mT
...
AttributeError: 'numpy.float32' object has no attribute 'mT' The standard doesn't require exceptions of a particular type, so this is technically still fully compliant. It's just annoyances. For this PR, I plan to rework the text to not state that scalars are non-compliant and an explanatory note saying that while there are minor differences, they duck type well enough to be considered the same as 0-D arrays from the perspective of the standard. |
Addresses review feedback. Array scalars aren't actually incompatible with the design of the standard.
As requested in PR review, better delineate the different types of changes in the backwards compatibility section, and add assessments of what the impact of changes is. In addition, expect the description of the `copy=` keyword semantics changes. [skip cirrus] [skip actions] [skip azp]
I pushed two new commits. The first one changes how array scalars are described. The second one adds more detail to the backwards compatibility section (including impact assessment per category of change) and expands on the |
I'll note that pretty much all PRs implementing this NEP (except for the copy keyword one) have been merged for quite a while now, so the impact assessment is based on at least a month of usage by libraries and end users that test against NumPy main. Over the course of 2.0 development we've reverted several backwards incompatible changes that turned out to be disruptive; nothing that is in the backwards compatibility section now falls in that category I believe. |
[skip actions] [skip cirrus] [skip azp]
There's still one outstanding PR: #25802 and it looks like discussion around the |
[skip azp] [skip cirrus] [skip actions]
I think it's time (or well overdue) to merge this PR with Draft status, so we have a rendered version. Discussion seems to have settled down and all comments are addressed. In case someone wants to return to a comment that they think wasn't addressed well enough yet or if we have a hiccup with one of the recent PRs implementing support for this NEP, we can always open a follow-up PR. |
Thanks Ralf and thanks to all who reviewed and offered comments. |
For NumPy 2.0. xref gh-25076 as the main tracking issue for this topic.