Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NEP: add NEP 56 on array API standard support in main namespace #25542

Merged
merged 16 commits into from
Feb 28, 2024

Conversation

rgommers
Copy link
Member

@rgommers rgommers commented Jan 5, 2024

For NumPy 2.0. xref gh-25076 as the main tracking issue for this topic.

For NumPy 2.0. xref numpygh-25076 as the main tracking issue for this topic.

[skip actions] [skip azp] [skip cirrus]
@rgommers rgommers added this to the 2.0.0 release milestone Jan 5, 2024
Copy link
Contributor

@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! A few small comments; the only one that needs some thought is the one about the meaning of "array scalar"

doc/neps/nep-0056-array-api-main-namespace.rst Outdated Show resolved Hide resolved
doc/neps/nep-0056-array-api-main-namespace.rst Outdated Show resolved Hide resolved
doc/neps/nep-0056-array-api-main-namespace.rst Outdated Show resolved Hide resolved
Copy link
Contributor

@lucascolley lucascolley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great to see, nice work! A few comments but mostly minor editorial bits.

doc/neps/nep-0056-array-api-main-namespace.rst Outdated Show resolved Hide resolved
doc/neps/nep-0056-array-api-main-namespace.rst Outdated Show resolved Hide resolved
doc/neps/nep-0056-array-api-main-namespace.rst Outdated Show resolved Hide resolved
doc/neps/nep-0056-array-api-main-namespace.rst Outdated Show resolved Hide resolved
doc/neps/nep-0056-array-api-main-namespace.rst Outdated Show resolved Hide resolved
doc/neps/nep-0056-array-api-main-namespace.rst Outdated Show resolved Hide resolved
doc/neps/nep-0056-array-api-main-namespace.rst Outdated Show resolved Hide resolved
doc/neps/nep-0056-array-api-main-namespace.rst Outdated Show resolved Hide resolved
doc/neps/nep-0056-array-api-main-namespace.rst Outdated Show resolved Hide resolved

The ``copy`` keyword in ``astype`` will stick to its current meaning, because
"never copy" when asking for a cast to a different dtype doesn't quite make
sense.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

xref #25168 (comment)

It would be nice to get a little bit more discussion about how downstream should adapt to this change. Just replace np.array(..., copy=False) with np.asarray()? If so, do we communicate that in the ValueError that gets generated?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is a at least almost a large impact change: It should have its own sub section in the backwards compatibility section. (It is currently completely missing there).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume you're commenting on asarray, not astype. I'd say the migration guide should say: do nothing (because that's more likely than not what the author intended - remember we have a semantic ambiguity here right now), and if you get an exception, inspect the code and if you indeed intended "only if needed", change it to copy=None.

I also don't think it's a large change. The default is do nothing, and in some cases the exception message will then tell you what to change.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is fair, although I still think it would be good to explicitly mention the BC change of having to replace array(..., copy=False) with asarray(...). You can point out that it isn't super common because asarray() makes more sense and that the it is turns into an easy to fix error.

I am actually not convinced that it's more likely than not what the author intended (unless you found that by code search). Many cases are probably just better served by asarray() but tried to micro-optimize it away because it had extra overhead until a few years ago.
Also there is the caveat that if we go e.g. via __array__(), copy=False will fail even when it shouldn't conceptually.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've used np.array(..., copy=False, subok=True) quite a lot, when indeed it was faster than np.asanyarray; I don't think I have ever had a cases where I'd want an error if the input was not an array...

That said, I agree with the logic of False/True/None and don't mind to change my code/astropy (though I wish copy=None was backward compatible).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also there is the caveat that if we go e.g. via __array__(), copy=False will fail even when it shouldn't conceptually.

This is more problematic, indeed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick comments while I'm sorting through the text and gh-25168:

  1. there's a bit of a circular thing going on here, where most of the problem seems to be array (a non-recommended / don't care function), and the only reason we even want to update array is for consistency with asarray (the important function here)
  2. I'm not 100% sure I understand the __array__ comment. I haven't thought about that one much before, but I think you mean something here like "we cannot know if a call to __array__ is going to cause this non-numpy object to copy data, so we must not call __array__ if copy=False"? If so, I'm not sure that's necessarily a long-term problem; a logical solution would be to add copy=None keyword to __array__ so that we can know. If numpy cannot know it preferably shouldn't be guessing.

Copy link
Member

@seberg seberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some (partially opinionated) comments. Overall looks nice.

I would prefer somewhat different emphasis at points (as you can probably read out of the comments), but that is opinions.

However, the backwards compatibility section needs to be much clearer about the larger changes/harder to grasp changes.
That might be a small note in the end, the main issue is that we need to try it before I can voice a real opinion. For now I am worried by the change.

doc/neps/nep-0056-array-api-main-namespace.rst Outdated Show resolved Hide resolved
other key libraries. So those alternatives are not appealing. Given the amount
of interest in this topic, doing nothing also is not appealing. The "hidden
namespace" option would be a smaller change to this proposal. We prefer not to
do that since it leads to duplicate implementations staying around, a more
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite agree, but fine this is an opinion. We would only duplicate a tiny amount of divergence, in most cases these would be very thin wrappers.

(We might need such a hidden namespace in any case to support versioning of the Array API! Or any other odd future difficult... For that matter the special function additions definitely need something?!)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not impossible we may need it in the future, but it's not at all clear that we would ever need it. Versioning isn't needed as long as the added API surface and semantics in newer versions of the standard are a superset. Which should be true for the next version at least, and we should try hard to keep it that way. special extension isn't a given yet (it could be rejected, and certainly it isn't close to the finish line yet), and otherwise could probably be handled fairly easily in a module-level __getattr__.

libraries. Having support in the main namespace resolves this issue. Hence this
NEP supersedes NEP 47. The ``numpy.array_api`` module will be moved to a
standalone package, to facilitate easier updates not tied to a NumPy release
cycle.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not super important, but I think this discussion adds little except reasoning why NEP 47 is superseded. I don't think it is necessary at all. You could even just consider NEP 47 withdrawn and mostly ignore it in this one.

Otherwise, I would suggest to move this discussion to the alternatives section. The separate array-object discussion seems like the most interesting part, but NEP 47 is missing an explanation of how that could work, anyway.

doc/neps/nep-0056-array-api-main-namespace.rst Outdated Show resolved Hide resolved
doc/neps/nep-0056-array-api-main-namespace.rst Outdated Show resolved Hide resolved
doc/neps/nep-0056-array-api-main-namespace.rst Outdated Show resolved Hide resolved

The ``copy`` keyword in ``astype`` will stick to its current meaning, because
"never copy" when asking for a cast to a different dtype doesn't quite make
sense.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is a at least almost a large impact change: It should have its own sub section in the backwards compatibility section. (It is currently completely missing there).

@leofang
Copy link
Contributor

leofang commented Jan 10, 2024

cc:@kmaehashi for vis

@rgommers
Copy link
Member Author

I reworked for textual comments so far (thanks everyone!) and resolved those. All the ones that look like they might benefit from discussion I'll do in a next update and will leave them unresolved.

[skip actions] [skip azp] [skip cirrus]
[skip actions] [skip cirrus] [skip azp]
@ngoldbaum
Copy link
Member

Hi all, I realize that discussion is still ongoing, but since we're running short on time for the NumPy 2.0 RC, I wanted to unblock merging some of the remaining PRs for array API support that did not get any significant discussion so far.

Specifically, I'm planning on merging #25233 (adding device keyword arguments in array creation routines and device and to_device attributes to ndarray) and #25169 (adding a correction keyword argument to np.var and np.std) tomorrow. Please let me know if there are any comments relating to those changes for this NEP that haven't been lodged yet and I will hold off.

Copy link
Member

@seberg seberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates, looks pretty good. I simply have a different angle (such as seeing removing np.array_api as doing nothing).

There are still a few things that I think are not highlighted (around BC breaks) as much as I would like, but it's fine.
Other than that, I will just point out that I think the alternative isn't terrible in the mailing list: I don't have a strong enough opinion on new aliases/kwargs to worry, but if anyone really dislikes them, I also think that adding a custom namespace is so cheap that I wouldn't worry if anyone dislikes any of those (or a specific backwards compatibility break).

(I.e. to me most API changes in NumPy are great, but I would be also happy to see them as distinct, because most can be with a namespace. Maybe, that is misreading the NEP a bit though: It wants to argue for not doing that, and that is fine.)

doc/neps/nep-0056-array-api-main-namespace.rst Outdated Show resolved Hide resolved
doc/neps/nep-0056-array-api-main-namespace.rst Outdated Show resolved Hide resolved
[skip actions] [skip azp] [skip cirrus]
Copy link
Contributor

@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates! A few more comments in-line.

doc/neps/nep-0056-array-api-main-namespace.rst Outdated Show resolved Hide resolved
doc/neps/nep-0056-array-api-main-namespace.rst Outdated Show resolved Hide resolved

The ``copy`` keyword in ``astype`` will stick to its current meaning, because
"never copy" when asking for a cast to a different dtype doesn't quite make
sense.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've used np.array(..., copy=False, subok=True) quite a lot, when indeed it was faster than np.asanyarray; I don't think I have ever had a cases where I'd want an error if the input was not an array...

That said, I agree with the logic of False/True/None and don't mind to change my code/astropy (though I wish copy=None was backward compatible).


The ``copy`` keyword in ``astype`` will stick to its current meaning, because
"never copy" when asking for a cast to a different dtype doesn't quite make
sense.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also there is the caveat that if we go e.g. via __array__(), copy=False will fail even when it shouldn't conceptually.

This is more problematic, indeed.

Also a few smaller textual changes.

[skip actions] [skip cirrus] [skip azp]
@asmeurer
Copy link
Member

asmeurer commented Jan 24, 2024

(continuing a discussion that was started elsewhere)

I don't actually see what the issue with scalars is. They duck type as arrays, at least for all the purposes of the standard, so I don't see why they can't be considered as compliant. The text here implies that they aren't but doesn't actually explain why not. In particular here and here. The second note just seems wrong. The only reason that scalars don't work for "an array of the same data type as the indexed array" is if you don't consider scalars to be an array. But this argument is circular, because why isn't a scalar an array, for the purposes of the standard?

The only real "not array" thing about scalars is that they are immutable, but mutability is optional in the standard. That and the fact that they aren't called "arrays" in the NumPy docs, but e.g. pytorch doesn't call its arrays "arrays" either.

@asmeurer
Copy link
Member

And just to be clear, the standard doesn't require all arrays from a given library to have the same type. This is even discussed explicitly https://data-apis.org/array-api/latest/design_topics/static_typing.html

Also note that this standard does not require that input and output array types are the same (they’re expected to be defined in the same library though).

@mhvk
Copy link
Contributor

mhvk commented Jan 25, 2024

Good point. From a numpy user perspective, the main annoyance I have with scalars is that they share most but not all methods with arrays (e.g., they cannot be .viewd). But that's pretty irrelevant for the discussion here.

@rgommers
Copy link
Member Author

I don't actually see what the issue with scalars is. They duck type as arrays, at least for all the purposes of the standard

Good points. In principle yes, you are right - since it's not required that the returned 0-D array is of the same type as the input array and there's no isinstance check, duck typing should fulfill all the requirements here.

From a numpy user perspective, the main annoyance I have with scalars is that they share most but not all methods with arrays

Here's one such paper cut:

>>> x_0d.mT
...
ValueError: matrix transpose with ndim < 2 is undefined

>>> np.float32(3.5).mT
...
AttributeError: 'numpy.float32' object has no attribute 'mT'

The standard doesn't require exceptions of a particular type, so this is technically still fully compliant. It's just annoyances.

For this PR, I plan to rework the text to not state that scalars are non-compliant and an explanatory note saying that while there are minor differences, they duck type well enough to be considered the same as 0-D arrays from the perspective of the standard.

Addresses review feedback. Array scalars aren't actually incompatible
with the design of the standard.
As requested in PR review, better delineate the different types of
changes in the backwards compatibility section, and add assessments
of what the impact of changes is.

In addition, expect the description of the `copy=` keyword semantics
changes.

[skip cirrus] [skip actions] [skip azp]
@rgommers
Copy link
Member Author

I pushed two new commits. The first one changes how array scalars are described. The second one adds more detail to the backwards compatibility section (including impact assessment per category of change) and expands on the copy= keyword.

@rgommers
Copy link
Member Author

rgommers commented Feb 16, 2024

I'll note that pretty much all PRs implementing this NEP (except for the copy keyword one) have been merged for quite a while now, so the impact assessment is based on at least a month of usage by libraries and end users that test against NumPy main. Over the course of 2.0 development we've reverted several backwards incompatible changes that turned out to be disruptive; nothing that is in the backwards compatibility section now falls in that category I believe.

@ngoldbaum
Copy link
Member

There's still one outstanding PR: #25802 and it looks like discussion around the copy argument for asarray has settled down here. I'd like to merge the PR implementing it tomorrow. Let me know if you'd like me to hold off.

[skip azp] [skip cirrus] [skip actions]
@rgommers
Copy link
Member Author

I think it's time (or well overdue) to merge this PR with Draft status, so we have a rendered version. Discussion seems to have settled down and all comments are addressed. In case someone wants to return to a comment that they think wasn't addressed well enough yet or if we have a hiccup with one of the recent PRs implementing support for this NEP, we can always open a follow-up PR.

@charris charris merged commit 6e9b286 into numpy:main Feb 28, 2024
4 checks passed
@charris
Copy link
Member

charris commented Feb 28, 2024

Thanks Ralf and thanks to all who reviewed and offered comments.

@rgommers rgommers deleted the nep-56-arrayapi branch February 29, 2024 04:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants