Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: require [()] to be a no-op for all ranks? #833

Open
lucascolley opened this issue Aug 2, 2024 · 10 comments
Open

RFC: require [()] to be a no-op for all ranks? #833

lucascolley opened this issue Aug 2, 2024 · 10 comments
Labels
API change Changes to existing functions or objects in the API. Needs Discussion Needs further discussion. RFC Request for comments. Feature requests and proposed changes. topic: Indexing Array indexing.

Comments

@lucascolley
Copy link
Contributor

lucascolley commented Aug 2, 2024

The standard says:

Providing an empty tuple or an ellipsis to an array of rank 0 must result in an array of the same rank (i.e., if A has rank 0, A == A[()] and A == A[...])

Why is this restricted to rank 0? Sure, a reason for the inclusion is probably to be explicit about the divergence from NumPy, but am I missing something about rank 0 being special from the standard's perspective? Since [()] must act like [...] in the 0D case, should it not always act like that, for consistency?

@seberg
Copy link
Contributor

seberg commented Aug 3, 2024

The reason is the later bullet point:

Except in the case of providing a single ellipsis (e.g., A[2:10, ...] or A[1:, ..., 2:5]), the number of provided single-axis indexing expressions (excluding None) should equal N.

Where "should" is "must" for users and no recommendation at all for implementors, IMO.

@lucascolley
Copy link
Contributor Author

Right, but then my question is why does it not read:

"Except in the case of providing a single ellipsis or an empty tuple (e.g. ...) ..."

Do any of the array libraries have different behaviour for an empty tuple?

@seberg
Copy link
Contributor

seberg commented Aug 3, 2024

OK, I am not sure I understand the point. I would agree that there is nothing special at all about 0-D. I.e. to me the whole bullet point only serves to clarify the empty tuple case because intuition seems to sometimees lead astray for such cases.
In that sense, it would just as well make sense as a clarification to the tuple of ints bullet.

I suspect you are right, it something could be clearer, but I am not sure what :).

@lucascolley
Copy link
Contributor Author

lucascolley commented Aug 3, 2024

This arises from SciPy where (for BC) we still need to get NumPy scalars in array API compatible code, in both the production code and the tests. The only library that seems to complain about some_nonzero_rank_array[()] is array-api-strict. Of course one can just guard on ndim (e.g. h-vetinari/scipy#2 (comment)), but I wonder whether a bit of extra specification here would be useful to reduce complexity in user code (while NumPy scalars are still a thing).

Unless I am missing a reasonable alternative interpretation of [()] for higher dims (I suppose the argument is that it is nonsense for higher dims).

@seberg
Copy link
Contributor

seberg commented Aug 3, 2024

OK, then the question is if one should abandon disallowing partial indices, for the sake of saving you the idx = *idx, ... code line in this case. I do eexpect you have to abandon it for all cases, though. I doubt doing it only for 0-D is very helpful.
That has the downside of basically saying that the SymPy way of iterating elements becomes odder (you can still distinguish arr[0] and arr[0,] but it is less clear maybe). OTOH, I think in other places we may have shifted towards the narrow definition as well out of pragmatism in practice.

@lucascolley
Copy link
Contributor Author

Yeah, perhaps the practical solution is to add a as_scalar_if_np_0d helper to array-api-compat and be clear that [()] is only defined for 0-D, while [...] is always okay. I was just thinking that if every library already agrees that a no-op is the sensible result for [()], it might be worth putting in the standard.

@asmeurer
Copy link
Member

asmeurer commented Aug 7, 2024

FWIW, I was never a fan of the "all axes should be indexed" rule and feel that it leads to unnecessary verbosity for common indexing operations.

@kgryte
Copy link
Contributor

kgryte commented Oct 31, 2024

I was just thinking that if every library already agrees that a no-op is the sensible result for [()], it might be worth putting in the standard.

Simply because there may be existing alignment across two or more libraries does not mean that such behavior should be standardized. () and ... are not equivalent more generally, even if they happen to be equivalent for the zero-dimensional case.

As stated in the note on multi-axis indexing, x[1,2,3] is syntactic sugar for x[(1,2,3)]. As the standard requires that all axes be indexed in an indexing expression, x[()] for a non-zero dimensional array would violate that principle.

The standard allows wiggle room for libraries to implement NumPy-style indexing where trailing dimensions are omitted (i.e., "... should equal N"), but we should not make that required behavior, as current or future libraries could very reasonably opt to treat the omission of a trailing dimension as a user error (e.g., if x is 4D and a user does x[1,2,3], did the user intend to leave off the fourth dimension such that x[1,2,3,:] or was that a typo?). For portability, except in the case of ..., all axes should be explicitly provided.

@kgryte kgryte added the Needs Discussion Needs further discussion. label Oct 31, 2024
@kgryte kgryte changed the title indexing: require [()] to be a no-op for all ranks? RFC: require [()] to be a no-op for all ranks? Oct 31, 2024
@kgryte kgryte added RFC Request for comments. Feature requests and proposed changes. API change Changes to existing functions or objects in the API. labels Oct 31, 2024
@lucascolley
Copy link
Contributor Author

Thanks @kgryte. I feel a lot less strongly about this now, so happy for the issue to be closed. But maybe it should be kept open to discuss Aaron's point above.

@seberg
Copy link
Contributor

seberg commented Oct 31, 2024

FWIW, I like seeing the , ... or , :. I feel it is helpful verbosity for minimal typing in most places (but ... is strangely little used with : often being used instead).
Neither do I feel strongly about not requiring it, from that perspective you can also see it as a style guideline.

One argument for it was so that a library that does (sympy matrices are the only ones I know in Python world):

for element in arr_nd:
    # iterate all elements no matter the dimensions

which I like conceptually; would be able to also do (matlab-style?) arr2d[flat_index] without confusing that arr2d[dim1_index,] could be different.

But, nobody really does so in Python world right now probably. And arguably, if anything one should probably force users to spell out: for element in arr.flat/arr.iter(axis=None) or for vector in arr.iter(axis=0).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API change Changes to existing functions or objects in the API. Needs Discussion Needs further discussion. RFC Request for comments. Feature requests and proposed changes. topic: Indexing Array indexing.
Projects
None yet
Development

No branches or pull requests

5 participants