Being clearer about use of RFC 2119 #796

asmeurer · 2024-04-22T19:01:55Z

We use RFC 2119 for words like "should" and "must" in the standard. But many people are confused by this and think that "should" restrictions are actually "must" restrictions. This is most common for dtype restrictions, but it appears in other places in the standard as well.

We need to be clearer about this somehow. Some suggestions

Capitalize (as is the common convention) or bold these words (Capitalizing RFC 2119 words like SHOULD #397)
Add a section somewhere making it clear that we are following the RFC. There is a link to the RFC at the bottom of https://data-apis.org/array-api/latest/purpose_and_scope.html but that's it. The RFC itself says:

Authors who follow these guidelines
should incorporate this phrase near the beginning of their document:

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and
"OPTIONAL" in this document are to be interpreted as described in
RFC 2119.
The RFC also lists "required", "shall", "recommended", "may", and "optional". We should review our use of those terms to make sure they match the RFC.
We might be using "should" in places where we should be using "may" or "optional". From the RFC: SHOULD "means that there may exist valid reasons in particular circumstances to ignore a particular item, but the full implications must be understood and carefully weighed before choosing a different course", but MAY "means that an item is truly optional". Things like data type restrictions are probably correct in using SHOULD, but things like indexing restrictions, like out-of-bounds slices" perhaps ought to use MAY, since the only reason they are limited is so the the standard can support libraries that can't implement them.
We should consider if there are other ways things could be made clearer, for instance, explicitly stating "other behavior is unspecified".

Micky774 · 2024-04-23T13:24:37Z

Add a section somewhere making it clear that we are following the RFC.

IMO this is probably the most impactful change that could be made immediately. I failed to discover the reference at the bottom of the document, and that led to me making a series of not-quite-right interpretations 😅

Other than that, I think clarifying changelog entries a bit would help, e.g. for the v2023.12 API, one of the entries is:

fft.fft: require the input array to have a complex-valued floating-point data type and require that the output array have the same data type as the input array

The term "require" implies less freedom than the term "should" is meant to represent, and the beginning of the entry would perhaps be more precise as "clarify that the input array should have a complex..."

fancidev · 2024-10-22T09:40:56Z

I think systematic usage of the normative words is a great initiative and would make the spec more precise and hence easier to follow and disambiguate.

There is one thing I’m unsure about and would like to ask. If I understand correctly, the normative words, MUST and SHOULD, impose requirement on the (library) implementation. If the implementation doesn’t follow the MUST items, it is deemed non-conformant. (The SHOULD items are recommendations and hence technically irrelevant for conformance.)

However, in the Array API spec, the words are often applied to impose preconditions to an operation. Those preconditions are controlled by the user (calling code) rather than by the library.

For example, for maximum:

x1 (array) – first input array. Should have a real-valued data type.

x2 (array) – second input array. Must be compatible with x1. Should have a real-valued data type.

What is the implementation supposed to do if user calls the function with x1 not a real-valued type? x2 not a real-valued type? x2 not compatible with x1 in shape? These seem out of the scope of RFC 2119.

Therefore, I’d suggest to add a paragraph, possibly following the reference to RFC 2119, to clarify the expected behavior of a conformant implementation to handle precondition violation when the precondition is qualified with MUST, SHOULD, or unqualified (which I interpret to be an implicit MUST). For example, it could be something like the following:

If a precondition marked with SHOULD is violated, the behavior is implementation-defined. In particular, the implementation MUST execute predictably if such a precondition is violated.

If a precondition marked with MUST or not marked with any imperative word is violated, the behavior is undefined. For example, the implementation MAY assume such precondition to hold and proceed execution. The calling code is responsible for guaranteeing the precondition.

Clarifying the handling of precondition violation in one place might also save the spec of a lot of “implementation defined” notes for inputs that violate required/recommended preconditions.

seberg · 2024-10-22T10:42:59Z

For example, the implementation MAY assume such precondition to hold and proceed execution

That sounds ovrely strong. I think one should only ever talk about implementation defined. Should/must could maybe inform users about "this is crazy" (e.g. order of complex numbers) vs. "implementations differ in promotion but it's probably the users problem to know about it".

So, I am happy to add such a thing, but should/must are unclear for users anyway. The best thing might be to just say A real-real valued data type without should/must, making it clear that anything else is implementation defined.

asmeurer · 2024-10-22T18:04:05Z

The idea behind those sentences is that a conformant implementation must implement support for those data types. Additional data types can also be supported, but the "should" implies that the recommendation is for them not to be.

seberg · 2024-10-23T08:12:06Z

That just reinforces that I think it would be clearer to only mention what (must be guaranteed to be) accepted; omitting any must/should.

Otherwise, it is very unclear if the "must/should" is there for the array API user or the array object provider/implementer. That can be spelled out of course, it just seems unnecessary to me.

EDIT: That is OK for "must", if we say that violating a "must" means implementation defined behavior for the user of the API (which makes sense). Because, with that, the "must" actually applies to both ;).

fancidev · 2024-10-23T10:02:51Z

Yeah I was thinking of what’s the difference between “must” and “should” when they are applied to a precondition that a user “must” or “should” guarantee.

Intuitively “must” tells the user don’t trying something else, or the function could bring you any result and it’s probably not what you expect. “Should” tells the user if you don’t do that, you must consult the particular array backend for its behavior — the Array API doesn’t define that.

What does this mean for the (library) implementation? I’d think it means “must” violation needn’t be explicitly handled , and the function would just return what the code leads it to. (This is quite common given Python’s duck typing system.) If “Should” is to be anything different from “must”, then I’d imagine “should” requires the implementation to handle inputs that “shouldn’t” have occurred, e.g. by raising an exception.

I’d also like to try make a distinction between “undefined” and “implementation-defined”. By “undefined” I mean there is no specification of the expected behavior, neither at the Array API level nor at the implementation level. The actual behavior is just where the code happens to lead us; it is not predictable without looking at the code. “Implementation-defined” means there exists a prescribed behavior, just that it’s not prescribed by the Array API but prescribed by the particular array backend. The code is supposed to follow this prescription (specification).

seberg · 2024-10-23T10:19:36Z

I am not convinced that distinguishing "implementation defined" and "undefined" is useful for anyone here, at least in the sens that it would be too subtle.
(A Python library should not really have undefined behavior, it should either error or do something reasonable within it's context.)

You are right, there are things that if violated must lead to an error, etc. Those should be documented in a positive way. Like:

Exceptions
----------
Exception :
  When the shape of A and B does not broadcast.

(of course that is one that shouldn't be repeated everywhere). And I think that is how we have it everywhere.

fancidev · 2024-10-23T10:32:04Z

I agree it could be too subtle to distinguish undefined from implementation-defined in the standard text. After all as long as there’s no segfault (and there isn’t) no one cares :)

And I think documenting exceptions is probably not a good idea — it can never be exhaustive, and to be honest I doubt typical Python code is at the level of rigor to need to catch exceptions other than maybe except: pass.

asmeurer · 2024-10-23T19:07:12Z

I think it's fair that it text in argument specifications could be interpreted as applying to the user and not to the implementer. We could probably just add a note somewhere that all RFC 2119 words in the standard applies to implementers. Or maybe it would also be possible to slightly reword things to something like "x1 should accept arrays with real valued data types" to make it clearer that the text is referring to what the function should accept.

rgommers added the Narrative Content Narrative documentation content. label Apr 23, 2024

asmeurer linked a pull request Apr 30, 2024 that will close this issue

Making RFC 2119 usage clearer #801

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Being clearer about use of RFC 2119 #796

Being clearer about use of RFC 2119 #796

asmeurer commented Apr 22, 2024

Micky774 commented Apr 23, 2024 •

edited

Loading

fancidev commented Oct 22, 2024 •

edited

Loading

seberg commented Oct 22, 2024

asmeurer commented Oct 22, 2024

seberg commented Oct 23, 2024 •

edited

Loading

fancidev commented Oct 23, 2024

seberg commented Oct 23, 2024

fancidev commented Oct 23, 2024 •

edited

Loading

asmeurer commented Oct 23, 2024

Being clearer about use of RFC 2119 #796

Being clearer about use of RFC 2119 #796

Comments

asmeurer commented Apr 22, 2024

Micky774 commented Apr 23, 2024 • edited Loading

fancidev commented Oct 22, 2024 • edited Loading

seberg commented Oct 22, 2024

asmeurer commented Oct 22, 2024

seberg commented Oct 23, 2024 • edited Loading

fancidev commented Oct 23, 2024

seberg commented Oct 23, 2024

fancidev commented Oct 23, 2024 • edited Loading

asmeurer commented Oct 23, 2024

Micky774 commented Apr 23, 2024 •

edited

Loading

fancidev commented Oct 22, 2024 •

edited

Loading

seberg commented Oct 23, 2024 •

edited

Loading

fancidev commented Oct 23, 2024 •

edited

Loading