-
-
Notifications
You must be signed in to change notification settings - Fork 273
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
URI naming scheme for user/one-off/non-standard keywords #1401
Comments
I like this idea. I've suggested before that I'd like to see that all keywords have a URI and that vocabularies are just mappings of plain keyword names to the keyword identifier. Your proposal is compatible with that idea, but adds the concept of being able to use the keyword URI directly in the schema without being required to use the vocabulary system to map to a plain keyword name. |
I thought we had already addressed that we recognize the history of
While I agree with what you're saying, I don't think that our users will agree that it's a problem or that they'd want to use lengthy URI key names. (Not to mention that we've already made a big announcement that |
The "x-" convention I'm referring to here is merely the one seen in mail/HTTP, not related to marking SVAs in JSON Schema. |
I understand that's the one that you're referring to, but you're also suggesting we use URIs for custom SVA keywords, which is counter to the decision to use If that's not what you're proposing, then what you're proposing is not clear to me. |
I don't think that's what's being proposed. At least, that's not how I understood it. I see this as a way to introduce custom validation keywords, not custom SVA keywords. Implementations need to understand URI keyword, but they don't need to understand We've long talked about how the vocabulary system is better designed for organizations creating a custom dialect than for users wanting to use custom keywords. Something like this offers them a simpler alternative. If we think about it the way I mentioned in my previous comment, it's actually a stepping stone toward using the vocabulary system rather than an alternative. |
@gregsdennis Do you have any specific objections to this? Are there any better alternatives, on their merits? If If |
@awwright I don't fully understand what you're proposing. If you're proposing that we use a URI instead of If you're proposing that we use a URI to indicate custom non-SVA (i.e. non-vocab validation) keywords, I don't understand how using a URI enables this. The implementation still needs to know what to do with it. And if you're sharing the schema, then you have to ensure that all implementations that may consume it know what to do with it. This feels like your trying to subvert vocabularies. |
In my understanding, custom/user keywords and SVA keywords are orthogonal concepts: you can have a custom assertion keyword (you would tell it's custom because it's a URI), or you can have a standardized SVA keyword (you would know it's an annotation because it has a symbol prefix). Or you can have both (e.g. I opened this issue partly because I'm not so sure that we actually agree on what Also, I think committing ourselves to a solution because "that issue is decided" asks the editing process to make commitments that aren't realistic... practically speaking, a specification isn't final until someone has started relying on its interoperability guarantees. If the editors publish a change that doesn't capture the consensus, that's no problem, you can always go back and revise it until it does. And even an adopted specification can still be changed, so long as the change remains interoperable. So if it turns out to be a mistake, admitting " |
The consensus opinion was that This is simple for an implementation to support based on that convention, and it's simple for schema authors to understand.
No you can't. They need to be declared in a vocabulary. If you're proposing that such custom assertion keywords be supported outside of vocabularies, then you'll have to define an entirely new system for implementations to be able to say whether they support those custom assertion keywords. But we already have such a system: vocabularies.
No, standardized SVAs don't need to have, and in fact are forbidden from having, the prefix (which is See this addition which was added after @jdesrosiers' comment that this convention needs to be reserved for ad-hoc SVAs. SVAs in vocabularies will be normal keywords that do not follow the convention.
No. That's what I still don't understand the reasoning behind this proposal. You're saying that you want:
|
@gregsdennis The reasoning is there are several orthogonal problems that ① What if an update to the specification introduces a new annotation keyword? Older implementations ought to be able to ignore it for assertion purposes. The most straightforward way to identify this is with a naming scheme. ② What if the new standard keyword shadows a custom assertion keyword with different behavior? Or suppose you want to use multiple vocabularies with overlapping keyword names. You can't do this. With a URI, you could distinguish these keywords, and you could automatically know they come from a vocabulary, even if not otherwise declared. If you like, you could even map keywords from vocabularies to URIs by some mapping like <vocabulary URI + keyword name>, e.g. The two issues #1399 and (this issue) #1401 together solve these problems, by treating user keyword naming, and identifying annotation keywords, as separate problems. So you can have user keywords that are assertions, and you can have standard keywords that are annotation-only. So describe for me, do these problems exist with |
@awwright I think you are ascribing to The sole purpose of the
The proposal (#1387) disallows vocabularies (either defined in the spec or by a third party) from creating any keywords that begin with Vocbaularies may still define annotation keywords, but they must not start with
The possibility of vocabulary keywords having collisions does still exist, but it has nothing to do with the The idea to use URIs for keywords was proposed by @jdesrosiers in #1065 about augmenting the output formats with keyword->vocab mapping information. In that discussion I was opposed to it, but upon reflection, I think this is a decent idea, and I'm happy to revisit that in the proper context. Extending that idea to be able to specify which vocab a keyword refers to in the event of a conflict is a good idea as well. This would enhance the existing vocabulary system (instead of subverting it). I recommend a new core keyword could provide such a mapping. Again, I'm happy to discuss that in its own context. Regardless, the introduction of support for |
That's the problem though, implementations should be able to look at a keyword, even from a vocabulary, and understand that it's an annotation-only keyword. In an update to the specification, how would you indicate this, if not with
I don't mean to imply |
Why? What benefit does this provide?
I think it moves toward solving the problem of vocabulary collisions, but I don't know that it completely solves it. For one, this problem doesn't exist for just annotations, but for all keywords. Secondly, we should consider the case where a vocab keyword doesn't collide. Are you still going to require users to write out the full If keyword collisions is the problem you're describing, then let's talk about a solution that fits within existing systems. We don't have a precedent for keywords having two representations (the plain keyword and a "vocab-qualified" keyword). However we could introduce a new core keyword (I'll use Suppose a meta-schema/dialect/whatever uses two vocabs that both define {
"$schema": "https://example.com/my-meta-schema.json",
"$id": "https://example.com/my-meta-schema.json",
"$vocabulary": {
// all the core vocabs
"https://example.com/vocabs/custom-foo-1": true,
"https://example.com/vocabs/custom-foo-2": true
},
"$using": {
"foo1": "https://example.com/vocabs/custom-foo-1#foo",
"foo2": "https://example.com/vocabs/custom-foo-2#foo"
}
} Then, when the implementation can properly link the two We can then also add a requirement of implementations that if a meta-schema doesn't resolve this ambiguity, the meta-schema is invalid or indeterminant (implementation-defined). |
Suppose we want to publish an update to the specification that introduces a new annotation keyword with a standard meaning. Take for example, "JSON-LD context" to associate a JSON-LD/RDF semantics with a value. The keyword won't affect assertion at all, and schema authors should be able to introduce it into their schemas without disrupting validation. So, you name the keyword with a naming scheme that indicates this—perhaps It seems like "optional" vocabularies were supposed to solve this problem—but since the vocabulary won't even be loaded, the implementation won't understand where the In contrast, suppose you want to define a cheap assertion keyword, used by a single party. By "cheap" I mean, nobody else is going to use it, so you want to avoid the overhead of standardization, or even writing a vocabulary and associated meta-schema. This proposal provides that—just pick a URI that's under your control.
Well yes, URI keywords can do anything, unless specified otherwise. Again, you could have an annotation-only URI keyword e.g.
I haven't fully thought out the idea, but I suppose this URI form would be required if the keyword would otherwise be ambiguous (if it's defined in multiple vocabularies). |
You've missed the important part of your supposition:
The specification is being updated, and it defines the keyword. It uses vocabularies to define keywords. Thus, an implementation that understands the specification should understand this keyword and know that it's an annotation. If it doesn't understand the keyword, then the implementation is out of date and needs to be updated. (BTW - This leads back to all of the other discussions that Jason and I have been having around whether to include
I think anything that is not intended to be interoperable (i.e. schema author and consumer are the same party) needs to remain outside of the scope of the specification and something that individual implementations can support, if they want. Our job in writing the specification should remain focused on ensuring interoperability across implementations. The point of the specification is to standardize behavior and processes for common tasks. Something that is to be used by a single party needs no such standardization, and is therefore outside of our scope. Every implementation that supports ad-hoc assertion keywords is going to have their own way of doing it anyway. The spec cannot be expected to add requirements to implementation-specific behavior. (I've raised the question about scope in our charter PR.) |
But most implementations will not update right away. If you are authoring and publishing a schema, and it is being consumed by multiple different implementations not under your control, you want to be able to update the schema in a way that isn't going to cause breakage if you don't have to. |
That's true, but I also don't see schema authors wanting to use new features as soon as the spec updates. There may be a couple of users watching the spec, but if they're savvy enough to watch the spec, they also likely understand that there's always lag for the implementations to catch up. 2020-12 only really started seeing activity in the past year or so, two years after it was released. Also, users' desire for new features is the driving force that gets implementations to provide them. I don't want to create mechanisms that allow implementors to delay adding new features. I want to encourage faster updates. The only way for us to do that is by utilizing the users to apply that pressure. If you want to require a prefix for all vocabulary annotation keywords to support the kinda of forward compatibility you're talking about, then we'll have to update the ones we have to match that rule, e.g. That's a pretty large change, and it really only addresses a fairly niche use case as it only applies to annotations, and usually we'll be adding functional keywords (applicators and assertions). |
I think this is a great point. The Using
Agreed. If we go with a prefix for all annotation-only keywords, I think it effectively eliminates the case for optional vocabularies.
This is also a very good point. Renaming all of our annotation keywords is a big change for something that I don't expect will happen often. We could of course grandfather in the existing keywords, but the inconsistency is awkward. Or, we could support both and deprecate the old spelling, which is also awkward. It's a fair question to ask whether this is a problem worth solving. I'm leaning toward solving it, but not strongly. |
It seems we have two topics going simultaneously, and I think they can be resolved separately.
It may be better to split one (or both) of these into separate issues. On the topic of supporting future SVAs, stepping back from this a bit to get more of the larger picture, I noticed that we're now saying that we want a prefix for non-vocab SVAs ( Secondly, since the current state (at least in my PR) is that If we want to support all SVAs in a perfectly future-compatible way, then I think having a single prefix for both vocab SVAs and non-vocab SVAs, is the way to go.
I think this is probably the best approach. We could also encourage implementations to push people toward using the prefixed keyword. (We may need to augment the output format to allow implementations to provide a deprecation warning or something.) |
Let's say I introduce
I agree. This is why I wouldn't suggest |
Future-proofing. 👍
Okay. I think I can get behind this, but I think we need to work it out a bit more...
(We should use IRIs like the rest of the spec as adopted.) IRI keywords are for ad-hoc non-vocab keywords (all kinds). The format must be An implementation does not need to understand an ad-hoc IRI keyword in order to process the schema that contains it. If an implementation does not understand the keyword, it will be treated as an annotation-only keyword. By using an unknown IRI keyword, the user acknowledges that this may produce inconsistent evaluation results. (This is equivalent to the ad-hoc vocab being optional.) As a consequence of the above, vocab-IRI-qualified keywords are now acceptable for all keywords, include those in vocabs defined by the spec. So We need to adopt a keyword-conflict-resolultion strategy, such as my proposed
|
I don't think it's necessary to prescribe a URI format for keyword identifiers. Also, I very much don't want keywords to be coupled to a vocabulary. My implementation has always used URIs as keyword identifiers. I originally used a structure similar to what you propose except I used dialect URI as the base instead of vocabulary. I found this coupling to be problematic and changed my approach. The problem with using the vocabulary URI (or dialect URI) is that it gets awkward if you want to compose a new vocabulary from an existing one. For example, the Another example is renamed keywords like This kind of thing was important for implementing the annotation tooling I released recently. It allows me to treat
I was expecting that an implementation that encounters an unknown ad-hoc IRI keyword would produce an error. You've recognized that this breaks compatibility, so what's your reason for wanting this behavior? |
I can go with this. We, as "json-schema.org," wouldn't (hopefully) ever define the same keyword twice. I suppose it scopes a keyword to the org. We'd also be free to reorganize the vocabs if we saw fit to do so. Each vocab could actually list the keywords it brings to the table as well, getting us a bit closer to a machine-readable vocabulary.
Producing an error is fine. I prefer that as well. I took the option of generating an annotation because I thought it was desired that unknown IRI keywords would work as if they were defined in an optional vocab in 2020-12. |
Yeah. I can see the motivation to directly map a keyword URI to a vocabulary URI and a keyword name, but I don't see much benefit for schema authoring, and actually it might be too restrictive, for example:
...here, forcing a keyword URI include a vocabulary URI and the keyword name (as the fragment) would preclude the possibility that keywords can be shared between vocabularies (that two vocabularies can define the "same" keyword).
Would this just be a way to map a non-URI keyword to a URI keyword? Or is this useful for other purposes? |
This was just an example of a pattern we could use for assigning URIs to our existing keywords that doesn't couple to vocabularies or dialects. There's nothing special about this pattern. We could assign each keyword a A fun thing you could do with this is to define a dialect that translates keyword names to Spanish. The keyword semantics are the same, the vocabulary just allows you to change the keyword name that's used in the schema. Implementations should be able to trivially handle this dialect. The user just needs to provide the mapping. |
I think I may have misread this and a lot of my comments about @awwright correct me if I'm wrong, but are you merely suggesting that using a vocab-qualified keyword URI be allowed in place of creating a custom meta-schema to include the vocab just so I can use that keyword in a schema? I think I'd be okay with that, though it would make deserialization a bit trickier (for me) because I'd have to account for two keys that represent the same keyword. What would happen if the URI and the plain keyword were both used in a single schema object? For example, if I used |
I'm not Austin, but I don't think that's quite right. I think walking though an example would be the easiest way to explain what I'm thinking. Let's say I want to introduce a custom validation keyword (Austin also prefers that custom SVA keywords be URIs as well utilizing the I suggested we take it a step further and have a URI for all keywords including those in a vocabulary. Any plain-name keywords map to keyword URIs via a vocabulary and the keyword URI is the authority on the semantics of the keyword. One of the things this allows is seamless transition to moving the custom keyword to a vocabulary. Let's say I want to move my So, combining both Austin's proposal and my addition, you could use the keyword URI directly instead of using a dialect that includes the vocabulary that includes the keyword. However, the concept as a whole isn't dependent on vocabularies. It's not just a way to use a keyword without using it's vocabulary. It's a way to use a custom validating keyword that isn't part of a vocabulary (yet).
|
So the keyword doesn't necessarily need to be in a vocab up front. Nuance. But later if the user wants to create a vocab with that keyword, they'd have to use the same URI, which ideally would be the vocab URI + the keyword... which means if there's the possibility that it could be in a vocab later, they should figure out a vocab URI now and use it. Thus the keyword URI would be a vocab-qualified keyword URI. (This is more best practice than requirement.) So it's basically the same thing. Cool. |
👍
I don't think it's necessary or a good idea to couple keyword URIs to vocabulary URIs. I explained in a previous comment. |
So you don't see a problem with this? {
"type": "object",
"https://json-schema.org/draft/2020-12/meta/validation/type": "array"
} This seems like a problem to me. |
I don't see it as any more problematic than, {
"allOf": [
{ "type": "object" },
{ "type": "array" }
]
} It's just a quirk of the syntax that would make it possible to express the same thing without the |
In some of the discussion around annotation-only or SVA (simple value annotation) keywords, some commenters seem to suggest the "@" naming scheme would be useful for writing custom keywords. This is backwards from my understanding, see my opinion here.
However user keywords are important, and I think establishing how to do it may clarify some of the discussion around SVA-only keywords. I would like to propose a convention where user keywords — keywords that are experimental, or are unlikely to see a use beyond a single party — can be defined with a URI as a name, without requiring any outside coordination, or even declared a vocabulary.
Is this similar to the
x-
convention? How is it different?URI names are similar to the "x-" convention in that users can mint names however they see fit. However, the problem with "x-" names is that it was not intended to be a global namespace, but by agreement or internal/private use only; and sometimes these uses "leak" out of private use, and into the "global" namespace (which is how you get headers like
X-Frame-Options
). Using URIs solves this because URIs exist in a global namespace (no consideration for keeping them internal is needed), and each user can select a URI that doesn't interfere with anyone else.The only downside is that names for user keyword names may be very long. But if a keyword becomes popular, the keyword can be standardized with a more typical "standard" name; since keywords do not overlap, you can define a keyword with two different names, and schemas can use one, or the other, or both, with identical behavior.
The text was updated successfully, but these errors were encountered: