Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

URI naming scheme for user/one-off/non-standard keywords #1401

Open
awwright opened this issue Apr 21, 2023 · 31 comments
Open

URI naming scheme for user/one-off/non-standard keywords #1401

awwright opened this issue Apr 21, 2023 · 31 comments
Labels
proposal Initial discussion of a new idea. A project will be created once a proposal document is created.

Comments

@awwright
Copy link
Member

awwright commented Apr 21, 2023

In some of the discussion around annotation-only or SVA (simple value annotation) keywords, some commenters seem to suggest the "@" naming scheme would be useful for writing custom keywords. This is backwards from my understanding, see my opinion here.

However user keywords are important, and I think establishing how to do it may clarify some of the discussion around SVA-only keywords. I would like to propose a convention where user keywords — keywords that are experimental, or are unlikely to see a use beyond a single party — can be defined with a URI as a name, without requiring any outside coordination, or even declared a vocabulary.

{
"type": "string",
"http://example.com/schema/distance": 40
}

Is this similar to the x- convention? How is it different?

URI names are similar to the "x-" convention in that users can mint names however they see fit. However, the problem with "x-" names is that it was not intended to be a global namespace, but by agreement or internal/private use only; and sometimes these uses "leak" out of private use, and into the "global" namespace (which is how you get headers like X-Frame-Options). Using URIs solves this because URIs exist in a global namespace (no consideration for keeping them internal is needed), and each user can select a URI that doesn't interfere with anyone else.

The only downside is that names for user keyword names may be very long. But if a keyword becomes popular, the keyword can be standardized with a more typical "standard" name; since keywords do not overlap, you can define a keyword with two different names, and schemas can use one, or the other, or both, with identical behavior.

@jdesrosiers
Copy link
Member

I like this idea. I've suggested before that I'd like to see that all keywords have a URI and that vocabularies are just mappings of plain keyword names to the keyword identifier. Your proposal is compatible with that idea, but adds the concept of being able to use the keyword URI directly in the schema without being required to use the vocabulary system to map to a plain keyword name.

@gregsdennis
Copy link
Member

However, the problem with "x-" names is that it was not intended to be a global namespace, but by agreement or internal/private use only

I thought we had already addressed that we recognize the history of x- and were using it because

  • it's commonly recognized and popular
  • we're explicitly dissociating our usage from the HTTP origins
  • it's simple for users
  • it's easily recognizable for implementors

While I agree with what you're saying, I don't think that our users will agree that it's a problem or that they'd want to use lengthy URI key names.

(Not to mention that we've already made a big announcement that x- is the convention we're following. We need to start nailing this stuff down and moving on or else the spec will never have a next release.)

@awwright
Copy link
Member Author

I thought we had already addressed that we recognize the history of x- and were using it

The "x-" convention I'm referring to here is merely the one seen in mail/HTTP, not related to marking SVAs in JSON Schema.

@gregsdennis
Copy link
Member

The "x-" convention I'm referring to here is merely the one seen in mail/HTTP, not related to marking SVAs in JSON Schema.

I understand that's the one that you're referring to, but you're also suggesting we use URIs for custom SVA keywords, which is counter to the decision to use x-.

If that's not what you're proposing, then what you're proposing is not clear to me.

@jdesrosiers
Copy link
Member

you're also suggesting we use URIs for custom SVA keywords, which is counter to the decision to use x-.

I don't think that's what's being proposed. At least, that's not how I understood it. I see this as a way to introduce custom validation keywords, not custom SVA keywords. Implementations need to understand URI keyword, but they don't need to understand x- keywords. URI keywords aren't stepping on the toes of -x, but it does sort of step on the toes of the vocabulary system by providing an alternative way to declaring custom keywords.

We've long talked about how the vocabulary system is better designed for organizations creating a custom dialect than for users wanting to use custom keywords. Something like this offers them a simpler alternative. If we think about it the way I mentioned in my previous comment, it's actually a stepping stone toward using the vocabulary system rather than an alternative.

@awwright
Copy link
Member Author

awwright commented May 1, 2023

@gregsdennis Do you have any specific objections to this? Are there any better alternatives, on their merits?

If x- was supposed to be for one-off/user keywords only, I think a URI name overcomes many of the shortcomings of x-.

If x- was supposed to indicate validation/annotation behavior, I think we should figure out how to notate one-off user keywords first, so that we have a concrete understanding of the problem that SVA keywords are solving (what to do when an unknown, custom keyword is encountered... an SVA keyword can be ignored in more cases than other kinds of keywords, rather than causing an error).

@gregsdennis
Copy link
Member

@awwright I don't fully understand what you're proposing.

If you're proposing that we use a URI instead of x- for custom SVA keywords, then, yes, I object to that because we've already had an extensive discussion, received user opinion, and made a very official decision to use x-. That matter is closed.

If you're proposing that we use a URI to indicate custom non-SVA (i.e. non-vocab validation) keywords, I don't understand how using a URI enables this. The implementation still needs to know what to do with it. And if you're sharing the schema, then you have to ensure that all implementations that may consume it know what to do with it. This feels like your trying to subvert vocabularies.

@awwright
Copy link
Member Author

awwright commented May 1, 2023

In my understanding, custom/user keywords and SVA keywords are orthogonal concepts: you can have a custom assertion keyword (you would tell it's custom because it's a URI), or you can have a standardized SVA keyword (you would know it's an annotation because it has a symbol prefix). Or you can have both (e.g. @http://example.com/kw/status would be a custom annotation keyword).

I opened this issue partly because I'm not so sure that we actually agree on what x- is doing, I can't discern what the consensus opinion actually is.

Also, I think committing ourselves to a solution because "that issue is decided" asks the editing process to make commitments that aren't realistic... practically speaking, a specification isn't final until someone has started relying on its interoperability guarantees. If the editors publish a change that doesn't capture the consensus, that's no problem, you can always go back and revise it until it does. And even an adopted specification can still be changed, so long as the change remains interoperable. So if it turns out to be a mistake, admitting "x- is ambiguous, use these alternate naming schemes instead" remains possible.

@gregsdennis
Copy link
Member

gregsdennis commented May 1, 2023

I'm not so sure that we actually agree on what x- is doing, I can't discern what the consensus opinion actually is.

The consensus opinion was that x- keywords indicate custom SVAs, that is ad-hoc annotation-only keywords. They are functionless and their values are reported as annotations in the output.

This is simple for an implementation to support based on that convention, and it's simple for schema authors to understand.

you can have a custom assertion keyword

No you can't. They need to be declared in a vocabulary.

If you're proposing that such custom assertion keywords be supported outside of vocabularies, then you'll have to define an entirely new system for implementations to be able to say whether they support those custom assertion keywords.

But we already have such a system: vocabularies.

you can have a standardized SVA keyword (you would know it's an annotation because it has a symbol prefix)

No, standardized SVAs don't need to have, and in fact are forbidden from having, the prefix (which is x-, not a symbol).

See this addition which was added after @jdesrosiers' comment that this convention needs to be reserved for ad-hoc SVAs.

SVAs in vocabularies will be normal keywords that do not follow the convention.

@http://example.com/kw/status would be a custom annotation keyword

No. That's what x- is for.


I still don't understand the reasoning behind this proposal. You're saying that you want:

  • custom non-vocabulary assertion keywords, which reduces interoperability and is the exact reason the vocabulary system was introduced
  • to change x- to URIs... why? There's no real reason to do this.

@awwright
Copy link
Member Author

awwright commented May 2, 2023

@gregsdennis The reasoning is there are several orthogonal problems that x- is treating as the same problem, but this isn't always true.

① What if an update to the specification introduces a new annotation keyword? Older implementations ought to be able to ignore it for assertion purposes. The most straightforward way to identify this is with a naming scheme.

② What if the new standard keyword shadows a custom assertion keyword with different behavior? Or suppose you want to use multiple vocabularies with overlapping keyword names. You can't do this. With a URI, you could distinguish these keywords, and you could automatically know they come from a vocabulary, even if not otherwise declared.

If you like, you could even map keywords from vocabularies to URIs by some mapping like <vocabulary URI + keyword name>, e.g. https://json-schema.org/draft/2020-12/vocab/validation#type, that makes sense to me.

The two issues #1399 and (this issue) #1401 together solve these problems, by treating user keyword naming, and identifying annotation keywords, as separate problems. So you can have user keywords that are assertions, and you can have standard keywords that are annotation-only.

So describe for me, do these problems exist with x-, or are they not important enough to address, or is it something else?

@gregsdennis
Copy link
Member

@awwright I think you are ascribing to x- purposes that it doesn't have.

The sole purpose of the x- prefix is to indicate ad-hoc, non-vocabulary, annotation-only keywords. This restores to users some of the functionality that was removed when we disallowed unknown keywords, namely the ability to add custom non-validation data to a schema.

① What if an update to the specification introduces a new annotation keyword? Older implementations ought to be able to ignore it for assertion purposes. The most straightforward way to identify this is with a naming scheme.

The proposal (#1387) disallows vocabularies (either defined in the spec or by a third party) from creating any keywords that begin with x-. This eliminates the possibility of collisions between these custom keywords and those in vocabularies.

Vocbaularies may still define annotation keywords, but they must not start with x-.

② What if the new standard keyword shadows a custom assertion keyword with different behavior? Or suppose you want to use multiple vocabularies with overlapping keyword names. You can't do this. With a URI, you could distinguish these keywords, and you could automatically know they come from a vocabulary, even if not otherwise declared.

The possibility of vocabulary keywords having collisions does still exist, but it has nothing to do with the x- prefix.

The idea to use URIs for keywords was proposed by @jdesrosiers in #1065 about augmenting the output formats with keyword->vocab mapping information. In that discussion I was opposed to it, but upon reflection, I think this is a decent idea, and I'm happy to revisit that in the proper context.

Extending that idea to be able to specify which vocab a keyword refers to in the event of a conflict is a good idea as well. This would enhance the existing vocabulary system (instead of subverting it). I recommend a new core keyword could provide such a mapping. Again, I'm happy to discuss that in its own context.

Regardless, the introduction of support for x--prefixed keywords as SVAs intersects neither of these cases.

@awwright
Copy link
Member Author

awwright commented May 2, 2023

The proposal (#1387) disallows vocabularies (either defined in the spec or by a third party) from creating any keywords that begin with x-.

Vocbaularies may still define annotation keywords, but they must not start with x-.

That's the problem though, implementations should be able to look at a keyword, even from a vocabulary, and understand that it's an annotation-only keyword. In an update to the specification, how would you indicate this, if not with x-?

The possibility of vocabulary keywords having collisions does still exist

I don't mean to imply x- causes this problem; rather, doesn't this proposal solve that problem?

@gregsdennis
Copy link
Member

gregsdennis commented May 2, 2023

implementations should be able to look at a keyword, even from a vocabulary, and understand that it's an annotation-only keyword

Why? What benefit does this provide?

  • If it's x-*, then it's an annotation and not from a vocab.
  • If it's from a vocab, then the implementation needs to know the keywords defined in that vocab in order to process the schema, so it would already know if a given keyword is an annotation.

doesn't this proposal solve that problem?

I think it moves toward solving the problem of vocabulary collisions, but I don't know that it completely solves it.

For one, this problem doesn't exist for just annotations, but for all keywords.

Secondly, we should consider the case where a vocab keyword doesn't collide. Are you still going to require users to write out the full {vocabURI}#{keyword}, or is just the keyword sufficient? If just the keyword is sufficient, are you requiring implementations to recognize {vocabURI}#{keyword} and {keyword} to be the same?

If keyword collisions is the problem you're describing, then let's talk about a solution that fits within existing systems. We don't have a precedent for keywords having two representations (the plain keyword and a "vocab-qualified" keyword). However we could introduce a new core keyword (I'll use $using here because I'm used to C#) that can resolve these conflicts simply.

Suppose a meta-schema/dialect/whatever uses two vocabs that both define foo. The meta-schema can be written as:

{
  "$schema": "https://example.com/my-meta-schema.json",
  "$id": "https://example.com/my-meta-schema.json",
  "$vocabulary": {
    // all the core vocabs
    "https://example.com/vocabs/custom-foo-1": true,
    "https://example.com/vocabs/custom-foo-2": true
  },
  "$using": {
    "foo1": "https://example.com/vocabs/custom-foo-1#foo",
    "foo2": "https://example.com/vocabs/custom-foo-2#foo"
  }
}

Then, when the implementation can properly link the two foo keywords to foo1 and foo2, and the schema author uses foo1 and foo2 as needed in their schema.

We can then also add a requirement of implementations that if a meta-schema doesn't resolve this ambiguity, the meta-schema is invalid or indeterminant (implementation-defined).

@awwright
Copy link
Member Author

awwright commented May 3, 2023

Why? What benefit does this provide?

Suppose we want to publish an update to the specification that introduces a new annotation keyword with a standard meaning. Take for example, "JSON-LD context" to associate a JSON-LD/RDF semantics with a value. The keyword won't affect assertion at all, and schema authors should be able to introduce it into their schemas without disrupting validation. So, you name the keyword with a naming scheme that indicates this—perhaps @LDContext or @context.

It seems like "optional" vocabularies were supposed to solve this problem—but since the vocabulary won't even be loaded, the implementation won't understand where the @LDContext is defined or that it's safe to ignore... unless there's a naming scheme that says @ keywords are annotation-only.

In contrast, suppose you want to define a cheap assertion keyword, used by a single party. By "cheap" I mean, nobody else is going to use it, so you want to avoid the overhead of standardization, or even writing a vocabulary and associated meta-schema. This proposal provides that—just pick a URI that's under your control.

For one, this problem doesn't exist for just annotations, but for all keywords.

Well yes, URI keywords can do anything, unless specified otherwise. Again, you could have an annotation-only URI keyword e.g. @http://example.com/kw

Secondly, we should consider the case where a vocab keyword doesn't collide. Are you still going to require users to write out the full {vocabURI}#{keyword}, or is just the keyword sufficient? If just the keyword is sufficient, are you requiring implementations to recognize {vocabURI}#{keyword} and {keyword} to be the same?

I haven't fully thought out the idea, but I suppose this URI form would be required if the keyword would otherwise be ambiguous (if it's defined in multiple vocabularies).

@gregsdennis
Copy link
Member

gregsdennis commented May 3, 2023

You've missed the important part of your supposition:

Suppose we want to publish an update to the specification that introduces a new annotation keyword.

The specification is being updated, and it defines the keyword. It uses vocabularies to define keywords. Thus, an implementation that understands the specification should understand this keyword and know that it's an annotation. If it doesn't understand the keyword, then the implementation is out of date and needs to be updated.

(BTW - This leads back to all of the other discussions that Jason and I have been having around whether to include unevaluatedProperties: false in the meta-schema`.)

suppose you want to define a cheap assertion keyword, used by a single party

I think anything that is not intended to be interoperable (i.e. schema author and consumer are the same party) needs to remain outside of the scope of the specification and something that individual implementations can support, if they want. Our job in writing the specification should remain focused on ensuring interoperability across implementations.

The point of the specification is to standardize behavior and processes for common tasks. Something that is to be used by a single party needs no such standardization, and is therefore outside of our scope. Every implementation that supports ad-hoc assertion keywords is going to have their own way of doing it anyway. The spec cannot be expected to add requirements to implementation-specific behavior.

(I've raised the question about scope in our charter PR.)

@awwright
Copy link
Member Author

awwright commented May 3, 2023

The specification is being updated, and it defines the keyword. It uses vocabularies to define keywords. Thus, an implementation that understands the specification should understand this keyword and know that it's an annotation.

But most implementations will not update right away. If you are authoring and publishing a schema, and it is being consumed by multiple different implementations not under your control, you want to be able to update the schema in a way that isn't going to cause breakage if you don't have to.

@gregsdennis
Copy link
Member

But most implementations will not update right away.

That's true, but I also don't see schema authors wanting to use new features as soon as the spec updates. There may be a couple of users watching the spec, but if they're savvy enough to watch the spec, they also likely understand that there's always lag for the implementations to catch up. 2020-12 only really started seeing activity in the past year or so, two years after it was released.

Also, users' desire for new features is the driving force that gets implementations to provide them. I don't want to create mechanisms that allow implementors to delay adding new features. I want to encourage faster updates. The only way for us to do that is by utilizing the users to apply that pressure.


If you want to require a prefix for all vocabulary annotation keywords to support the kinda of forward compatibility you're talking about, then we'll have to update the ones we have to match that rule, e.g. title -> @title.

That's a pretty large change, and it really only addresses a fairly niche use case as it only applies to annotations, and usually we'll be adding functional keywords (applicators and assertions).

@jdesrosiers
Copy link
Member

① What if an update to the specification introduces a new annotation keyword? Older implementations ought to be able to ignore it for assertion purposes. The most straightforward way to identify this is with a naming scheme.

I think this is a great point. The x- syntax we've discussed doesn't solve that forward-compatibility case. Assuming we stick with what we've decided so far, we would need two separate prefixes. Perhaps @ for vocabulary-defined SVAs and x- for non-vocabulary SVAs. It's awkward to have two prefixes that describe the same behavior, but it would work.

Using @ for all SVA keywords and URIs for non-vocabulary keywords would be less awkward conceptually, but more cumbersome for users who will be annoyed by extra typing and confused that the URIs don't go anywhere. I think it's possible to get to a point where URIs make more sense than x-, but we need to explore further how URIs fit in with the vocabulary system and introduce ways (something like the $using suggestion) to make using URIs less cumbersome.

It seems like "optional" vocabularies were supposed to solve this problem

Agreed. If we go with a prefix for all annotation-only keywords, I think it effectively eliminates the case for optional vocabularies.

If you want to require a prefix for all vocabulary annotation keywords to support the kinda of forward compatibility you're talking about, then we'll have to update the ones we have to match that rule, e.g. title -> @title.

That's a pretty large change, and it really only addresses a fairly niche use case as it only applies to annotations, and usually we'll be adding functional keywords (applicators and assertions).

This is also a very good point. Renaming all of our annotation keywords is a big change for something that I don't expect will happen often. We could of course grandfather in the existing keywords, but the inconsistency is awkward. Or, we could support both and deprecate the old spelling, which is also awkward. It's a fair question to ask whether this is a problem worth solving. I'm leaning toward solving it, but not strongly.

@gregsdennis
Copy link
Member

It seems we have two topics going simultaneously, and I think they can be resolved separately.

  • providing support for future SVAs
  • resolving conflicts between the same keyword defined by multiple vocabs

It may be better to split one (or both) of these into separate issues.


On the topic of supporting future SVAs, stepping back from this a bit to get more of the larger picture, I noticed that we're now saying that we want a prefix for non-vocab SVAs (x-) and a prefix for vocab SVAs. Why not just require the same prefix for all SVAs? Whether they're in a vocabulary doesn't change an implementation's behavior toward them: the value of the keyword is still reported as an annotation in the output, and the application decides what to do with it.

Secondly, since the current state (at least in my PR) is that x- is reserved for non-vocab (ad-hoc) SVAs, a user/org which wishes to collect their ad-hoc annotations into a vocab would be required to remove the x- from all of their keywords in all of their schemas. This would also be the case if vocab SVAs and non-vocab SVAs had different prefixes. This works against the adoption of vocabularies. However, if all SVAs, including those defined in vocabs, had the same prefix, then collecting ad-hoc SVAs into vocabs would require no code changes. This encourages users/orgs to create vocabs because it's really just documentation of the annotations they're using; they just need to add a URI to identify it.

If we want to support all SVAs in a perfectly future-compatible way, then I think having a single prefix for both vocab SVAs and non-vocab SVAs, is the way to go.

Or, we could support both and deprecate the old spelling, which is also awkward.

I think this is probably the best approach. We could also encourage implementations to push people toward using the prefixed keyword. (We may need to augment the output format to allow implementations to provide a deprecation warning or something.)

@jdesrosiers
Copy link
Member

Why not just require the same prefix for all SVAs?

Let's say I introduce x-foo in my schema that's a string with certain semantics. Later, JSON Schema adds x-foo as a number with different semantics. My old schemas now have ambiguous semantics and fail meta-schema validation. We can avoid this potential conflict either by using a different prefix or a URI for non-vocabulary SVAs.

a user/org which wishes to collect their ad-hoc annotations into a vocab would be required to remove the x- from all of their keywords in all of their schemas.

I agree. This is why I wouldn't suggest x- keywords (as currently discussed) for something that would potentially make sense in a vocabulary. It's mostly only useful for something that's a glorified comment. The URI solution is better for experimenting with something that could potentially end up as part of a formal vocabulary. If we go with the idea of all keywords having a URI and vocabularies mapping keyword names to keyword URIs, users wouldn't have to change their existing schemas when they incorporate the new vocabulary. The keyword name is just an alias for the keyword URI. Both would work the same. (This is similar to how JSON-LD works)

@gregsdennis
Copy link
Member

gregsdennis commented May 5, 2023

We can avoid this potential conflict either by using a different prefix or a URI for non-vocabulary SVAs.

Future-proofing. 👍

... [the rest of what Jason just said]

Okay. I think I can get behind this, but I think we need to work it out a bit more...

x- keywords are reserved for ad-hoc non-vocab annotation-only keywords. This is the thing we recently publicized. Nothing is changing here. If a user/org wants to collect these into a vocab, they'll have to update them to remove x-.

(We should use IRIs like the rest of the spec as adopted.)

IRI keywords are for ad-hoc non-vocab keywords (all kinds). The format must be [base-iri]#[keyword], where base-iri must be an absolute IRI that indicates the ad-hoc vocab for the keyword. This will allow a user/org to collect IRI keywords that share a common base-iri into a vocab without changing any code.

An implementation does not need to understand an ad-hoc IRI keyword in order to process the schema that contains it. If an implementation does not understand the keyword, it will be treated as an annotation-only keyword. By using an unknown IRI keyword, the user acknowledges that this may produce inconsistent evaluation results. (This is equivalent to the ad-hoc vocab being optional.)

As a consequence of the above, vocab-IRI-qualified keywords are now acceptable for all keywords, include those in vocabs defined by the spec. So https://json-schema.org/draft/2020-12/vocab/validation#minimum and minimum mean the same thing (if the 2020-12 validation vocab is used by the meta-schema).

We need to adopt a keyword-conflict-resolultion strategy, such as my proposed $using above. I think we should discuss this particular point in a follow-up issue.

  1. Does this sound acceptable to everyone?
  2. Does this sound implementable?
  3. Do you think that a user that is savvy enough to use a non-annotation IRI keyword in their spec will understand that evaluation may be inconsistent?

@jdesrosiers
Copy link
Member

The format must be [base-iri]#[keyword], where base-iri must be an absolute IRI that indicates the ad-hoc vocab for the keyword.

I don't think it's necessary to prescribe a URI format for keyword identifiers. Also, I very much don't want keywords to be coupled to a vocabulary. My implementation has always used URIs as keyword identifiers. I originally used a structure similar to what you propose except I used dialect URI as the base instead of vocabulary. I found this coupling to be problematic and changed my approach.

The problem with using the vocabulary URI (or dialect URI) is that it gets awkward if you want to compose a new vocabulary from an existing one. For example, the title keyword in 2019-09 and 2020-12 are exactly the same in every way, but have different identifiers. Instead, I found it much more useful to identify keywords using https://json-schema.org/keyword/{keyword}. Then I could construct the 2019-09 meta-data vocabulary and the 2020-12 meta-data vocabulary by each mapping title to https://json-schema.org/keyword/title.

Another example is renamed keywords like $defs. One vocabulary can define definitions => https://json-schema.org/keyword/definitions while another defines $defs => https://json-schema.org/keyword/definitions. They have the same semantics and they should have the same URI.

This kind of thing was important for implementing the annotation tooling I released recently. It allows me to treat title the same no matter what draft schema it showed up in (because that keyword has never changed), while also treating items as separate keywords if it appeared in 2019-09 or 2020-12 (because it's semantics changed).

An implementation does not need to understand an ad-hoc IRI keyword in order to process the schema that contains it. If an implementation does not understand the keyword, it will be treated as an annotation-only keyword. By using an unknown IRI keyword, the user acknowledges that this may produce inconsistent evaluation results. (This is equivalent to the ad-hoc vocab being optional.)

I was expecting that an implementation that encounters an unknown ad-hoc IRI keyword would produce an error. You've recognized that this breaks compatibility, so what's your reason for wanting this behavior?

@gregsdennis
Copy link
Member

I found it much more useful to identify keywords using https://json-schema.org/keyword/{keyword}

I can go with this. We, as "json-schema.org," wouldn't (hopefully) ever define the same keyword twice. I suppose it scopes a keyword to the org.

We'd also be free to reorganize the vocabs if we saw fit to do so.

Each vocab could actually list the keywords it brings to the table as well, getting us a bit closer to a machine-readable vocabulary.

I was expecting that an implementation that encounters an unknown ad-hoc IRI keyword would produce an error.

Producing an error is fine. I prefer that as well.

I took the option of generating an annotation because I thought it was desired that unknown IRI keywords would work as if they were defined in an optional vocab in 2020-12.

@awwright
Copy link
Member Author

awwright commented May 9, 2023

I don't think it's necessary to prescribe a URI format for keyword identifiers. Also, I very much don't want keywords to be coupled to a vocabulary.

Yeah. I can see the motivation to directly map a keyword URI to a vocabulary URI and a keyword name, but I don't see much benefit for schema authoring, and actually it might be too restrictive, for example:

For example, the title keyword in 2019-09 and 2020-12 are exactly the same in every way

...here, forcing a keyword URI include a vocabulary URI and the keyword name (as the fragment) would preclude the possibility that keywords can be shared between vocabularies (that two vocabularies can define the "same" keyword).

Instead, I found it much more useful to identify keywords using https://json-schema.org/keyword/{keyword}.

Would this just be a way to map a non-URI keyword to a URI keyword? Or is this useful for other purposes?

@jdesrosiers
Copy link
Member

Would this just be a way to map a non-URI keyword to a URI keyword? Or is this useful for other purposes?

This was just an example of a pattern we could use for assigning URIs to our existing keywords that doesn't couple to vocabularies or dialects. There's nothing special about this pattern. We could assign each keyword a urn:uuid: identifier instead if we wanted to be annoying. Including the keyword name in the URI is just a reader convenience. Keyword name to keyword URI mapping is defined by vocabularies.

A fun thing you could do with this is to define a dialect that translates keyword names to Spanish. The keyword semantics are the same, the vocabulary just allows you to change the keyword name that's used in the schema. Implementations should be able to trivially handle this dialect. The user just needs to provide the mapping.

@gregsdennis
Copy link
Member

gregsdennis commented Sep 27, 2023

I think I may have misread this and a lot of my comments about x-* is for custom non-vocab SVAs are likely completely unrelated.

@awwright correct me if I'm wrong, but are you merely suggesting that using a vocab-qualified keyword URI be allowed in place of creating a custom meta-schema to include the vocab just so I can use that keyword in a schema? I think I'd be okay with that, though it would make deserialization a bit trickier (for me) because I'd have to account for two keys that represent the same keyword.

What would happen if the URI and the plain keyword were both used in a single schema object? For example, if I used type and https://json-schema.org/draft/2020-12/meta/validation/type?

@jdesrosiers
Copy link
Member

are you merely suggesting that using a vocab-qualified keyword URI be allowed in place of creating a custom meta-schema to include the vocab just so I can use that keyword in a schema?

I'm not Austin, but I don't think that's quite right. I think walking though an example would be the easiest way to explain what I'm thinking. Let's say I want to introduce a custom validation keyword minDate. This is not an SVA keyword, so I can't use x-. I'd have to create new vocabulary and build a dialect. Austin's suggestion is to use URI keywords instead of needing to create a vocab and dialect in order to use the keyword. The keyword would be https://myorg.com/keyword/minDate. The implementation still needs to support that keyword in order to process the schema, but there's no danger of someone else creating a keyword with different semantics and the same name because theirs would be https://theirorg.com/keyword/minDate. At this point, there's no vocabularies involved.

(Austin also prefers that custom SVA keywords be URIs as well utilizing the @ prefix, but I'm sticking to the case that's consistent with what we've decided about x- keywords.)

I suggested we take it a step further and have a URI for all keywords including those in a vocabulary. Any plain-name keywords map to keyword URIs via a vocabulary and the keyword URI is the authority on the semantics of the keyword. One of the things this allows is seamless transition to moving the custom keyword to a vocabulary. Let's say I want to move my https://myorg.com/keyword/minDate keyword to a vocab and be able to use minDate going forward. Since the vocab is just a mapping of plain-name to URI, I can introduce that vocabulary without breaking existing schemas. My old schemas that use the URI still work and at the same time I can use minDate. This is why I called this proposal a stepping stone to using the vocabulary system in previous comment. Without this concept of all keywords having a URI, moving minDate to a vocabulary would break your existing schemas. You would have to update all of your schemas that used the URI to use the plain-name.

So, combining both Austin's proposal and my addition, you could use the keyword URI directly instead of using a dialect that includes the vocabulary that includes the keyword. However, the concept as a whole isn't dependent on vocabularies. It's not just a way to use a keyword without using it's vocabulary. It's a way to use a custom validating keyword that isn't part of a vocabulary (yet).

What would happen if the URI and the plain keyword were both used in a single schema object? For example, if I used type and https://json-schema.org/draft/2020-12/meta/validation/type?

type is just an alias for https://json-schema.org/draft/2020-12/meta/validation/type so there should be no issue using both in in the same schema.

@gregsdennis
Copy link
Member

using a vocab-qualified keyword URI be allowed in place of creating a custom meta-schema to include the vocab just so I can use that keyword in a schema - @gregsdennis

Austin's suggestion is to use URI keywords instead of needing to create a vocab and dialect in order to use the keyword. - @jdesrosiers

So the keyword doesn't necessarily need to be in a vocab up front. Nuance.

But later if the user wants to create a vocab with that keyword, they'd have to use the same URI, which ideally would be the vocab URI + the keyword... which means if there's the possibility that it could be in a vocab later, they should figure out a vocab URI now and use it. Thus the keyword URI would be a vocab-qualified keyword URI. (This is more best practice than requirement.)

So it's basically the same thing. Cool.

@jdesrosiers
Copy link
Member

So the keyword doesn't necessarily need to be in a vocab up front. Nuance.

👍

But later if the user wants to create a vocab with that keyword, they'd have to use the same URI, which ideally would be the vocab URI + the keyword

I don't think it's necessary or a good idea to couple keyword URIs to vocabulary URIs. I explained in a previous comment.

@gregsdennis
Copy link
Member

What would happen if the URI and the plain keyword were both used in a single schema object? For example, if I used type and https://json-schema.org/draft/2020-12/meta/validation/type?

type is just an alias for https://json-schema.org/draft/2020-12/meta/validation/type so there should be no issue using both in in the same schema. - @jdesrosiers

So you don't see a problem with this?

{
   "type": "object",
   "https://json-schema.org/draft/2020-12/meta/validation/type": "array"
}

This seems like a problem to me.

@jdesrosiers
Copy link
Member

I don't see it as any more problematic than,

{
  "allOf": [
    { "type": "object" },
    { "type": "array" }
  ]
}

It's just a quirk of the syntax that would make it possible to express the same thing without the allOf. I'm not worried about it.

@gregsdennis gregsdennis added the proposal Initial discussion of a new idea. A project will be created once a proposal document is created. label Jun 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
proposal Initial discussion of a new idea. A project will be created once a proposal document is created.
Projects
None yet
Development

No branches or pull requests

4 participants
@awwright @jdesrosiers @gregsdennis and others