Spec changes regarding URIs ($id
, $ref
, etc) and location semantics
#460
Replies: 9 comments 83 replies
-
I'm not sure I fully understand the problem being solved, or whether here all of the above changes are being proposed, or whether the point is to pick some or one of them to solve the problem --
This though seems like a significant step backwards to me. The point of this line is "hey implementers you should allow users of your library a way to associate URIs with their schema even if that schema isn't hosted anywhere". Many implementations do not make this as easy as they should, I suspect because they don't know this line even exists in the spec. Changing it to MAY will, in my opinion, do nothing to help users, and simply make it even less overt that this should exist in implementations.
I don't follow what positive change could come out of this either. I guess overall I don't understand:
That sounds like a feature not a bug to me (one that's yeah, underimplemented). Overall inasmuch as I do understand the problem being addressed ("the concept of a URI vs a URL is not intuitive to schema authors who don't read or care about reading URI specs") I don't personally see any of the above improving things. If anything I'd throw in an additional / replacement, if that is the problem we're trying to solve, which is "schema authors: if you use a URI to identify your schema, and if that URI is a URL, then you SHOULD also make the schema retrievable from that URL, and if not, then don't use a URL, use some other kind of URI", and then we of course ourselves should follow that. |
Beta Was this translation helpful? Give feedback.
-
I was initially opposed to this direction, however there are some good arguments. On balance and reflection, I'm not convinced it's the right direction, and feels to me like a weakening, which I feel would be bad. I thought that the WHATWG URL specification removed the ability for a URL to be an identifier only, but it does not. I agree, there's a general confusion over how references work in relation to URIs, but I think people equally expect it to work in the local file system vs making HTTP requests. I agree with @karenetheridge here. I think one of the main reasons people struggle to understand how URI resolution works in relation to JSON Schema is due to lack of consistency and documentation across implementations. Mandating that consistency could really help. (I also think that by providing a way for implementations to link to specific parts of their documentation, such as how to associate schemas with URIs, would send a strong signal, and that's one of my reasons for wanting to do so.) I think preferably, implementations SHOULD also provide an interface to allow for alternative user defined resolution of URI/L to schemas. That might be making an HTTP request. That may well be a solution provided by the same implementation. But I do feel strongly that libraries shouldn't be making HTTP requests elsewhere without a user's specific request/setup. I agree that you've correctly identified a common challenge, but I think the proposed solution is looking to make it "simpler" but at a cost to what users can reasonably expect. With schemas which have many many references, if implementations default to making HTTP requests, if you're making 30 or 40 requests, that's going to hugely hit performance, especially if it makes that request every time it runs validation against an instance with no cache. What about the average user? Even a single HTTP request is going to hit performance. We should encourage traversal before processing to verify required referenced scheams are present, and if explicitly enabled, make requests elsewhere to retrieve schemas where possible.
IMHO this would play havoc with schema registries. I feel like there are a number of considerations like registeries you may not be considering here. I think it comes down to, what do we believe should be the default and what should be (potentailly, if implemented) configurable. I feel that "offline first" is the best practice and most secure approach, while the counter argument is that an "online first" approach better matches user expectations. I'm not convinced changing this to meet newer users expectations is preferable over the performance and security consideraitons. |
Beta Was this translation helpful? Give feedback.
-
Some of my own thoughts here, not because I'm against the proposal, but to get feedback on my way of thinking about it:
If the problem is confusing around people using URLs and (rightfully so?) assuming it can be located through the URL, maybe recommend URNs here? That sounds like the correct way of using URIs to me for supporting both cases. |
Beta Was this translation helpful? Give feedback.
-
I’m with @gregsdennis on all this. Here’s a quick summary of the definitions: URI’s are opaque IDs having a specific format. URL’s essentially assign semantics to the scheme part of a URI and the contents. URI’s are a superset of URL’s. Just looking at a URI does not tell you whether it’s supposed to be a URL or URI. It’s up to a specification and context. This means if a specification says to use a “URI”, then it doesn’t matter if “http” is the scheme or if it’s confusing to a user. URI’s are essentially opaque ID’s. That’s it. JSON schema use URI’s which I think are the correct choice. I also disagree with catering to people who “don’t read the documentation.” (Mentioned in one of the comments above.) I also disagree with forcing a specific URI scheme or that users should not use so-called “confusing” URI forms such as “http.” This is an infrastructure-type specification and, in my view, it’s inappropriate to mandate what URI’s look like. That is, user-supplied ones specifically. The specification authors get to decide whatever they like for well-defined ones that aren’t supplied by a user. [I decided to make this a top-level comment instead of a reply because there’s lots of stuff and wanted to share my thoughts.] |
Beta Was this translation helpful? Give feedback.
-
I think adding some guidance about using URIs are often misunderstood because Web browsers use a very specific subset of what they're capable of. But not every misunderstanding like this can be addressed in a specification, adding implementation guidance can only do so much.
The passage in question ("Implementations SHOULD be able to associate arbitrary URIs with an arbitrary schema") is saying there has to be some way to map a URI to a schema. This is required to use $ref to do recursion; the only reason this isn't a MUST is because that would be slightly out of scope; there might be legit reasons for an implementation to do it some different way.
This passage is reenforcing the usage of URIs as an identifier. The only reason this isn't a "MUST" is to support the use of hypermedia and network distribution. If your application is a hypermedia application then use the network, but on the other hand, you shouldn't adopt network usage of JSON Schema if your application doesn't do anything with hypermedia.
The meaning of URIs in network location (i.e. URLs) is already defined in the normative references (RFC 3986), so adding a "SHOULD" would be redundant.
There's only a few situations where this is beneficial, in most cases it could only cause confusion. The point of having the URI in the document is so you can take the document with you anywhere, and it means the same thing. That's easier if the document gives itself a URI, rather than supplying its Base URI out-of-band. Sometimes you do want the base URI to change when you move the document around... that's why $id is optional. It's just not very common.
To rephrase the principle: Unless the application is already a network application, the act of validating JSON with a schema should not depend on a network request, or the network being up at all.
Indeed, picking a URN for "non-information resources" is an under-used technique, we could add some implementation guidance to use a |
Beta Was this translation helpful? Give feedback.
-
I don't have much to add that hasn't already been said elsewhere. But just to "officially note" what I think my opinion is having watched this week's Monday call (which I think isn't on YouTube yet so can't link to it for those who don't have context, though I'm sure will be at some point), will leave this below. I would personally:
TL;DR: my opinion I think is (still?) "everything should stay as-is", though maybe adding another sentence or tweaking the language makes everyone happy. My 0.02 :D |
Beta Was this translation helpful? Give feedback.
-
I think it's time to wrap this up. Going around arguing our points isn't reaching conclusions, so here's where I think everyone is: It seems that the majority is in favor of the status quo. While they may be open to some tweaks to the language, there should be no changes to the overall requirements or recommendations as these recommendations serve a purpose (as outlined in the discussion above). This majority includes myself, @Julian, @awwright, and @karenetheridge (edit: and @ssilverman). In favor of removing some of the recommendations is primarily @jdesrosiers with a little support from @jviotti (the only participant not part of the core team). As such, I propose the following (specific text TBD):
Overall, the only changes are some wording tweaks in the areas of (2) and (3). Again, no policy changes. |
Beta Was this translation helpful? Give feedback.
-
Proposal: json-schema-org/json-schema-spec#1435 |
Beta Was this translation helpful? Give feedback.
-
Also related: json-schema-org/json-schema-spec#1231 |
Beta Was this translation helpful? Give feedback.
-
While discussing what to use for the meta-schema URIs, the topic of deriving meaning from URIs and
$id
arose, namely that URIs, specifically URLs, convey location semantics, and currently those semantics are not honored.Core section 9.1.2 states
and Core section 8.2.1 states
With schemas being able to declare their own identifiers, and implementations being recommended to associate arbitrary URIs with schemas, any URI locating semantics are short-circuited. The semantics of the URI are meaningless.
Over the years, this has led to great confusion amonst JSON Schema users (and implementors to some degree) in that the natural expectation when attempting to resolve a
$ref
to a locatable URI (e.g.https://json-schema.org/some-schema
) is that the resource should be available at that location but often isn't.In an effort to alleviate some of this confusion, a few changes to the spec are proposed:
$id
. We want to encourage that a schema's identifier is the location where it is available: implicit identification over explicit identification.$ref
) be used with location-defining URIs.$id
is supplied, its URI should be a non-locating format (e.g. a URN). (This is still quite useful for bundling and potentially other scenarios.)There are a few implicit impacts of these changes:
$id
becomes somewhat superfluous for most use cases.$id
with location semantics that differs from its location, it could be referenced by either URI.There will be significant changes to how the spec defines and uses URIs, but the hope is that we can make these changes without breaking current functionality. Please feel free to list any areas of text that needs to be changed to align with the above, but I'm less interested in actual text changes at this point. Actual text changes should be made in a PR (eventually). For now, we're looking for confirmation that we're headed in the right direction.
This discussion will inform the meta-schema URI discussion linked above.
Beta Was this translation helpful? Give feedback.
All reactions