-
-
Notifications
You must be signed in to change notification settings - Fork 279
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow "$ref" to take a schema, to make inlining easier? #779
Comments
To be clear, are you suggesting that we support schema objects as a value for Or are you just talking about how references are explained without any functional changes? If so, this is probably a good explanation. |
@gregsdennis Here's a better problem statement:
A related thing that may or may not be considered a problem:
The first/primary problem can be solved by having a Core keyword that is a no-op in-place applicator where you can put a dereferenced schema
If we agree that the second thing (the output changes based on whether it is reference or inline) is also a problem, then we should overload If we believe that it is not a problem, then it doesn't matter. If we believe that it would be a feature, with apps wanting to know the structure, then we should add a keyword such as My feeling is that dereferencing and inlining schemas is an implementation detail, and the consumer of said schemas should not be affected by it. This is based on my experience with tools that collect large numbers of schemas and pack them into a single file before processing them. The author of the schemas is not necessarily even aware that the tooling takes this step. Up until now, it didn't matter- there was no standardized output format for annotations or errors, and none of this changes the validation result. But now it will matter. For a tool like Cloudflare's Doca, the expectation is that the schemas are spread across many files, but they are processed as if they were a single dereferenced file (this is obviously a significant limitation, in that circular references are not allowed, but they've been using it to produce their API documentation for years so it's a real use case). Doca in fact uses json-schema-ref-parser to do this. So if inlining a schema changes the output, doca would start to produce surprising results. It would be just that much harder to correlate errors with input, because the dereference/inline step is hidden from the schema author. |
I do not think anyone would ever write a (and of course, people do weird stuff so someone somewhere will no doubt write it by hand, but I don't think that's a significant factor). |
For implementors like me and @gregsdennis schema inlining is not really on our mind. We must support the generic case and for that inlining obviously doesn't work. I remember some people sending in feature requests to support schema dereferencing, which I always shot down as it's more a feature of a schema processor than a validation library. I can see the problem the new |
@johandorland schema processors are as much our customers as validators / code generators / whatever. That is kind of the point of having modular vocabularies- a scheme processor can just depend on core and ignore everything else. As far as the {
"$ref": "#/$defs/base",
"refSemantics": "baseClass",
"allOf": [
{
"$ref": "#/$defs/commonStuff",
},
{
"$ref": "#/$defs/childSpecificStuff",
}
]
} In this example, the How should this be handled? A generic dereferencer won't know about {
"refSemantics": "baseClass",
"allOf": [
{
"$ref": "#/$defs/commonStuff",
},
{
"$ref": "#/$defs/childSpecificStuff",
},
{ the baseClass schema itself }
]
} which no longer works because the annotation has been separated from the reference. However, if {
"$ref": { the baseClass schema itself },
"refSemantics": "baseClass",
"allOf": [
{
"$ref": "#/$defs/commonStuff",
},
{
"$ref": "#/$defs/childSpecificStuff",
}
]
} this continues to work just fine. This is an interesting example because it actually depends on the ability to have a keyword adjacent to |
All of this sounds like an implementation problem, and not one that JSON Schema needs to deal with. If an implementation processes references by mechanically inlining them as a step, then they have to deal with how to model and report that dereference. I think this is a non-issue for us (as JSON Schema). |
This makes a little bit of sense. Why not just jump to the logical conclusion and also say "A string found where anywhere schema is expected, is a URI Reference to a schema" |
@awwright I feel like that's a lot to ask particularly with extensible vocabularies that may include their own applicators. String-or-schema is more of a challenge in strongly typed languages, and I see some value in If you see a Perhaps more importantly, the annotation-adjacent-to- It's pretty straightforward to define the behavior of an annotation with respect to an adjacent I do see your point with this idea, though. I think a good option would be to try it with |
@gregsdennis Validators are not the only type of implementation. And not all implementations need to support all features, as seen by many popular tools that work with only part of the spec. Including dereferencing and packaging schemas into single files, or single in-memory structures. Dereferencing is a popular standalone feature. Not only is there the library that I mentioned (and several others), but I've noticed it requested as a feature on several validator repositories. It is often rejected on the argument that stand-alone dereferencing should be done as a separate implementation (@Julian has this come up with your implementation? I confess I'm too lazy to go look right now 😴 ) This really has nothing to do with how references are handled internal to a validator. This has to do with supporting implementations of the core spec as a base for other features, and implementing a tooling ecosystem around JSON Schema that includes modular utilities. I am aware of multiple On our Slack workspace, you asserted that our customers are schema authors, not implementors. I disagree, particularly now that we are encouraging the development of further vocabularies, and "implementation" will have a more broad and flexible meaning. Furthermore, schema authors may rely on tooling such as this to bridge the gap between the actual features that they wish to use in their schemas, and the features supported by specific tools. This is precisely what Cloudflare did with Doca. We need to think about a much wider range of uses than simply writing a validation schema and validating instances with it. |
@philsturgeon can probably also speak to the usefulness of utility libraries for things like transforming between OAS Schema Objects and various JSON Schema drafts, writing linters, etc. |
I can see the argument and utility in this. We often do not make changes to the spec where a solution already presents. In this case "Just use @handrews My feeling is your example which includes What's the primary problem we are trying to solve here? Simply that it's cumbersome? If so, my feeling is there's little expectation that people will read these compiled schemas, and therefore preventing a cumbersome look of the schema is not enough of a reason. If however there's a specific gain to implementations that do stuff like documentation generation, I'd like to hear the overview. (I think you have indicated there would be, but not specifically what. Generally my feeling is supporting evidence isn't required for your assertions like this, but I'd like to see and consider also.) We should involve @JamesMessinger at this point, given his implementation's popularity for this purpose. Ultimatly, if such a change wouldn't be supported by My feeling is it would be preferable to my previously axed section which was originally bundled into the "$ref is delegation" change (ugh I don't have the link offhand, please feel free to add it here). Can anyone speak to thier expectation on the cost of this change to validation implementations? @gregsdennis maybe? I'm personally unconcerned with the issue around typed langauges, but I'm also willing to listen. |
Just to throw in a new perspective -- our tooling treats referenced schema as semantically meaningful, in that these are the schema (for the most part) that will be turned into classes in our generated client libraries. So we don't start by dereferencing all the |
General comment on this sort of talk:
Spectral is a validation (well, validation plus linting) tool, which uses a schema processor called json-ref-resolver to figure all this stuff out. We also have a mocking library, which uses a schema processor called json-ref-resolver to figure all this stuff out. Pretty much any tool has the job of "reading JSON Schema from files or URLs and they probably have Catching up on the rest of it. @handrews it would be handy if you could post a refresher on why $ref can have siblings now. I think a lot of the case being made for this relies on that, so it'd be good to get that context. |
Additionally, doing this would not advert the issues around URI shadowing when a subschema sets an $id. (This issue is not the space to discuss and debate URI shadowing.) |
This was mainly because people often expected it to work that way for the purposes of adding specific annotations when generic schemas were used in specific contexts. We also now had the applicability language, which made explaining HOW it works a LOT easier, and made behaviour in line with other applicability keywords. |
Initially it was as @Relequestual says. More importantly, I gave an example above of how code generation functionality will depend on it (the |
Right! So, if someone took a schema written for use with your tooling, and dereferenced it so that those That is the issue that we have right now. The way to dereference a Obviously, people who are writing with your tooling in mind would not do this themselves. It would only happen in some sort of environment where other tools are also involved, and someone combines them in the wrong order or something like that. But there is a key principle here (@Relequestual this is the most concise way to describe the problem so far): Schema semantics should be robust to transclusion. Secondarily, the transclusion process should be a function of the core vocabulary only, because only the core vocabulary is guaranteed to be supported by all implementations. The
I don't know what you mean by double-wrapping. Could you please use my example from earlier and show what a double-wrapped solution would look like? |
Regarding involving @JamesMessinger, I would be thrilled for him to join the discussion. However, I do not think that any one implementation gets veto power over a feature that has valid use cases. I'm pointing to that project because its existence shows the need for transclusion support in general. There are now new use cases around transclusion because of the adjacent annotation possibilities. While I hope JSON Schema Ref Parser (JSRP) chooses to support those, it would also be valid for it to say that it only supports If JSRP decides to go that route (with or without an actual meta-schema involved), that does not invalidate the use case. Transclusion should not break schema semantics. I feel like I have clearly described a key use case around code generation for semantically significant Furthermore, @mkistler has confirmed that references are semantically significant for his tooling. While he and others making such tools could simply say "don't repackage schemas with transclusion", one problem with the |
Perhaps it was stated by other comments here, but in re:
Part of the reason is that it makes |
This is a legit point. But @mkistler, does the tooling you mention restrict itself to a subset/profile of JSON Schema? Perhaps the form of |
@ucarion There is currently a Very Big Discussion going on in the OpenAPI steering committee about JSON Schema profiles using the forthcoming draft's vocabulary features (@mkistler's tooling is OpenAPI-related). But the point I was making concerned the scenario where someone who understands the tooling requirement writes something using In the current set-up, there is no way to tell that the schema was dereferenced before sending it to the tool, because the Under the proposal in this issue, the |
@ucarion Re:
We do currently only handle a subset of JSON Schema. Elsewhere we are discussing how to formalize that concept, as I think it will be a common case for tooling developers. |
Yup. Pretty much why we spent so much of the last year on vocabularies and the keyword taxonomy that lets us reason about keyword behavior more generally. |
@mkistler I failed to notice you work on OpenAPI. Thanks @handrews for cluing me in. @handrews, I'm not entirely sure I see how the tooling in question would work. @mkistler's requirement, presumably, is that there is a string associated with the
But in the case of a |
Concretely, a schema that starts like this: {
"definitions": {
"user": {
"properties": {
"id": { "type": "string" },
"display_name": { "type": "string" }
}
},
"photo": {
"properties": {
"id": { "type": "string" },
"display_name": { "type": "string" }
}
}
},
"properties": {
"me": { "$ref": "#/definitions/user" },
"favorite_photo": { "$ref": "#/definitions/photo" }
}
} Would be converted into: {
"definitions": {
"user": {
"properties": {
"id": {
"type": "string"
},
"display_name": {
"type": "string"
}
}
},
"photo": {
"properties": {
"id": {
"type": "string"
},
"display_name": {
"type": "string"
}
}
}
},
"properties": {
"me": {
"$ref": {
"properties": {
"id": {
"type": "string"
},
"display_name": {
"type": "string"
}
}
}
},
"favorite_photo": {
"$ref": {
"properties": {
"id": {
"type": "string"
},
"display_name": {
"type": "string"
}
}
}
}
}
} I'm not sure how you can usefully codegen off of the second case? |
@ucarion thanks for the example, that is a good question. There are really two sub-cases of the "remove references" use case.
Your example is related to the first sub-case, which is most likely to happen as some sort of internal processing step. This is what Cloudflare's Doca documentation generator does. It's also the sort of internal implementation detail that @gregsdennis points out is not all that relevant to schema authors. My argument against that was that there are use cases that are relevant to schema authors, and by that I meant the second sub-case. The second sub-case is more broadly useful because managing and distributing a big tree of files can be a pain. Re-packaging the files into a single file helps with that, and in addition to JSRP you can find tools, including OpenAPI-specific tools that handle this exact use case. In in the second use case, you will have And technically, even if a separate file does not have an So the more interesting case here is the 2nd use case: bundling multiple files into a single file, without necessarily dereferencing internal references. I hadn't really thought of this but it is a very important distinction and explains the difference in opinion between @gregsdennis and myself. Removing internal references is kind of a weird thing- there are use cases, but they mostly involve implementations that are limited in some way and do the internal de-referencing as a workaround. The bundling use case is far more significant (and has gotten discussed in various ways repeatedly over the years in numerous issues in this repository). I think the bundling use case, because it would involve |
I still maintain that this is a detail of one or more implementations that don't conform to the spec and that this change is trying to accommodate these implementations. I think it's a bad move that opens the door to other changes that are initiated by implementors who don't want to do the work to conform to an established (almost, but not-quite) standard. Will existing implementations (that do schema referencing correctly) be expected to now accept a schema (as well as a pointer) as a valid value of a If file bundling is really the problem that you're trying to solve, that's a transport/storage issue, and not one that JSON Schema needs to concern itself with. Implementations that manage references need to do so in accordance with the spec. If they want to "bundle"/dereference the files into a single file as part of their process, then that's an implementation detail and the implementor's perogative. If they require the schema author to do so (whether manually or through some secondary tooling), then they need to clearly specify that in their documentation as a deviation from the spec (e.g. "External references are not processed by this validator..."). Additionally, my implementation operates in a similar way to @mkistler's in that it tracks each schema as a distinct object, each with it's own document path. The references link two documents. |
I find this confusing. You have decided that transclusion is not part of the spec, which goes against various discussions where we have talked about specifying the transclusion process, particularly the As far as I am concerned, based on wording in past drafts and discussions in issues during the past year, transclusion is part of the spec. Stop trying to invalidate other people's usages. Bundling is NOT an implementation detail. It is something that schema authors/distributors care about. It is something that I have used in my professional use of JSON Schema. You don't get to dismiss it because you don't personally have a use for it. |
This has already gotten too long and confusing. I am going to re-file and consolidate the various objections and arguments. |
+1 for @awwright's idea about allowing a url anywhere a schema is allowed. If you do that, there is no need for The argument that it could be problematic having schemas be represented by more than one type is valid, but that ship has already sailed. A schema can already be represented as either an object or a boolean, so why not a string as well. This also means that the semantics of references don't need to change. You don't need to define behavior with respect to Here's an example of what it would look like if {
"type": "object",
"properties": {
"foo": "#/definitions/foo",
"bar": {
"title": "Bar",
"allOf": ["#/definitions/bar"]
}
},
"definitions": {
"foo": { "type": "string" },
"bar": { "type": "number" }
}
} |
@jdesrosiers the only issue with that is @handrews's point that it's still important to know where a dereference occurred, especially with external references. Hence the " |
I'm pulling together a more comprehensive proposal. This thread has shown where the vision is missing or poorly articulated, and I need to spend some time laying it out correctly. |
I have included a CREF discussing the problem I'm trying to solve in this issue in #780. If I can't get buy-in, we can ship draft-08 with that CREF rather than fix this. I think it will cause problems, but it's apparently going to take a lot to convince anyone else that this is a problem and I think I'm about at the "give up" stage. And really, when in doubt, get user feedback 😛 |
As a reference for others viewing this, I find @handrews previous comment, quite a compelling argument:
|
(also for my own reference): I'll have to find and comment on the new associated issue or PR, but I want to hear more about the potenital cost to implementers. If it's minimal, it could be good. My inclination is we SHOULD gather more feedback on such a change and not push for draft-8. |
I realized that this issue is more about how the changes to Before, the following two were equivalent.
But, with the changes to
Given the new definition of
|
@jdesrosiers exactly. And (to address a question likely to be asked) the reason to follow through on this change to make inlining make sense again rather than roll back the change, is that allowing annotations in the same schema object as Preserving |
I am sceptical about inlining dereferenced documents directly as the value of Where I work we use Business data types are subject to reuse and thus being put into a separate schema. They are being referenced from the actual model schemas. One model can have multiple properties referencing the same business data type (see country example below). Some business data types are enumerations with over a hundreds of possible values, e.g. there are ~192 (?) countries on earth (and many more cultures). And this is just one business data type among many quite large enumerations. Due to issues with OpenAPI tooling of handling If OpenAPI had better support for referencing primitive types, then our first attempt to reduce file size would have been to inline business type definitions just once into the I wonder if there is anything I miss that no one proposes a keyword like entities.schema.json {
"$schema": "...",
"type": "object",
"description": "Customer",
"properties": {
"iban": { "$ref": "./business-data-types.schema.json#/definitions/iban" },
"country_of_birth": {"$ref": "./business-data-types.schema.json#/definitions/country"},
"residence": { "$ref": "#/definitions/Residence" },
"...many more": "..."
},
"definitions": {
"Residence": {
"street": { "$ref": "./business-data-types.schema.json#/definitions/streetname"},
"country": {"$ref": "./business-data-types.schema.json#/definitions/country"},
"...many more": "..."
}
}
} business-data-type.schema.json {
"$schema": "...",
"description": "Business Data Type Definitions",
"definitions": {
"iban": {
"type": "string",
"regex": "..."
},
"country": {
"type": "string",
"oneOf": [
{ "enum": ["en_US"], "title": "USA"},
{ "enum": ["en_GB"], "title": "Great Britain"}
{ "enum": ["de_DE"], "title": "Germany"},
{ "enum": ["de_AT"], "title": "Austria"},
{ "enum": ["de_CH"], "title": "Switzerland"},
{ "...hundreds more": "..." }
]
}
}
} Do you really want this? (just dereferenced country for this example) {
"$schema": "...",
"type": "object",
"description": "Customer",
"properties": {
"iban": { "$ref": "./business-data-types.schema.json#/definitions/iban" },
"country_of_birth": {
"$ref": {
"type": "string",
"oneOf": [
{ "enum": ["en_US"], "title": "USA"},
{ "enum": ["en_GB"], "title": "Great Britain"}
{ "enum": ["de_DE"], "title": "Germany"},
{ "enum": ["de_AT"], "title": "Austria"},
{ "enum": ["de_CH"], "title": "Switzerland"},
{ "...hundreds more": "..." }
]
}
},
"residence": { "$ref": "#/definitions/Residence" },
"...many more": "..."
},
"definitions": {
"Residence": {
"street": { "$ref": "./business-data-types.schema.json#/definitions/streetname"},
"country": {
"$ref": {
"type": "string",
"oneOf": [
{ "enum": ["en_US"], "title": "USA"},
{ "enum": ["en_GB"], "title": "Great Britain"}
{ "enum": ["de_DE"], "title": "Germany"},
{ "enum": ["de_AT"], "title": "Austria"},
{ "enum": ["de_CH"], "title": "Switzerland"},
{ "...hundreds more": "..." }
]
}
},
"...many more": "..."
}
}
} |
@about-code please take a look at #788 which I just posted last night. I think it will allay your fears, as I think I found a way to make clear that we are not trying to encourage what you are worried about. Rather, there is a specific type of multi-file vs single-file transformation that is worth supporting. BTW I am still thinking on this and might come up with yet another option that removes the need for a |
@about-code OK I was able to remove the need for a more complex |
[EDIT: See better problem statement two comments below]
Now that
$ref
can have adjacent keywords, dereferencing/inlining is more complex. (for the purpose of this issue, we're just talking about situations where inlining is possible, e.g. no cyclic references- inlining such use cases is common, with numerous libraries dedicated to this operation).In informal discussions, we've recommended replacing
$ref
with anallOf
containing just the referenced schema, OR if there is already an adjacentallOf
, appending the referenced schema to thatallOf
. This is rather cumbersome.At the same time, we use runtime JSON Pointer-ish constructs that look like
/properties/foo/$ref/properties/bar/$ref
, etc., to record the runtime path as we traverse references.What if we allowed replacing the
$ref
URI with the target schema? e.g. if{"$ref": "A"}
points to{A}
, then it can be replaced with{"$ref": {A}}
$ref
here is effectively a no-op, it just allows inlining the target without having to re-arrange the context.Pros:
$ref
inlining, so now is the time to sort this outCons:
The text was updated successfully, but these errors were encountered: