-
-
Notifications
You must be signed in to change notification settings - Fork 273
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clarify the handling of "contentSchema" #1288
Comments
Why wouldn't you treat it like a normal schema? I don't see why being an annotation or not getting evaluated means it can't be treated as a schema. The problems you bring up only apply if you don't treat it as a schema. It seems much simpler to just treat it like a schema and everything just works like normal. |
@jdesrosiers |
The spec only says that the keyword contains a schema. Without any other guidance than that, I'd expect it to be treated like any other schema. But, I hold the intention of the author very highly, so I accept your interpretation since you were the author. However, we are the spec authors and we can change it if we want, so why not change it to treat it like a normal schema rather than document all the ways it's weird because it's not a normal schema? |
What would it even mean to treat it like a normal schema? How would that solve the problem at hand, which is using it as an annotation? Can you explain to me what "treat it like a normal schema" would accomplish other than allowing it to be used as a |
I realize the above may be phrased in a more demanding way than I really intended. I am honestly curious as to whether you see something I am missing. When we added |
In this case, the keyword value is not useful as the annotation value for this keyword. What's needed is the location of the keyword in the schema. The location references a schema that can be used like any other to validate the content. Here's an example using an annotation utility library I've been working on but haven't published yet. This currently wouldn't work, but would if Schema {
"$id": "https://example.com/schema",
"$schema": "https://json-schema.org/draft/2020-12/schema",
"type": "object",
"properties": {
"json": {
"type": "string",
"contentMediaType": "application/json",
"contentSchema": {
"type": "object",
"properties": {
"foo": { "type": "number" }
},
"required": "foo"
}
}
}
} Valid Instance { "json": "{ \"foo\": 42 }" } Invalid Instance { "json": "{ \"foo\": true }" } Implementation const schema = await JsonSchema.get("https://example.com/schema");
const instance = await JsonSchema.annotate(schema, { "json": "{\"foo\": 42}" });
for (const contentInstance of AnnotatedInstance.annotatedWith(instance, "contentMediaType")) {
const [mediaType] = AnnotatedInstance.annotation(contentInstance, "contentMediaType");
if (mediaType === "application/json") {
const value = JSON.parse(AnnotatedInstance.value(contentInstance));
const [contentSchemaUri] = AnnotatedInstance.annotation(contentInstance, "contentSchema"); // https://example.com/schema#/properties/json/contentSchema
if (contentSchemaUri) {
const contentSchema = await JsonSchema.get(contentSchemaUri);
const result = await JsonSchema.validate(contentSchema, value, JsonSchema.BASIC);
if (!result.valid) {
throw new ValidationError(result);
}
}
}
} Hopefully the API here is intuitive enough to understand without documentation and hopefully it's helps show how this could be implemented.
I wasn't offended, but I appreciate the thought. I didn't answer right away because I was working on something else I wasn't willing to tear my attention away from long enough for a sufficient response. |
The problem is that this does not handle the case where the schema containing Annotations in general need to be usable without access to the original schema. The annotation information is more than just the value, as mandated by §7.7.1 (quoted here with formatting tweaks for readability and [notes] for recent terminology changes):
This is enough information to determine which annotations go where, and how they relate to each other (using the evaluation path). But it's not the same as having the original schema, and it's not intended to be. The only piece truly missing here is the dialect information (if the |
I see your point, but I can't imagine a case where someone would have access to the annotations but not the schema. Just because I can't image it doesn't mean it doesn't exist, but if there is some real(-ish) example of when this could possibility be the case, I'd feel a lot better about accepting your point. The example you gave about highly regulated industries is too abstract to be convincing. Assuming you can't have access to the schema, it could also be solved by having the annotation value be a tuple of the location and the schema resource the location is part of. This would keep things self-contained and allow the schema to work as expected. I'd just really like to avoid the weird special cases you identified in this issue if there's any reasonable way to avoid it. Although having an application-facing annotation that is not just the value of the keyword feels like a special case as well since there are no other application-facing that behave like that. |
There's one buried somewhere in the
So you want to return the entire containing schema resource? Why is this simpler than just adding the dialect URI to the list in §7.7.1? That's all we need to do. I don't understand the objection. If the All we need to solve the problem is the dialect URI. Why make it more complex? Particularly when needing the dialect URI for errors has come up before. It's not a solution that is specific to |
Consider this example where a value can accept an object with a given schema or a JSON string with the same schema. As horrible as this is, it's a real-life thing I've seen before. {
"type": "object",
"properties": {
"value": {
"anyOf": [
{ "$ref": "#/defs/value" },
{
"type": "string",
"contentMediaType": "application/json",
"contentSchema": { "$ref": "#/$defs/value" }
}
]
}
},
"$defs": {
"value": {
"type": "object",
"properties": {...}
}
}
} If
I'm still not convinced that anything other than a schema identifier is need, but if you need a schema, then yes I would expect it to include the entire containing schema resource. In most cases, I would expect that the schema should have an
That (or something equivalent) definitely needs to be done for other reasons, but that doesn't effect the simplicity/complexity of the trade-offs in my mind. The issue is treating |
@jdesrosiers I definitely see the points you're making, and I agree that having the complete containing schema resource (and any information needed to fully resolve its base URI if it does not have an absolute-URI for If I understand you correctly, you have offered two solutions
The first one, if you mean using it in-place as a schema, is not an option. The second one involves significant changes to how annotations work that I do not think are worth the cost. Treating it "like a schema"
I'm not sure what you mean here. The So if it's surprising to them, it means that they didn't read the spec or the test suite. Perhaps I'm reading too much into grammar here, but I want to make sure we're clear on something here so we're debating the right thing: That's our baseline for this discussion. We're just talking about how to make it usable. Returning the whole schema resource as the annotation valueThis would be a backwards-incompatible change that makes
That wouldn't be too hard to do , but more importantly it would violate the architectural principle that annotation values come from the location described by the annotation data ( On the one hand, I do want Looking at the original proposal again
Right, which is why I proposed:
Since the annotation location information contains the schema location, which identifies the containing schema resource, a caller that does have access to the containing schema can trivially get that resource anyway, and do whatever they need. And as noted above, a caller that does not have access could point to the spec and explain why the schema owner needs to add a
Which covers what I asked for in my original point 3. So, I would still like to go with my original proposal. What you are suggesting adds complexity to annotation usage by setting a new precedent for what kind of data can be returned. We would then have to expand annotation data to include location information for both the keyword (as we do now) and the source of the data (since it might not be the same, and if others follow this precedent their data may not be a schema resource and therefore wouldn't have an I definitely prefer documenting proper usage over a significant architectural change to annotations. |
I'm not going to respond to most of that, because I think what I'll say next makes most of it irrelevant. Briefly, no, you haven't correctly understood the solution I was suggesting and, yes, I know the
I'm not concerned about backward incompatible changes in this case. It wouldn't surprise me if you were the one person who has has ever tried to do something with this feature. I'm much more concerned about getting it right than maintaining backward compatibility. That's not to say it necessarily needs to change to be "right". I'm just saying we shouldn't limit our options as I'd consider this feature still in it's early stages where it's still reasonable to make changes.
If I understand you correctly, I think this is a reasonable compromise. The schema would be treated as a normal schema within the schema resource, but the annotation would just be the value of the keyword. In cases where you have access to the original schema,
I agree that not setting an
If you drop part 1 (about it not being a schema) and change the way part 2 (should have |
I have never added a keyword purely for my own use case. Please do not use such an assumption to dismiss anything about a keyword or about my position regarding it.
I realize you are not necessarily saying that the change is necessary to "get it right", but please try to avoid implying that I am prioritizing other concerns over "getting it right." I mentioned backwards incompatibility because it was not clear to me that a.) you were actually proposing that, or b.) if you were, that you realized it was a change (because it seemed to me that you had a different impression of how the keyword works).
If you can de-reference a URI to get a thing, by definition that URI is the retrieval URI. That's just what the term "retrieval URI" means, and you can't re-define it away. So any URI with a JSON Pointer fragment is the retrieval URI for that location in the JSON document, again by definition.
The only reason there would be "weird behavior" is if someone fed the non- On the other hand, if you fed it right back into an implementation that had the original resource, and recognized that that retrieval URI was for a location within that resource, it would work perfectly well. It would recognize that any fragments are relative to that original resource, not to the decontextualized schema you sent in. However, all of that requires a lot more understanding of URIs than most folks have, despite none of it being JSON Schema-specific aside from
As I suggested doing earlier, I have filed a separate issue (#1307) to discuss that separately.
That wasn't proposed wording for the spec, so this is a no-op. We can debate the wording when there's a PR, but I have no intention of diving into URI minutia to get the point across.
That would create a new case for So we're back to where we started with point 2. I've filed #1307 for point 1. I can't tell if #1065 takes care of point 3 or if I need to file a new issue since #1065 seems focused on keyword-level identification and we only need dialect-level for this — I don't want this to get hung up on associating keywords and dialects which is a larger topic. |
I find it very frustrating that this is how you interpreted my words. Clearly passions are running too high right now for this to be a productive conversation, so I'm going to leave this alone for a week or two to let us both cool off before we try again. |
I don't think this is the case in the current published spec - this issue is framed as a clarification, but it seems to me like a major change in how I think it is a good change to make. Treating I agree it should just be schema-shaped data, not a subschema. This does have other implications, though. The metaschema currently describes That affects validation: if the metaschema doesn't describe it as a schema, its content won't be validated as a schema if you validate against the metaschema. I think that is fine. It can be validated as a schema if/when its data are interpreted as a schema. As just data, it should become a schema only considered as its own detached document, with the rules that apply to root schemas. This is a bit different than resource subschemas (subschemas with an $id).
|
yeah, I had started commenting here and then saw the new one after. |
As documented in #1307, I have come to agree that the As for other ideas in this issue, I would like to make the minimal clarification at this time, which would be to simply note that the Others are free to take any idea from this issue, such as adding SHOULD or MUST requirements around |
contentSchema
is an annotation. It is never applied as part of normal evaluation (although see #1287), and shouldn't be processed at schema load time like a subschema of an applicator or location ($defs
) would be. In other words, you shouldn't be able to target it with a$ref
even though it looks like a schema. It's just data.$ref
$id
then your schema-from-data will have that retrieval URI as its base URI, which means after the fragment is chopped off it is likely to have URI collisions with the parent schema. If you then try to feed it right back into the implementation to use in the same session, this will probably blow up in your face. We probably don't need to forbid it, but we should be very clear about why it is a Bad Idea™.$schema
, you might not have any idea how to process it. In a large ecosystem, this might come from a different schema resource than you started from. In a sufficiently complex system, the code asking for the validation (possibly over a network API) may not have direct access to that schema resource in order to check its$schema
. We could either warn about this or we could include the dialect URI in the annotation data. Note that we need the dialect and not the specific vocabulary of thecontentSchema
keyword, because even if it matches the version of the core vocabulary, that doesn't tell us what other vocabularies are assumed. Presumably if we added dialect info to the annotation info, we'd also need to add it to the output format somewhere (which probably deserves its own issue).The text was updated successfully, but these errors were encountered: