-
-
Notifications
You must be signed in to change notification settings - Fork 266
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use JSON Pointers instead of 'properties' or 'prefixItems' keywords #1489
Comments
This is an interesting idea, and it's not the first time we've seen something like it. However, it breaks a fundamental operating model of JSON Schema: constraints are expressed using keywords. In your example, This also breaks (or at least makes more difficult) other use cases for JSON Schema, like form and code generation. Currently Lastly, keywords can technically be any string. To support this, we'd need to restrict keywords to non-pointers since they'd have special meaning. Again, not a deal-breaker, but worth noting. |
I agree with you : constraints are expressed using keywords.
The difference is subtle:
I don't know about this topic (can you explain it ?)
Yes, it is the same point as below.
Yes we'd need to restrict keywords to prohibit that the first character being '/' but i'm not convinced that having such a keyword will be considered. To conclude, I understand that the main obstacle is justifying the change.
I don't know JSON Schema's strategy regarding pointers but it would also be interesting to position this proposal in relation to this strategy. |
Validation succeeds if, for each name that appears in both the instance and as a name within this keyword's value, the child instance for that name successfully validates against the corresponding schema. - Core 10.3.2.1
While the JSON Schema specification is written targeting validation and annotation use cases, people also use it as a sort of data definition (which it isn't, really) in order to generate data entry forms or even generate code (e.g. creating models from schemas found in OpenAPI documents). Moving to support pointers as keys may break these use cases. I'm sure that many of these kinds of users will adjust and find a way to still support their uses, but in the short term, it will break.
User-built vocabularies can define whatever keywords they want. However, we are looking at reserving
I'm not sure what this is solving. Can you elaborate?
These were the previous arguments used. Another thing to consider is pointer ambiguity. The JSON Pointer {
"foo": [
{
"bar": 42
}
]
} and {
"foo": {
"1": {
{
"bar"
}
}
}
} This is one place where the form and code generation can break down. There's no information as to whether I'm not shutting this down. I'm stating the difficulties we've had with this approach before. TBH, I think this could probably be implemented as a new vocab. There's no requirement of vocabs to define discrete keywords, so technically a vocab could define this as a family of keywords. I'd like to see how this would affect some "in the wild" schemas. For example, how would one of the meta-schemas be changed? Also, what guidance would you give for when to use pointers (implicit structure) vs |
I think it is not realistic to abandon the
For example, if you have a code composed of ten numbers where the last is equal to 999 (.e.g [10, 25, 574, 65, 89, 5, 8, 56, 8, 999]), the schema could be : { "type": "array",
"items": {"type": "integer"},
"/9": {"const": 999}} "/0": { "type": "number" },
I agree with this pointer ambiguity but i think it is not a problem if in the schema you specify the type of instance: { "type": "array",
"/0": {"const": 42}}
{ "type": "object",
"/1": {"const": "bar"}}
I don't know ! I think it is necessary to have feedback from other users to better identify the benefit of this approach. |
I'm not in favor of this, really, it seems to solve a problem that is adequately handled by existing keywords. However, I am not deeply attached to that position, and maybe I would come to love the convenience of this approach (I've considered implementing similar for my own purposes before). My thoughts/suggestions, supposing something like this were to go in: Given this is a pointer, and per "This principle can also be extended to pointers of rank greater than 1":
And, I know this is half the point of your suggestion, but the idea of having pointers as properties of JSON Schemas I think is unlikely for actually getting merged (again, just my opinion, which has no bearing on any actual outcome). It's possible, since the spec doesn't define any existing keywords that start with Given the above, I envision something like this (adapted/cut down from your example): {
"$schema": "https://json-schema.org/spec-with-descendents",
"type": "object",
"pointerDescendents": {
"/name": {
"type": "string"
},
"/address": {
"type": "object"
},
"/address/street": {
"type": "string"
}
},
"requiredPointerDescendents": [
"/name",
"/name/address"
]
} It's a major addition, more than just the keywords: existing keywords all apply in-place to the instance itself or to a child of the instance, so introducing ones that apply anywhere below the instance is a whole new domain. For that reason I avoided the keyword There is more to work out but I'll stop there because, as I started with, I'm not really of the opinion that this is a good thing to add (even with what I believe to be improvements suggested above). The complexity of applying to any descendent would be a significant challenge to implement in my own tooling, and I don't see enough benefit from being a bit less verbose to introduce such complexity. |
Thank you @notEthan for this comment. Here are my remarks:
Maybe we can have a multi-step strategy :
{
"type": "array",
"pointerChild": {
"/5": {
"const": 42
}
}
} ( or with another keyword, or without keyword)
|
I personally prefer the new keyword option. I'd like to propose Considering what this looks like in a subschema, we get {
"type": "array",
"items": {
"locationSchemas": {
"/foo/bar": { "const": 42 }
}
}
} This case would need to be implemented like this since JSON Pointer doesn't have a wildcard segment, which you'd need to indicate "all items". So it becomes apparent that having this work in a subschema is a necessity. However, it should also be apparent (although we'll probably have to explicitly state it in the spec) that these pointers are relative to the instance location at this point in the evaluation, even though they're not Relative JSON Pointers. Further, this should probably work like To that end, we'd need a new keyword to indicate specific locations are required, {
"type": "array",
"items": {
"locationSchemas": {
"/foo/bar": { "const": 42 }
},
"requiredLocations": [
"/foo/bar"
]
}
} A fallout of {
"type": "array",
"items": {
"type": "object",
"properties": {
"foo": {
"type": "object",
"properties": {
"bar": { "const": 42 }
},
"required": [ "bar" ]
}
},
"required": [
"foo"
]
}
} I don't think the former is any less readable, and I think it should be reasonably easy to implement this. |
How would |
If it were implemented as an external vocab, I expect that it wouldn't interact with them. If we implemented it as part of the Core spec (which currently defines all applicators), we'd have the option to try and figure that out. At best, I expect it could define properties and items for each segment in the pointer. This is all stream of consciousness... The part that makes this keyword moot, though, is that you couldn't define Using the previous example, if the {
"type": "array",
"items": {
"locationSchemas": {
"/foo": { "additionalProperties": false },
"/foo/bar": { "const": 42 }
},
"requiredLocations": [
"/foo/bar"
]
}
} The problem here is that the So you try it outside of {
"type": "array",
"items": {
"locationSchemas": {
"/foo/bar": { "const": 42 }
},
"requiredLocations": [
"/foo/bar"
],
"properties": {
"foo": { "additionalProperties": false }
}
}
} but you have the same problem. You still need to define {
"type": "array",
"items": {
"locationSchemas": {
"/foo/bar": { "const": 42 }
},
"requiredLocations": [
"/foo/bar"
],
"properties": {
"foo": {
"properties": { "bar": true },
"additionalProperties": false
}
}
}
} which... why are we using The other option is that {
"type": "array",
"items": {
"locationSchemas": {
"/foo": { "additionalProperties": false },
"/foo/bar": { "const": 42 }
},
"requiredLocations": [
"/foo/bar"
],
"additionalLocations": false
}
} This would disallow (or provide a schema for) any locations not specified by the pointers in an adjacent These keywords, together, could be implemented in a separate vocab, too. |
We need to consider JSON pointers in two cases:
single-level:
Multi-level: The multi-level approach is more comprehensive if we consider an instance as a tree of JSON pointers: Example (from “Getting Started”): {
"order": {
"orderId": "ORD123",
"items": [
{
"name": "Product A",
"price": 50
},
{
"name": "Product B",
"price": 30
}
]
}
} This example is equivalent to the JSON pointer tree below: {" ": ["/order"],
"/order": ["/order/orderId", "/order/items"],
"/order/orderId": "ORD123",
"/order/items": ["/order/items/0", "/order/items/1"],
"/order/items/0": ["/order/items/0/name", "/order/items/0/price"],
"/order/items/0/name": "Product A",
"/order/items/0/price": 50,
"/order/items/1": ["/order/items/1/name", "/order/items/1/price"],
"/order/items/1/name": "Product B",
"/order/items/1/price": 30
} In this representation, an instance is an object where the keys are JSON pointers to subschemas and the values are the contents of the subschemas (an array of child JSON pointers (nodes) or a string/number/boolean/null ( leaves)). The type of subschemas is deduced from the values (array if the child JSON pointers end with a number, object otherwise) With this dual representation, the use of the JSON pointer is explicit. Returning to @gregsdennis example, I have a few comments:
{
"type": "array",
"items": {
"locationSchemas": {
"/foo/bar": { "const": 42 }
}
}
}
{
"type": "array",
"locationSchemas": {
"1/foo": { "additionalProperties": false },
"1/foo/bar": { "const": 42 }
},
"requiredLocations": [ "1/foo/bar" ]
}
Summary To conclude, my opinion is the following:
|
Output is going to be gross. Going back to the simple example: {
"$id": "http://example.com/schema",
"type": "array",
"items": {
"locationSchemas": {
"/foo/bar": { "const": 42 }
}
}
} Output contains a couple properties which include JSON Pointers indicating properties in the schema: However, now, since some of the segments are themselves JSON Pointers, those pointers need to be encoded before appending to the evaluation path. So the output unit resulting from evaluating {
"valid": true,
"schemaLocation": "https://example.com/schema#/items/locationSchemas/~1foo~1bar",
"evaluationPath": "/items/locationSchemas/~1foo~1bar",
"instanceLocation": "/1/foo/bar"
} The |
This last point is indicative of what I think is a larger problem (and at the root of the challenge for code generation - and some classes of optimization). Another quite fundamental but as yet unspoken feature of JSON Schema is its locality. It is not possible to "reach out" into other parts of the document. This might just be "implicit philosophy" rather than deliberate intent but it makes it a lot simpler to reason about and avoids most of the ordering problems you get with other constraints languages. As to code gen - I would need to create "pseudo-schema" types at the target locations to inject those properties, and I fear it would quickly become a mess. Especially when dynamic references are in play - I would need to walk the dynamic scope to find out if anyone was injecting properties anywhere. |
I agree that a multi-level approach is not a continuation of the current single-level approach and that a more global reflection is necessary if we want to generalize json-pointers. This is why it seems simpler to me to initially only deal with single level json-pointers (meets the need for access to an element of an array). |
If it's not targeting arbitrary depth, why use pointers? Just indicating the array index seems much simpler, and would look and function very similar to {
"type": "array",
"indexItems": {
"0": {
"title": "the first element"
},
"4": {
"title": "the fifth element"
}
}
} Reading through the considerations people explore above, my opinion that targeting arbitrary-depth descendents with pointers is more problematic than beneficial is stronger than when I first came to this issue. |
However, single-level pointers don't really get you much. You have a bunch of It also still doesn't solve the output problem of having to include pointers inside of another pointer. |
'properties' keyword is defined in chapter 10.3.2.1 of the core specification:
Validation succeeds if, for each name that appears in both the instance and as a name within this keyword's value, the child instance for that name successfully validates against the corresponding schema.
The keyword 'properties' is therefore used to identify child instances to associate them with a subschema. This association is made by matching names.
However, there is a dedicated way to identify a child instance: the JSON Pointer.
We can therefore make this association in a simpler way by indicating in the Schema only the JSON Pointer of the corresponding child instance.
This way of identifying a child instance is more understandable because it clearly separates a name in an instance and a pointer in a schema.
Example:
Example with JSON Pointer:
Furthermore, the use of the json-pointer is not limited to json-objects and can be generalized to json-arrays, which is an alternative to 'prefixItems'. It also allows you to associate a subschema only with the targeted json element.
Example:
Example with JSON Pointer:
Note 1: This principle can also be extended to pointers of rank greater than 1
Note 2 : The use of JSON pointer and the keyword 'properties' are compatible if we accept for the keyword 'required' indifferently name or JSON pointer.
Note 3: The Python functions below also show the conversion between a schema with keyword properties and one without.
The text was updated successfully, but these errors were encountered: