-
-
Notifications
You must be signed in to change notification settings - Fork 282
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hierarchical output structure needs clarification around which nodes are kept #1319
Comments
@gregsdennis I've always been a little unclear about how
The spec mandates that annotations for failed validation scopes are deleted, so I don't understand what's happening here. On the other hand, I can see the value of such annotations being preserved for debugging, but technically doing so would require modifying the spec (unless those annotations are re-categorized as errors somehow). |
I think its primary use is debugging. It can be helpful knowing why, e.g., But I think that use case is probably less common, and we should default to a more common use case, which is people want what affects final outcome. |
yes, I think most people do output them, if they support it at all. And I definitely see the debugging use. Are you including annotations from failed subschemas, though? That is the part that I'm a bit confused about. Because the spec requires deleting them, but I can see them being useful for debugging as well so I wasn't sure what happened with |
I output them where they were produced, yes, but it should be apparent from the validation results that they don't "flow down" (or up, depending on your tree orientation) or factor into the outcome. |
I don't really know what this means. Does this mean that you save them for output but not for use in parent schemas? Does this mean that sometimes there are both annotations and errors in the output unit? |
Yeah, it's really confusing and definitely needs clarification. The thing is, I'm not sure what it should be, so this discussion is useful. (This will use the new output structures because they're simpler.) The way I currently have my implementation set up, I only output annotations on nodes that pass, including for Example: // schema
{
"anyOf": [
{
"title": "bar is required, foo is specified",
"type": "object",
"properties": {
"foo": {
"title": "found foo"
}
},
"required": [
"bar"
]
},
{
"title": "foo is required, bar is specified",
"type": "object",
"properties": {
"bar": {
"title": "found bar"
}
},
"required": [
"foo"
]
}
]
}
// instance
{ "foo": "value" }
// output with nested passing nodes (my current setup)
{
"valid": true,
"evaluationPath": "",
"schemaLocation": "https://json-everything/base#",
"instanceLocation": "",
"nested": [
{
"valid": false,
"evaluationPath": "/anyOf/0",
"schemaLocation": "https://json-everything/base#/anyOf/0",
"instanceLocation": "",
"errors": {
"required": "Required properties [\"bar\"] were not present"
},
"nested": [
{
"valid": true,
"evaluationPath": "/anyOf/0/properties/foo",
"schemaLocation": "https://json-everything/base#/anyOf/0/properties/foo",
"instanceLocation": "/foo",
"annotations": {
"title": "found foo"
}
}
]
},
{
"valid": true,
"evaluationPath": "/anyOf/1",
"schemaLocation": "https://json-everything/base#/anyOf/1",
"instanceLocation": "",
"annotations": {
"title": "foo is required, bar is specified",
"properties": []
}
}
]
}
// output with all annotations, including those in failed nodes
{
"valid": true,
"evaluationPath": "",
"schemaLocation": "https://json-everything/base#",
"instanceLocation": "",
"nested": [
{
"valid": false,
"evaluationPath": "/anyOf/0",
"schemaLocation": "https://json-everything/base#/anyOf/0",
"instanceLocation": "",
"annotations": {
"title": "bar is required, foo is specified",
"properties": [
"foo"
]
},
"errors": {
"required": "Required properties [\"bar\"] were not present"
},
"nested": [
{
"valid": true,
"evaluationPath": "/anyOf/0/properties/foo",
"schemaLocation": "https://json-everything/base#/anyOf/0/properties/foo",
"instanceLocation": "/foo",
"annotations": {
"title": "found foo"
}
}
]
},
{
"valid": true,
"evaluationPath": "/anyOf/1",
"schemaLocation": "https://json-everything/base#/anyOf/1",
"instanceLocation": "",
"annotations": {
"title": "foo is required, bar is specified",
"properties": []
}
}
]
} I think for debugging purposes, showing all of the annotations (the second output example) is arguably more useful, but it's also really confusing if you don't understand how annotations work or how they're used internally. I'm not really sure that hiding annotations the annotations at failed nodes (the first output example) is terribly useful for debugging, which kinda makes me wonder if even the nested nodes of a failure need to be shown. Note that this probably also impacts what's included in Just for fun (and probably completeness, etc, whatever), here are a couple more options: // output omitting children of failed nodes
{
"valid": true,
"evaluationPath": "",
"schemaLocation": "https://json-everything/base#",
"instanceLocation": "",
"nested": [
{
"valid": false,
"evaluationPath": "/anyOf/0",
"schemaLocation": "https://json-everything/base#/anyOf/0",
"instanceLocation": "",
"errors": {
"required": "Required properties [\"bar\"] were not present"
}
},
{
"valid": true,
"evaluationPath": "/anyOf/1",
"schemaLocation": "https://json-everything/base#/anyOf/1",
"instanceLocation": "",
"annotations": {
"title": "foo is required, bar is specified",
"properties": []
}
}
]
}
// output omitting children that don't impact validation result
{
"valid": true,
"evaluationPath": "",
"schemaLocation": "https://json-everything/base#",
"instanceLocation": "",
"nested": [
{
"valid": true,
"evaluationPath": "/anyOf/1",
"schemaLocation": "https://json-everything/base#/anyOf/1",
"instanceLocation": "",
"annotations": {
"title": "foo is required, bar is specified",
"properties": []
}
}
]
} It could be argued that something like a |
Looking at the "include all annotations" one again, I think it could be made clearer by having another property ( {
"valid": true,
"evaluationPath": "",
"schemaLocation": "https://json-everything/base#",
"instanceLocation": "",
"nested": [
{
"valid": false,
"evaluationPath": "/anyOf/0",
"schemaLocation": "https://json-everything/base#/anyOf/0",
"instanceLocation": "",
"annotationsDropped": true,
"annotations": {
"title": "bar is required, foo is specified",
"properties": [
"foo"
]
},
"errors": {
"required": "Required properties [\"bar\"] were not present"
},
"nested": [
{
"valid": true,
"evaluationPath": "/anyOf/0/properties/foo",
"schemaLocation": "https://json-everything/base#/anyOf/0/properties/foo",
"instanceLocation": "/foo",
"annotations": {
"title": "found foo"
}
}
]
},
{
"valid": true,
"evaluationPath": "/anyOf/1",
"schemaLocation": "https://json-everything/base#/anyOf/1",
"instanceLocation": "",
"annotations": {
"title": "foo is required, bar is specified",
"properties": []
}
}
]
} That way there's an explicit indication (as opposed to the implication of |
A subtle alternative would be to list the annotations in a {
"valid": true,
"evaluationPath": "",
"schemaLocation": "https://json-everything/base#",
"instanceLocation": "",
"nested": [
{
"valid": false,
"evaluationPath": "/anyOf/0",
"schemaLocation": "https://json-everything/base#/anyOf/0",
"instanceLocation": "",
"droppedAnnotations": {
"title": "bar is required, foo is specified",
"properties": [
"foo"
]
},
"errors": {
"required": "Required properties [\"bar\"] were not present"
},
"nested": [
{
"valid": true,
"evaluationPath": "/anyOf/0/properties/foo",
"schemaLocation": "https://json-everything/base#/anyOf/0/properties/foo",
"instanceLocation": "/foo",
"annotations": {
"title": "found foo"
}
}
]
},
{
"valid": true,
"evaluationPath": "/anyOf/1",
"schemaLocation": "https://json-everything/base#/anyOf/1",
"instanceLocation": "",
"annotations": {
"title": "foo is required, bar is specified",
"properties": []
}
}
]
} I think I like this better. This actually affects That would mean that you could have nodes that have:
|
@gregsdennis I like the |
I think the primary question I have for the default behavior (don't show everything) is how to specify what's considered to "affect validation output" for keywords that have multiple subschemas.
It seems that I'm not sure what to do with that. I'd prefer that the keywords themselves not specify what the output is, however they already do that for annotations. I'm open to ideas. |
|
I think it's worth keeping in mind the possible interaction of this issue and discussion json-schema-org/community#236, particularly since @marksparkza is taking a crack at implementing something based on it. That proposal has to do with enabling/disabling annotation collection on a more fine-grained level, and while I intend for it to be separate from the output system, both annotations and the more general concept of a node of data are things that are kept or not, which (if kept) show up in output. I feel like having a general filtering layer would a.) allow for a wider variety of implementations and use cases, such as a more interactive approach with callbacks at various points of validation (don't ask for details, I literally just made that up right now) and b.) would allow the output part of the specification to stay simple. It outputs whatever it gets. It doesn't care whether there was filtering, or on what criteria, etc. Decisions about what types of filters should be easy or on by default (e.g. drop all nodes that fail validation) can be part of the filtering system. |
I think that conversation is somewhat orthogonal to this one since annotations are merely a property in the output unit. This conversation is about the structure, i.e. "Which nested results are included?" Edit: this just sank in...
|
Glad to see the edit (and yeah why do points always click right after we hit "send"? It's like they hide until you've already commented). The other side of this is that both annotation collection and output nodes in general take up space, and potentially a lot of it. There are performance reasons to filter both. With something on the scale of the FHIR mega-schema, it might only be feasible to store output notes for a portion of the schema, so you might have to run with different filters in order to narrow the errors down and get the information you need. I'm thinking of this b/c I know someone (maybe you?) talked about error collection running out of memory. |
Just an update. I'm implementing some basic pruning logic on a few keywords in JsonSchema.Net, and it looks promising. From the test suite: // schema
{
"$schema": "https://json-schema.org/draft/2019-09/schema",
"items": [
{}
],
"additionalItems": {
"type": "integer"
}
}
// instance
[null,2,3,"foo"]
// without pruning
{
"valid": false,
"evaluationPath": "",
"schemaLocation": "https://json-everything.net/fc30dcb71c",
"instanceLocation": "",
"details": [
{
"valid": true,
"evaluationPath": "",
"schemaLocation": "https://json-everything.net/fc30dcb71c",
"instanceLocation": ""
},
{
"valid": true,
"evaluationPath": "/additionalItems",
"schemaLocation": "https://json-everything.net/fc30dcb71c#/additionalItems",
"instanceLocation": "/1"
},
{
"valid": true,
"evaluationPath": "/additionalItems",
"schemaLocation": "https://json-everything.net/fc30dcb71c#/additionalItems",
"instanceLocation": "/2"
},
{
"valid": false,
"evaluationPath": "/additionalItems",
"schemaLocation": "https://json-everything.net/fc30dcb71c#/additionalItems",
"instanceLocation": "/3",
"errors": {
"type": "Value is \"string\" but should be \"integer\""
}
}
]
}
// with pruning
{
"valid": false,
"evaluationPath": "",
"schemaLocation": "https://json-everything.net/263f35c2f9",
"instanceLocation": "",
"details": [
{
"valid": true,
"evaluationPath": "/0",
"schemaLocation": "https://json-everything.net/263f35c2f9#/0",
"instanceLocation": "/0"
},
{
"valid": false,
"evaluationPath": "/additionalItems",
"schemaLocation": "https://json-everything.net/263f35c2f9#/additionalItems",
"instanceLocation": "/3",
"errors": {
"type": "Value is \"string\" but should be \"integer\""
}
}
]
} With pruning enabled, it's much easier to see what the problem is. |
Hey all. I'm wrapping up a validation and annotation collection library I started from scratch a couple of months back, and just found this issue plus the new doc regarding the new output format. I based my output on the 2020-12 format and have some questions/concerns about the new one. What I'm not sure is what is the right gh issue or discussion to bring it up...? I found some discussions but they're 2+ years old now. Anyway, please let me know where makes sense to comment a bit and ask questions on that :) |
Hey @jam01. Feel free to open a new issue or a discussion on the community repo. Happy to answer questions. In short, 2020-12 is the latest published. It's recommended, not required, by the specification. In my lib, I use the format that's in the new doc. (It's in a new doc because we've decided to extract it to allow others to devise other formats for non-machine audiences.) |
Follow-up to #939
Currently, the specification says that all nodes are to be kept for the
hierarchical
structure. This needs to be clarified to state more precisely which nodes are intended.This structure was meant to be a replacement for the 2019-09/2020-12
verbose
structure. Therefore all nodes, including those which do not impact validation would be retained.However, I realize this makes for a significantly larger structure in many cases. As such, I would like to propose that we default to only including those output units which affect the final outcome, but allowing for implementations to include all of them upon request (or configuration)
I'm open to opposing arguments and alternatives.
The text was updated successfully, but these errors were encountered: