-
-
Notifications
You must be signed in to change notification settings - Fork 273
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
resolve ambiguity regarding annotation inclusion in failed validation results #939
Comments
In my validator, I separate the concept of "collected annotations" from "error collection". I think the "annotations" referred to by section 10.4.4 are in the second, "error collection" set, where there are both "error" and "annotation" annotations representing "false" and "true" results, respectively. At least that's how I'm reading this part. I agree the language could use work. |
My intention was that failed validation always drops annotations, which means you can abandon processing once a schema object fails validation, and ignore its subschemas. I did not write section 10, so you'll have to ask @gregsdennis about it (if I was involved in that point, I have since forgotten about it). It's not feasible to fully specify annotations that might-have-been because you get into very complex questions about conditional schema application and exactly how much needs to be faked in order to find might-have-been annotations. That's one reason I didn't require it, in addition to the performance issues. |
Section 7.7.2.1 paragraph one is correct as written, any change to the behavior it requires it would require substantial discussion beyond just harmonizing wording. |
completely agreed! I think this means that the "verbose" output format might need to be reworked, or moved into the "not-required-but-some-implementations-may-support-it" category. edit: #679 (comment) shows that at one point the core team did grasp what was going on here and did not see any contradictions in it :) Perhaps there is no issue here that needs resolving. I hope to determine that for myself soon when I get to implementing the "detailed" and "verbose" result formats. |
Looking at 10.4.4, I don't see any examples of annotations being passed through to output. In terms of what @handrews said...
Yeah, I believe the verbose error reporting can ONLY produce error results for schema parts which were actually APPLIED to the instance. @ssilverman Can I see some verbose output examples from you when you have failing subschemas with annotations please? Also for I haven't poked at @jdesrosiers hyperjump, but the web based version does fast failing, so you only get the first failure. |
@Relequestual I'm not completely clear what the request is, so I created the following example to start. Let's start here and modify until it's what you'd like to see. Schema: {
"$id": "http://github.com/json-schema-org/json-schema-spec/issues/939",
"allOf": [
{
"not": true,
"title": "The Title"
}
]
} Instance: {
} Basic errors: {
"valid": false,
"errors": [
{
"keywordLocation": "",
"absoluteKeywordLocation": "https://github.com/json-schema-org/json-schema-spec/issues/939",
"instanceLocation": "",
"error": "schema didn't validate"
},
{
"keywordLocation": "/allOf",
"absoluteKeywordLocation": "https://github.com/json-schema-org/json-schema-spec/issues/939#/allOf",
"instanceLocation": "",
"error": "invalid subschemas: 0"
},
{
"keywordLocation": "/allOf/0/not",
"absoluteKeywordLocation": "https://github.com/json-schema-org/json-schema-spec/issues/939#/allOf/0/not",
"instanceLocation": "",
"error": "\"not\" didn't validate"
}
]
} Annotations when not collecting annotations for failed schemas: {
"annotations": []
} Annotations when collecting annotations for failed schemas (note the "valid=false"): {
"annotations": [
{
"instanceLocation": "",
"keywordLocation": "/allOf/0/title",
"absoluteKeywordLocation": "https://github.com/json-schema-org/json-schema-spec/issues/939#/allOf/0/title",
"annotation": {
"name": "title",
"valid": false,
"value": "The Title"
}
}
]
} |
Hyperjump JSV doesn't collect annotations, so I'm not going to be any help. |
Thanks @ssilverman Regardless, I've considered... @karenetheridge Do you think there's any merit in collecting non-annotations (I'm calling them non-annotations because they are never attached to an instance location) for failed validation processes? Given the failures provide a schema location, anyone wanting such non-annotations could look up in the schema to see what they are (based on the schema location attribute). If you agree there's no point, then I'm going to suggest we remove the part where it talks about "annotations for failed validations", unless @gregsdennis pipes up to explain what this meant... |
That verbiage only pertains to the Verbose output format. The intent was that, if the client is requesting verbose, they probably want as much info as possible, so we give them everything, even annotations that would otherwise be removed. Maybe this can be an implementation prerogative. If so, we should update the text as such. |
From an implementor's point of view, collecting annotations on a node you're already evaluating is trivial, unless you have shortcutting logic to quickly fail. But if they're requesting verbose output, performance is an unlikely concern as the output content take priority. |
OK. I think we're getting there. @ssilverman I think in your example, the Addressing...
I wonder @handrews if you're thinking about subschemas which are not actually applied as part of processing? Obviously that would be insane. It's impractical to suggest that the verbose ouput somehow magically works out how to apply all subschemas (which might not even be possible with the instance provided). I think, as @gregsdennis says here, if the user is wanting verbose output, the annotation collection would be enhanced, and collected regardless of the validation result, but STILL only based on applicability, and without short circiting validation (because verbose). @gregsdennis @handrews Are we on the same page here? I do have a follow up though... It's actually a bit of a sidetrack... but I'd like to keep it here anywayGiven a schema and instance such as...
and instance...
How does the verbose result look? The subschema |
@Relequestual The "annotations" output I have there is just my own thing and I chose to use "valid" to indicate whether the annotation was the result of a valid collection or the result of a failed validation (there's an option in my API to collect them). That "valid" property you mention is for the various "errors" outputs. (Mostly because my understanding is that all the output formats only show "errors" and not "annotations". Please correct me if I'm wrong; there's no examples to the contrary in the spec.) |
@ssilverman The verbose example at https://json-schema.org/draft/2019-09/json-schema-core.html#rfc.section.10.4.4 I hadn't considered that you'd collect annotations separate from the verbose output structure. I'd have expected it as part of the verbose output structure as with successful validation, as allowed per json-schema-spec/output/schema.json Line 31 in 964c2f8
|
@Relequestual Here's what I get for the basic output: {
"valid": true,
"errors": []
} @gregsdennis Something I just realized I've been doing for basic output is excluding all failing errors if the schema passes. Is this correct? There needs to be lots more examples in the schema spec for all possible cases. Output with everything included, including "pruned" and non-failing errors (I'm just dumping all the errors, I didn't feel like creating the tree structure): {
"valid": true,
"errors": [
{
"keywordLocation": "",
"absoluteKeywordLocation": "file:schema-ann2.json",
"instanceLocation": "",
"valid": true
},
{
"keywordLocation": "/$schema",
"absoluteKeywordLocation": "file:schema-ann2.json#/$schema",
"instanceLocation": "",
"valid": true
},
{
"keywordLocation": "/not",
"absoluteKeywordLocation": "file:schema-ann2.json#/not",
"instanceLocation": "",
"valid": true
},
{
"keywordLocation": "/not/allOf",
"absoluteKeywordLocation": "file:schema-ann2.json#/not/allOf",
"instanceLocation": "",
"error": "invalid subschemas: 0",
"valid": false,
"pruned": true
},
{
"keywordLocation": "/not/allOf/0/required",
"absoluteKeywordLocation": "file:schema-ann2.json#/not/allOf/0/required",
"instanceLocation": "",
"error": "required properties not found: \"a\", \"b\"",
"valid": false,
"pruned": true
},
{
"keywordLocation": "/not/allOf/0/title",
"absoluteKeywordLocation": "file:schema-ann2.json#/not/allOf/0/title",
"instanceLocation": "",
"valid": true
},
{
"keywordLocation": "/not/allOf/1/required",
"absoluteKeywordLocation": "file:schema-ann2.json#/not/allOf/1/required",
"instanceLocation": "",
"valid": true
},
{
"keywordLocation": "/not/allOf/1/title",
"absoluteKeywordLocation": "file:schema-ann2.json#/not/allOf/1/title",
"instanceLocation": "",
"valid": true
}
]
} |
@Relequestual there's no annotations in that link you mentioned (https://json-schema.org/draft/2019-09/json-schema-core.html#rfc.section.10.4.4). (I'm not referring to valid/invalid as "annotations". When I think "annotations", I think "collected annotations, for example 'format' or 'title', or even 'properties.") Update: Oh, I see what you mean. Does the spec even mention "annotations" with an example outside of that .json file? (Yes, I know it calls "passing validation" "annotation", but there's no examples, so I just presumed to make my own format.) |
I didn't include an annotations example in the spec. I assumed it would be trivial given the errors example. Probably a wrong assumption... @ssilverman what do you mean by "non-failing errors?" Is this like an @relequestral for your example, I would expect errors for the missing |
@gregsdennis Since "errors" is used in all the examples, even for "valid=true" errors, my understanding that an "error" is the result of a schema (or keyword) validation. It's a little confusing because "annotation" is supposed to be the name of a "passing schema" "error annotation". I've always wondered about this redundancy/apparent contradiction. TLDR:
Through some experimentation with real-world examples, I sought to include relevant errors and exclude errors that weren't relevant. With annotations, I sought to do the same thing, but following the spec’s suggestions that subschema annotation results needn't be included. Here's my logic of how I collect annotations:
To summarize: I think of an "annotation" as extra information and an "error" as a schema or keyword validation result. Internally, I implement an "error" as an "annotation" having a "validation result" object. Splitting errors and annotations up this way made more sense to me and helped me implement things. I really believe "annotation" and "error result" need to be thought of as separate, even though they may be implemented the same way or with the same structure. I believe my implementation was successful much quicker once I saw this difference. I'm curious if other implementers feel the same way or have the same thought process. |
As I understand, you would need both, because |
@ssilverman I'm reading the rest of your comment, but I was trying to say, "validity" is NOT an annotation. It's a possible property of the node. |
Annotations are ONLY ever attached to an instance. If it's not attached to the instance, it's not an annotation (yet). Anything the "annotations" collected for nodes (of the output) where validation failed, are "here's the annotations you would have gotten if this node had validated successfully". @ssilverman you can do anything you want internally, but the output formats are defined. They were designed so there would be uniformity across implementations. There will be tests (which you will currently fail) if you don't include the annotations in the verbose output, and if you include annotations which shouldn't be there. I'd really like to stay on topic here. |
I'm going to summarise. The spec says...
It does not say you MUST NOT collect the annotations, just that you must not produce them (in the output). I would furthe argue that the We COULD change the key for this to make it clearer. Thoughts @karenetheridge ? |
@Relequestual yes, that was long-winded in my post. I thought it would be useful to see though process around how it’s possible to think about errors, annotations, and their collection. I’ve added “TLDR” after the first paragraph. I hope the point of the first paragraph wasn’t lost, though. :) Another question I had was about the output for non-simple annotations. Since, for example, the annotation for “properties” (and related) is a set; how should that look? Right now, I’m just adding a JSON array. Alternatively, it could be mandated that the output is the string form of the annotation contents, but then it’s harder to keep implementations uniform. Or other values that are some implementation-defined non-simple type. Bottom line is that I believe that more examples covering all the output cases would be helpful for implementers. That might help resolve these ambiguities. (To state the obvious.) But maybe this is all about what needs to go in those examples and I have some catching up to do... :) |
If they're not associated with an instance location, then what are they? all we have is a path to the annotation keyword in the schema itself. Sure, we could collect them as we traverse the schema during evaluation, and not collect the same one twice if we happen to traverse the same schema path multiple times with different instance locations... however, even if don't short-circuit (e.g. don't traverse the other allOf branches once we hit the first failing branch) we can't be sure to collect them all -- as some subschemas are only traversed for certain data types (consider subschemas under "properties" when the instance is an array type), or even conditionals under an "if", "not" etc. But even then... I could maybe see an application having a use for such a thing... shrug? As an application developer I would probably just traverse the schema myself and extract all the annotation keywords I cared about.
I'd be in favour of this, as holding on to annotations after a subschema has been evaluated as false would make implementations much more complicated. It is difficult enough properly implementing a configurable short-circuit mode. I haven't yet implemented the 'verbose' output format because it would require restructuring the entire logic of error and annotation collection (keeping everything no matter the result, and marking whether errors were part of an successful evaluation path, or annotations from a failed path). The 'detailed' format, by contrast, can be constructed after the fact, as it is just a matter of walking the flat list of errors or annotations and creating nesting using the path components in keywordLocation as a guide. I structured my API such that the output format could be requested after evaluation had taken place (although if output_format="flag" is specified up front, short-circuiting might be enabled which will limit the number of errors in the result). tldr I'd be happy to see the verbose format sent to the bitbucket in the sky.
I agree with @gregdennis that the similarity between annotations and errors are fairly clear. The accompanying output schema backs this up. It would certainly help if the test suite had some output examples to work from, but we can add a few more to the spec as well. My take: Once a subschema, or even a particular keyword within a subschema, has a failing result, all annotations collected below that point are moot and thrown away. Errors are easier, because once a failing result has been collected at this keyword (or below) it can't become successful again, with the exception of 'contains'+'minContains' or 'oneOf', where even when we have a failing result we need to hold out for the possibility that the overall result may be true and the collected errors at this point aren't relevant. I implement my 'error' and 'annotation' classes very similarly - they have instance and schema locations attached to them, and the actual error/annotation value themselves (where with error, it's always a string, and with annotations the value could be anything). There may be additional data attached to the error/annotation internally (e.g. I also collect the keyword itself, even though it can be derived from keywordLocation, because the caller can use it when searching for annotations by instance location + keyword later on when provided with the results in object format rather than serialized to json). Using the allOf+not+title example provided earlier, this is what my implementation does:
and, flipping that 'not' condition so it results in a true result:
A set is an unordered list. So for this evaluation:
|
I never anticipated anyone splitting that particular hair! "collecting" to me is stuff that happens from a given schema object down (or kind of the whole process, I never defined it precisely), and "producing" is that schema handing the collection up to its parent. Of course you do some collection even on a schema object that fails, as you might not be able to tell if it fails until after evaluating some subschemas. Once a schema fails, you "drop" the collected annotations in the sense of not passing them to the parent alongside the |
Is any part of this just a clarification that can be put in, or does it require an actual conformance-impacting change? |
A bit more clarification here: the inclusion of annotations for failed validations was not so that they could be processed later by subsequence validations but rather so that any human reader (which is the assumption of the Contrast this with https://json-schema.org/draft/2019-09/json-schema-core.html#rfc.section.7.7.1.2:
which is specifically about the validation process itself. These two are not really in conflict as they target different uses. However, yes, this does need to be clarified. With the latest proposed changes to the output, this language will likely come out anyway. |
Is this specific change being tracked somewhere? I've just found myself sorting through this long conversation for about the 4th or 5th time only to realize that there doesn't seem to be anything else that needs doing. If it's not being tracked, can we file it separately and close this? |
@gregsdennis meant to ping you on that last comment |
It appears that this text was replaced with simply
I've opened ☝️ to address this with the new output format. In specific regard to annotations, any output unit that's included should retain any errors/annotations that were created as a part of that subschema's evaluation. |
The spec says (https://json-schema.org/draft/2019-09/json-schema-core.html#rfc.section.7.7.1.2):
However, this is contradicted by other parts of the spec
(https://json-schema.org/draft/2019-09/json-schema-core.html#rfc.section.10.4.4):
So, it would appear there are cases where annotations should be included in a result even if the
overall validation result is false. (e.g. if one branch of an allOf was successful and produced annotations, but another branch failed, this means that the subschema at allOf fails, so it should discard all annotations collected below that point and simply return its valid: "failed" and errors: [..] results back up to the next-higher node. But in the end, annotations are still not returned in the final result.
This (section 10.4.4) also implies that an implementation cannot simply discard
all collected annotations at a schema position where validation fails: it must continue to collect
them in case the "verbose" output method is requested.
Now, another usecase:I have a desire to include annotations in failed validation result, so as to provide moreinformation as to the nature of the failure (or perhaps offer extra guidance as to how to make
validation succeed).·
(cross-ref: https://github.com/karenetheridge/JSON-Schema-Draft201909/issues/22)I can achieve this with a simple annotation-only keyword (e.g. "description", or a new one that actslike "description" by returning data at the position it is evaluated), except for this item of the
spec that says that annotations are discarded on failed results.
Is there a way through this? At the very least, it seems as if there is some contradictory languagein the spec that could be clarified.
The text was updated successfully, but these errors were encountered: