Disallow even optional "content*" processing #1296

handrews · 2022-09-19T02:11:00Z

This feature added substantial complexity and security exposure for essentially no benefit. The functionality could be trivially implemented as a library on top of a JSON Schema implementation that supports annotation collection.

Fixes #1287. Intentionally avoids topics still under discussion in #1288.

This added substantial complexity for essentially no benefit. The functionality could be trivially implemented as a library on top of a JSON Schema implementation that supports annotation collection.

jsonschema-validation.xml

Co-authored-by: Jason Desrosiers <[email protected]>

based on review feedback.

karenetheridge · 2022-10-06T05:34:34Z

I'm against this change as it is written; I've found it useful in OpenAPI validation. I understand that in some situations it might be a security risk, so it's reasonable to have its support be optional (as it already is), and to default to NOT performing validation. I've been thinking of splitting it into two vocabularies: Annotation and Validation. Like with format, it can be enabled when desired, and left disabled most of the time.

To clarify: I'm fully onboard with changing the spec to clarify that an author of a schema should never be unsure whether content* will be validated or not -- implementations should NOT validate by default, so that leaves either something in $vocabulary to indicate that validation is desired, or an out-of-band configuration variable that defaults to false (which is what I've done in my implementation, so it can be used in certain situations where appropriate decoding algorithms have been provided to the implementation).

handrews · 2022-10-10T00:57:16Z

@karenetheridge since the content* keywords in the https://json-schema.org/draft/next/content vocabulary are strictly annotations, there are two ways to validate string contents while remaining in compliance with the spec:

take the output of the completed initial evaluation, and for each instance location to which content* annotations are attached
- decode the string according to contentEncoding (the spec gives no guidance on handling any errors)
- parse the string according to contentType (the spec gives no guidance on handling any errors)
- evaluate the parsed string against contentSchema as a new evaluation process
- recurse if any content* annotations are present in this new output
- store the output to return alongside but separately from the initial evaluation and any other secondary evaluations
create a content-assertion extension vocabulary analogous the the format-assertion vocabulary, which would perform the encoding/parsing/additional validation as part of the regular (single) evaluation process

Option 1 can be done in two ways: Shoved into the main JSON Schema implementation, creating a completely different set of output requirements from everything else (as currently suggested by the spec). Or it can be done as a separate library, without burdening the JSON Schema implementation's output interface that works for literally everything else in JSON Schema.

There is no functional difference whatsoever between doing it within the implementation or as a separate library. Please let me know if you do see a difference, because I would like to understand that. I am proposing this change because fundamentally, it's not a change. It just moves non-essential functionality to a library, which could even have its own little specification to ensure interoperability of this area. That would be an improvement over the current situation.

Option 2 is not impacted by this change at all. Supporting a contentSchema assertion-applicator is tricky as it is not clear how to represent instance locations within a string, so for now any such feature would not be interoperable, but we could figure that out if we wanted to make it interoperable.

karenetheridge · 2022-10-16T04:34:23Z

I don't understand your references to output requirements in Option 1. What does output have to do with anything here?

The issue I have with moving this functionality out of the spec is, if we only talk about annotations in the spec, we lose the information needed to implement these keywords as validations consistently.

But, more importantly, if we "..take the output of the completed initial evaluation..", then we can't use the results of evaluating the content* keyword(s) and use them as inputs to applicator keywords one level up in the document. So I see Option 2 as much more preferrable.

it is not clear how to represent instance locations within a string.. but we could figure that out if we wanted to make it interoperable

The string is decoded (using the specified media type) into something that matches the JSON document model, so I presumed (when implementing this) that the instance location just carries on using json pointers inside the decoded data structure. That's under-specified though, so it would be good to clarify that (assuming the use of validation stays in the spec at all).

handrews · 2022-10-16T15:40:49Z

But, more importantly, if we "..take the output of the completed initial evaluation..", then we can't use the results of evaluating the content* keyword(s) and use them as inputs to applicator keywords one level up in the document

The spec has never allowed this since contentSchema was introduced in 2019-09. It's expressly forbidden:

Implementations MAY offer the ability to decode, parse, and/or
validate the string contents automatically. However, it MUST NOT
perform these operations by default, and MUST provide the validation
result of each string-encoded document separately from the enclosing
document.

You're currently not allowed to use that result within the validation of the enclosing document. It MUST be produced as separate output results, which puts content* assertion behavior into a totally unique processed divorced from normal implementation.

This PR removes that uniqueness. If anything, it makes what you want to do easier by removing this constraint.

So I see Option 2 as much more preferrable.

That's great! This PR actually makes that easier by removing the part that more-or-less forbids that.

I don't hear any actual objection from you regarding what this PR does. If anything, this sounds like support. You're saying "no, don't merge this", but also "my preferred solution is the thing that is easier if we merge this."

so I presumed (when implementing this) that the instance location just carries on using json pointers inside the decoded data structure. That's under-specified though, so it would be good to clarify that (assuming the use of validation stays in the spec at all).

You should file an issue for that, as it is not relevant to this PR. It was underspecified before, and it will be the same level of under-specified afterwards.

karenetheridge · 2022-10-17T00:08:10Z

It was underspecified before, and it will be the same level of under-specified afterwards.

Well no, not really - if you take out the descriptions of how the content* keywords would validate, then that leaves the reader to make up their own (inconsistent) rules. ..But on looking again, I see you're not removing those paragraphs.

I don't hear any actual objection from you regarding what this PR does. If anything, this sounds like support. You're saying "no, don't merge this", but also "my preferred solution is the thing that is easier if we merge this."

I guess? I don't see any path forward to allowing in-place evaluation as assertions, other than creating a new vocabulary (akin to the format-annotation and format-validation vocabulary split in 2020-12), but I'd been pondering that myself and it seems reasonable.

So I guess my vote is "not-no" - I'm not terribly happy with the spec staying in the state after merging this, but the parts you are removing are parts that I definitely disagreed with. If that makes sense :)

handrews · 2022-10-17T01:29:54Z

@karenetheridge

the parts you are removing are parts that I definitely disagreed with. If that makes sense :)

Yes, exactly! This PR is about removing a bad solution to the problem you want to solve. It takes no position on other possible solutions, and you are quite welcome to file proposals for other solutions. I would actually suggest a slightly different approach to validation, using more specific keywords. A major problem with these a validation keywords is that's they're so open-ended. But if you specifically want to validate JSON-in-JSON, or or parse HTML-in JSON, etc., you can write a more clear spec for that behavior.

But that's a discussion for another issue (or discussion).

I'll go ahead and merge this, then. Two "yes"es and a "not-no" seem sufficient :-)

Disallow even optional "content*" processing

e59dc7a

This added substantial complexity for essentially no benefit. The functionality could be trivially implemented as a library on top of a JSON Schema implementation that supports annotation collection.

handrews added Type: Enhancement annotation validation labels Sep 19, 2022

handrews added this to the draft-next milestone Sep 19, 2022

jdesrosiers requested changes Sep 21, 2022

View reviewed changes

jsonschema-validation.xml Outdated Show resolved Hide resolved

jsonschema-validation.xml Outdated Show resolved Hide resolved

handrews and others added 2 commits September 21, 2022 20:06

Fix duplicated word

2a2f983

Co-authored-by: Jason Desrosiers <[email protected]>

Remove confusing parenthetical aside

e7d36b2

based on review feedback.

jdesrosiers approved these changes Sep 27, 2022

View reviewed changes

This comment was marked as outdated.

Sign in to view

handrews merged commit 27cc759 into json-schema-org:main Oct 17, 2022

handrews deleted the no-auto-content branch October 17, 2022 05:55

handrews mentioned this pull request Nov 22, 2022

contentSchema tests inadequate json-schema-org/JSON-Schema-Test-Suite#609

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Disallow even optional "content*" processing #1296

Disallow even optional "content*" processing #1296

handrews commented Sep 19, 2022

karenetheridge commented Oct 6, 2022 •

edited

Loading

This comment was marked as outdated.

This comment was marked as outdated.

handrews commented Oct 10, 2022 •

edited

Loading

karenetheridge commented Oct 16, 2022

handrews commented Oct 16, 2022

karenetheridge commented Oct 17, 2022

handrews commented Oct 17, 2022

Disallow even optional "content*" processing #1296

Disallow even optional "content*" processing #1296

Conversation

handrews commented Sep 19, 2022

karenetheridge commented Oct 6, 2022 • edited Loading

This comment was marked as outdated.

This comment was marked as outdated.

handrews commented Oct 10, 2022 • edited Loading

karenetheridge commented Oct 16, 2022

handrews commented Oct 16, 2022

karenetheridge commented Oct 17, 2022

handrews commented Oct 17, 2022

karenetheridge commented Oct 6, 2022 •

edited

Loading

handrews commented Oct 10, 2022 •

edited

Loading