Proposal: remove optional automatic processing of "content*" keywords #1287

handrews · 2022-09-14T22:02:07Z

In §8.2 Implementation Requirements (for the content* keywords), we have:

Implementations MAY offer the ability to decode, parse, and/or validate the string contents automatically. However, it MUST NOT perform these operations by default, and MUST provide the validation result of each string-encoded document separately from the enclosing document. This process SHOULD be equivalent to fully evaluating the instance against the original schema, followed by using the annotations to decode, parse, and/or validate each string-encoded document. <CREF>For now, the exact mechanism of performing and returning parsed data and/or validation results from such an automatic decoding, parsing, and validating feature is left unspecified. Should such a feature prove popular, it may be specified more thoroughly in a future draft. </CREF>

See also the Security Considerations (Section 10) sections for possible vulnerabilities introduced by automatically processing the instance string according to these keywords.

This is a lot of potential complexity and security risk for a minimal increase in convenience:

it creates an entirely new code path, where additional activity takes place between the main validation and returning the result
it requires a different approach to results in order to separately indicate the results of the (potentially many) sets of content* keywords
those additional results include not just any result from contentSchema, but the output of decoding and/or parsing based on contentEncoding and/or contentMediaType
it involves a mandatory runtime configuration option, which is extra work and reduces the predictability of evaluation behavior compared to schema author intent
it is a giant security hole (JavaScript, anyone?)
it creates another extension point in the JSON Schema architecture in order to handle more media types (and maybe encodings- they're not said to be extensible but people will ignore that), but this one does not add anything to the capabilities of JSON Schema at all (other than needless complexity)

If we remove this, it just means that people need to look at the annotations, hand the instance value off to the right decoder and/or parser, and (if contentSchema is present), call the JSON Schema implementation again, providing the schema and the decoded/parsed instance value.

All of the information needed to do that is present in the annotations, and there's nothing JSON Schema-specific about it (calling back into the implementation doesn't count- since this came from a call into a JSON Schema implementation in the first place, clearly the application can call it again just fine).

Let's simplify our architecture by jettisoning this potentially complex and insecure, and entirely unnecessary, feature. People who want to automatically do this can trivially implement it as a separate library (and we can note that in the spec, maybe in a CREF).

The text was updated successfully, but these errors were encountered:

handrews added Type: Enhancement Priority: High Type: Security validation labels Sep 14, 2022

handrews added this to the draft-next milestone Sep 14, 2022

This was referenced Sep 14, 2022

Clarify the handling of "contentSchema" #1288

Closed

Disallow even optional "content*" processing #1296

Merged

handrews self-assigned this Sep 22, 2022

handrews closed this as completed in #1296 Oct 17, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: remove optional automatic processing of "content*" keywords #1287

Proposal: remove optional automatic processing of "content*" keywords #1287

handrews commented Sep 14, 2022

Proposal: remove optional automatic processing of "content*" keywords #1287

Proposal: remove optional automatic processing of "content*" keywords #1287

Comments

handrews commented Sep 14, 2022