Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: remove optional automatic processing of "content*" keywords #1287

Closed
handrews opened this issue Sep 14, 2022 · 0 comments · Fixed by #1296
Closed

Proposal: remove optional automatic processing of "content*" keywords #1287

handrews opened this issue Sep 14, 2022 · 0 comments · Fixed by #1296

Comments

@handrews
Copy link
Contributor

In §8.2 Implementation Requirements (for the content* keywords), we have:

Implementations MAY offer the ability to decode, parse, and/or validate the string contents automatically. However, it MUST NOT perform these operations by default, and MUST provide the validation result of each string-encoded document separately from the enclosing document. This process SHOULD be equivalent to fully evaluating the instance against the original schema, followed by using the annotations to decode, parse, and/or validate each string-encoded document. <CREF>For now, the exact mechanism of performing and returning parsed data and/or validation results from such an automatic decoding, parsing, and validating feature is left unspecified. Should such a feature prove popular, it may be specified more thoroughly in a future draft. </CREF>

See also the Security Considerations (Section 10) sections for possible vulnerabilities introduced by automatically processing the instance string according to these keywords.

This is a lot of potential complexity and security risk for a minimal increase in convenience:

  • it creates an entirely new code path, where additional activity takes place between the main validation and returning the result
  • it requires a different approach to results in order to separately indicate the results of the (potentially many) sets of content* keywords
  • those additional results include not just any result from contentSchema, but the output of decoding and/or parsing based on contentEncoding and/or contentMediaType
  • it involves a mandatory runtime configuration option, which is extra work and reduces the predictability of evaluation behavior compared to schema author intent
  • it is a giant security hole (JavaScript, anyone?)
  • it creates another extension point in the JSON Schema architecture in order to handle more media types (and maybe encodings- they're not said to be extensible but people will ignore that), but this one does not add anything to the capabilities of JSON Schema at all (other than needless complexity)

If we remove this, it just means that people need to look at the annotations, hand the instance value off to the right decoder and/or parser, and (if contentSchema is present), call the JSON Schema implementation again, providing the schema and the decoded/parsed instance value.

All of the information needed to do that is present in the annotations, and there's nothing JSON Schema-specific about it (calling back into the implementation doesn't count- since this came from a call into a JSON Schema implementation in the first place, clearly the application can call it again just fine).

Let's simplify our architecture by jettisoning this potentially complex and insecure, and entirely unnecessary, feature. People who want to automatically do this can trivially implement it as a separate library (and we can note that in the spec, maybe in a CREF).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant