contentSchema has implementation-defined referencing behavior when contentMediaType is not present #1381

Julian · 2023-02-08T08:39:01Z

The text of 2020-12's contentSchema says:

The value of this property MUST be a valid JSON schema. It SHOULD be ignored if "contentMediaType" is not present.

In #1288 which I've found, there was some back and forth about what the first sentence there means (which I agree with -- i.e. that the current spec seems to treat this schema as any other subschema, by virtue of omission). But the second sentence still has undefined behavior, because "SHOULD" there isn't clear.

Specifically, this schema (or others like it) when validating the instance 12:

{
  "$ref": "#foo",
  "contentSchema": {
      "$anchor": "foo",
      "type": "string"
  }
}

either blows up if one follows the SHOULD or else returns invalid if one ignores the SHOULD about contentMediaType being missing. (FWIW of the implementations Bowtie supports, there appear to be at least 1 implementation with both behaviors, though unclear whether they explicitly intend to do so).

It seems unnecessary to have undefined behavior in this case (one reminiscent of old $ref), so my personal opinion would be to remove that second sentence (and not intimate that it's ok to ignore it), or to just make it invalid to have contentSchema present without contentMediaType (and add a dependentRequired to the metaschema).

(n.b. #1352 already tweaked the language here coming out of #1288, but the second sentence is still present.)

The text was updated successfully, but these errors were encountered:

Note we avoid some side question about whether this is true even when contentMediaType is not present, as that appears undefined in the spec. See json-schema-org/json-schema-spec#1381 which may fix this for future spec versions.

gregsdennis · 2023-02-08T10:14:22Z

blows up if one follows the SHOULD

I don't follow your logic here. Following the SHOULD means that contentSchema is ignored in your example. I see nothing wrong with that.

It doesn't mean that identifiers aren't still registered, just that the keyword isn't evaluated.

Julian · 2023-02-08T10:34:26Z

Here's the language from draft 7's $ref definition:

All other properties in a "$ref" object MUST
be ignored.

where it was essentially universally understood that this means identifiers are not registered. As far as a quick skim, no other language is present there, that's how we chose to communicate that behavior.

So I think you're asserting something (that there's only one interpretation for the current language) which isn't consistent with historical wording.

But at worst even if you're right there I'd still think it's unclear rather than really undefined.

jdesrosiers · 2023-02-08T20:59:14Z

I think the intention was that that quote is in the context of using the contentSchema keyword as an assertion (which is optional, off-by-default behavior). It doesn't mean that the contentSchema sub-schema should be ignored, it means that it shouldn't apply contentSchema as an assertion on the parsed string data. I think it's a "SHOULD" in order to allow for but discourage heuristic-based parsing of the string value.

So, I think it would be wrong for your example to produce an error. The text is certainly ambiguous and should be clarified in the next release, but I weight intention pretty high when there is ambiguity in the text and I'm pretty confident that was the intention.

Julian · 2023-02-09T08:23:08Z

Fair enough maybe it's not needed for us to argue or agree on how to interpret the current draft (I suspect we probably all agree it's quite unlikely anyone has implemented anything related to this at all, let alone anything depending on the small nonsense point I'm raising) so perhaps consider this just a "it seems like this line needs tweaking for the next version" ticket.

gregsdennis · 2024-10-20T05:53:15Z

@Julian @jdesrosiers is there an action here? Do we need to clarify anything in the current text?

Julian · 2024-10-20T14:07:02Z

If it were me my recommendation was to remove that second sentence given that we no longer allow contentSchema to be used as an assertion, and/or to make it invalid to have contentSchema without contentMediaType in the text and metaschema, but it's fairly minor given I think nearly no one does anything with these keywords so if others don't agree I'm fine if you close.

gregsdennis · 2024-10-20T18:48:46Z

We did receive a question about this in Slack in the past week, so I'll look at how it can be made more explicit, but I don't think any requirements need to change.

jdesrosiers · 2024-10-23T22:31:39Z

I think whether or not this needs action depends on the resolution of #1288.

According to Henry, the example given wouldn't be valid. The schema in contentSchema is not a subschema. It's an annotation that happens to be in the shape of a schema. So, the $ref in that example is invalid because it's referencing a location that is known to not be a schema. That's true regardless of whether contentMediaType is present.

However, if it's decided that contentSchema is a subschema of the schema it appears in, then this becomes an issue we should probably address in the next release.

gregsdennis · 2024-10-23T22:40:10Z

I agree that it's worth revisiting that conversation. Primarily, I'd like to identify (good) use cases for considering it a subschema.

I can't think of any good reasons someone would want to $ref into that subschema (i.e. creating an $id in that subschema that is then referenced from somewhere else), however it might be handy to be able to $ref out of it, e.g. into $defs.

If it's not a subschema, then even something as simple as

{
  // ...
  "contentSchema": { "$ref": "#/$defs/my-content" },
  "$defs": {
    "my-content": { /* ... */ }
  }
}

wouldn't be possible.

jdesrosiers · 2024-10-24T17:39:20Z

There are other considerations as well. For instance, contentSchema not being a subschema violates the principle of least surprise. People will assume this is a normal subschema and be confused when they find it's not, which will lead to a lot of questions we have to answer. Another consideration is ease of implementation. It will vary given different architectures, but I know one option would be problematic for me to implement while the other would require nothing at all.

karenetheridge · 2024-11-21T22:05:52Z

I suspect we probably all agree it's quite unlikely anyone has implemented anything related to this at all

FWIW, I have fully implemented contentMediaType, contentEncoding and contentSchema evaluation (behaviour defaulting to annotation-only, but when enabled, the data instance is decoded and then evaluated against the schema in the context of the surrounding schema), along with a pluggable interface for adding decoders for media types and encodings, and I made use of it at $lastjob in validating APIs which included a uuencoded string as part of a message payload.

Additionally, the code has a comment saying "since contentSchema should never be assumed to be evaluated in the context of the containing schema, it is not appropriate to gather identifiers found therein". So my interpretation was that identifiers included within should not be available from outside this subschema. However, the way I am doing this also makes it impossible to set the context properly when actually evaluating the schema (as I describe it in the next paragraph), and I think the change proposed in this issue and related PR is better, where we do allow the identifiers to work as normal, so I am making this change now.

FWIW, I also (when contentSchema-evaluation mode is enabled) process the schema in place, rather than in isolation, in order for errors and annotations to appear in the right place in the overall result object. I have been pondering for some time making a proposal that this change be made officially in the next draft, as well as splitting the Content vocabulary into two just as we did for format (content-annotation and content-assertion), but since we're abandoning the vocabulary concept I'm not sure how that proposal would look anymore.

(much of the content of this post has changed since I originally wrote it, so please ignore the email for it if you first read it there.)

We treat the contentSchema subschema like any other, both traversing it for contained identifiers (always) and evaluating a decoded data instance against it (when the validate_content_schemas option is enabled). See discussion at json-schema-org/json-schema-spec#1381

handrews mentioned this issue Feb 28, 2023

Detect embedded "$id"/"$anchor"/"$dynamicAnchor" as valid "$ref"/"$dynamicRef" targets marksparkza/jschon#58

Closed

gregsdennis added this to Stable Release Development Jun 18, 2024

gregsdennis moved this to Awaiting PR in Stable Release Development Jun 18, 2024

gregsdennis added this to the stable-release milestone Jun 18, 2024

gregsdennis linked a pull request Nov 21, 2024 that will close this issue

clarify that contentSchema holds a subschema and when/how it applies #1564

Open

gregsdennis self-assigned this Nov 21, 2024

gregsdennis moved this from Awaiting PR to In Progress in Stable Release Development Nov 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

contentSchema has implementation-defined referencing behavior when contentMediaType is not present #1381

contentSchema has implementation-defined referencing behavior when contentMediaType is not present #1381

Julian commented Feb 8, 2023 •

edited

Loading

gregsdennis commented Feb 8, 2023 •

edited

Loading

Julian commented Feb 8, 2023

jdesrosiers commented Feb 8, 2023

Julian commented Feb 9, 2023

gregsdennis commented Oct 20, 2024

Julian commented Oct 20, 2024

gregsdennis commented Oct 20, 2024

jdesrosiers commented Oct 23, 2024

gregsdennis commented Oct 23, 2024

jdesrosiers commented Oct 24, 2024 •

edited

Loading

karenetheridge commented Nov 21, 2024 •

edited

Loading

contentSchema has implementation-defined referencing behavior when contentMediaType is not present #1381

contentSchema has implementation-defined referencing behavior when contentMediaType is not present #1381

Comments

Julian commented Feb 8, 2023 • edited Loading

gregsdennis commented Feb 8, 2023 • edited Loading

Julian commented Feb 8, 2023

jdesrosiers commented Feb 8, 2023

Julian commented Feb 9, 2023

gregsdennis commented Oct 20, 2024

Julian commented Oct 20, 2024

gregsdennis commented Oct 20, 2024

jdesrosiers commented Oct 23, 2024

gregsdennis commented Oct 23, 2024

jdesrosiers commented Oct 24, 2024 • edited Loading

karenetheridge commented Nov 21, 2024 • edited Loading

Julian commented Feb 8, 2023 •

edited

Loading

gregsdennis commented Feb 8, 2023 •

edited

Loading

jdesrosiers commented Oct 24, 2024 •

edited

Loading

karenetheridge commented Nov 21, 2024 •

edited

Loading