Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

contentSchema tests inadequate #609

Closed
gregsdennis opened this issue Nov 18, 2022 · 11 comments · Fixed by #618
Closed

contentSchema tests inadequate #609

gregsdennis opened this issue Nov 18, 2022 · 11 comments · Fixed by #618
Labels
bug A test is wrong, or tooling is broken or buggy.

Comments

@gregsdennis
Copy link
Member

I'm not sure that the contentSchema are fully testing what they intend to.

For example:

  • file: content.json (any draft)
  • case: validation of binary-encoded media type documents with schema
  • test: a valid base64-encoded JSON document
  • expected: valid

Schema:

 {
  "$schema": "https://json-schema.org/draft/next/schema",
  "contentMediaType": "application/json",
  "contentEncoding": "base64",
  "contentSchema": {
    "required": [
      "foo"
    ],
    "properties": {
      "foo": {
        "type": "string"
      }
    }
  }
}

Instance

"eyJmb28iOiAiYmFyIn0K"

(instance decodes to {"foo": "bar"})

The subschema at /contentSchema (based on the keywords present) seems to be expecting an object. However, if an implementation ignores the encoding and media type and just validates this as a string, the test incorrectly passes. (Mine does this.)

I suggest we need to add a "type": "object" restriction to the subschema to ensure that it requires an object and doesn't just validate the base64 string.

@Julian
Copy link
Member

Julian commented Nov 22, 2022

Sounds reasonable to me.

@Julian Julian added the bug A test is wrong, or tooling is broken or buggy. label Nov 22, 2022
@gregsdennis
Copy link
Member Author

Actually I'm starting to question whether this needs to be in the requires test set (could be in optional).

The spec (Section 8.2) says:

Implementations MAY offer the ability to decode, parse, and/or validate the string contents automatically. However, it MUST NOT perform these operations by default, and MUST provide the validation result of each string-encoded document separately from the enclosing document. This process SHOULD be equivalent to fully evaluating the instance against the original schema, followed by using the annotations to decode, parse, and/or validate each string-encoded document.

This effectively means that contentSchema just produces the schema as an annotation, and leaves decoding the data and evaluating it against the contentSchema schema to the consumer.

@Julian
Copy link
Member

Julian commented Nov 22, 2022

Right, I thought you knew that part :) and that the point was that if an implementation for some reason does do something with contentSchema (incorrectly) that it would still pass this test because it might consider the thing to be a string.

@handrews
Copy link
Contributor

In draft 2019-09 onwards the content* keywords are strictly annotations. There is an annoyingly confusing option to automatically run a second validation using those annotations, which few people understood and has since been removed.

In draft-07, content* (which I think did not yet include contentSchema) could optionally function as assertions, like format. This was removed because it's an absurdly enormous security problem.

The test schema and instance are correct, although a "type": "object" in the contentSchema would prevent the unexpected pass that @gregsdennis noticed.

@gregsdennis
Copy link
Member Author

gregsdennis commented Nov 22, 2022

the point was that if an implementation for some reason does do something with contentSchema (incorrectly) that it would still pass this test - @Julian

But given that the requirement is "MUST NOT perform these operations by default," doesn't that mean that we need to test that the implementation isn't evaluating the schema? I'm not sure that this test does that.

I think we'd need content that doesn't pass the contentSchema schema, but the overall schema passes anyway. (For the above example, the content could decode to {"foo": 42} which doesn't pass the schema.)

@Julian
Copy link
Member

Julian commented Nov 22, 2022

I'm not sure who's responding to who with what from the last two comments even though you quoted me :D

I was trying to say the same thing Henry said though, which is there's no test we can write with a result other than valid: true (because content* does nothing in drafts other than 7) but that if you were suggesting we strengthen the test such that an implementation which incorrectly does assert using content* fails this test and not unexpectedly passes it (by adding "type": object or however you suggest doing it) that that's ok with me (though I think it's very unlikely it matters, since implementations should really have trivial implementations that do absolutely nothing, so it's hard for them to have bugs here, but I'm fine with it if you care to).

Is that different than your understanding, sounds like no now right? Basically yeah if you want to strengthen the test sure go for it, as you say, the way to do so is to have content that doesn't pass the contentSchema, and then assert that it passes anyhow..

@gregsdennis
Copy link
Member Author

gregsdennis commented Nov 22, 2022

When I created the issue, I had initially misinterpreted the requirement thinking that I had to decode and validate the content (because the test has content that is valid), and the contentSchema schema is insufficient to do that as it allows non-objects.

Reading the spec more recently, I realize that I am explicitly NOT to decode and validate the content. But the content is valid against the contentSchema schema, so it's not really testing that I'm not decoding and validating the content. It's not really doing anything.

If we change the content to encode {"foo": 42} (or any other JSON data that would fail the contentSchema schema), then we can be sure that the implementation is NOT decoding and validating the content, as per the requirement.

(Sorry the direction of this issue changed after I educated myself and thought about it a bit.)

@Julian
Copy link
Member

Julian commented Nov 22, 2022

Right I think we're all saying the same thing now? Agree, change the schema, change the instance, whichever.

@karenetheridge
Copy link
Member

I was trying to say the same thing Henry said though, which is there's no test we can write with a result other than valid: true (because content* does nothing in drafts other than 7)

Yes!

When I created the issue, I had initially misinterpreted the requirement thinking that I had to decode and validate the content (because the test has content that is valid), and the contentSchema schema is insufficient to do that as it allows non-objects.

Such a test would be great in optional/, where the keywords were treated as assertions as well as annotations.

@gregsdennis
Copy link
Member Author

I noted in the PR that there's already a test that does this what I'm looking for. I added the type to be a bit more explicit.

@handrews
Copy link
Contributor

Such a test would be great in optional/, where the keywords were treated as assertions as well as annotations.

Unfortunately the assertion behavior is only optional in draft-07 and earlier, which do not include contentSchema. In 2019-09 and later, assertion behavior is non-compliant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug A test is wrong, or tooling is broken or buggy.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants