Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define the abstract instance validation function #5

Closed
awwright opened this issue Jul 19, 2015 · 27 comments
Closed

Define the abstract instance validation function #5

awwright opened this issue Jul 19, 2015 · 27 comments

Comments

@awwright
Copy link
Member

It may be useful to define, in somewhat mathematical terms, what it means to validate an instance, and which inputs are used.

I imagine the validation function being defined as such:

Validate[collection, schema, version, iriBase, instance] → Boolean ∪ Indeterminate

Where:

  • collection ∈ set of all Map[ IRI → valid JSON Schema instance ]
  • schema ∈ set of all IRIs
  • version ∈ set of all IRIs
  • iriBase ∈ set of all IRIs
  • instance ∈ set of all JSON documents (i.e. with a media type application/json)

This may also help to resolve issue #4. If the validation function is defined to have no side-effects, then we can just reiterate that point within the "default" keyword. We can also say the keyword is "not used for validation, but may be used for other purposes not defined here."

This is not to say that JSON Schema libraries can't implement other functions, they might desire to implement a "coerce" function that turns an arbitrary JSON instance into a validating one (casting strings to numbers, filling in missing required values using the default, etc).

Aside: Defining a "coerce" might be something useful for v6 (or, the next version with feature additions).

@epoberezkin
Copy link
Member

I agree that validation should not have side effects. I would keep "coerce" out of the standard.

@yoshuawuyts
Copy link

Agreed. -1 for side effects, -1 for "coerce" in v5.

@awwright
Copy link
Member Author

Perhaps more something like:

Validate[collection, schema, version, iriBase, instance] → Boolean

Where:

  • collection ∈ set of all Map[ IRI → valid JSON Schema instance ]
  • schema ∈ set of all IRIs found in collection
  • version ∈ set of all IRIs identifying meta-schemas
  • iriBase ∈ set of all IRIs
  • instance ∈ set of all JSON documents (i.e. with a media type application/json)

By this definition, invalid schemas, and schemas linking to non-existent schemas, are outside of the domain, and the function can always return valid/invalid.

@awwright
Copy link
Member Author

There's two questions we have to figure out here:

  1. Is it possible to change schema versions, intra-document, with a $schema keyword? How can we indicate where a schema version change may take place?

  2. Should the function include an indeterminate return value; or should invalid schemas be considered outside the domain of the function? (I.e. the value of this function is only defined for valid schemas). This may have a few ramifications, for example, it's possible to construct a schema that can produce all three invalid, valid, and error conditions depending on the instance that you feed it.

@awwright awwright modified the milestones: draft-future, draft-next Nov 30, 2016
@Relequestual
Copy link
Member

An invalid schema should be out of scope. In a similar way, if you try to parse invalid json, you get an error from your json parser... you don't get up to the point of the error in parsed json.

@Relequestual
Copy link
Member

@yoshuawuyts You have given two -1's but without reason. A -1 without reason is invalid.

@seagreen
Copy link
Collaborator

One thing I think we can say for sure is that invalid schemas should not return validation: false, either the shouldn't be part of the domain of the function at all or they should get their own result (like the indeterminate suggested by @awwright). If they did return false then then invalid schemas would be indistinguishable from {"not":{}} valid schemas. No one has actually suggested we do this, I'm just writing the argument against it out for completeness.

Also, I think the questions brought up by @awwright are super important and I'd like to hear more people's thoughts on this.

@seagreen
Copy link
Collaborator

seagreen commented Feb 6, 2017

This should absolutely be a blocker for new drafts of the spec until it's resolved.

How can we be releasing specifications for something that we can't even formally describe?

@handrews
Copy link
Contributor

handrews commented Feb 6, 2017

@seagreen from a practical perspective, it's obviously been working out OK. We are very close to Draft 06 and I am reluctant to postpone it for an issue that will no doubt involve a lot of debate.

What about this would prevent someone from successfully implementing Draft 06?

@seagreen
Copy link
Collaborator

seagreen commented Feb 6, 2017

@handrews: I'll try to explain my thinking.

For one thing, @awwright asked a good question above:

Is it possible to change schema versions, intra-document, with a $schema keyword? How can we indicate where a schema version change may take place?

And I see that the currect spec takes a side on this:

JSON Schema implementations SHOULD implement support for current and previous published drafts of JSON Schema vocabularies as deemed reasonable.

But holy smoke, have we thought about the implications of this? It means that JSON Schema is only as clear as the messiest specification draft. This is really bad. At the very least I think this should say that implementations MAY support earlier drafts.

And more importantly, he asks a second question:

should invalid schemas be considered outside the domain of the function?

This has been an issue since the Foundations of JSON Schema paper. Has it been resolved in the current draft?

Here I should say my apologies if it's in the current draft and I missed it -- the question above is a genuine question because I haven't read the draft as closely as I'm sure you have.

My full opinion on this is actually stronger than just resolving this issue -- I think that the specification should be based on a formal model such at the one in Foundations of JSON Schema, and that English specs for things like this are fundamentally inadequate.

@handrews
Copy link
Contributor

handrews commented Feb 6, 2017

@seagreen coould you see if PR #248 sufficiently addresses your "$schema" concerns at least enough to get draft 06 out?

@seagreen
Copy link
Collaborator

seagreen commented Feb 6, 2017

@handrews: I shouldn't have underestimated you guys, glad to see others were concerned about it as well.

To answer your question: it looks like I have a different impression of what a "draft" is than you all (which is good!). While language like "implementation behavior is subject to be revised or liberalized in future drafts" the draft is obviously pretty casual and nothing needs to be a blocker.

I just think that it makes more sense to do a formal spec first -- it's likely to save time and confusion.

Also: thanks for being so nice even when I'm clearly a little annoyed. You're awesome 😃

Also also: note that I still have two more concerns: the domain of the validation function and the idea that the spec should be based on a formal language instead of English.

@handrews
Copy link
Contributor

handrews commented Feb 6, 2017

@seagreen credit to @epoberezkin on this one- I confess I was totally fine with being able to switch schemas :-)

These sorts of drafts are just checkpoints for gathering feedback. It's unusual (even pathological) that draft 04 became a de-facto "standard" for years.

As for the "more formal language instead of English", is there some way you think this should be treated differently than other RFCs? ABNF isn't really useful here, I don't think, and generally RFCs do not use formal language (beyond the MUST/SHOULD/MAY/etc. from RFC 2119.

(also, you clearly haven't seen the issues where I got annoyed- trust me, you're fine, and I'm in no position to throw stones anyway :-)

@seagreen
Copy link
Collaborator

seagreen commented Mar 28, 2017

As for the "more formal language instead of English", is there some way you think this should be treated differently than other RFCs? ABNF isn't really useful here, I don't think, and generally RFCs do not use formal language (beyond the MUST/SHOULD/MAY/etc. from RFC 2119.

This is a great question. The Foundations of JSON Schema paper uses mathematical notation. I'm not convinced that would be a huge gain for us though, but I don't know much about the subject.

One thing that would could consider would be to have a canonical reference implementation of JSON Schema that we try to keep exactly correct. Then if parts of the spec are unclear we'll be forced to think about them immediately instead of later on when someone brings them up in the test suite.

@handrews
Copy link
Contributor

@awwright @Relequestual @seagreen how do we resolve this issue?

Is this really something that needs to go in the specification, or is it better handled as a paper or something hosted on the json-schema.org web site?

@seagreen
Copy link
Collaborator

seagreen commented Sep 14, 2017

The quasi-mathematical language isn't the important part, the important part is that the specification is extremely precise about what an implementation is allowed to do when it hits an edge case (E.g. if it's 90% through validating an instance and an invalid schema is referenced, is the result "MUST be invalid", "MAY be invalid", "MUST be indeterminate", etc.)

Another example would be if the implementation is partway through validating and a schema is referenced for a really old draft of JSON Schema -- what is the implementation allowed to do?

As long as those kind of things are exhaustively addressed the notation doesn't matter too much.

That said, I'm bowing out of JSON Schema stuff in general, so I'll leave the rest of this to you and @awwright. But those are my thoughts in case they're helpful.

@handrews
Copy link
Contributor

@seagreen thanks, that is helpful. I think the useful thing to do here would be to file some specific issues around particular gaps in the spec. Those are things that I think will get nailed down as we get into the working group phase- we're still trying to just get the feature scope right. Anyway, since you are not active with the project anymore, I'll look into filing these.

I want to give @awwright a chance to weigh in as this is his issue, and he's been busy in recent weeks, but my inclination is to move the general "let's define the abstract function" part of this over to the web site repo as supplemental information.

BTW if there's anything to your moving on from JSON Schema other than just not needing it for your current work, I'd be grateful for any parting feedback on any aspect of the project, technical or otherwise. My email address is on the last version of the spec if you'd rather send feedback that way.

@seagreen
Copy link
Collaborator

Sounds good @handrews. I actually can answer you here instead of emailing because some of my reasons are on-topic: my personal preference is extremely small, well-defined specs that start with a theoretical foundation and build based off of that. So from this perspective a well-defined validation function would be reassuring to me.

(There's also an off-topic reason: for my projects I realized that human-readability doesn't matter much compared to the simplicity of the spec. Obviously JSON Schema can't toss human-readability out just for one person though!)

I do still like the JSON Schema community and think it's a cool project, so I look forward to seeing what you all come up with.

@Relequestual
Copy link
Member

I have no opinion on this. Goes beyond my maths theory understanding =/

@handrews
Copy link
Contributor

@awwright @seagreen are there any examples of how such a function is defined in existing RFCs? I'm really just looking for the right sort of language, syntax, notation, etc. to use for such a thing.

@seagreen
Copy link
Collaborator

Great question. Unfortunately I don't know of a good example, but I'll keep an eye out.

@Relequestual
Copy link
Member

@handrews How does this link to output?

@handrews
Copy link
Contributor

handrews commented Oct 1, 2018

@Relequestual I think it was based on @awwright 's comment above, when he listed two points, one of which is still unresolved:

Should the function include an indeterminate return value; or should invalid schemas be considered outside the domain of the function? (I.e. the value of this function is only defined for valid schemas). This may have a few ramifications, for example, it's possible to construct a schema that can produce all three invalid, valid, and error conditions depending on the instance that you feed it.

I admit I really don't know what to do about this issue. I've tried a few times to get it into something that feels actionable to me, or to close it, but I've not been able to accomplish either thing.

@Relequestual
Copy link
Member

eep...

@Relequestual
Copy link
Member

I feel it's overkill to define what the implementer should do if they get invalid JSON or an invalid schema. It seems kind of obvious that you should check the instance JSON and the Schema are valid before processing, but it MIGHT be more complex than that...

@awwright what do you mean by...

This may have a few ramifications, for example, it's possible to construct a schema that can produce all three invalid, valid, and error conditions depending on the instance that you feed it.

Can you give an example where this could be true? (I'll be honest, I have very little understanding of this issue. I do not understand your initial post.)

@awwright
Copy link
Member Author

awwright commented Oct 6, 2018

That's a good question. I may have forgotten some of my line of thinking since then.

Maybe I should try to identify a problem this is actually supposed to be solving, first.

@awwright
Copy link
Member Author

Since I can't really come up with a solid example of what there is to improve, I'll close this out. If I can come up with something, and a better way to phrase the issue, I'll open a new issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

No branches or pull requests

6 participants