Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider forcing end-users into explicitly selecting a format behavior once Draft 8 is out #407

Closed
Julian opened this issue May 12, 2018 · 9 comments
Labels
Enhancement Some new desired functionality

Comments

@Julian
Copy link
Member

Julian commented May 12, 2018

See the discussion in #403 -- it might be worthwhile (i.e. it might remove some footgunning) to force even users who do not want format to do anything to have to opt into that by passing in an empty FormatChecker.

Let's wait here though until Draft 8 is out, since one of its focuses is to further classify things in a way that makes these things generally applicable.

@Julian
Copy link
Member Author

Julian commented May 29, 2018

So one issue I have here is that I feel somewhat strongly that most users are satisfied by the simple validate(instance, schema) function -- would love to back that up with some data, but I don't have any unfortunately -- given that assumption, there needs to be some default behavior there, so we might be back at square one.

@handrews
Copy link

handrews commented May 29, 2018

I've been thinking about this sort of thing recently.

Right now, format, contentEncoding, and contentMediaType are classified as both annotations and optional assertions. Which I think is really confusing and unsatisfying for everyone. WTF even is an "optional assertion"? Not to mention that as shown here in #403, it's not even a yes-its-validated/no-its-not question, as many of these things are difficult to impossible to truly fully validate.

So I'm considering changing the model to say:

  • These are purely annotations
  • Annotations can be defined to produce an assertion result at the application** level
  • JSON Schema implementations MAY or SHOULD provide an extension point to call back to the application level to incorporate the assertion during the validation process
  • Users must opt-in to such callbacks (probably with some convenience mechanism for saying things like "opt into everything, best effort", not sure)

**application here includes anything not completely nailed down by the JSON Schema spec. So format validators, while likely packaged with implementations rather than literal applications, are considered application-level things in this sense. And some applications would provide their own format validators

This is super hand-wavey but since you just mentioned something that lines up with this vague idea really well, I thought I'd float it.

@Julian
Copy link
Member Author

Julian commented May 30, 2018

@handrews so... forgive me if I missed it, but I'm not sure I see anything but a semantic change there (one that I'm even behind on, since this library hasn't yet adopted the language of "annotations" and "assertions") -- did I miss the point?

I think fundamentally the problem I see here is people don't realize format is different (and the other two but just gonna talk format for a sec), despite the fact that the spec already somewhat clearly says it is. End users (people who are using JSON Schema here I mean) don't really read specs, and I'm not sure that's the kind of thing you can change by writing more words.

The kinds of things that I think might actually help the problem here to me are things like considering a syntax change that causes pause. E.g., if optional-whosemacallits were syntactically distinct, someone might go "huh why do I spell it :format instead of what I do for properties" and actually get hit by the distinction a bit more overtly.

@handrews
Copy link

@Julian I'm reluctant to add breaking changes in terms of syntax, although I'm not entirely against it and the :format type of approach is an interesting one that I had not considered.

What I'm trying to do with the assertion/annotation/applicator keyword classifications and related things is establish a clear and predictable framework for keyword behavior. Most users won't read up on it at the spec level, but they will notice the effects. Or rather, hopefully they won't notice, because things just behave in a predictable way so they aren't surprised and don't need to care.

My audience for that is not so much people who have been following along from draft to draft, but people who will be coming in new, or without having paid much attention before.

Historically, if you glanced over the spec, you found format in a document labeled "Validation", and you probably did not read the paragraph about how it is an optional keyword. Unless you happened to read that, there's nothing in the spec anywhere to hint that it might be weirdly optional. The section is even called Semantic Validation With "format", so of course people expect the keyword to actually validate things.

But what if the spec looked like:

  • Validation assertions
    • For all types
    • For integers
    • For strings
    • For objects
    • For arrays
  • Annotations
    • Basic meta-data
    • Semantic annotations with format
    • String-encoded media

This gives anyone who even glances at the spec the correct impression- format provides information, and does not necessarily validate.

So instead of format looking like a validation keyword at first glance, but actually burying counter-intuitive behavior in the wording, the wording will just be about how format informs the schema consumer what kind of data is in the string, and how implementations MAY offer some level of validation.

And we can make it a little more standardized and predictable how implementations should handle not just format, but the content* keywords and anything else that anyone (the standard or extensions) want to do that fits the same general pattern: flag a bit of JSON data as intending to conform to something complicated, and needing some sort of extension module to handle it properly.

Even better, that's just a special case of how all annotations work: they add information to the instance, and something else has to decide how to make use of that information. That's the right mental model for format.

Yes, people who already know format will keep making understandable but wrong assumptions about how it works. But I think we can shift some assumptions over time.

@Julian
Copy link
Member Author

Julian commented May 30, 2018

Ah, got it, ok, so the idea is to just completely split it (and others like it) into a completely separate section. Might work.

@handrews
Copy link

Yup! The section would also include (under the "basic meta-data" heading or something vaguely like that) things like title, description, default, readOnly, etc. which are generally understood to not affect validation.

There's also going to be more in the core spec about assertions and annotations, and what those things mean and what you can do with them, so there should be some context around that as well. Again, most end-users won't read all of that, but hopefully we'll get people writing some new tutorials and guides, and they are more likely to read it and be influenced by the new organization of functionality.

@Julian
Copy link
Member Author

Julian commented May 30, 2018

but hopefully we'll get people writing some new tutorials and guide

Yeah agree.

@Julian Julian added the Enhancement Some new desired functionality label Aug 13, 2018
@LukeMarlin
Copy link

Hey, end-user here :)

As you both rightfully said, not everyone goes deep in the spec. I read http://json-schema.org/understanding-json-schema/reference/string.html#format and nothing seemed fishy. After reading the thread, I now noticed that there is a small friendly blue note saying "that's not required". Honestly, I completely skipped it and went straight to what the spec offers. I was really please to see ipv4 format and tried to use it with your lib.

When I noticed that "bob" was considered OK, I went through some syntax checking, spell checking, checks on other tools (that DO implement it by default: https://www.jsonschemavalidator.net/) and was really confused. That's when/why I ended up here, checking if someone reported the issue.

Things I would have liked:

  • In the "understanding jsonschema", this should be a Warning, not a note. If one doesn't try a wrong value, one might have relayed on an ignored check on production (that's not on your side)
  • Maybe an info about this on your readme. It is quite small and I would have seen it without doubt (except if it's just a small note in blue :D)
  • A warn that would hint me: "I've noticed your schema has format, be aware that I don't support this"
  • Being forced to be explicit about the behaviour I need.

There's many ways one can avoid to be in this case (thorough read of docs, unit tests with failing cases), but people make mistakes. A crash is problematic, a silent validation of a wrong data is potentially catastrophic.

Julian added a commit that referenced this issue Jun 20, 2020
fc05651cc Merge pull request #409 from Stranger6667/dd/add-jsonschema-rs
5f1575a93 Add Rust `jsonschema` crate
2bf95beec Merge pull request #407 from fisxoj/master
9ae956b21 Add common lisp implementation to the list
d4ffd569b Merge pull request #401 from json-schema-org/ether/format-uuid
2d6c45711 tests for the "uuid" format
08f6cdaff Merge pull request #400 from json-schema-org/ether/more-format-ipv6
d3064eb3a some more tests for the "ipv6" format
1f34d3321 Merge pull request #399 from json-schema-org/ether/more-format-idn-email
22adda78c also test the "email" inputs against "idn-email"
25598a3b4 Merge pull request #392 from rjmill/rjmill/test-prop-named-ref-containing-a-ref
8dfa8adc9 Merge pull request #380 from ChALkeR/fix-ecmascript-regex
d595dbf9d backport $ref cases, changing "$defs" to "definitions"
ca8319c9e Fix \W test description
452b5f8f4 Test property named $ref, containing an actual $ref
a01ae5404 Fix ECMA 262 regex whitespace tests.

git-subtree-dir: json
git-subtree-split: fc05651cce3889975f8dbcca38c203d6a396694b
@Julian
Copy link
Member Author

Julian commented Jul 26, 2022

There's a note about this in the FAQ which I've updated (quite awhile ago now).

@LukeMarlin and others I agree with most of the feedback here -- in newer drafts the way to be more explicit is $vocabulary, though support there is still in-progress (I do hope it'll fully land soon, working on it alongside a few other things).

I think the feedback about Understanding JSON Schema is likely good too, do take it upstream.

For now though I don't see much more to do here locally in the library, so going to close for now beyond getting $vocabulary fully working. Of course additional feedback is always welcome.

@Julian Julian closed this as completed Jul 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Some new desired functionality
Projects
None yet
Development

No branches or pull requests

3 participants