-
-
Notifications
You must be signed in to change notification settings - Fork 266
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Could my "Son" project be useful to JSON Schema? #274
Comments
Interesting! I think that such a thing is useful, but perhaps orthogonal to JSON Schema? Meaning that I think that requiring JSON Schema to use or operate on a restricted format of JSON (for any restriction) would reduce its applicability and therefore adoption. But enabling such things is very desirable (not entirely unlike enabling use with a media type such as CBOR that can map to JSON). Also, have you seen I-JSON? I'm only vaguely aware of it so not sure how similar the projects are. |
I hadn't seen I-JSON! Very glad to know about it, I added it to my collection of JSON subsets: https://housejeffries.com/page/7. Let me know if you find more. I actually agree that we don't want JSON Schema to only operate on serialized Son. We want it to operate on all JSON, not just a subset! What I'm actually proposing is a little subtler. The argument goes like this:
Which means that we don't distinguish between exponential and non-exponential notation (this makes sense, as letting someone say "you have to use exponential notation here" is probably outside the scope of JSON Schema). But we're still trying to decide whether JSON Schema should distinguish between
Son is such a subset, though of course the particular decisions it makes might not be suitable for JSON Schema, we could always make another. |
(One thing I really need to do is start working on the Son Parser Specification. Right now the only thing I've written is the data format. This make take a while, I want to get it exactly right. It's going to be something like "There must be a bijection between Son JSON and the parsed representation of the Son JSON", but I might be able to improve on that). |
That's a fundamental issue with JSON, and not one that I think can/should be addressed by JSON Schema. JSON Schema inherits JSON's ambiguities and must deal with them.
Because JSON treats them ambiguously, so JSON Schema has to support the ambiguity. In the case of #152, we're deciding whether to support the common model from many languages of 1 being an integer while 1.0 is a float. Or perhaps to allow making that distinction through validation. For exponential notation, I would consider a I may still be misunderstanding your point, but I just don't see that as a problem with JSON Schema. Your Son project sounds very interesting, and useful, but I think JSON Schema needs to be and describe regular JSON instances. |
I'll try to explain what I'm saying better. It's relevant to this part of the spec:
This is saying that JSON Schema can't actually distinguish all of JSON. In the case of insignificant whitespace it's said explicitly, but consider numbers as well. Once Lets forget Son for now (at the moment I don't actually think Son is right for JSON Schema, except as a thought experiment). Are we sure we want JSON Schema only concerning itself with part of the details of JSON? This is the way we're doing it now if I read the spec correctly. And if so, are we happy about how we've defined this subset? I'm not sure about this, there's are things in the current language that seem ambiguous. |
@seagreen I think I see what you're getting at but I'm not sure I follow to the same conclusion. As far as I can tell JSON Schema accepts all possible JSON values (conforming implementations do not need to handle repeated object keys in any particular way, so saying that the effect is undefined is just acknowledging that JSON parsing libraries are inconsistent). The data model is just how a validator is supposed to process things. Implementing "minimum" in terms of JSON representation strings would be needlessly complicated, so the data model says to treat numbers as numbers. The "arbitrary-precision base 10" part is how numbers are described in the most recent JSON RFC. There is no difference in how the validation keywords would handle 1e1 vs 1E1, so this data model has sufficient precision to make all validation functions work. What are we losing by not being able to distinguish 1e1 and 1E1? JSON Schema isn't responsible for re-encoding the data into JSON, so we don't actually need to know how it was originally represented. As for part of the details... there are many limits to what you can detect or enforce with validation. I don't see this limit as any more or less significant. If we wanted to offer validation for that, then we might need to update the data model. But is there a compelling use case? |
Absolutely! The fact that you had to say this means I've been explaining myself badly. JSON Schema should definitely be able to validate all JSON values. The question is what it can distinguish. I think the current description of what it can distinguish can be improved. Take strings for instance (which are nice and simple). It currently says: |
eh, to be fair, I was super-tired when responding last night and probably should have just left it until the morning :-P
I'm still now sure how this matters. This is just identifying the part of the JSON spec (the "string" production) that fits into the validation data mode (unicode code points, which map to abstract characters). That has nothing to do with how those code points are or are not escaped in the JSON document representation. In fact, it is specifically there to avoid that problem- the JSON spec and the unicode spec determine how the representation is parsed into code points, and the implementation language determines how that is represented in memory. If we could distinguish between representations before and after escaping, what would we do with that information? Just like the distinction between the two representations "1e1" and "1E1", I understand that you are talking about those distinctions, but I cannot come up with a single way in which we would want to use that information. The entire point of the data model is to make it absolutely clear which distinctions are meaningful and which are not. I can't think of any distinctions that the data model excludes as meaningless that would be of any use to us. |
You could write a schema saying U+007f (the DELETE character) isn't allowed in this document. I could see a use for that. But! I'm not saying we should do that. I personally like that JSON Schema doesn't know the difference between the single U+007f Unicode character and the sequence My concrete suggestions are twofold: add "unescaped" to the definition of JSON Schema string, and think hard about #152 because almost no JSON parsers in the wild preserve the number of significant digits, so we might not want JSON Schema to require that. |
Can't you do that already? The escaping doesn't matter, both the schema and the instance are parsed, then JSON Schema rules become relevant. I think we're going to have to wait for other folks to chime in because I'm just not getting this :-/ As far as #152, I have no opinion on it specifically, but I would oppose validating anything that is lost during RFC-conforming parsing so that would seem to mean opposing #152. If nothing else, this discussion has clarified that for me so thanks for persisting! |
Sounds good, we can wait for other commenters. If it turns out no one else is interested we can just close this.
Did you take a look at the reddit discussion I linked to? (https://www.reddit.com/r/programming/comments/59htn7/parsing_json_is_a_minefield/d98qxtj/) I really don't think there's such a thing as RFC-compliant parsing. JSON is a specification for certain sequences of codepoints, not for parsers or generators. |
@seagreen could you put a more descriptive title on this? I keep having to read it again to figure out why it's still open :-P I would change the title myself but I still really do not understand what you're trying to do here. With respect to "RFC-conforming parsing" I just mean any parser that is considered to correctly map RFC-conforming codepoint sequences into a given language's data model. If "successfully" is impossible to define precisely, I still do not think it is JSON Schema's responsibility to "fix" that. JSON Schema works with the resulting data model, not the encoding. |
Let's close this, I don't think there's any interest in this from JSON Schema side. I do appreciate your patience while I babbled away here though. ❤️ In case anyone who comes along later is wondering what I was say (because re-reading the thread I don't think I explained myself well):
|
@seagreen thanks! and you're welcome :-) |
One of the issues with JSON is that places basically no restrictions on parsers and generators. I first learned about this from a Reddit comment, of all things: https://www.reddit.com/r/programming/comments/59htn7/parsing_json_is_a_minefield/d98qxtj/
This puts projects like JSON Schema in an awkward position where they have to decide which details of JSON are insignificant and which aren't.
Clearly, whitespace outside of JSON Strings is insignificant. Clearly, the different between
1
and2
is significant. But in between lies a gray area: Should JSON Schema be able to specify that certain control characters must be escaped? For instance the spec doesn't requirex1f
(DEL) to be escaped, but that might cause a problem in some circumstances. What about numbers? The difference between10e2
and10E2
is insignificant, but what about100
and1.00e2
?I'm working on a subset of JSON called Son that I'd like to be able to answer these problems. The goal is to eliminate redundencies in JSON so that actual restrictions can then be placed on parsers, such as "there should be a bijection from serialized Son to the parsed representation of Son". Then projects like JSON schema could only concern themselves with the subset of JSON represented by Son, instead of each project trying to figure out what part of JSON it wants to cover on its own.
If this doesn't seem like a helpful thing to base JSON Schema on feel free to just hit the close button, I don't want to clog up the issues with promoting my own project. If this does seem interesting, but you're not happy with the specific decisions Son made, please let me know either here or in Son's issues so I can look into it.
The text was updated successfully, but these errors were encountered: