Replies: 7 comments 12 replies
-
Let's try to answer these points as factually and precisely as possible:
This is an academic paper, an empirical study in the field of software engineering. Its scientific goals are to:
The last part or the paper is a new data structure description language based on JSON, which was initially a separate paper but got unfortunately merged there at the request of the program chair. The goal of the paper is not to promote JSON Schema, it is to study it, and to provide evidence-based feedback about the language design.
About the rational and controversies/disagreements so far, on each parts:
You do not need to take all the suggested spec changes to achieve some of the benefit. There is a continuum of spec changes which can forbid some things in the syntax, and others in the semantics, or keep the errors because you think it does not need to be addressed as it would harm some use cases thus are not worth the benefit.
Dunno!
It seems that currently rules depend on who you are: advocating breaking changes is okay for the spec team, but not so for outsiders with a differing agenda (in our case, an academic study: we are not serving the JSON Schema Community, we are not paid by a company which has a vested interested in promoting this technology because it is selling services based on that, we are independent, which is a requirement for academics). |
Beta Was this translation helpful? Give feedback.
-
The paper has been peer reviewed by researchers. Indeed, it has not been reviewed by the JSON Schema people, but this is not a requirement from our perspective, although it is fine. We read in detail most research papers about JSON Schema.
Please elaborate your claims, otherwise this is empty:
We tried to do a short blog post presenting the main results of the paper. The (interesting) discussion/debate you propose would be a longer one. Why not.
The paper has been accepted at a conference after being peer-reviewed. We are fine/would be happy to update the research report based on a detailed review by other people, if we find it appropriate. We may plan to submit an extended version to a Journal.
Hm... No, not really, IMHO. It is rather a less ambitious data description language.
Yes.
Mostly yes, but JSON Schema is not a programming language as such. Probably most data languages may allow some nonsense stuff at the syntax level, but much less so than a programming language because their power of expression/description is usually reduced compared to a full-fledged programming language, so there is less leeway for errors.
Sure. Our evidence shows unintentional errors in a majority of public schemas, without ambiguity in most cases.
Ok, this is your position. Too bad, no such linter is available for JSON Schema after 14 years and 10 drafts. There is a gradation of checks that can be performed about a DSL, from the syntax to the semantics to external tools: the current design puts very few constraints at the syntax/semantics level and all on a hypothetical linter. We indirectly prototyped one for our research. This research demonstrates that reasonable constraints on the syntax would detect most defects found by our tool in public schemas, a majority of which were proven defective. Our opinion is that the language design should be updated significantly in regard to these findings. We are fine if you want to keep your language as is.
Ok. Our opinion is that you could/should tackle the issue from the spec instead of from the tooling, and this opinion is backed by evidences that this approach works for its/most purposes: we tested the reduced language, it detects most defects, and it is practical. Now you may argue that these constraints are unbearable/unpractical/bad/whatever, but we are yet to be provided with concrete examples and use cases that can be investigated and discussed.
Ok... so what? The recommendations do work for avoiding defects, at the price of changing the spec.
Good! The paper basically describes the defects we investigated, and all our tooling is available online as public domain, feel free to look at, use and improve them.
Please do proceed, we are actually really interested on that point. Up to now we have not found any actual concrete example of use cases which would be made impractical with the restrictions we have suggested. If a few examples are found, then the cost/benefit of covering them with more or less convenience could be debated, but for now we just had believe me, which is not factual enough.
Hm. Maybe. It may just be that a language which silently ignores typos makes it likely that typos are not found, but then we are back to the syntax and our suggestion that it should be much more rigid. ISTM that there is a cluster of mistaken keywords around min/max related keywords, though, that is why we proposed this explanation. As a professor, I have experienced that the more student have to memorize things the harder it is for them. As an engineer who has practiced dozen (dozens?) of programming language in the last 40 years, memory is an issue. Ok, maybe age as well:-) Research shows that the human has a limited memory capacity (eg working memory, Miller 1956...). On the other hand, it is good to be able to give a word to a concept so that it can be named and discussed. So it is complicated. My opinion is that 60 keywords for describing JS data structures looks like a lot anyway.
Yes, and no. We think that these typos happen because THEY ARE NOT DETECTED by tools (validators, ...). In a normal development process, when you have a typo in a programming language, the program does not even compile or cannot run, so usually it would not be committed, and if it is committed it would be fixed at the first occasion and the dev would have an earful from the lead so they would strive to do clean commits. This auto-correction behavior does not happen with JSON Schema because typos are fine wrt to the spec. This is basically what we are suggesting to change.
Please pinpoint where our reasoning is wrong, so we can improve/correct the arguments and/or conclusions.
Interesting point. ISTM that we do not have enough data for that type of analysis.
We suggested to remove keywords which are seldom used and present implementation challenges which do not seem justified for the purpose of describing data structures. Now if typos are not ignored and types are checked in the syntax, the issue about typo errors would be fixed, so the argument about reducing these is much weaker. Anyway, this is not the main point of our recommendations.
We partially disagree: it is not obvious that more expressive is better, it should be a case by case discussion: this feature allows to write that, which is simpler than that (maybe, define simpler), but it adds these constraints and complexity in the implementation, and so on. Probably you did that with the 60 keywords, but the resulting spec is quite complex and people usually write schemas with defects (fact), and 1/3 of keywords are nearly never used in the public schemas we found (fact). Note that we also suggested to add a keyword to combine object properties, which would he a significant help in handing inheritance, a use case which does not seem well served by the current proposal.
Yes, that is one example, there may be other posts as well.
Good.
This is debatable: The research process involves feedback from academic colleagues and anonymous reviewers which are experts in their field of research.
Well, it is what we are doing now by proposing a blog post and discussing the paper on the side.
Please do not presume about the level of diligence in the academic community!
Sure. We are doing that. We have tried to present and list the controversies in our first post above in this discussion.
This is noted. We'll see.
In research, the fact that you are paid by someone who has an interest in your results requires an explicit declaration because it taints the results. This cannot be helped. |
Beta Was this translation helpful? Give feedback.
-
Hi this Ben, Community Manager serving JSON Schema Community. I am not an expert in JSON Schema so I am not going to join the Technical discussion going on in this issue, instead I'd like to start a parallel discussion regarding the benefits of this blog post for the JSON Schema Community: The first time I read the message it appeared to me that your goal was just to present a brief highlight of your study, your findings, and your conclusions and then leave the forum, like you really don't care about how the broader community will perceive those findings and if someone will be able to understand it to take some or all of the proposals forward to evolve the spec and/or the tooling. I am sure that that wasn't your intention but it is what I personally perceived. I see this as an opportunity to improve JSON Schema, and this is why I'dl like to collaborate with you on some elements of the blog:
I hope this makes sense. 🙏 🙏🏾 |
Beta Was this translation helpful? Give feedback.
-
Ok. You may have gathered that I'm not a soft spoken diplomat, so I must work on that. My colleague is much better at that, so with some care and active proof reading the situation should improve.
Fine with me.
The computer science usual acceptance is that a programming language is a language which is somehow Turing complete, whether it is GP or DS is irrelevant. A programming language usually includes a part about describing/declaring data structures, and a part about control flow. JSON Schema only focus on the data structure side. There is no control flow as such (you may argue about if/then/else, but this is rather a mean to express predicates about the data structure). A DSL may or may not be a programming language depending on what it does, there is no inclusion one way or the other.
Thanks for these contextual information.
As we have written kind-of-a linter (freely open source) for our paper, we can provide an opinion, and this objective seems rather elusive:
So even if a linter is available, it may not be widely used. A benefit is that the spec does not need to be changed explicitly, which we understand is a primary concern. The syntax-level language restriction strategy/ies that we have put forward through our proposal would ensure that conforming tools would detect many defects, and the implementation is easy to do and check (with the standard test suite), compared to constraint in the semantics (which means that all conforming tools would have to implement some kind of analysis) or through external tools (linter option above, they must exist and you have to use them). There has been a few counter arguments. Against our data and analyses:
About the proposed changes:
About the general logic: we claim that the changes we suggest fix the identified issues.
Thanks for the history. From the output, it was pretty clear that the road had been bumpy, so your explanations says why.
Ok, preserving backward compatibility is a usual and useful objective for such a project. This is somehow a change of direction as, up to now, the design seems mainly to have focused on allowing upward compatibility (eg accepting unknown keywords), and most past versions have often broken backward compatibility, including the next. As academics, we are not committed to keep fundamental concepts established a long time ago, especially if these concepts lead to error-prone schemas. We looked at how to try to improve the situation at the spec level. It is very unclear whether it is possible to preserve both worlds, but at least we have checked that a significant part of existing schemas already conform to our proposed changes.
If you introduce significant breaking changes, ISTM that the pain will be the same whether you go all the way or half way.
We understand that you mean to focus on backward compatibility, i.e. old schemas are accepted by newer versions, and that will be true after the next version, so this version would be the last to include breaking changes. This suggests to do all appropriate changes now. We are looking forward to a benevolent, constructive and productive discussion. |
Beta Was this translation helpful? Give feedback.
-
The first thing I want to cover is foundational concepts. You have classified JSON Schema as a data description language. While we are aware that many people try to use it in that capacity, that's not what it's designed for. When it is used for that purpose, people do encounter many of the difficulties the paper highlights. JSON Schema is actually designed to be a "data validation language". More specifically, it's a "JSON validation language" because it aims to validate JSON, not necessarily any data. I know this distinction may seem trivial, but when you view things from that perspective, I think it should help explain why some things work the way they do. (JSON Schema is also designed to be a "JSON annotation language", but that's not relevant to the paper, so we can ignore that aspect for the purposes of this discussion.) As a validation language, each keyword represents a specific assertion about the data. That means that a JSON Schema is collection of assertions to be applied to a JSON instance. A JSON instance is valid if it passes every assertion in the schema. An empty schema is making no assertions. In general, keywords are designed to assert one thing and be combined as necessary to make more complex assertions. For example, the This is why schemas are "loose" by default. Being "strict" implies an assertion that properties that aren't mentioned in the schema aren't allowed. We use the terms "open" and "closed" generally in the same way you use "loose" and "strict". It's "open" because it's open for extension using composition ( The above are the foundational concepts of JSON Schema. If you want to study problems inherent in these foundational concepts, that's fine, but it's not constructive to ask us to change those concepts. If we did it wouldn't be JSON Schema anymore. It would be something else. This foundation is going to lead to some good properties and some bad. It will work very well in some applications and be problematic in others. (I want to be clear I'm not saying all of you suggestions violate these fundamental concepts, just that any that are are not something we would consider changing.) Here's an example to illustrate where these properties work to our advantage. {
"type": "object",
"properties": {
"foo": { "type": "string" },
"bar": {}
},
"additionalProperties": false,
"if": {
"properties": {
"foo": { "const": "bar" }
},
"required": ["foo"]
},
"then": { "required": ["bar"] }
} Notice that the Notice that the Notice that the Notice the
I know it seems odd for additional properties to be ignored, but I wanted to point out that an important reason for that to be allowed is to support polymorphism. Let's say I have a Schema that represents a vehicle and another that extends that schema to represent a car.
{
"type": "object",
"properties": {
"isHumanPowered": { "type": "boolean" }
...
},
"required": ["isHumanPowered", ...]
}
{
"$ref": "./vehicle",
"properties": {
"make": { "type": "string" },
"model": { "type": "string" }
},
"required": ["make", "model"]
} (Note that the Let's say my application has an
{
"isHumanPowered": false,
"make": "Toyota",
"model": "Tacoma"
} The desired behavior is that the "make" and "model" properties of this JSON instance are ignored when validating against the vehicle schema. Any kind of vehicle should pass. We could also have a bicycle schema and that should be allowed as well. This is an example of a good reason to use an open schema, but your argument is well taken that this isn't always what people want and it's easy to forget to use
The paper correctly identifies that the reason the meta-schema is loose is to allow for extensions. We agree 100% that this causes more problems than it needs to to support an extension feature. We have a plan to address this in the next release. We consider it important enough to make a breaking change to address this problem. People will have two ways to define custom keywords. One is to use the vocabulary system to define their keywords and the other is to use the I'll have more next week. I hope this was helpful. |
Beta Was this translation helpful? Give feedback.
-
@benjagm: It seems that the discussion about the paper contents has stalled: Too bad, shame on us. We have updated the blog entry to highlight the disagreements as we understood them, add context and caveats, so as to possibly make it acceptable… Nevertheless, the blog entry is not about supporting JSON Schema as it is, thus is somehow controversial. Please consider deciding to: (1) accept the blog entry with its controversies (2) ask for more changes… (3) ditch the entry because you do not want dissenting opinions shown in the community web site. |
Beta Was this translation helpful? Give feedback.
-
Hi @zx80. As you know, we had a lengthy discussion showing everyone's efforts to learn from each other, so first of all we'd like to thank everyone for the effort in engaging this conversation. After long consideration, in the last OCWM, we have decided not to publish the blog and encourage you to use other channels of the JSON Schema Community, like GitHub Discussions to continue the exchange of ideas, or the The JSON Schema's Blog main goal is to promote the JSON Schema adoption, and this is why the content needs to fulfill specific criteria as per the blog guidelines, and as said before, this was not the case. The JSON Schema Community is proud to be a diverse and safe space, and we encourage everyone to respectfully share their opinions, having in mind the JSON Schema Code of Conduct and here we celebrate diversity. We are just providing the appropriate channel for this type of content/discussion for this case. We are mindful that this can be frustrating at this point, but trust us, this was a difficult decision, and we took a lot of care in every step of the process. We hope you stay in the Community so we can continue learning from each other. |
Beta Was this translation helpful? Give feedback.
-
Hi everyone!
We recently received this blog proposal PR by researchers external to JSON Schema Community sharing defects found in public schemas. Even though we are excited to receive new contributions in any possible way, the content differs from the initial purpose of the blog, and we are unclear about the goals of the publication.
We are respectful with everyones work, and we recognize the effort behind the study backing the article, however we'd like to start a conversation between authors and the Community to make sure we publish content that supports the JSON Schema Community the best way.
How to start?
Who can participate?
We are expecting primarily the blog authors and JSON Schema maintainers, however everyone is invited and we'll love to have more opinions about this.
Beta Was this translation helpful? Give feedback.
All reactions