Why should the schema be evaluated? #304

SorinGFS · 2023-01-21T17:13:35Z

SorinGFS
Jan 21, 2023

I have been trying to understand the logic of this project for some time. So forgive me if I ask too trivial questions...

Why should the schema be evaluated? In other words why the schema can't simply contain unknown keywords and instruct the dependent projects to simply ignore them? And where does come the necessity to evaluate the meta-schema itself from?
Why there are vocabularies for so few words? To me, this seems like a performance problem because additional requests are required for files under 1kb and this process will be repeated gzilions of times.
Why are there variations of the same keyword at the schema level? For example minimum -> exclusiveMinimum, maximum - exclusiveMaximum and so on... Why this matter is not left to the dependent application level?
Why are there references ($ref) in the meta-schema itself? Any `$ref' is an additional processing for embeding that schema, why not embeding it from the beginning?
Why keywords addressing the same concern aren't combined into a single keyword? For example: minLength, minItems => minLength or maxLength, maxItems => maxLength , because in javascript the Array.prototype has length, not items.

Some one said:

A perfect thing is not the one to which you cannot add anything, but the one from which you cannot extract anything!

Maybe I'm wrong, but to me seems like from this specs a lot of things can be extracted... From there comes the main question, why does the schema have to be evaluated or why can't the schema be permissive by default and restricted on demand only where it is needed, using something like { type: 'object', strict = true }?

NB: I'm aware that older drafts aren't permissive by default, but this matter can be easily solved to be translated to a new permissive draft by adding a requirement for the dependent projects to consider older schemas as having strict: true in their root object and inherited in deeper levels. IMHO the transition to a recommended IETF standard would be better to be longer than superficial!

Answered by Relequestual

Jan 23, 2023

JSON Schema 2020-12 and prior is permissive by default. I expect future version to remain permissive by default too.

Any implementation which needs a configuration to "enable" "permissive by default" is non-compliant.

I know about AJVs strict mode which is enabled by default. This makes some valid schemas throw an error before even running validaiton.

View full answer

gregsdennis · 2023-01-22T21:08:39Z

gregsdennis
Jan 22, 2023
Maintainer

Hey there @SorinGFS. Thanks for the questions. I'l have a stab at answering them, and then others will probably chime in with their thoughts.

Before I get into the questions, it would be good to understand that JSON Schema is over a decade old at this point and in its 9th iteration. What exists today is the culmination of many authors' efforts and many times more users' feedback.

Why should the schema be evaluated? In other words why the schema can't simply contain unknown keywords and instruct the dependent projects to simply ignore them? And where does come the necessity to evaluate the meta-schema itself from?

The meta-schema is "a schema for the schema." It's a special schema that validates a schema. Currently the meta-schema's identifier is also used to describe how an implementation should process the schema, including what keywords the implementation should recognize. Technically, the meta-schema doesn't need to be evaluated. There usually language-specific techniques that a lot of implementations use to ensure a schema is correct without havnig to perform an explicit meta-schema evaluateion. For example, my implementation, JsonSchema.Net, is written in C#, so I get the benefit of a typing system. I literally can't accept anything except a number in minimum, for example.

Secondly, if a meta-schema is evaluated, it only needs to be evaluated when the schema is loaded, not every time the schema is evaluated. Once loaded and evaluated against the meta-schema, the user can be certain that the schema is valid for the duration of the application.

Historically, an implementation is supposed to ignore unknown keywords. Starting with draft 2019-09, when annotations were introduced, it was decided that implementations which support annotations should collect the values of unknown keywords as annotations. I think this aligns with what you're expecting.

However, moving forward, we will likely forbid unknown keywords in an effort to make forward-compatability guarantees. You can read more about that in these discussions:

Why there are vocabularies for so few words? To me, this seems like a performance problem because additional requests are required for files under 1kb and this process will be repeated gzilions of times.

I assume you're talking about the meta-schemas for vocabularies like content, unevaluated, and format, which contain 3, 2 and 1 keyword(s), respectively.

These were separated because they target a niche audience, they can be difficult to implement fully, and separating them helps modularity.

The content and unevaluated vocabularies aren't needed by the majority of users. We include them by default in our meta-schemas, but if you don't need the keywords that they define, you can create a meta-schema that doesn't require them or include them at all. This can only be done because they're separate.

Why are there variations of the same keyword at the schema level? For example minimum -> exclusiveMinimum, maximum - exclusiveMaximum and so on... Why this matter is not left to the dependent application level?

Those two keywords evaluate differently for non-integers. Consider these schemas:

{ "type": "integer", "minimum": 6 }

{ "type": "integer",, "exclusiveMinimum": 5 }

These two schemas are effectively equivalent. That is, they'll validate the same sets of data. But let's change the type to number:

{ "type": "number", "minimum": 6 }

{ "type": "number",, "exclusiveMinimum": 5 }

Now they validate different sets of data. The second will validate 5.5, but the first will produce a validation error.

Often, this nuace is necessary. The validation meta-schema actually uses this to define multipleOf. This value must be a positive number. Negative numbers and zero don't make sense, but positive values very close to zero do make sense.

As to why this isn't left to the consuming application, the application expects the schema to do the validation for it. That's its purpose. Since this is a validation that a schema can adequately describe, we support it.

Why are there references ($ref) in the meta-schema itself? Any `$ref' is an additional processing for embeding that schema, why not embeding it from the beginning?

Supporting $ref enables several things. Most notably:

Code consolidation - create a single definition for something and reference it in muliple places
Recursive data descriptions - data structures that contain themselves, e.g. linked lists, trees, etc.

In the case of recursive data descriptions, completely resolving those references is impossible.

There is an effort to define JSON referencing for non-schema consumers as well that you may be interested in. This repo has a couple proposals that you can read up on.

Why keywords addressing the same concern aren't combined into a single keyword? For example: minLength, minItems => minLength or maxLength, maxItems => maxLength , because in javascript the Array.prototype has length, not items.

First, JSON Schema isn't Javascript. It's language agnostic. We can't assume features and conventions of a particular language. The only thing we assume is the JSON data model (we actually support other syntaxes that align with this data model).

Secondly, these keywords operate on different data types, and so they do slightly different things.

minimum only works on numeric types; its meaning is a lower bound for a numeric value
minLength only works on strings; its meaning is a lower limit for the number of unicode code points in a string
minItems only works on arrays; its meaning is a lower limit for the number of items in an array
minProperties only works on objects; its meaning is a lower limit for the number of properties in an object

These keywords do slightly different things. They seem related, but JSON Schema decided that since they target different kinds of data, it's worth separating them.

That said, if you wanted, you could create a vocabulary that defines a min keyword that does all of these things. I probably wouldn't implement it, though. I think that the above keywords are simpler to implement at the cost of some boilerplate code and repetition.

One of our philosophies is that keywords should be targeted and simple. This is why over the years, we've

defined true as {} and false as { "not": {} } - This allowed us to describe keywords like additionalProperties as having merely a schema as a value instead of supporting it containing a schema but also the special cases where it contained a boolean.
split $id with a plain-name fragment (e.g. #foo) into $anchor
defined unevaluatedItems and unevaluatedProperties as more powerful variants of additionalItems and additionalProperties
split the array form of items out into prefixItems (this also allowed us to remove additionalItems entirely)

using something like { type: 'object', strict = true }?

"Strict" isn't something that's defined by JSON Schema. Some implementations may define it (most notably AJV), but it's not a concept that comes from here. I'm not sure if that's the perspective you're coming from, but that implementation specifically has made some decisions that do not align with the JSON Schema spec.

IMHO the transition to a recommended IETF standard would be better to be longer than superficial!

We have already decided to move away from IETF and their processes. I'm not sure what is being recommended here.

... why can't the schema be permissive by default and restricted on demand only where it is needed

I'm aware that older drafts aren't permissive by default

I'm not sure I understand what you mean by "permissive by default." It seems you're wanting unknown keyword support, but as I mentioned before, that's already supported in the current draft.

If you mean something else, can you elaborate?

6 replies

Relequestual Jan 23, 2023
Maintainer

I feel it prudent to remind you if you're not already aware of our code of conduct: https://github.com/json-schema-org/.github/blob/main/CODE_OF_CONDUCT.md

Please show a little respect. There's no need to attack the work of our community.

Your response makes a lot of assumptions. Many of them are inaccurate.

There are multiple other projects that aim to do similar to what you suggest, even achieving standard status and RFC number. Yet JSON Schema is still here and continues to evolve in the direction we see the community and industry ask for and need.

SorinGFS Jan 23, 2023
Author

Please show a little respect. There's no need to attack the work of our community.

I'm sorry, there wasn't any intent to offend!

SorinGFS Jan 23, 2023
Author

@Relequestual
Your failure would be my failure too, because I spent cumulated at least 6 months of my life trying to find a solution for my projects, and long time ago I choosed json-schema. Waiting for a stable json-schema I worked the other parts of my projects until I came to the point where there is only one part left, the one based on json-schema, so a failure in this area would be a huge failure for me.

Relequestual Jan 24, 2023
Maintainer

The current team have spent roughly the last 6 years of our lives putting in mostly evenings and weekends into JSON Schema. It's only very recently that JSON Schema benefits from people being paid full time to work on it.
I did similar to you about 6 years ago, and JSON Schema was far more lacking... I wanted to help... and here we are 6 years later.

SorinGFS Jan 24, 2023
Author

You did a fantastic work, and I'm very very grateful for that! As for your next release I really think you should take your time to make it perfect. II think you shouldn't set a deadline for the next release because at the end of the day what will matter will be the quality, not the quantity. World wide things are settled now, developers have enough options for json-schema to work with, there is no longer the hunger that was a few years ago. A new version only makes sense if it would ensure long-term stability.

Relequestual · 2023-01-23T14:57:31Z

Relequestual
Jan 23, 2023
Maintainer

JSON Schema 2020-12 and prior is permissive by default. I expect future version to remain permissive by default too.

Any implementation which needs a configuration to "enable" "permissive by default" is non-compliant.

I know about AJVs strict mode which is enabled by default. This makes some valid schemas throw an error before even running validaiton.

7 replies

SorinGFS Jan 23, 2023
Author

Correct.

Nice, I didn't know that. There is so much confusion among implementations, I don't think ajv is the only one treating as strict by default. And I didn't wanted and I still don't want to create my own implementation, I just want to be able to rely on any implementation, because in their basis some subsequent projects are creating schemas which behave differently depending on environment. For example, if I choose to work with react-json-schema-form which is ajv based, basically I would have to exclude schema providers that in fact are respecting json-schema specifications! More deeper, for example mongodb is stuck in draft-04 and they also forbids unknown keywords, therefore a basically valid schema cannot be saved in the database entirely, it should be saved partly on the disk! I came here with a dilemma, I go back with two!

Relequestual Jan 24, 2023
Maintainer

We have raised those concerns with the releveant people years ago. We can't make people change things. They have to hear it from their customers and users.
As an aside, AJV isn't really being maintained anymore, as far as I know.

SorinGFS Jan 24, 2023
Author

We have raised those concerns with the releveant people years ago. We can't make people change things. They have to hear it from their customers and users. As an aside, AJV isn't really being maintained anymore, as far as I know.

I think there is a way around this, I will provide my view about this when I will be able to provide working examples about thing involved.

But, when you say "We can't make people change things. They have to hear it from their customers and users. " ... I have a different vision about how this should work: you shouldn't care about what people thinks, you shouldn't care about how people acts, you should only care about logic! IMHO whatever you achieved with json-schema it is due to logic! Of course, there were mistakes, tries, but the logic was the magic ingredient which led to this form of json-schema. And the logic should drive this project to a shape in which peaople doesn't even need to think what version should use, or what is the godamn keyword... Same as if I start writing a stylesheet I don't need to know the version of css, or if I write an html document I don't wonder if wasn't changed in the meantime... I magine people writing schema for a deep deep object without worriying about it's correct location that being asured by it's position in the file system or database, just like this:

File: base/schemas/apps/foo/schema.json

{
    "type": "object",
    "properties": {
        "user": { "type": "string" }
        "enabled": { "type": "boolean" }
    }
}

... and the implementation respecting json-schema will bring me just that tiny part of schema when I search let's say some-url/#apps/foo

... and I think we are really close to that moment!

Relequestual Jan 25, 2023
Maintainer

We've been thinking and debating about similar things for over a year. We welcome new ideas, but you might find we've already discussed many of them in GitHub Discussions already. Worth checking.

SorinGFS Jan 25, 2023
Author

Thank you, I will definitely look for the relevant discussions for the way forward. To find new ideas, I usually analyze the draft/next project (as well as the previous drafts). Apart from what is seen in the draft/next, are there other decided things that are to be changed? (I just want to know if is up to date, if not I will search for details myself in the discussions, no need to detail).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JSON Schema

Why should the schema be evaluated? #304

{{title}}

Replies: 2 comments 13 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

JSON Schema

Why should the schema be evaluated? #304

SorinGFS Jan 21, 2023

Replies: 2 comments · 13 replies

gregsdennis Jan 22, 2023 Maintainer

Relequestual Jan 23, 2023 Maintainer

SorinGFS Jan 23, 2023 Author

SorinGFS Jan 23, 2023 Author

Relequestual Jan 24, 2023 Maintainer

SorinGFS Jan 24, 2023 Author

Relequestual Jan 23, 2023 Maintainer

SorinGFS Jan 23, 2023 Author

Relequestual Jan 24, 2023 Maintainer

SorinGFS Jan 24, 2023 Author

Relequestual Jan 25, 2023 Maintainer

SorinGFS Jan 25, 2023 Author

SorinGFS
Jan 21, 2023

Replies: 2 comments 13 replies

gregsdennis
Jan 22, 2023
Maintainer

Relequestual Jan 23, 2023
Maintainer

SorinGFS Jan 23, 2023
Author

SorinGFS Jan 23, 2023
Author

Relequestual Jan 24, 2023
Maintainer

SorinGFS Jan 24, 2023
Author

Relequestual
Jan 23, 2023
Maintainer

SorinGFS Jan 23, 2023
Author

Relequestual Jan 24, 2023
Maintainer

SorinGFS Jan 24, 2023
Author

Relequestual Jan 25, 2023
Maintainer

SorinGFS Jan 25, 2023
Author