Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split Overview into the two specific use cases #1370

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
108 changes: 69 additions & 39 deletions jsonschema-core.xml
Original file line number Diff line number Diff line change
Expand Up @@ -125,57 +125,51 @@
</t>
</section>

<section title="Overview">
<!-- JSON Schema accomplishes two objectives, which each get their own section. -->
<section title="Validation">
<t>
This document proposes a new media type "application/schema+json" to identify a JSON
Schema for describing JSON data.
It also proposes a further optional media type, "application/schema-instance+json",
to provide additional integration features.
JSON Schemas are themselves JSON documents.
This, and related specifications, define keywords allowing authors to describe JSON
data in several ways.
A JSON Schema document describes a validator (also known as a "recognizer" or "acceptor") which classifies a provided JSON document as "accepted" or "rejected."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "accept"/"reject" terminology is new. I see you use it later in the PR as well, but it's not used throughout the document.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's new to this spec, but it is used widely outside JSON Schema and may help new readers understand what is going on. I'm going to suggest we should use accept/reject more often (it greatly simplifies the phrasing of many sentences), but that'll be an issue for later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remove that language from this PR and open an issue for that change, please?

I'm not opposed to it, but I think vernacular should be an agreed-upon change, not something that's just snuck in.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well my point is there's a certain segment who may see our language as new, and "accepts" is the existing term they're familiar with. I think we should use a variety of language to introduce and define the concepts, and then we can use our choice of term for the rest of the document. Is there a problem with this line of thinking?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a problem with introducing them, but this PR doesn't seem the place for it. I'd like to get the opinions of the other maintainers.

Copy link
Member Author

@awwright awwright Apr 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know what a finite state machine is, I still don't find the references you're adding helpful

significantly fewer people have a real understanding of them or how a JSON Schema can be mapped into one

Ok, though my argument is that not every part of the intro has to be helpful to everyone; it has to be written so that the widest possible audience will understand what JSON Schema accomplishes for them.

The two biggest audiences, I think, will be application developers ("I want a DSL for checking JSON, instead of doing it in code") and formal grammars ("I know what ABNF and DTDs are, I want this for JSON").

I think you'll find that other similar technology uses technical terms much more heavily than I'm suggesting we do.

I looked at the introduction for ABNF, which I found far too technical for most people to understand. It talks in technical terms that it's a formal syntax, but doesn't really describe why you'd want to use it at all, or use it over other languages.

XML DTD also talks about formal grammars, validators, and uses the accepts/rejects terminology; but it too is somewhat technical and it's not immediately obvious to me who the target audience is.

So what I'm looking for is (1) should the formal grammar audience be accommodated in the introduction? (Since ABNF and DTDs both seem to be written exclusively for this audience, I would suggest this is important.)

And (2) if we should accommodate the formal grammar audience, is there a better way to write it so that it's more helpful for them, and less confusing to others?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, though my argument is that not every part of the intro has to be helpful to everyone; it has to be written so that the widest possible audience will understand what JSON Schema accomplishes for them.

This is a code review, where saying "I don't find it helpful" is to say "I believe you should not add this, it isn't helpful to a wider audience", not simply offering my own anecdote about my personal reading.

XML DTD also talks about formal grammars, validators, and uses the accepts/rejects terminology;

Section 2.8 of a document is wildly different from being literally the first paragraph of the actual content of the document. I also don't see the "accepts/rejects" terminology in the section you linked. It uses "valid", as we already do.

So what I'm looking for is (1) should the formal grammar audience be accommodated in the introduction?

You already have my own opinion, now three times: no, we should not.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not simply offering my own anecdote about my personal reading

Ok, I ask because saying "I don't find it helpful" is suggestive of a personal opinion without projecting what others will think; saying "I don't believe this will be helpful" is a general observation of the sort I'm looking for.

I'm going to have to think about what else to say, if it's not immediately obvious that formal grammars are related here, as that's the formal study of what JSON Schema is fundamentally doing.

I also don't see the "accepts/rejects" terminology in the section you linked. It uses "valid", as we already do.

XML does not use the term "validates" (in the third person singular) to refer to an outcome (and actually it doesn't use it in that form at all). It uses "validate" to describe a process, "accept"/"matches", and "reject" to describe outcomes of that process, and "valid" to describe documents that have been accepted by the process, but nothing like "validates successfully" as we do.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I ask because saying "I don't find it helpful" is suggestive of a personal opinion without projecting what others will think; saying "I don't believe this will be helpful" is a general observation of the sort I'm looking for.

At the risk of quoting myself, the comment I left before that was quite clear on which I was intending, please don't ignore it:

All in all I find the first few paragraphs here to be a step back

I don't see them as adding understanding to someone reading the spec

what's here in this whole PR does too much

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to bow out of this PR as well, as I've I believe communicated I'm -1 on the changes in their current form, and that there might be smaller changes that I'm more positive on but that they're sufficiently far away from this PR in its current state that it's not a matter of rewording a small bit here and there. It bears repeating I suppose that that's just my vote, and others may disagree of course, though obviously I've landed on this PR after Greg sounds like he was expressing similar doubts.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the schema describe a validator? I would expect people think of the "validator" as the implementation, not the document.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that makes sense... There's a sense in which these two uses are actually the same, the "validator implementation" is just a generic form of validator that is configurable. Like if I have a schema, then if the program is written or compiled to work only with that schema, or if it's generic and configured at runtime, makes no difference.

Is there a better name for "the program that tests an input against some specific schema"?

Copy link
Member

@gregsdennis gregsdennis Mar 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you understand my point. Colloquially, the "validator" is the implementation, not the schema. I think we need to stick with this.

Saying the schema itself is the validator will be confusing. A validator evaluates JSON against a schema. The schema is no more than configuration.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Colloquially, the "validator" is the implementation, not the schema.

I believe I see the point you're making, but I'm adding, this is similar to how we discuss compilers and interpreters. You're pointing out a definition of "validator" that functions like an interpreter: there's a library that reads the schema (the source code), then uses this interpretation to validate JSON.

But you can also compile source code to a program, and run the program directly. In this paradigm, there is no interpreter (what is usually called the validator), but the compiled program is still a "validator" (a thing that performs validation). It just has no concept of a schema (any more than a compiled C program can parse C).

So with JSON Schema, the schema is not the validator (as such), but I think you can say it describes a validator.

Copy link
Member

@gregsdennis gregsdennis Mar 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see where you're coming from, but never in my experience with this project have we used "validator" that way. It has always been used to mean the implementation.

At best, this reads weird.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Julian If I compile a schema or curry away the schema argument, leaving an executable that only reads an instance, what terminology should we use for the compiler, and the program/function it outputs?

I ask because in my opinion, I think the function that accepts the instance would be the "validator", not the compiler. And I argue this usage is entirely consistent with most "validator" libraries that are more like interpreters (they both parse the schema, and validate instances, in a single package).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe we need terminology for such a concept in the spec at all (and certainly not at this point in time). What we use today is fine, "implementation", which refers to the executable program capable of doing things with schemas.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the function that accepts the instance would be the "validator"

This I agree with, but it doesn't follow from this that the schema is a validator. The schema is still just "configuration" (if you want to call it that. It still goes through a library/application, and you get an output. It's just that your example also produces an intermediate output of an executable function that represents a specific schema. The system is inputting the JSON Schema (most likely as JSON or YAML text) and an instance and getting out whether the instance is valid according to that schema. That "compile" step is an intermediate implementation detail that doesn't need to be covered in the spec.

The spec needs to concern itself with one thing:

  • inputs: a schema and an instance
  • output: validation results and/or annotations

Anything an implementation does to get from input to output is necessarily beyond the scope of the spec.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but it doesn't follow from this that the schema is a validator

I see, this isn't what I intended to convey. By saying "the schema describes a validator" I think that would disconnect the schema (the description) from the validator (the actual process). Is a different word is in order here, or some additional explanation ("the schema describes the behavior of a validator")?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's necessary to say that at all.

A schema describes a set of constraints and annotations that can be applied to an instance. That's it. There's no need to bring in implementations of any form.

It supports "structural validation" (context-free grammars), and certain more complicated conditions.
Validation follows JSON semantics, so two documents that are value-equal, but vary only by character escapes, property ordering, or whitespace, will validate with the same result.
</t>
<t>
JSON Schema uses keywords to assert constraints on JSON instances or annotate those
instances with additional information. Additional keywords are used to apply
assertions and annotations to more complex JSON data structures, or based on
some sort of condition.
With respect to a given schema, an input document accepted by that schema is called an "instance."
A JSON Schema may be used to specify sets of JSON documents, by referring to the set of all possible instances of that schema.
</t>
<t>
To facilitate re-use, keywords can be organized into vocabularies. A vocabulary
consists of a list of keywords, together with their syntax and semantics.
A dialect is defined as a set of vocabularies and their required support
identified in a meta-schema.
A condition for accepting a document is called an "assertion".
Assertions impose constraints that instances must conform to.
Given a schema and an instance, the schema "accepts" an input whenever all the assertions are met,
and the schema "rejects" when any of the assertions fail.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"rejects" needs an object, i.e. what is being rejected?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The input JSON document, as was mentioned in 'the schema "accepts" an input whenever...'

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but grammatically, you need to repeat the object.

Schemas without any assertions accept all JSON documents.
</t>
<t>
JSON Schema can be extended either by defining additional vocabularies,
or less formally by defining additional keywords outside of any vocabulary.
Unrecognized individual keywords simply have their values collected as annotations,
while the behavior with respect to an unrecognized vocabulary can be controlled
when declaring which vocabularies are in use.
Assertions are encoded into a JSON Schema using "keywords," described below.
</t>
</section>

<section title="Annotation">
<t>
This document defines a core vocabulary that MUST be supported by any
implementation, and cannot be disabled. Its keywords are each prefixed
with a "$" character to emphasize their required nature. This vocabulary
is essential to the functioning of the "application/schema+json" media
type, and is used to bootstrap the loading of other vocabularies.
A schema may also describe an "annotator," a way to read an instance and output a set of "annotations."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the schema describing an annotator? (same as "validator" above)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, similar situation, I have a schema, and I want to use it to compile a program that takes a JSON input and returns an output format. It's not otherwise configurable, maybe this is an HTTP service. What do I call that program?

Annotations can be any output metadata about that instance.
</t>
<t>
Additionally, this document defines a RECOMMENDED vocabulary of keywords
for applying subschemas conditionally, and for applying subschemas to
the contents of objects and arrays. Either this vocabulary or one very
much like it is required to write schemas for non-trivial JSON instances,
whether those schemas are intended for assertion validation, annotation,
or both. While not part of the required core vocabulary, for maximum
interoperability this additional vocabulary is included in this document
and its use is strongly encouraged.
For example, you can document the meaning of a property,
suggest a default value for new instances,
generate a list of hyperlinks from the instance,
or declare relationships between data.
Applications may make use of annotations to query for arbitrary information;
for example, to extract a list of names from a document with a known structure.
Annotations may also describe values within the instance in a standard way;
for example, extracting a common type of hyperlink from many different types of documents, using a different schema for type.
</t>
<t>
Further vocabularies for purposes such as structural validation or
hypermedia annotation are defined in other documents. These other
documents each define a dialect collecting the standard sets of
vocabularies needed to write schemas for that document's purpose.
Like assertions, the instructions for producing annotations are encoded in a schema using keywords.
Output is only defined over valid instances,
so annotations are not returned until the input has been validated.
However, not all valid input is meaningful or true to a given application.
That is, if you process an arbitrary instance with nonsense data,
the resulting annotations may not necessarily be true, even though the input is valid.
Comment on lines +170 to +172
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use of "true" here is odd. What does it mean for an input to be "true" to an application?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I struggled a bit with how to phrase this. I'm trying to explain the phenomenon of "garbage in garbage out" and that the assertions don't have to be 100% completely defined.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think dropping "true" and sticking with "meaningful" is the right way here.

</t>
</section>

Expand Down Expand Up @@ -394,6 +388,42 @@
</t>
</section>
<section title="Schema Vocabularies">
<t>
To facilitate re-use, keywords can be organized into vocabularies. A vocabulary
consists of a list of keywords, together with their syntax and semantics.
A dialect is defined as a set of vocabularies and their required support
identified in a meta-schema.
</t>
<t>
JSON Schema can be extended either by defining additional vocabularies,
or less formally by defining additional keywords outside of any vocabulary.
Unrecognized individual keywords simply have their values collected as annotations,
while the behavior with respect to an unrecognized vocabulary can be controlled
when declaring which vocabularies are in use.
</t>
<t>
This document defines a core vocabulary that MUST be supported by any
implementation, and cannot be disabled. Its keywords are each prefixed
with a "$" character to emphasize their required nature. This vocabulary
is essential to the functioning of the "application/schema+json" media
type, and is used to bootstrap the loading of other vocabularies.
</t>
<t>
Additionally, this document defines a RECOMMENDED vocabulary of keywords
for applying subschemas conditionally, and for applying subschemas to
the contents of objects and arrays. Either this vocabulary or one very
much like it is required to write schemas for non-trivial JSON instances,
whether those schemas are intended for assertion validation, annotation,
or both. While not part of the required core vocabulary, for maximum
interoperability this additional vocabulary is included in this document
and its use is strongly encouraged.
</t>
<t>
Further vocabularies for purposes such as structural validation or
hypermedia annotation are defined in other documents. These other
documents each define a dialect collecting the standard sets of
vocabularies needed to write schemas for that document's purpose.
</t>
<t>
A schema vocabulary, or simply a vocabulary, is a set of keywords,
their syntax, and their semantics. A vocabulary is generally organized
Expand Down Expand Up @@ -1357,7 +1387,7 @@
specification and the companion Validation specification.
</t>
</section>
<section title="Non-inheritability of vocabularies ">
<section title="Non-inheritability of vocabularies">
<t>
Note that the processing restrictions on "$vocabulary" mean that
meta-schemas that reference other meta-schemas using "$ref" or
Expand Down