Improve JSON schema #64

mwoehlke · 2024-05-10T23:09:06Z

Add a link to the JSON schema from the main documentation. Pretty-print said schema and add format and default, with the latter also moving to the "machine readable" block in the primary documentation. Add custom Pygments style to reflect the coloring used elsewhere in the documentation, and "fix" the language of the sample CPS file. Factor out some duplicated code. Fix some flake8 warnings.

Refactor some of the logic to handle attribute types to reduce code duplication between the reST parsing and creating the JSON schema.

Add ability to specify indentation to the schema builder. Add an option (currently set to '2') to specify the indentation via the Sphinx configuration file. This makes the schema easier for humans to read at the expense of increased file size.

Add a link to the generated JSON schema from the main documentation. This required a bit of fiddling to be able to link to a file that is generated, but not by Sphinx (proper), as there does not seem to be any built-in way to give a relative link to something that Sphinx doesn't know about. On the plus side, having to concoct such a mechanism allowed providing a role that obtains the proper file name from the site configuration. This is a very odd role, in that the input is taken as the link text, and the link reference is controlled by the site configuration, but it works for now. A better solution might be to provide a built-in target definition, but I'm not sure how to do that, or if it would permit "undefined" local links. (The approach here sort of "tricks" Sphinx into thinking the link is external.)

Add and use a custom Pygments style that more closely reflects the style used elsewhere in the documentation. Change the "language" of the sample CPS file from JavaScript to JSON.

Remove unused imports and fix "comment style" in conf.py. The latter is actually a line of code that's commented out, but adding a space after the '#' is harmless and makes flake8 happier.

mwoehlke · 2024-05-10T23:10:42Z

FYI @autoantwort, this should fix a couple of the issues you raised. I'd appreciate if you can verify whether I did it right; thanks!

autoantwort · 2024-05-23T15:44:46Z

schema-supplement.rst

@@ -90,6 +90,7 @@ By definition, none of the following attributes are required.
 .. cps:attribute:: website
  :type: string
  :context: package
+  :format: uri


It seems that this does not work. I have not found format in the resulting schema

Oops, that's because of a dumb typo; the Python code uses the variable name typeformat because format is a built-in, but accidentally used that also as the name of the reST option to inspect. Should be fixed; thanks for the catch!

LecrisUT · 2024-07-02T18:04:22Z

A suggestion, why not invert the generator to have rst be generated from json-schema (with sphinx-jsonschema). Reasoning is that the json schema can be much more powerful, e.g. checking sem-version, conditional dependencies, etc. which are less intuitive to define in the rst format. JSON format is of course rather cumbersome to navigate, so maybe having it in a yaml file would be better

Make default values part of the "machine readable" portion of an attribute specification. This makes them stand out more in the documentation, and also allows them to be exported to the JSON schema. Also add value format, which is exported to the JSON schema but does not appear in the main documentation.

mwoehlke · 2024-07-19T18:49:40Z

why not invert the generator to have rst be generated from json-schema?

@LecrisUT, #25 did something similar (albeit without reST generation). The major reason is that reST is significantly more friendly to edit than a JSON schema. It's also unclear how well generating reST from something else allows reST markup to be used in the "something else" (even at best, editing is going to lost editor syntax highlighting), and generating reST also complicates building the documentation. The reality is that the reST is the source of truth and the JSON schema is supplemental and is intended primarily for use in machine validation.

LecrisUT · 2024-07-19T19:22:02Z

The major reason is that reST is significantly more friendly to edit than a JSON schema.

A few pointers:

JSON schema does not have to be written in json, you can do it in yaml
json schema has a json schema that will help you write it
you can split a schema into components and either reference within the document or across documents ($ref, $definitions, stores)
a simple pytest module can be implemented (or a pre-commit, but that's more fragile) to regression test the schemas
cmake's schema has a quite neat design to handle version-specific schemas

It's also unclear how well generating reST from something else allows reST markup to be used in the "something else" (even at best, editing is going to lost editor syntax highlighting)

The schema objects that sphinx-jsonschema produces are refereanceable at least via :ref:. Syntax highlighting I haven't looked into it, it might require integration with sphinx to resolve dynamic .rst files. You do have tools like sphinx-autobuild that could help fill in the gaps.

and generating reST also complicates building the documentation

Initially, making the transition might take a lot of effort, but generating the rst and documentation is actually trivial. I did not dig into the current rst generation though but a recommended pattern is to expose components of the json schema one at a time, e.g.

.. jsonschema:: schema.json#/definitions/package

.. jsonschema:: schema.json#/definitions/components

PS: this would also make it easier to publish in https://github.com/SchemaStore/schemastore

mwoehlke · 2024-07-19T19:47:23Z

@LecrisUT, let's consider an example:

.. cps:attribute:: clr_vendor
  :type: string
  :context: platform

  Specifies that the package's CLR (.NET) components
  require the specified `Common Language Runtime`_ vendor.
  Typical (case-insensitive) values include
  :string:`"microsoft"` and
  :string:`"mono"`.

How would that be written, without loss of expressiveness, in YAML?

It's trivial to migrate the attribute to a different object. It's also both trivial and very legible to write attributes that belong to more than one object. This isn't a hard requirement, but it's a strong nice-to-have. (Do attributes applying to more than one object get duplicated?)
It uses reST markup. Getting an editor to highlight that in a YAML file is a pain. Some markup (external references and especially substitutions) also require global definitions; how does jsonschema handle that?
What does markup look like in the schema .json? Does that just get the raw markup? (The existing system turns the reST into plain text.)
If some other documentation wants to reference this, what does that look like? What does it look like when objects have different attributes with the same name?
This example is a trivial type; how is a complex type like map(map(string)) specified? (The existing system generates compound type specifications as needed.)
How are "tiers" of schema handled? In the existing system, each tier is a separate file; they get merged in the generated schema.

LecrisUT · 2024-07-19T20:28:30Z

@LecrisUT, let's consider an example:

There are 2 options, either use $$description which is specific to sphinx-jsonschema

#schema.json
definitions:
  clr_vendor:
    type: string
    description: ...fill short description for ide...
    $$description: |
      Specifies that the package's CLR (.NET) components
      require the specified `Common Language Runtime`_ vendor.
      Typical (case-insensitive) values include
      :string:`"microsoft"` and
      :string:`"mono"`.

or my preferred approach is to integrate the text in the rst document:

#schema.json
definitions:
  clr_vendor:
    type: string
    description: ...fill short description for ide...

Schemas.rst
===========

.. schema.json#/definitions/clr_vendor

  Specifies that the package's CLR (.NET) components
  require the specified `Common Language Runtime`_ vendor.
  Typical (case-insensitive) values include
  :string:`"microsoft"` and
  :string:`"mono"`.

Note: I am not sure of the indentation necessary here, definitely works without the indentation for the text. This one works by making each .. directive as a specific subsection so the text block is the same as writing it in other sections/admonations.

In this example I used definitions.clr_vendor, but it can as well delete those 2 levels and make it a single json-schema object.

It's trivial to migrate the attribute to a different object. It's also both trivial and very legible to write attributes that belong to more than one object. This isn't a hard requirement, but it's a strong nice-to-have. (Do attributes applying to more than one object get duplicated?)

With $ref, effectively it copies the content wherever the key $ref appears when the json-schema is parsed. But the original object is defined in only 1 place, and sphinx-jsonschema uses the $ref to use a link instead of copying the contents (there is an annoying limitation that $$target needs to be defined sometimes)

It uses reST markup. Getting an editor to highlight that in a YAML file is a pain. Some markup (external references and especially substitutions) also require global definitions; how does jsonschema handle that?

In the first case, indeed it would not work nice, but the second option should work smoothly. Only some :ref: would be harder to redirect.

What does markup look like in the schema .json? Does that just get the raw markup? (The existing system turns the reST into plain text.)

You could use plain rst markups, but ymmv when it comes to the IDE displaying it. My recommendation is the yaml section to be as compact as possible, since people will have the online documentation when needed. CMake adopts a similar approach.

If some other documentation wants to reference this, what does that look like? What does it look like when objects have different attributes with the same name?

Referencing an external json schema inside a json schema is technically possible, but I did not investigate it (it's the same $ref attribute). As for documentation, the targets generated by sphinx-jsonschema are available through cross-sphinx linkage.

This example is a trivial type; how is a complex type like map(map(string)) specified? (The existing system generates compound type specifications as needed.)

type: object
# If you want specific keys to be allowed only
properties:
  key:
    type: string
# Or a regex pattern
patternProperties:
  ".*":
    type: string
# Or take anything
additionalProperties:
  type: string

Reference: https://json-schema.org/understanding-json-schema/reference/object

How are "tiers" of schema handled? In the existing system, each tier is a separate file; they get merged in the generated schema.

Depends, you can either have 1 yaml file per object/rule or combine them and use definitions if you want to share definitions inside/across a file. Probably this part is the most relevant reference for this.

mwoehlke · 2024-08-07T21:20:52Z

My recommendation is the yaml section to be as compact as possible, since people will have the online documentation when needed.

...but that's not unlike what I'm already doing; schema.rst is the "online documentation". The schema is a simplified version of that (which, frankly, is intended for machine-verification). The current system is easy to edit/maintain and is generally expressive enough to generate a schema that is intended to be used as a validation mechanism.

LecrisUT · 2024-08-07T21:51:22Z

The current system is easy to edit/maintain and is generally expressive enough

I want to challenge this claim. The issue is that the generation goes from rst to json-schema in a custom generator. This means that new contributors wanting to dip their toes and start contributing will have to learn yet another markup, e.g. they would need to decipher what something like this means

cps/schema.rst

Lines 297 to 298 in 2bfdc4b

    
             :type: list(string)|map(list(string)) 
        
             :context: component configuration

So the quality of contributors can only go as far as the documentation made for contributors and the willingness for them to overcome.

On the other hand, there would always be people familiar with sphinx and json-schema who would be able to jump in and out to do random fixes as long as the structure is familiar to them.

...but that's not unlike what I'm already doing; schema.rst is the "online documentation"

There is a difference here. The proposal is to move all of the logic and metadata to a static json-schema document, and the rst document would only contain the detail text and markup. sphinx-json-schema will do the heavy lifting of populating the metadata

There are various benefits of such an approach:

reduced maintenance for custom implementations
familiarity for external contributors
integration with schemastore.org and immediate support to IDEs and other tooling
full support of json schema features:
- breaking down the schema into components
- conditionals, e.g. different schemas depending on cps_version
- regex matching, e.g. semver validation
- deprecation support
- and much more

mwoehlke · 2024-08-07T21:54:25Z

This is now in the way of other changes that need to be made. Since it's been sitting a long while with only one relevant comment (which has been addressed weeks ago), I'm going to go ahead and merge it. Any further fixes needed will have to wait for their own PRs.

mwoehlke · 2024-08-08T18:57:51Z

The proposal is to move all of the logic and metadata to a static json-schema document, and the rst document would only contain the detail text and markup.

What's "static"? "Unchanging"? Clearly that isn't the case, as some of your claimed advantages make clear. Nor am I convinced that splitting the schema into multiple formats is an improvement. I also am skeptical of your assertion that the current system is difficult to comprehend, especially if someone employs the simple step of comparing the source to the output.

That said, nothing is stopping someone from making a pull request. I definitely do not have time right now to make such sweeping changes.

LecrisUT · 2024-08-08T19:25:23Z

Static as in static file. A file that is not generated.

Nor am I convinced that splitting the schema into multiple formats is an improvement.

I am not proposing that. There would only be 1 json schema in either a yaml or json format. Unfortunately that does mean the way to migrate is to fully migrate.

Regarding PR contribution, what I can write is the skeleton for this, but I would need to do a bunch more changes:

adopting RTD in order to run sphinx in PRs
reorganizing the project to move documentation part in docs and schemas in it's own folder
adding pre-commit
migrating the build to a proper PEP517 backend
add various builders like linkcheck and html -W and eliminating check-build-logs.py
discouraging the use of github pages

It is quite cumbersome to make each of these changes as individual PRs as they can be strongly dependent on one another, at most I could make them as individual commits. And this is all balancing on whether or not you are open for these changes and review so that I should know beforehand if I should go forward with prototyping these or not.

mwoehlke added 5 commits May 10, 2024 14:42

Reduce duplication building JSON schema

fcd62b8

Refactor some of the logic to handle attribute types to reduce code duplication between the reST parsing and creating the JSON schema.

Pretty-print schema

3baebc8

Add ability to specify indentation to the schema builder. Add an option (currently set to '2') to specify the indentation via the Sphinx configuration file. This makes the schema easier for humans to read at the expense of increased file size.

Use custom syntax highlighting

0603aaa

Add and use a custom Pygments style that more closely reflects the style used elsewhere in the documentation. Change the "language" of the sample CPS file from JavaScript to JSON.

Fix style in conf.py

d6f9c8e

Remove unused imports and fix "comment style" in conf.py. The latter is actually a line of code that's commented out, but adding a space after the '#' is harmless and makes flake8 happier.

mwoehlke assigned bretbrownjr May 10, 2024

autoantwort reviewed May 23, 2024

View reviewed changes

mwoehlke force-pushed the improve-schema branch from d3e9bac to 05e3c3e Compare July 19, 2024 18:47

mwoehlke merged commit f493d15 into master Aug 7, 2024
3 checks passed

mwoehlke deleted the improve-schema branch August 7, 2024 21:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve JSON schema #64

Improve JSON schema #64

mwoehlke commented May 10, 2024

mwoehlke commented May 10, 2024

autoantwort May 23, 2024

mwoehlke Jul 19, 2024

LecrisUT commented Jul 2, 2024

mwoehlke commented Jul 19, 2024

LecrisUT commented Jul 19, 2024

mwoehlke commented Jul 19, 2024

LecrisUT commented Jul 19, 2024 •

edited

Loading

mwoehlke commented Aug 7, 2024

LecrisUT commented Aug 7, 2024

mwoehlke commented Aug 7, 2024

mwoehlke commented Aug 8, 2024

LecrisUT commented Aug 8, 2024

Improve JSON schema #64

Improve JSON schema #64

Conversation

mwoehlke commented May 10, 2024

mwoehlke commented May 10, 2024

autoantwort May 23, 2024

Choose a reason for hiding this comment

mwoehlke Jul 19, 2024

Choose a reason for hiding this comment

LecrisUT commented Jul 2, 2024

mwoehlke commented Jul 19, 2024

LecrisUT commented Jul 19, 2024

mwoehlke commented Jul 19, 2024

LecrisUT commented Jul 19, 2024 • edited Loading

mwoehlke commented Aug 7, 2024

LecrisUT commented Aug 7, 2024

mwoehlke commented Aug 7, 2024

mwoehlke commented Aug 8, 2024

LecrisUT commented Aug 8, 2024

LecrisUT commented Jul 19, 2024 •

edited

Loading