Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve JSON schema #64

Merged
merged 6 commits into from
Aug 7, 2024
Merged

Improve JSON schema #64

merged 6 commits into from
Aug 7, 2024

Conversation

mwoehlke
Copy link
Member

Add a link to the JSON schema from the main documentation. Pretty-print said schema and add format and default, with the latter also moving to the "machine readable" block in the primary documentation. Add custom Pygments style to reflect the coloring used elsewhere in the documentation, and "fix" the language of the sample CPS file. Factor out some duplicated code. Fix some flake8 warnings.

Refactor some of the logic to handle attribute types to reduce code
duplication between the reST parsing and creating the JSON schema.
Add ability to specify indentation to the schema builder. Add an option
(currently set to '2') to specify the indentation via the Sphinx
configuration file. This makes the schema easier for humans to read at
the expense of increased file size.
Add a link to the generated JSON schema from the main documentation.

This required a bit of fiddling to be able to link to a file that is
generated, but not by Sphinx (proper), as there does not seem to be any
built-in way to give a relative link to something that Sphinx doesn't
know about. On the plus side, having to concoct such a mechanism allowed
providing a role that obtains the proper file name from the site
configuration.

This is a very odd role, in that the input is taken as the link text,
and the link reference is controlled by the site configuration, but it
works for now. A better solution might be to provide a built-in target
definition, but I'm not sure how to do that, or if it would permit
"undefined" local links. (The approach here sort of "tricks" Sphinx into
thinking the link is external.)
Add and use a custom Pygments style that more closely reflects the style
used elsewhere in the documentation. Change the "language" of the sample
CPS file from JavaScript to JSON.
Remove unused imports and fix "comment style" in conf.py. The latter is
actually a line of code that's commented out, but adding a space after
the '#' is harmless and makes flake8 happier.
@mwoehlke
Copy link
Member Author

FYI @autoantwort, this should fix a couple of the issues you raised. I'd appreciate if you can verify whether I did it right; thanks!

@@ -90,6 +90,7 @@ By definition, none of the following attributes are required.
.. cps:attribute:: website
:type: string
:context: package
:format: uri
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that this does not work. I have not found format in the resulting schema

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, that's because of a dumb typo; the Python code uses the variable name typeformat because format is a built-in, but accidentally used that also as the name of the reST option to inspect. Should be fixed; thanks for the catch!

@LecrisUT
Copy link

LecrisUT commented Jul 2, 2024

A suggestion, why not invert the generator to have rst be generated from json-schema (with sphinx-jsonschema). Reasoning is that the json schema can be much more powerful, e.g. checking sem-version, conditional dependencies, etc. which are less intuitive to define in the rst format. JSON format is of course rather cumbersome to navigate, so maybe having it in a yaml file would be better

Make default values part of the "machine readable" portion of an
attribute specification. This makes them stand out more in the
documentation, and also allows them to be exported to the JSON schema.
Also add value format, which is exported to the JSON schema but does not
appear in the main documentation.
@mwoehlke
Copy link
Member Author

why not invert the generator to have rst be generated from json-schema?

@LecrisUT, #25 did something similar (albeit without reST generation). The major reason is that reST is significantly more friendly to edit than a JSON schema. It's also unclear how well generating reST from something else allows reST markup to be used in the "something else" (even at best, editing is going to lost editor syntax highlighting), and generating reST also complicates building the documentation. The reality is that the reST is the source of truth and the JSON schema is supplemental and is intended primarily for use in machine validation.

@LecrisUT
Copy link

The major reason is that reST is significantly more friendly to edit than a JSON schema.

A few pointers:

  • JSON schema does not have to be written in json, you can do it in yaml
  • json schema has a json schema that will help you write it
  • you can split a schema into components and either reference within the document or across documents ($ref, $definitions, stores)
  • a simple pytest module can be implemented (or a pre-commit, but that's more fragile) to regression test the schemas
  • cmake's schema has a quite neat design to handle version-specific schemas

It's also unclear how well generating reST from something else allows reST markup to be used in the "something else" (even at best, editing is going to lost editor syntax highlighting)

The schema objects that sphinx-jsonschema produces are refereanceable at least via :ref:. Syntax highlighting I haven't looked into it, it might require integration with sphinx to resolve dynamic .rst files. You do have tools like sphinx-autobuild that could help fill in the gaps.

and generating reST also complicates building the documentation

Initially, making the transition might take a lot of effort, but generating the rst and documentation is actually trivial. I did not dig into the current rst generation though but a recommended pattern is to expose components of the json schema one at a time, e.g.

.. jsonschema:: schema.json#/definitions/package

.. jsonschema:: schema.json#/definitions/components

PS: this would also make it easier to publish in https://github.com/SchemaStore/schemastore

@mwoehlke
Copy link
Member Author

@LecrisUT, let's consider an example:

.. cps:attribute:: clr_vendor
  :type: string
  :context: platform

  Specifies that the package's CLR (.NET) components
  require the specified `Common Language Runtime`_ vendor.
  Typical (case-insensitive) values include
  :string:`"microsoft"` and
  :string:`"mono"`.

How would that be written, without loss of expressiveness, in YAML?

  • It's trivial to migrate the attribute to a different object. It's also both trivial and very legible to write attributes that belong to more than one object. This isn't a hard requirement, but it's a strong nice-to-have. (Do attributes applying to more than one object get duplicated?)
  • It uses reST markup. Getting an editor to highlight that in a YAML file is a pain. Some markup (external references and especially substitutions) also require global definitions; how does jsonschema handle that?
  • What does markup look like in the schema .json? Does that just get the raw markup? (The existing system turns the reST into plain text.)
  • If some other documentation wants to reference this, what does that look like? What does it look like when objects have different attributes with the same name?
  • This example is a trivial type; how is a complex type like map(map(string)) specified? (The existing system generates compound type specifications as needed.)
  • How are "tiers" of schema handled? In the existing system, each tier is a separate file; they get merged in the generated schema.

@LecrisUT
Copy link

LecrisUT commented Jul 19, 2024

@LecrisUT, let's consider an example:

There are 2 options, either use $$description which is specific to sphinx-jsonschema

#schema.json
definitions:
  clr_vendor:
    type: string
    description: ...fill short description for ide...
    $$description: |
      Specifies that the package's CLR (.NET) components
      require the specified `Common Language Runtime`_ vendor.
      Typical (case-insensitive) values include
      :string:`"microsoft"` and
      :string:`"mono"`.

or my preferred approach is to integrate the text in the rst document:

#schema.json
definitions:
  clr_vendor:
    type: string
    description: ...fill short description for ide...
Schemas.rst
===========

.. schema.json#/definitions/clr_vendor

  Specifies that the package's CLR (.NET) components
  require the specified `Common Language Runtime`_ vendor.
  Typical (case-insensitive) values include
  :string:`"microsoft"` and
  :string:`"mono"`.

Note: I am not sure of the indentation necessary here, definitely works without the indentation for the text. This one works by making each .. directive as a specific subsection so the text block is the same as writing it in other sections/admonations.

In this example I used definitions.clr_vendor, but it can as well delete those 2 levels and make it a single json-schema object.

  • It's trivial to migrate the attribute to a different object. It's also both trivial and very legible to write attributes that belong to more than one object. This isn't a hard requirement, but it's a strong nice-to-have. (Do attributes applying to more than one object get duplicated?)

With $ref, effectively it copies the content wherever the key $ref appears when the json-schema is parsed. But the original object is defined in only 1 place, and sphinx-jsonschema uses the $ref to use a link instead of copying the contents (there is an annoying limitation that $$target needs to be defined sometimes)

  • It uses reST markup. Getting an editor to highlight that in a YAML file is a pain. Some markup (external references and especially substitutions) also require global definitions; how does jsonschema handle that?

In the first case, indeed it would not work nice, but the second option should work smoothly. Only some :ref: would be harder to redirect.

  • What does markup look like in the schema .json? Does that just get the raw markup? (The existing system turns the reST into plain text.)

You could use plain rst markups, but ymmv when it comes to the IDE displaying it. My recommendation is the yaml section to be as compact as possible, since people will have the online documentation when needed. CMake adopts a similar approach.

  • If some other documentation wants to reference this, what does that look like? What does it look like when objects have different attributes with the same name?

Referencing an external json schema inside a json schema is technically possible, but I did not investigate it (it's the same $ref attribute). As for documentation, the targets generated by sphinx-jsonschema are available through cross-sphinx linkage.

  • This example is a trivial type; how is a complex type like map(map(string)) specified? (The existing system generates compound type specifications as needed.)
type: object
# If you want specific keys to be allowed only
properties:
  key:
    type: string
# Or a regex pattern
patternProperties:
  ".*":
    type: string
# Or take anything
additionalProperties:
  type: string

Reference: https://json-schema.org/understanding-json-schema/reference/object

  • How are "tiers" of schema handled? In the existing system, each tier is a separate file; they get merged in the generated schema.

Depends, you can either have 1 yaml file per object/rule or combine them and use definitions if you want to share definitions inside/across a file. Probably this part is the most relevant reference for this.

@mwoehlke
Copy link
Member Author

mwoehlke commented Aug 7, 2024

My recommendation is the yaml section to be as compact as possible, since people will have the online documentation when needed.

...but that's not unlike what I'm already doing; schema.rst is the "online documentation". The schema is a simplified version of that (which, frankly, is intended for machine-verification). The current system is easy to edit/maintain and is generally expressive enough to generate a schema that is intended to be used as a validation mechanism.

@LecrisUT
Copy link

LecrisUT commented Aug 7, 2024

The current system is easy to edit/maintain and is generally expressive enough

I want to challenge this claim. The issue is that the generation goes from rst to json-schema in a custom generator. This means that new contributors wanting to dip their toes and start contributing will have to learn yet another markup, e.g. they would need to decipher what something like this means

cps/schema.rst

Lines 297 to 298 in 2bfdc4b

:type: list(string)|map(list(string))
:context: component configuration

So the quality of contributors can only go as far as the documentation made for contributors and the willingness for them to overcome.

On the other hand, there would always be people familiar with sphinx and json-schema who would be able to jump in and out to do random fixes as long as the structure is familiar to them.

...but that's not unlike what I'm already doing; schema.rst is the "online documentation"

There is a difference here. The proposal is to move all of the logic and metadata to a static json-schema document, and the rst document would only contain the detail text and markup. sphinx-json-schema will do the heavy lifting of populating the metadata


There are various benefits of such an approach:

  • reduced maintenance for custom implementations
  • familiarity for external contributors
  • integration with schemastore.org and immediate support to IDEs and other tooling
  • full support of json schema features:
    • breaking down the schema into components
    • conditionals, e.g. different schemas depending on cps_version
    • regex matching, e.g. semver validation
    • deprecation support
    • and much more

@mwoehlke
Copy link
Member Author

mwoehlke commented Aug 7, 2024

This is now in the way of other changes that need to be made. Since it's been sitting a long while with only one relevant comment (which has been addressed weeks ago), I'm going to go ahead and merge it. Any further fixes needed will have to wait for their own PRs.

@mwoehlke mwoehlke merged commit f493d15 into master Aug 7, 2024
3 checks passed
@mwoehlke mwoehlke deleted the improve-schema branch August 7, 2024 21:54
@mwoehlke
Copy link
Member Author

mwoehlke commented Aug 8, 2024

The proposal is to move all of the logic and metadata to a static json-schema document, and the rst document would only contain the detail text and markup.

What's "static"? "Unchanging"? Clearly that isn't the case, as some of your claimed advantages make clear. Nor am I convinced that splitting the schema into multiple formats is an improvement. I also am skeptical of your assertion that the current system is difficult to comprehend, especially if someone employs the simple step of comparing the source to the output.

That said, nothing is stopping someone from making a pull request. I definitely do not have time right now to make such sweeping changes.

@LecrisUT
Copy link

LecrisUT commented Aug 8, 2024

Static as in static file. A file that is not generated.

Nor am I convinced that splitting the schema into multiple formats is an improvement.

I am not proposing that. There would only be 1 json schema in either a yaml or json format. Unfortunately that does mean the way to migrate is to fully migrate.

Regarding PR contribution, what I can write is the skeleton for this, but I would need to do a bunch more changes:

  • adopting RTD in order to run sphinx in PRs
  • reorganizing the project to move documentation part in docs and schemas in it's own folder
  • adding pre-commit
  • migrating the build to a proper PEP517 backend
  • add various builders like linkcheck and html -W and eliminating check-build-logs.py
  • discouraging the use of github pages

It is quite cumbersome to make each of these changes as individual PRs as they can be strongly dependent on one another, at most I could make them as individual commits. And this is all balancing on whether or not you are open for these changes and review so that I should know beforehand if I should go forward with prototyping these or not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants