Documentation and schemas should be generated from a common source #282

magnusbaeck · 2021-10-07T20:59:09Z

Description

Today the schemas and the Markdown documentation files are maintained separately by hand, thus spreading out information about events. We should collect that information in a single place and generate both schemas and documentation files.

I don't have a particular file format in mind, but one could imagine a YAML format that's more or less a superset of the JSON schema specification with additional keys for documentation and other metadata used when generating documentation but stripped when generating a schema.

Motivation

The schemas can't express all rules in the spec, like what event types are valid targets for particular link types. Right now the schemas don't even express which link types are valid for an event type, but at least that's fixable (see JSON schema doesn't have link objects validations for Eiffel events #148).
Changing parts of the documentation shared between all events requires modifying and reviewing 23 files.
The lack of machine-readable documentation of the event members makes it hard to generate SDKs with documentation (Javadoc, docstrings, godoc, ...).
In Propose to have a diagram which can visualize all the connection between different Eiffel events types #281 there's an excellent suggestion to visualize the relationships between event types but that would currently require manual labor.
When e.g. considering protocol changes it's often useful to understand how events can relate, like "what kind of events can link to CD?". Right now the best way of answering such a question is probably to grep the Markdown files and that's not a fantastic method.

Exemplification

See above.

Benefits

The current representation of event schemas and documentation is an obstacle for protocol maintainers and others who want to process the information with a program. Making this change would greatly improve the situation.

Possible Drawbacks

Complexity? Hand-written files are labor intensive but they're at least simple to grasp.

m-linner-ericsson · 2021-10-11T05:44:13Z

Sound good. We could take a look at https://github.com/spdx. They have their relationships in a model and then can transform it to other formats. Spec: https://github.com/spdx/spdx-spec, Java tool: https://github.com/spdx/tools-java

e-backmark-ericsson · 2021-10-11T06:59:04Z

I like this idea as well. I agree with the added complexity of maintaining/reviewing yaml's compared to md's, but I'd say that benefits provided through this suggestion are greater that that drawback.

sselberg · 2021-11-11T13:58:27Z

Would protocol-buffers be a good fit?
https://developers.google.com/protocol-buffers

m-linner-ericsson · 2021-11-16T08:59:12Z

From what I understand protocol-buffers forces you to use their binary encoding for transport. Thus both the sender and the receiver need to use protocol buffers for it to work. Please correct me here if I am wrong.

magnusbaeck · 2021-11-16T09:44:45Z

Yeah, protobufs primarily use a binary format (but there's an ASCII format too). Not sure how that would fit in here. I suppose we could formalize the event specifications in ASCII protobuf files but I don't really see a benefit in that. Protobufs provide an efficient wire format for well-defined data structures as well as good support for generating code for working with the data structures but that doesn't seem like things we're looking for here.

sselberg · 2021-11-16T12:25:06Z

FWICT you can define objects, with what I perceive a fairly powerful definition-language and from that definition you can generate objects in a multitude of languages (including json schema).
You don't have to use protobuf for RPCs you can just compile the definition when creating representations in other languages. If my assumptions are correct you should be able to represent the events with protobuf-definition (!?), with comments, and generate libs, docuementation, json schema from that definition.
I haven't tried it out myself, but in Gerrit we have a similar problem of the same event being described by an array of different dtos and protobuf was one of the proposed solutions (from engineers with more experience with protobuf).

m-linner-ericsson · 2021-11-30T09:35:14Z

Not sure if this is what we are looking for but does connect with the topic: https://www.asyncapi.com/docs/getting-started

sselberg · 2021-12-06T15:27:33Z

Cue might also be worth looking at: https://cuelang.org/docs/usecases/

sselberg · 2022-01-20T16:36:29Z

Played around with protobuf and everything looked promising until I realized that marking fields as required was removed from proto3.
Another thing is that the generated API had dependencies to protocol-buffer API (i.e. the generated SDK would have a dependency against protocol-buffer itself) as the API:s are mainly intended not as a general SDK but as a way to interact with protocol-buffers.

So it doesn't look like protcol-buffers is an option.

m-linner-ericsson · 2022-01-26T09:26:13Z

Played around with protobuf and everything looked promising until I realized that marking fields as required was removed from proto3. Another thing is that the generated API had dependencies to protocol-buffer API (i.e. the generated SDK would have a dependency against protocol-buffer itself) as the API:s are mainly intended not as a general SDK but as a way to interact with protocol-buffers.

So it doesn't look like protcol-buffers is an option.

Thanks for taking the time to do an investigation of this option @sselberg

magnusbaeck · 2022-01-27T11:49:41Z

I think we have the following main options:

Introducing a custom JSON schema-like format (think superset of JSON schema but expressed as YAML). Proper JSON schemas, documentation, and SDKs can be generated from that representation.
Introducing a custom DSL for describing these data structures.
AsyncAPI.

I'll start with a description of the first option.

JSON schema superset format

This idea aims to solve the problem without complicating matters or introducing larger frameworks that might not be a good fit for our goals. Starting from today's JSON schemas we convert them to YAML to allow for comments and make them more convenient to work with and add additional keys that e.g. allow adding documentation of fields and events. To get JSON schemas we can simply strip the extra keys and output as JSON. Generating documentation wouldn't be hard either.

We could support extraction of e.g. the meta field to a separate schema file to reduce duplication. In that case the script that generates the schemas would be responsible for inlining those references so that the end result is a flat schema file without external references.

The following (partial) example, based on schemas/EiffelCompositionDefinedEvent/3.2.0.json, shows what it could look like. Note that it doesn't exhibit the aforementioned meta extraction that I think we should do.

---
"$schema": http://json-schema.org/draft-04/schema#
_abbrev: CD
_docs: |
  The EiffelCompositionDefinedEvent declares a composition of items (artifacts, sources and
  other compositions) has been defined, typically with the purpose of enabling further downstream
  artifacts to be generated.
type: object
properties:
  meta:
    type: object
    properties:
      id:
        _docs: |
          The unique identity of the event, generated at event creation.
        _typehint: UUID
        type: string
        pattern: "^[0-9a-f]{8}-[0-9a-f]{4}-[1-5][0-9a-f]{3}-[89ab][0-9a-f]{3}-[0-9a-f]{12}$"
      type:
        _docs: |
          The type of event. This field is required by the recipient of the event, as each event type
          has a specific meaning and a specific set of members in the __data__ and __links__ objects.
        type: string
        enum:
          - EiffelCompositionDefinedEvent
# (lots of stuff omitted)
_links:
  ELEMENT:
    required: false
    multiple: true
    targets:
      - EiffelCompositionDefinedEvent
      - EiffelSourceChangeCreatedEvent
      - EiffelSourceChangeSubmittedEvent
  PREVIOUS_VERSION:
    required: false
    multiple: true
    targets:
      - EiffelCompositionDefinedEvent

So here we introduce a few new keys, _abbrev, _docs, _links, and _typehint, that provide additional information that can't be encoded in the JSON schema (unless we can use description instead of _docs but that's perhaps only available in newer JSON schema drafts?). Should we also encode the event version history here? The _typehint key could either be strictly defined so that SDK generators could use it to pick more specific data types that what the JSON schema itself can describe, or it could be a humanly readable description that's included verbatim in the documentation.

In addition to producing schemas and documentation we could consider producing a supplemental event information file for each event that would basically contain only these extra keys. That way an SDK generator wouldn't have to consume and interpret the source files that are almost JSON schemas and make sense of them but could use the actual JSON schema files (for which there could be parsers; the Go SDK uses github.com/lestrrat-go/jsschema) and pick up the Eiffel-specific stuff from a file that could look like something this:

name: EiffelCompositionDefinedEvent
abbrev: CD
fields:
  data.name:
    docs: ...
  data.version:
    docs: ...
  ...
links:
  PREVIOUS_VERSION:
    required: false
    multiple: true
    targets:
      - EiffelCompositionDefinedEvent
...

Such a file would make it easy for SDK generators etc to look up additional information about fields without having to recursively walk the full schema. Hmm, actually, this file could contain more or less exactly what the documentation would contain but without the markup. Hence, the documentation generator(s) could use this instead of the source files. It would also decouple consumers from schema details, i.e. a switch from JSON schema draft 04 to a more recent draft might be easier.

sselberg · 2022-01-27T12:06:00Z

Could we add complex types to this model so that you could reuse f.i. "GitIdentifier", meta etc. in the definition?

magnusbaeck · 2022-02-23T14:38:26Z

Sorry, I totally forgot about this question.

Yes, the idea is to be able to use references just like in #257, i.e. in practice we'd have a separate file for the meta object and other common pieces and do this:

---
"$schema": http://json-schema.org/draft-04/schema#
_abbrev: CD
_docs: |
  The EiffelCompositionDefinedEvent declares a composition of items (artifacts, sources and
  other compositions) has been defined, typically with the purpose of enabling further downstream
  artifacts to be generated.
type: object
properties:
  meta:
    "$ref": "../EiffelMetaProperty/1.0.0.json"
  links:
    "$ref": "../EiffeLinksArrayProperty/1.0.0.json"
  data:
    ...

Those references would be flattened when generating the schema files actually used for validation.

magnusbaeck · 2022-03-02T21:46:01Z

I've pushed a working but incomplete example of the proposal to the new-schema-def branch of my fork. Let's use that as a basis for the discussions at tomorrow's community meeting.

z-sztrom · 2022-08-09T09:27:22Z

I like this approach. Especially ability to define common parts in separate files. I had a quick look at your implementation; it looks promising.

sselberg · 2022-10-07T06:54:42Z

Awesome work @magnusbaeck!

magnusbaeck mentioned this issue Oct 7, 2021

Propose to have a diagram which can visualize all the connection between different Eiffel events types #281

Open

m-linner-ericsson added the enhancement label Dec 8, 2021

magnusbaeck mentioned this issue Jan 4, 2022

Add linters and formatters for Python code #291

Closed

sselberg mentioned this issue Jan 21, 2022

Extract "meta" as a type #256

Closed

magnusbaeck self-assigned this Mar 16, 2022

This was referenced Aug 31, 2022

Add schema URL to the meta object #313

Merged

Generate docs and schemas from common schema definition format #315

Merged

magnusbaeck closed this as completed in #315 Oct 6, 2022

magnusbaeck mentioned this issue Nov 18, 2022

Adapt to new YAML-based schema definitions eiffel-community/eiffelevents-sdk-go#40

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Documentation and schemas should be generated from a common source #282

Documentation and schemas should be generated from a common source #282

magnusbaeck commented Oct 7, 2021

m-linner-ericsson commented Oct 11, 2021

e-backmark-ericsson commented Oct 11, 2021

sselberg commented Nov 11, 2021

m-linner-ericsson commented Nov 16, 2021

magnusbaeck commented Nov 16, 2021

sselberg commented Nov 16, 2021

m-linner-ericsson commented Nov 30, 2021

sselberg commented Dec 6, 2021

sselberg commented Jan 20, 2022 •

edited

Loading

m-linner-ericsson commented Jan 26, 2022

magnusbaeck commented Jan 27, 2022

sselberg commented Jan 27, 2022

magnusbaeck commented Feb 23, 2022

magnusbaeck commented Mar 2, 2022

z-sztrom commented Aug 9, 2022

sselberg commented Oct 7, 2022

Documentation and schemas should be generated from a common source #282

Documentation and schemas should be generated from a common source #282

Comments

magnusbaeck commented Oct 7, 2021

Description

Motivation

Exemplification

Benefits

Possible Drawbacks

m-linner-ericsson commented Oct 11, 2021

e-backmark-ericsson commented Oct 11, 2021

sselberg commented Nov 11, 2021

m-linner-ericsson commented Nov 16, 2021

magnusbaeck commented Nov 16, 2021

sselberg commented Nov 16, 2021

m-linner-ericsson commented Nov 30, 2021

sselberg commented Dec 6, 2021

sselberg commented Jan 20, 2022 • edited Loading

m-linner-ericsson commented Jan 26, 2022

magnusbaeck commented Jan 27, 2022

JSON schema superset format

sselberg commented Jan 27, 2022

magnusbaeck commented Feb 23, 2022

magnusbaeck commented Mar 2, 2022

z-sztrom commented Aug 9, 2022

sselberg commented Oct 7, 2022

sselberg commented Jan 20, 2022 •

edited

Loading