Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation and schemas should be generated from a common source #282

Closed
magnusbaeck opened this issue Oct 7, 2021 · 16 comments · Fixed by #315
Closed

Documentation and schemas should be generated from a common source #282

magnusbaeck opened this issue Oct 7, 2021 · 16 comments · Fixed by #315
Assignees

Comments

@magnusbaeck
Copy link
Member

Description

Today the schemas and the Markdown documentation files are maintained separately by hand, thus spreading out information about events. We should collect that information in a single place and generate both schemas and documentation files.

I don't have a particular file format in mind, but one could imagine a YAML format that's more or less a superset of the JSON schema specification with additional keys for documentation and other metadata used when generating documentation but stripped when generating a schema.

Motivation

  • The schemas can't express all rules in the spec, like what event types are valid targets for particular link types. Right now the schemas don't even express which link types are valid for an event type, but at least that's fixable (see JSON schema doesn't have link objects validations for Eiffel events  #148).
  • Changing parts of the documentation shared between all events requires modifying and reviewing 23 files.
  • The lack of machine-readable documentation of the event members makes it hard to generate SDKs with documentation (Javadoc, docstrings, godoc, ...).
  • In Propose to have a diagram which can visualize all the connection between different Eiffel events types #281 there's an excellent suggestion to visualize the relationships between event types but that would currently require manual labor.
  • When e.g. considering protocol changes it's often useful to understand how events can relate, like "what kind of events can link to CD?". Right now the best way of answering such a question is probably to grep the Markdown files and that's not a fantastic method.

Exemplification

See above.

Benefits

The current representation of event schemas and documentation is an obstacle for protocol maintainers and others who want to process the information with a program. Making this change would greatly improve the situation.

Possible Drawbacks

Complexity? Hand-written files are labor intensive but they're at least simple to grasp.

@m-linner-ericsson
Copy link
Member

Sound good. We could take a look at https://github.com/spdx. They have their relationships in a model and then can transform it to other formats. Spec: https://github.com/spdx/spdx-spec, Java tool: https://github.com/spdx/tools-java

@e-backmark-ericsson
Copy link
Member

I like this idea as well. I agree with the added complexity of maintaining/reviewing yaml's compared to md's, but I'd say that benefits provided through this suggestion are greater that that drawback.

@sselberg
Copy link

Would protocol-buffers be a good fit?
https://developers.google.com/protocol-buffers

@m-linner-ericsson
Copy link
Member

From what I understand protocol-buffers forces you to use their binary encoding for transport. Thus both the sender and the receiver need to use protocol buffers for it to work. Please correct me here if I am wrong.

@magnusbaeck
Copy link
Member Author

Yeah, protobufs primarily use a binary format (but there's an ASCII format too). Not sure how that would fit in here. I suppose we could formalize the event specifications in ASCII protobuf files but I don't really see a benefit in that. Protobufs provide an efficient wire format for well-defined data structures as well as good support for generating code for working with the data structures but that doesn't seem like things we're looking for here.

@sselberg
Copy link

FWICT you can define objects, with what I perceive a fairly powerful definition-language and from that definition you can generate objects in a multitude of languages (including json schema).
You don't have to use protobuf for RPCs you can just compile the definition when creating representations in other languages. If my assumptions are correct you should be able to represent the events with protobuf-definition (!?), with comments, and generate libs, docuementation, json schema from that definition.
I haven't tried it out myself, but in Gerrit we have a similar problem of the same event being described by an array of different dtos and protobuf was one of the proposed solutions (from engineers with more experience with protobuf).

@m-linner-ericsson
Copy link
Member

Not sure if this is what we are looking for but does connect with the topic: https://www.asyncapi.com/docs/getting-started

@sselberg
Copy link

sselberg commented Dec 6, 2021

Cue might also be worth looking at: https://cuelang.org/docs/usecases/

@sselberg
Copy link

sselberg commented Jan 20, 2022

Played around with protobuf and everything looked promising until I realized that marking fields as required was removed from proto3.
Another thing is that the generated API had dependencies to protocol-buffer API (i.e. the generated SDK would have a dependency against protocol-buffer itself) as the API:s are mainly intended not as a general SDK but as a way to interact with protocol-buffers.

So it doesn't look like protcol-buffers is an option.

@m-linner-ericsson
Copy link
Member

Played around with protobuf and everything looked promising until I realized that marking fields as required was removed from proto3. Another thing is that the generated API had dependencies to protocol-buffer API (i.e. the generated SDK would have a dependency against protocol-buffer itself) as the API:s are mainly intended not as a general SDK but as a way to interact with protocol-buffers.

So it doesn't look like protcol-buffers is an option.

Thanks for taking the time to do an investigation of this option @sselberg

@magnusbaeck
Copy link
Member Author

I think we have the following main options:

  • Introducing a custom JSON schema-like format (think superset of JSON schema but expressed as YAML). Proper JSON schemas, documentation, and SDKs can be generated from that representation.
  • Introducing a custom DSL for describing these data structures.
  • AsyncAPI.

I'll start with a description of the first option.

JSON schema superset format

This idea aims to solve the problem without complicating matters or introducing larger frameworks that might not be a good fit for our goals. Starting from today's JSON schemas we convert them to YAML to allow for comments and make them more convenient to work with and add additional keys that e.g. allow adding documentation of fields and events. To get JSON schemas we can simply strip the extra keys and output as JSON. Generating documentation wouldn't be hard either.

We could support extraction of e.g. the meta field to a separate schema file to reduce duplication. In that case the script that generates the schemas would be responsible for inlining those references so that the end result is a flat schema file without external references.

The following (partial) example, based on schemas/EiffelCompositionDefinedEvent/3.2.0.json, shows what it could look like. Note that it doesn't exhibit the aforementioned meta extraction that I think we should do.

---
"$schema": http://json-schema.org/draft-04/schema#
_abbrev: CD
_docs: |
  The EiffelCompositionDefinedEvent declares a composition of items (artifacts, sources and
  other compositions) has been defined, typically with the purpose of enabling further downstream
  artifacts to be generated.
type: object
properties:
  meta:
    type: object
    properties:
      id:
        _docs: |
          The unique identity of the event, generated at event creation.
        _typehint: UUID
        type: string
        pattern: "^[0-9a-f]{8}-[0-9a-f]{4}-[1-5][0-9a-f]{3}-[89ab][0-9a-f]{3}-[0-9a-f]{12}$"
      type:
        _docs: |
          The type of event. This field is required by the recipient of the event, as each event type
          has a specific meaning and a specific set of members in the __data__ and __links__ objects.
        type: string
        enum:
          - EiffelCompositionDefinedEvent
# (lots of stuff omitted)
_links:
  ELEMENT:
    required: false
    multiple: true
    targets:
      - EiffelCompositionDefinedEvent
      - EiffelSourceChangeCreatedEvent
      - EiffelSourceChangeSubmittedEvent
  PREVIOUS_VERSION:
    required: false
    multiple: true
    targets:
      - EiffelCompositionDefinedEvent

So here we introduce a few new keys, _abbrev, _docs, _links, and _typehint, that provide additional information that can't be encoded in the JSON schema (unless we can use description instead of _docs but that's perhaps only available in newer JSON schema drafts?). Should we also encode the event version history here? The _typehint key could either be strictly defined so that SDK generators could use it to pick more specific data types that what the JSON schema itself can describe, or it could be a humanly readable description that's included verbatim in the documentation.

In addition to producing schemas and documentation we could consider producing a supplemental event information file for each event that would basically contain only these extra keys. That way an SDK generator wouldn't have to consume and interpret the source files that are almost JSON schemas and make sense of them but could use the actual JSON schema files (for which there could be parsers; the Go SDK uses github.com/lestrrat-go/jsschema) and pick up the Eiffel-specific stuff from a file that could look like something this:

name: EiffelCompositionDefinedEvent
abbrev: CD
fields:
  data.name:
    docs: ...
  data.version:
    docs: ...
  ...
links:
  PREVIOUS_VERSION:
    required: false
    multiple: true
    targets:
      - EiffelCompositionDefinedEvent
...

Such a file would make it easy for SDK generators etc to look up additional information about fields without having to recursively walk the full schema. Hmm, actually, this file could contain more or less exactly what the documentation would contain but without the markup. Hence, the documentation generator(s) could use this instead of the source files. It would also decouple consumers from schema details, i.e. a switch from JSON schema draft 04 to a more recent draft might be easier.

@sselberg
Copy link

Could we add complex types to this model so that you could reuse f.i. "GitIdentifier", meta etc. in the definition?

@magnusbaeck
Copy link
Member Author

Sorry, I totally forgot about this question.

Yes, the idea is to be able to use references just like in #257, i.e. in practice we'd have a separate file for the meta object and other common pieces and do this:

---
"$schema": http://json-schema.org/draft-04/schema#
_abbrev: CD
_docs: |
  The EiffelCompositionDefinedEvent declares a composition of items (artifacts, sources and
  other compositions) has been defined, typically with the purpose of enabling further downstream
  artifacts to be generated.
type: object
properties:
  meta:
    "$ref": "../EiffelMetaProperty/1.0.0.json"
  links:
    "$ref": "../EiffeLinksArrayProperty/1.0.0.json"
  data:
    ...

Those references would be flattened when generating the schema files actually used for validation.

@magnusbaeck
Copy link
Member Author

I've pushed a working but incomplete example of the proposal to the new-schema-def branch of my fork. Let's use that as a basis for the discussions at tomorrow's community meeting.

@magnusbaeck magnusbaeck self-assigned this Mar 16, 2022
@z-sztrom
Copy link
Contributor

z-sztrom commented Aug 9, 2022

I like this approach. Especially ability to define common parts in separate files. I had a quick look at your implementation; it looks promising.

@sselberg
Copy link

sselberg commented Oct 7, 2022

Awesome work @magnusbaeck!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants