Kernelspec JSON schema #105

JohanMabille · 2023-04-19T13:51:07Z

Kernelspec JSON schema

This proposes to specify the kernelspec through a JSON schema, and have the specification and documentation live in a dedicated repo.

The JSON schema is joined as a standalone file.

Voting from @jupyter/software-steering-council

Like #106 , let's move to the voting phase. As said in this comment, future schema should not be JEPs, but PR open to the Jupyter Schema repo

@gabalafou
- Yes
- No
- Abstain
@ivanov
- Yes
- No
- Abstain
@JohanMabille
- Yes
- No
- Abstain
@jtpio
- Yes
- No
- Abstain
@marthacryan
- Yes
- No
- Abstain
@martinRenou
- Yes
- No
- Abstain
@minrk
- Yes
- No
- Abstain
@rpwagner
- Yes
- No
- Abstain
@SylvainCorlay
- Yes
- No
- Abstain
@vidartf
- Yes
- No
- Abstain

kernelspec-spec/kernelspec-spec.md

bollwyvl · 2023-04-19T14:24:18Z

kernelspec-spec/kernelspec-spec.md

+to [the current description of the kernelspec](https://jupyter-client.readthedocs.io/en/stable/kernels.html#kernel-specs),
+and adds an optional `protocol_version` field.
+
+[A dedicated repo](https://github.com/jupyter-standards/kernelspec) for the specification and the documentation of


I'm not sold yet on repo-per-spec, and think the "big tent" repo, with common tooling and interlinked documentation will be more sustainable.

I agree that a repo-per-spec might be overkill. However, I would like to have some separation of concerns. The repo for the kernel protocol will also hold the roadmap and additional documentation about the "release" process of the protocol. Also the kernel protocol and the widget protocol are somehow orthogonal, I would find it confusing to have them in the same repo.

I'm not sold yet on repo-per-spec, and think the "big tent" repo, with common tooling and interlinked documentation will be more sustainable.

I would agree with this. I think it would be more sustainable and easier for folks to follow if we had a single "Jupyter specs" repo.

What if we gather specs by topics or "thematics"? Specifications for the kernel protocol, the kernelspec and the connection file could live together in the kernel_protocol repo. Everything related to the widgets specification (for instance) could live in another one. This way we avoid the multiplication of specs repos, while keeping some separation of concerns. Also this would be consistent with the current situation: a kernel has to implement the kernel protocol, provide a kernel spec and handle a conection file, while supporting the widget protocol is optional.

Sure, widgets are maybe an exception... but the knowledge of how comm works is not. At the end of the chain, something like the Jupyter Server API references almost everything each other... kernelspec just happens to be very low in the order, with only perhaps a known-mimetypes.json below it, as that is not a schema-level construct like uri.

When changes occur, often these things will reference each other, and documentation and generated type packages should reflect the compatibility of them.

How comm works is part of the Kernel protocol, thus I agree the Widgets protocol references the Kernel protocol, but not vice versa. And I took the Widget protocol as an example, but we can consider other things, like the LSP, which is totally unrelated. I see this as a software distribution, we have low layers upon which are built upper layers, and there should not be circular dependencies. The Jupyter Server API you mentioned is a super high layer that references everything, however lower layers should not reference it.

bollwyvl · 2023-04-19T14:25:44Z

kernelspec-spec/kernelspec.schema.json

+    "title": "Kernelspec",
+    "description": "Specification of the kernelspec file",
+    "type": "object",
+    "properties": {


recommend some top-level definitions to make individual pieces easier to reference in other schema.

I don't find a way of gathering pieces, they all appear independent to me. Having a definition per field would just add complexity and not ease their reference in other schemaa, right? Sorry for the naive question, but I'm not used to JSON schemas.

The kernelspec, and specific fields within it, need to be referenced by other schema.

Have a look at, e.g. nbformat. If the kernelspec is formalized, then conventionally specs would like to point to specific meanings without pointing at a concrete locations:

$ref = "https://schema.jupyter.org/kernelspec/v1/schema.json#/definitions/display-name"

vs

$ref = "https://schema.jupyter.org/kernelspec/v1/schema.json#/properties/display_name"

This becomes more relevant, once more complex patterns are used such as additionalProperties or patternProperties (useful for e.g. mimetypes).

Thanks for the detailed explanation, I will try to make the appropriate changes.

bollwyvl · 2023-04-19T14:26:36Z

kernelspec-spec/kernelspec.schema.json

+        },
+        "metadata": {
+            "description": "A dictionary of additional attributes about this kernel; used by clients to aid in kernel selection. Metadata added here should be namespaced for the tool reading and writing that metadata.",
+            "type": "object"


As this is not widely-used yet, could this already be further constrained to enforce namespacing:

[definitions.metadata] type = "object" [definitions.metadata.additionalProperties] type = "object"

kernelspec-spec/kernelspec.schema.json

krassowski · 2023-04-19T14:31:13Z

kernelspec-spec/kernelspec-spec.md

+and adds an optional `protocol_version` field.
+
+[A dedicated repo](https://github.com/jupyter-standards/kernelspec) for the specification and the documentation of
+the kernelspec has been created.


In the Notebook format, precisely nbformat v4, in metadata there is a kernelspec property, with schema defined here. Previously (in nbformat v3) this was named as kernel_info (side note: format description is outdated as it still mentions kernel_info).

Should the JEP also give mandate to update nbformat schema so that it refer to this new kernelspec a source of truth (using definition link)? In nbformat v4 language is not required.

Can we describe advantages of moving kernelspec schema out of nbformat in the JEP to explain the reasoning?

Should the JEP also give mandate to update nbformat schema so that it refer to this new kernelspec a source of truth (using definition link)?

I don't know actually, would it make sense to store the arguments used to launch the kernel in the nbformat for instance? If all the mandatory fields of the kernel spec make sense for the notebook format, then we should probably update it so that it refers this new kernelspec. Otherwise, maybe the name could be reverted to kernel_info or something else to avoid confusion? But that would be out of the scope of this JEP.

Can we describe advantages of moving kernelspec schema out of nbformat in the JEP to explain the reasoning?

The kernelspec is really about describing a kernel, and it is totally orthogonal to the notebook format. Kernel authors should not have to refer to the ntebook format spec to know how to implement a kernelspec.

I think the "kernelspec" entry in nbformat.v4 is a misnomer. It is not a kernelspec—sure, it shares some of the same fields, but there is a subtle difference here.

The kernelspec file does not have a "name" field. The (locally) unique name for a kernelspec comes from the directory on disk where the kernel.json file is located—it is not stored in the kernelspec file.

In the nbformat, this "name" field is necessary to locate the kernelspec on disk (this has always been brittle, because it means this information becomes useless the moment the notebook leaves the current context).

It really should be called something like "kernelspec_info".

Should the JEP also give mandate to update nbformat schema so that it refer to this new kernelspec a source of truth (using definition link)? In nbformat v4 language is not required.

This would be great for notebooks that always run in the same Python/conda/virtual environment (this field isn't helpful for notebooks that move around, but that's a separate issue). I would be in favor of changing this field, but I think this is out-of-scope for this JEP. We should make a follow-up JEP to update nbformat if we'd like to pursue that.

Can we describe advantages of moving kernelspec schema out of nbformat in the JEP to explain the reasoning?

The kernelspec is really about describing a kernel, and it is totally orthogonal to the notebook format. Kernel authors should not have to refer to the ntebook format spec to know how to implement a kernelspec.

I believe this is out-of-scope for the current JEP.

That said, they aren't entirely orthogonal. Yes, the kernelspec shouldn't depend on the nbformat, but the nbformat loosely depends on the kernelspec today. I think that's what @krassowski is alluding to... if we can drop nbformat's current conflicting definition of the kernelspec and point at this definition, we gain some standardization.

kernelspec-spec/kernelspec.schema.json

bollwyvl · 2023-04-19T14:36:59Z

kernelspec-spec/kernelspec.schema.json

+        },
+        "kernel_protocol_version": {
+            "description": "The version of protocol this kernel implements. If not specified, the client will assume the version is <5.5 until it can get it via the kernel_info request.",
+            "type": "string"


if this stays, should have some pattern. but as mentioned, hoisting this to and $id might be more robust over time.

I don't think this should be included in the first version of the kernelspec.

I believe the first published version should only include the spec currently documented in the protocol documentation.

Then, we can quickly publish a second version with kernel_protocol_version added as a key. That way, we don't immediately invalidate everyone kernelspecs.

Also, creates a circular dependency on JEP #66.

That way, we don't immediately invalidate everyone kernelspecs.

I realize this actually isn't true, since "kernel_protocol_version" wouldn't be a required field. That said, we still would need to settle the naming in #66 to merge this. I think "kernel_protocol_version" is a fine name for the field, so maybe this isn't any issue.

Actually let's say that #66 depends on this one and will be updated accordingly (if needed) when this one is accepted.

Zsailer · 2023-04-19T15:06:26Z

Thank you for doing this @JohanMabille! You beat me to it. I had spent some time yesterday writing the exact same JEP as an action item from the SSC meeting. 😅

JohanMabille · 2023-04-19T15:10:05Z

@Zsailer arf sorry for that! This came out when discussing the Handshake pattern, we need an additional field in the kernelspec and I thought it could be the opportunity to add a schema for it. I'll be more careful next time to avoid doing the work twice!

Zsailer · 2023-04-19T15:12:18Z

@Zsailer arf sorry for that! This came out when discussing the Handshake pattern, we need an additional field in the kernelspec and I thought it could be the opportunity to add a schema for it. I'll be more careful next time to avoid doing the work twice!

No no no—this is fantastic! I really appreciate you taking this over! No need to apologize 😃

Zsailer · 2023-04-19T16:04:57Z

I think it's worth saying for voting purposes—I'm 👍 for this JEP getting accepted.

We can continue to copy-edit, but wanted to explicitly submit my support. 😃

JohanMabille · 2023-04-19T16:12:25Z

More changes to come, but I need to educate myself more to JSON schemas 😅

kernelspec-spec/kernelspec.schema.json

westurner · 2023-12-05T11:28:18Z

- [ ] BLD,SCH: CI task to run e.g. js2shacl to generate W3C SHACL from the kernelspec JSON schema https://github.com/ThiagoCComelli/JS2SHACL-JSON-Schema-to-SHACL-conversor/blob/master/conversor.js IDK why SHACL also yet, but there's a converter

…

On Mon, Nov 20, 2023, 10:03 AM Frédéric Collonval ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In kernelspec-spec/kernelspec.schema.json <#105 (comment)> : > + "required": ["language"], + "properties": { + "language": { + "description": "The name of the language of the kernel. When loading notebooks, if no matching kernelspec key (may differ across machines) is found, a kernel with a matching language will be used. This allows a notebook written on any Python or Julia kernel to be properly associated with the user’s Python or Julia kernel, even if they aren’t listed under the same name as the author’s.", + "type": "string" + } + } + }, + "kernel_protocol_version": { + "type": "object", + "required": ["kernel_protocol_version"], + "properties": { + "kernel_protocol_version": { + "description": "The version of protocol this kernel implements. If not specified, the client will assume the version is <5.5 until it can get it via the kernel_info request. The kernel protocol uses semantic versioning (SemVer).", + "type": "string", + "pattern": "^(0|[1-9]\d*)\.(0|[1-9]\d*)\.(0|[1-9]\d*)(?:-((?:0|[1-9]\d*|\d*[a-zA-Z-][0-9a-zA-Z-]*)(?:\.(?:0|[1-9]\d*|\d*[a-zA-Z-][0-9a-zA-Z-]*))*))?(?:\+([0-9a-zA-Z-]+(?:\.[0-9a-zA-Z-]+)*))?$" ⬇️ Suggested change - "pattern": "^(0|[1-9]\d*)\.(0|[1-9]\d*)\.(0|[1-9]\d*)(?:-((?:0|[1-9]\d*|\d*[a-zA-Z-][0-9a-zA-Z-]*)(?:\.(?:0|[1-9]\d*|\d*[a-zA-Z-][0-9a-zA-Z-]*))*))?(?:\+([0-9a-zA-Z-]+(?:\.[0-9a-zA-Z-]+)*))?$" + "pattern": "^(0|[1-9]\\d*)\\.(0|[1-9]\\d*)\.(0|[1-9]\\d*)(?:-((?:0|[1-9]\\d*|\\d*[a-zA-Z-][0-9a-zA-Z-]*)(?:\\.(?:0|[1-9]\\d*|\\d*[a-zA-Z-][0-9a-zA-Z-]*))*))?(?:\\+([0-9a-zA-Z-]+(?:\\.[0-9a-zA-Z-]+)*))?$" — Reply to this email directly, view it on GitHub <#105 (review)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAMNS7LNV3B4W35OUUIJZDYFNWNVAVCNFSM6AAAAAAXEDPL7CVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMYTONBQGAZTINJWGE> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

kernelspec-spec/kernelspec-spec.md

Co-authored-by: Jason Grout <[email protected]>

Co-authored-by: Frédéric Collonval <[email protected]>

JohanMabille · 2024-02-26T20:47:47Z

Kind ping on this, I think this is ready for a final review:

The Jupyter Schema repo has been created, it will hold all the schemas for Project Jupyter and the JEP has been updated accodingly (not more specific repo per spec)
While the first schema specifications have been proposed through JEPs, the recent works and meetings of the Jupyter Contributors involved in the schema specifications led to the conclusion that we could directly work on the Jupyter Schema repo: implement first versions of different schema and then have fast iterations to make them more robust and extract common parts into dedicated schema. This would streamline the process.

minrk

Made one small suggestion on minItems, but 👍

kernelspec-spec/kernelspec.schema.json

JohanMabille · 2024-06-10T15:19:25Z

The JEP has been approved.

Yes: 9
No: 0
Abstention: 0
No vote: 1

Merging

JohanMabille force-pushed the kernelspec-spec branch from 86ef612 to 1765886 Compare April 19, 2023 13:53

JohanMabille mentioned this pull request Apr 19, 2023

Kernel handshaking pattern proposal #66

Merged

33 tasks

Kernelspec JSON schema

1765886

JohanMabille added the under discussion (RFC) label Apr 19, 2023