-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add JEP for sub-shells #91
Changes from 2 commits
ac519a0
d389802
abe9275
1b19dde
1f1ad3d
30058d0
5f8bf40
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,144 @@ | ||
--- | ||
title: Jupyter kernel sub-shells | ||
authors: David Brochart (@davidbrochart), Sylvain Corlay (@SylvainCorlay), Johan Mabille (@JohanMabille) | ||
issue-number: XX | ||
pr-number: XX | ||
date-started: 2022-12-15 | ||
--- | ||
|
||
# Summary | ||
|
||
This JEP introduces kernel sub-shells to allow for concurrent shell requests. This is made possible | ||
by defining new control channel messages, as well as a new shell ID field in shell messages. | ||
|
||
# Motivation | ||
|
||
Users have been asking for ways to interact with a kernel while it is busy executing CPU-bound code, | ||
for the following reasons: | ||
- inspect the kernel's state to check the progress or debug a long-running computation (e.g. | ||
through a variable explorer). | ||
- visualize intermediary results before the final result is computed. | ||
- request [completion](https://jupyter-client.readthedocs.io/en/stable/messaging.html#completion) or | ||
[introspection](https://jupyter-client.readthedocs.io/en/stable/messaging.html#introspection). | ||
- process | ||
[Comm messages](https://jupyter-client.readthedocs.io/en/stable/messaging.html#custom-messages) | ||
immediately (e.g. for widgets). | ||
|
||
Unfortunately, it is currently not possible to do so because the kernel cannot process other | ||
[shell requests](https://jupyter-client.readthedocs.io/en/stable/messaging.html#messages-on-the-shell-router-dealer-channel) | ||
until it is idle. The goal of this JEP is to offer a way to process shell requests concurrently. | ||
|
||
# Proposed Enhancement | ||
|
||
The [kernel protocol](https://jupyter-client.readthedocs.io/en/stable/messaging.html) only allows | ||
for one | ||
[shell channel](https://jupyter-client.readthedocs.io/en/stable/messaging.html#messages-on-the-shell-router-dealer-channel) | ||
where execution requests are queued. Accepting other shells would allow users to connect to a kernel | ||
and submit execution requests that would be processed in parallel. | ||
|
||
We propose to allow the creation of optional "sub-shells", in addition to the current "main shell". | ||
This will be made possible by adding new message types to the | ||
[control channel](https://jupyter-client.readthedocs.io/en/stable/messaging.html#messages-on-the-control-router-dealer-channel) | ||
for: | ||
- creating a sub-shell, | ||
- deleting a sub-shell, | ||
- listing existing sub-shells. | ||
|
||
A sub-shell should be identified with a shell ID, either provided by the client in the sub-shell | ||
creation request, or given by the kernel in the sub-shell creation reply. The shell ID of the | ||
targeted sub-shell must then be sent along with any shell message. This allows any other client | ||
(console, notebook, etc.) to use this sub-shell. If no shell ID is sent, the message targets the | ||
main shell. Sub-shells are thus multiplexed on the shell channel through the shell ID, and it is the | ||
responsibility of the kernel to route the messages to the target sub-shell according to the shell | ||
ID. | ||
|
||
Essentially, a client connecting through a sub-shell should see no difference with a connection | ||
through the main shell, and it does not need to be aware of it. However, a front-end should provide | ||
some visual information indicating that the kernel execution mode offered by the sub-shell has to be | ||
used at the user's own risks. In particular, because sub-shells may be implemented with threads, it | ||
is the responsibility of users to not corrupt the kernel state with non thread-safe instructions. | ||
|
||
# New control channel messages | ||
|
||
## Create sub-shell | ||
|
||
Message type: `create_subshell_request`: | ||
|
||
```py | ||
content = { | ||
# Optional, the ID of the sub-shell if specified by the client. | ||
'shell_id': str | ||
} | ||
``` | ||
|
||
Message type: `create_subshell_reply`: | ||
|
||
```py | ||
content = { | ||
# 'ok' if the request succeeded or 'error', with error information as in all other replies. | ||
'status': 'ok', | ||
|
||
# The ID of the sub-shell, same as in the request if specified by the client, given by the | ||
# kernel otherwise. | ||
'shell_id': str | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do you envision sub-shells having any properties or inputs? Or are they all by definition identical for a given kernel (at least to start)? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The only setting I can think of, since the difference between sub-shells and the main shell is that they run concurrently, would be to specify which concurrency "backend" should be used: a thread or a process or asynchronous programming. |
||
} | ||
``` | ||
|
||
## Delete sub-shell | ||
|
||
Message type: `delete_subshell_request`: | ||
|
||
```py | ||
content = { | ||
# The ID of the sub-shell. | ||
'shell_id': str | ||
} | ||
``` | ||
|
||
Message type: `delete_subshell_reply`: | ||
|
||
```py | ||
content = { | ||
# 'ok' if the request succeeded or 'error', with error information as in all other replies. | ||
'status': 'ok', | ||
} | ||
``` | ||
|
||
## List sub-shells | ||
|
||
Message type: `list_subshell_request`: no content. | ||
|
||
Message type: `list_subshell_reply`: | ||
|
||
```py | ||
content = { | ||
# A list of sub-shell IDs. | ||
'shell_id': [str] | ||
} | ||
``` | ||
|
||
# Behavior | ||
|
||
## Kernels not supporting sub-shells | ||
|
||
The following requests should be ignored: `create_subshell_request`, `delete_subshell_request` and | ||
`list_subshell_request`. A `shell_id` passed in any shell message should be ignored. This ensures | ||
that existing kernels don't need any change to be compatible with the kernel protocol changes | ||
required by this JEP. | ||
|
||
This means that all shell messages are processed in the main shell, i.e. sequentially. | ||
|
||
Since sub-shells are basically a "no-op", the behavior around | ||
[kernel restart](https://jupyter-client.readthedocs.io/en/stable/messaging.html#kernel-shutdown) and | ||
[kernel interrupt](https://jupyter-client.readthedocs.io/en/stable/messaging.html#kernel-interrupt) | ||
is unchanged. | ||
|
||
## Kernels supporting sub-shells | ||
|
||
A sub-shell request may be processed concurrently with other shells. Within a sub-shell, requests | ||
are processed sequentially. | ||
|
||
A [kernel restart](https://jupyter-client.readthedocs.io/en/stable/messaging.html#kernel-shutdown) | ||
should delete all sub-shells. A | ||
[kernel interrupt](https://jupyter-client.readthedocs.io/en/stable/messaging.html#kernel-interrupt) | ||
should interrupt the main shell and all sub-shells. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I was thinking that perhaps an interrupt request could give a subshell id to interrupt only that subshell. However, if we want to be backwards compatible, we have to interrupt all shells: if all subshell requests are processed in the main shell, then interrupting the kernel will currently interrupt all shells. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. True. Maybe we could also say that a kernel should do its best at interrupting only the requested sub-shell, but that it may interrupts all shells? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
That sounds too unpredictable to me. I think if we want subshell-specific interrupt, we need another message so we can be backwards compatible and predictable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say the kernel always generates the shell id and we don't support the client providing an id. Once you have clients providing ids, then it's always a guessing game if there is contention between clients, or you have clients generate UUIDs, at which point you might as well have the kernel generate a truly unique id.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking that if in the future we allow per-cell sub-shells (through e.g. cell metadata), it could open up possibilities such that a cell creates a sub-shell, and other cells run in this sub-shell, so they would need the shell ID. We could build complex asynchronous systems like that.
akernel can do something similar but programmatically:
__task__()
returns a handle to the previous cell task, so the next cell can do whatever it wants with it (await
it, etc.).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the client specifies a subshell id, it will need to wait until it is confirmed in the reply to be sure it has reserved that name. In that case, why not just get the subshell id from the reply message, and be guaranteed it didn't fail because of a name conflict? What does having the client give the subshell id do for us?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought that it allowed us to reuse it later, at least in the case of a self-contained notebook where we know there is no shell ID conflict.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A notebook might be opened with two copies, in which case each copy would want to start up a subshell with the same name? For example, either a notebook in real-time collaboration, or a notebook opened twice side by side in JLab?
Or perhaps if you try to create a subshell with an existing id, it just acknowledges that the subshell is already created, with no error? Multiple clients might send computations to the same subshell?
What if we treat it like we do kernel sessions right now, with a user-supplied name as a key? In other words, a client subshell creation request optionally gives a name (not an id). If a subshell with that name already exists, its id is returned. If it doesn't exist, a new subshell with that name is created and returned. And if a name is not given by the client, an unnamed subshell is created and returned. Thoughts? This gives you the ability to share subshells between clients addressable with some client-supplied string, but gives me always unique ids created by the resource manager.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that. It seems that there is no distinction between a sub-shell name and a sub-shell ID in this case.
In that case there seems to be an unnecessary mapping between sub-shell name and sub-shell ID, or am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a technical difference: since the client name and shell id are different namespaces, the shell id (generated by the kernel) does not have to check for conflicts with the names already given, the client can request a shell that is guaranteed that no one else will ever request (i.e., a shell specific to itself, guaranteed to not collide with any other requested name).
For example, suppose the shell ids are generated by starting at 0 and incrementing for each new subshell. If the client asks for shell name 5, and the client name and shell id namespaces are conflated, the client won't know if it's getting some random subshell someone else created (i.e., shell id 5), or if it's getting a shell with specific meaning "5" (i.e., client name 5). Likewise, any time a new shell id is generated, the kernel would have to check to see if someone else had already claimed that number.
I think it's a much better design to keep the client-specific names in a separate namespace from the unique shell ids. With this design, any client asking for a shell named "autocomplete" gets the same autocomplete shell shared with every other client requesting the same subshell name. However, if you want to get your own subshell separate from any other client, you just request a subshell without a name.