-
Notifications
You must be signed in to change notification settings - Fork 29.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Notebook execution model API (+ clear outputs side effects) #103713
Comments
For Jupyter we don't want it to interrupt. To my knowledge changes to output do not make a document dirty today, hence extensions should handle these and accordingly trigger a dirty change. Also, when a jupyter notebook is not trusted we clear the output today, once trusted we rrstore all output, and the document is not marked dirty even though we change the output. Though that's something that can change if vsc were to mark docs as dirty of output is changed |
My own current thinking is that we would clear the outputs and the run state and status message, except if the run state is "running", in which case we clear the outputs but leave those metadata properties alone and assume that the extension will update them soon when execution is finished. Does that sound ok to you? Some alternatives
|
Alternatives sound great. |
@jrieken, @roblourens and me discussed about above challenges and one outcome is we should try to make the metadata minimal and see if it's possible to move execution status related things out of the metadata and merge with outputs as they are all coming from execution. For example, firstly we move the execution related ones out export interface NotebookCellMetadata {
contentEditable?: boolean;
runnable?: boolean;
hasExecutionOrder?: boolean;
breakpointMargin?: boolean;
inputCollapsed?: boolean;
outputCollapsed?: boolean;
custom?: { [key: string]: any };
}
interface IExecutionState {
executionOrder?: number;
statusMessage?: string;
runState?: NotebookCellRunState;
runStartTime?: number;
lastRunDuration?: number;
} and then have a new property for execution results export interface NotebookCell {
readonly notebook: NotebookDocument;
readonly uri: Uri;
readonly cellKind: CellKind;
readonly document: TextDocument;
language: string;
executionResult: {
state: IExecutionState;
outputs: Output[];
} | undefined;
metadata: NotebookCellMetadata;
} The core action "Clear Outputs" will clear the whole executionResult. No matter which approach we take, the "Clear Outputs" action is opinionated , we may want to explore approaches that allow extensions to control the behavior. |
Also related: #100388 More thoughts: There are three cases for working with the execution state, reading (saving), writing (execution), initialization. We should make executionResult entirely readonly and only editable through some limited API. For initialization, I don't know, it probably has to be a free for all in letting the content provider initialize the execution result. We can take inspiration from the github issue notebooks' NotebookCellExecution and give the extension something with an API like this. class NotebookCellExecution {
// Need to support sending multiple cancel signals until it works. Or use a normal token and handle jupyter's case some other way
cancellationToken: todo;
// Allow updating outputs during execution
pushOutputs(outputs: vscode.CellOutput[]) { }
resolve(executionOrder?: number, message?: string): void { }
reject(err: any): void { }
}
interface NotebookKernel {
// [...]
// executeCell passes a NotebookCellExecution object that the kernel will resolve or reject
executeCell(document: NotebookDocument, cell: NotebookCell, cellExecution: NotebookCellExecution): void;
// But the kernel also needs a way to trigger a cell execution for "run all cells", liveshare, attaching to a kernel with a running cell, etc
// We would want the kernel to be able to trigger this, but not other extensions.
// And it bypasses the runnable metadata.
// So the kernel gets this event that will trigger an executeCell call when fired.
onDidCellBeginExecuting: Event<{ cell: NotebookCell; runStartTime?: number; }>;
} This is a minor thing but for liveshare you would probably want the extension to determine the duration, not vscode. It would be confusing to see different durations on different machines. Also if we really want to get rid of invalid states, we should do more to split up Code-type and Markdown-type cells. |
I like @roblourens suggestion of having an optional clear function that an extension can implement. The presence of the function enables the UI (which we own) but the implementation/behaviour is entirely up to the extension. I think that's a good alternative in case we cannot agree on a generic "clear output" behaviour |
Dragging @roblourens' proposal a little further. Let the event emit an execution object, make the execution object an managed object // managed object, call `createNotebookCellExecution `
interface NotebookCellExecution {
//...
cancellationToken: CancelationToken;
progress: Progess<NotebookCellExecutionProgress>
// Allow updating outputs during execution
pushOutputs(outputs: vscode.CellOutput[]);
resolve(executionOrder?: number, message?: string): void;
reject(err: any): void;
}
interface NotebookKernel {
// [...]
// executeCell passes a NotebookCellExecution object that the kernel will resolve or reject
executeCell(document: NotebookDocument, cell: NotebookCell): NotebookCellExecution;
// But the kernel also needs a way to trigger a cell execution for "run all cells", liveshare, attaching to a kernel with a running cell, etc
// We would want the kernel to be able to trigger this, but not other extensions.
// And it bypasses the runnable metadata.
// So the kernel gets this event that will trigger an executeCell call when fired.
onDidStartExecution: Event<NotebookCellExecution>;
}
namespace notebooks {
export declare function createNotebookCellExecution(cell: NotebookCell, ...more: NotebookCell[]): NotebookCellExecution;
} |
The ugly part (which we have avoided in other APIs) is that implementors of |
I don't like that anybody can call |
I guess if the execution object doesn't do anything outside of the kernel methods, it's ok. Thanks for writing this out though, that helps clear it up a lot. |
A slight variant would be to allow to assign the const exec = vscode.notebook.createNotebookCellExecution(...);
cell.execution = exec;
exec.pushOutput(...)
exec.done(new Error('didn't know what to to')) That would be closer to todays approach but be more strictly typed (moves this from metadata to a "topic type"). Tho, it doesn't address your concern that basically any extension can "execute" a cell. If we want to enforce execution to be strictly kernel only then we need to find a way to pass some kind of execution factory to the kernel, maybe a function on the kernel that signals that it is the selected kernel for a notebook and that includes a mechanism to set execution. |
A few comments (context = Jupyter/Python extension):
To me the
Based on this observation I'd propose the following: interface NotebookCellExecution {
cell: NotebookCell;
cancel(): void;
done: Promise<void>;
}
interface NotebookExecution {
document: NotebookDocument;
cancel(): void;
done: Promise<void>;
}
interface NotebookKernel {
executeCell(document: NotebookDocument, cell: NotebookCell): void;
// Kernel must always fire this event when running a cell.
// This also allows VSCode to cancel an executing cell.
onDidStartCellExecution: Event<NotebookCellExecution>;
// Triggered for notebook (all cells), VSC can use this to cancel notebook execution.
onDidStartNotebookExecution: Event<NotebookCellExecution>;
// The `cancel` method on the above two interfaces `possibly` removes the need for `NotebookKernel.cancelExecution` & `NotebookKernel.cancelAllCellsExecution`
}
|
Thanks, we (Python extension) will need this. |
I like where this going. A few followup questions:
|
Hmm, good issues.
interface NotebookKernel {
// Probably rename as well, then its obvious that it will not start it.
// Obvious to extension authors.
// Triggering of the event `onDidStartCellExecution` is when the actual execution takes place.
queueCell(document: NotebookDocument, cell: NotebookCell): Promise<void>;
}
Interesting, I guess this rules out the ability for other extensions to update a notebook. Related issue with regareds to Untrusted Notebooks Today in the Python extension when opening a notebook, we do not display the output, as it could contain malicious JS. We display a prompt to the user and they click a button to trust the notebook.
What I'm getting at it, if we're not able to update cell output without executing cells, we'd need a way to hide cells and then restore the cells (either updating the cells or reloading the notebook - even though no changes have been made to the document). |
Lets try to address that separately, just thought you should know how we are using the existing API.
Hmm, In this case your original proposal works better, where VSC provides an instance of |
Can you say more about this? Are you talking about the sort of case like with ipywidgets where an output has code that can send a message to code in some other output? If so, that's fine and this API won't block that. Or, are you suggesting that there is a case where a totally new output will be added to a cell that is not executing? If so, I don't understand that case.
This is for triggering executions, not monitoring execution state. That info should still be readable to other things outside of the kernel. Do you want to get an event when a cell starts executing?
Would also like to hear more about what you mean here
Most "metadata" that we have right now is related to execution and all of that will be restricted to the kernel, which makes sense as it's the only thing that knows how to execute a cell. Whatever is left, I guess things like Let's not worry too much about untrusted notebooks right now. I think VS Code should handle that in a nice coherent way in general, and we shouldn't open the API just for that case. |
No this isn't the case, we have scenarios where existing output in other cells are updated.
Yes.
I agree, realized that i'd be derailing this conversation hence my subsequent suggestion to ignore |
@roblourens I'm confused on this part here: // When the play button is clicked, the cell is immediately running. Don't wait for a roundtrip to show the spinner.
// vscode needs to create the execution object for that to make sense.
executeCell(document: NotebookDocument, cell: NotebookCell, execution: NotebookCellExecution): void; If there's a |
Yes, at that point, VS Code will call |
If So it would look more like this: interface NotebookKernel {
// When the play button is clicked, the cell is immediately running. Don't wait for a roundtrip to show the spinner.
// vscode needs to create the execution object for that to make sense.
executeCell(document: NotebookDocument, cell: NotebookCell, execution: NotebookCellExecution): void;
// The kernel also needs a way to trigger a cell execution for "run all cells", liveshare, attaching to a kernel with a running cell, etc
// We would want the kernel to be able to trigger this, but not other extensions.
// And it bypasses the 'runnable' metadata.
// So the kernel gets this event that will trigger an executeCell call when fired.
onDidStartExecution: Event<NotebookCell>;
} |
❤️ for the workflow and it feels natural, here is the way I interpret the API
interface NotebookCellExecutionData {
outputs: Readonly<CellOutput[]>;
result: Readonly<{
executionOrder?: number;
message?: string;
resultKind: NotebookCellExecutionResultKind;
}>;
runStartTime?: number;
lastRunDuration?: number;
} I was confused of the grouping of the properties in
Maybe they can have better names to reflect how they are generated. |
I still think you could eliminate the need for the |
I think I know the answer to this question already, but would the python extension use 'onDidStartExecution' to modify the output of another cell in the document? Seems kinda weird. The user is executing a cell and we fire an execution for another cell. This message here is the specific use case for why this is necessary: Jupyter sends that to update a previous cell's output. |
This is confusing because we keep coming back to this topic of when outputs can actually be updated. When this came up just above in the thread, I thought we established that scenario was only for other creative usages of the notebook API. But looking at this doc and talking to Peng, it seems clear that a basic capability of Jupyter is updating the output of cell B while executing cell A (but only during an execution). Would really appreciate someone sharing a notebook that does this just so I can see it in action. But, that doesn't mean that scenario has to be explicitly supported in our API. The API design question that is hard to answer is, are we modeling Jupyter/the superset of all notebook functionality in the world, or are we modeling a simple ideal notebook that mostly matches real notebooks but may not be a perfect fit. In the first case, let's go with something like the API above where any cell's output can be updated during an execution. In the second, we can just say that in the case of |
Here's example code (although it doesn't work with our extension at the moment) I've fixed the bug in a branch though. This is what it looks like in action (it's actually updating two cells at once): |
It should be noted that IPywidgets do this sort of thing constantly. |
Thanks. I understand the way that ipywidgets run code in the same context and communicate directly, this is different from the updates coming from the kernel (which maybe ipywidgets do that too I guess) |
Yes IPyWidgets can do that too. They'll send update_display_data messages to other cells. |
Can they do that at any time or only during an "execution"? |
I think there might be a case where you have an ipywidget slider, which can send out |
Yes @rebornix is correct. At any point messages can come in. We have to listen to |
Peng and I talked for awhile about what this means. And per #103713 (comment) we can design the model for this or we can let Python work around it with a "fake" execution. But I don't like such a major hack becoming a recommended way to do something this essential. Here is one basic concept interface CellExecutionDataAccessor {
updateCellOutputs(cell: NotebookCell, outputs: CellOutput[]);
createNotebookCellExecution(cell: NotebookCell, executionStartTime?: number): NotebookCellExecution;
}
interface NotebookKernelProvider<T extends NotebookKernel = NotebookKernel> {
resolveKernel?(kernel: T, document: NotebookDocument, webview: NotebookCommunication, cellExecutionDataAccessor: CellExecutionDataAccessor, token: CancellationToken): ProviderResult<void>;
} This uses the resolveKernel call that already exists to give the extension an accessor for the kernel's private mutation methods. Note this is also moving We flattened execution data // --- READ API
interface NotebookCellExecutionData {
readonly outputs: Readonly<CellOutput[]>;
readonly resultKind: NotebookCellExecutionResultKind;
readonly executionOrder?: number;
readonly message?: string;
readonly runStartTime?: number;
readonly lastRunDuration?: number;
}
interface NotebookCell {
readonly executionData: Readonly<NotebookCellExecutionData> | undefined;
} Answering the questions above about |
Thanks @roblourens. The With regards to the |
Note: Based on the API, the However this doesn't work for us
This is misleading.
This makes a lot of sense when we start running all cells in a notebook, we might want to mark all cells as buys, however only the first cell is executing and others are marked as buys to let the user know that we are processing them (whether we run all in parallel or in sequence is an implementation detail - i.e. upto extensions) |
This API lets you set the total duration if you want to run a stopwatch yourself. You can use that to at least end up with the correct duration. It does not give you a way to determine when the timer shows up and starts counting. We talked about having some "queued" state in the past, I thought that we concluded it wasn't really necessary. But if we want to add that, we can figure out how to take an execution through queued/running states. This is sort of in conflict with the idea we had that the cell UI should change immediately on clicking the run button (like without a round trip to the extension host). |
Will discuss offline and update this |
My opinion is to display the execution time for the cell as we saw it. The magic commands do this but without the network overhead but I don't think anybody is that picky. Basically this option:
|
thanks @rchiodo
@roblourens |
Yeah, we will have to give some mechanism for starting the timer. After more discussion today, the conclusion is that we should table this discussion for now and revisit it later. I think the main priority right now is to continue making progress towards a nice stable notebook experience with Jupyter and our other partners. We have a flexible API right now that we have implemented and you have implemented, and it would be best to keep that and determine what the ideal API is later on when things settle down and stabilize and we have an even better understanding of everyone's scenarios. For future reference, my code samples above represent an up to date proposal, with the exception of what we just said about the timer. And also, we discussed loosening it up a bit more and making the output updates and even the execution updates just part of the general edit API. Thanks for your feedback in talking through this with us @DonJayamanne and @rchiodo, I think we learned a lot through this exploration. |
Ok, taking this issue way back to its roots for a moment. I pushed a change for "clear outputs" so that github issues and, I think, python, can remove their hacks to clear execution state when outputs.length goes to 0. The logic is to wipe runState, runStartTime, lastRunDuration, and statusMessage in the "clear outputs" command UNLESS the cell's runState is Running. So then it should work to clear the previous outputs while a cell is running. |
Let's close this |
We implemented Clear Outputs command/action in the core and have it displayed in the cell toolbar. However the side effects of this command is not clear
It's not clear to extension authors who should handle what: should the core modify the status when outputs are cleared? should the core make the document dirty? If the extensions are responsible for that, what events should they listen to?
cc @roblourens @DonJayamanne
The text was updated successfully, but these errors were encountered: