Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Notebook execution model API (+ clear outputs side effects) #103713

Closed
rebornix opened this issue Jul 31, 2020 · 52 comments
Closed

Notebook execution model API (+ clear outputs side effects) #103713

rebornix opened this issue Jul 31, 2020 · 52 comments
Assignees
Labels
api-proposal notebook under-discussion Issue is under discussion for relevance, priority, approach
Milestone

Comments

@rebornix
Copy link
Member

We implemented Clear Outputs command/action in the core and have it displayed in the cell toolbar. However the side effects of this command is not clear

  • ✅ it clear the outputs of the active cell, and emit the output change event
  • ❓ it does not clear the status
  • ❓ it does not clear the execution count, duration time, etc
  • ❓ it does interrupt the active execution
  • ❓ it does not make the document dirty

It's not clear to extension authors who should handle what: should the core modify the status when outputs are cleared? should the core make the document dirty? If the extensions are responsible for that, what events should they listen to?

cc @roblourens @DonJayamanne

@rebornix rebornix added under-discussion Issue is under discussion for relevance, priority, approach notebook labels Jul 31, 2020
@rebornix rebornix added this to the July 2020 milestone Jul 31, 2020
@DonJayamanne
Copy link
Contributor

does interrupt the active execution

For Jupyter we don't want it to interrupt.
E.g. assume you are existing a for loop with a print statement inside it. It's possible to clear the output half way through the execution so that the out only displays the latest. Similar to clearing console window or debug console.

To my knowledge changes to output do not make a document dirty today, hence extensions should handle these and accordingly trigger a dirty change.

Also, when a jupyter notebook is not trusted we clear the output today, once trusted we rrstore all output, and the document is not marked dirty even though we change the output. Though that's something that can change if vsc were to mark docs as dirty of output is changed

@roblourens
Copy link
Member

My own current thinking is that we would clear the outputs and the run state and status message, except if the run state is "running", in which case we clear the outputs but leave those metadata properties alone and assume that the extension will update them soon when execution is finished. Does that sound ok to you?

Some alternatives

  • Expect extensions to implement this command if they want it
  • Add a basic implementation but expect extensions to override that command if they want to tweak the behavior in any way
  • Add an optional method to the notebook kernel that extensions can implement to make it a little easier for extensions to provide their own implementation

@DonJayamanne
Copy link
Contributor

Alternatives sound great.

@rebornix
Copy link
Member Author

@jrieken, @roblourens and me discussed about above challenges and one outcome is we should try to make the metadata minimal and see if it's possible to move execution status related things out of the metadata and merge with outputs as they are all coming from execution.

For example, firstly we move the execution related ones out

export interface NotebookCellMetadata {
	contentEditable?: boolean;
	runnable?: boolean;
	hasExecutionOrder?: boolean;
	breakpointMargin?: boolean;
	inputCollapsed?: boolean;
	outputCollapsed?: boolean;
	custom?: { [key: string]: any };
}

interface IExecutionState {
	executionOrder?: number;
	statusMessage?: string;
	runState?: NotebookCellRunState;
	runStartTime?: number;
	lastRunDuration?: number;
}

and then have a new property for execution results

export interface NotebookCell {
	readonly notebook: NotebookDocument;
	readonly uri: Uri;
	readonly cellKind: CellKind;
	readonly document: TextDocument;
	language: string;
	executionResult: {
		state: IExecutionState;
		outputs: Output[];
	} | undefined;
	metadata: NotebookCellMetadata;
}

The core action "Clear Outputs" will clear the whole executionResult.

No matter which approach we take, the "Clear Outputs" action is opinionated , we may want to explore approaches that allow extensions to control the behavior.

@roblourens
Copy link
Member

roblourens commented Aug 12, 2020

Also related: #100388

More thoughts:

There are three cases for working with the execution state, reading (saving), writing (execution), initialization. We should make executionResult entirely readonly and only editable through some limited API. For initialization, I don't know, it probably has to be a free for all in letting the content provider initialize the execution result. We can take inspiration from the github issue notebooks' NotebookCellExecution and give the extension something with an API like this.

class NotebookCellExecution {
        // Need to support sending multiple cancel signals until it works. Or use a normal token and handle jupyter's case some other way
	cancellationToken: todo;

        // Allow updating outputs during execution
	pushOutputs(outputs: vscode.CellOutput[]) { }
	resolve(executionOrder?: number, message?: string): void { }
	reject(err: any): void { }
}

interface NotebookKernel {
        // [...]

        // executeCell passes a NotebookCellExecution object that the kernel will resolve or reject
	executeCell(document: NotebookDocument, cell: NotebookCell, cellExecution: NotebookCellExecution): void;

        // But the kernel also needs a way to trigger a cell execution for "run all cells", liveshare, attaching to a kernel with a running cell, etc
        // We would want the kernel to be able to trigger this, but not other extensions.
        // And it bypasses the runnable metadata.
        // So the kernel gets this event that will trigger an executeCell call when fired.
	onDidCellBeginExecuting: Event<{ cell: NotebookCell; runStartTime?: number; }>;
}

This is a minor thing but for liveshare you would probably want the extension to determine the duration, not vscode. It would be confusing to see different durations on different machines.

Also if we really want to get rid of invalid states, we should do more to split up Code-type and Markdown-type cells.

@jrieken
Copy link
Member

jrieken commented Aug 12, 2020

No matter which approach we take, the "Clear Outputs" action is opinionated , we may want to explore approaches that allow extensions to control the behavior.

I like @roblourens suggestion of having an optional clear function that an extension can implement. The presence of the function enables the UI (which we own) but the implementation/behaviour is entirely up to the extension. I think that's a good alternative in case we cannot agree on a generic "clear output" behaviour

@jrieken
Copy link
Member

jrieken commented Aug 12, 2020

Dragging @roblourens' proposal a little further. Let the event emit an execution object, make the execution object an managed object

// managed object, call `createNotebookCellExecution `
interface NotebookCellExecution {
	//...
	cancellationToken: CancelationToken;

	progress: Progess<NotebookCellExecutionProgress>

	// Allow updating outputs during execution
	pushOutputs(outputs: vscode.CellOutput[]);

	resolve(executionOrder?: number, message?: string): void;

	reject(err: any): void;
}


interface NotebookKernel {
	// [...]

	// executeCell passes a NotebookCellExecution object that the kernel will resolve or reject
	executeCell(document: NotebookDocument, cell: NotebookCell): NotebookCellExecution;

	// But the kernel also needs a way to trigger a cell execution for "run all cells", liveshare, attaching to a kernel with a running cell, etc
	// We would want the kernel to be able to trigger this, but not other extensions.
	// And it bypasses the runnable metadata.
	// So the kernel gets this event that will trigger an executeCell call when fired.
	onDidStartExecution: Event<NotebookCellExecution>;
}

namespace notebooks {
	export declare function createNotebookCellExecution(cell: NotebookCell, ...more: NotebookCell[]): NotebookCellExecution;
}

@jrieken
Copy link
Member

jrieken commented Aug 12, 2020

The ugly part (which we have avoided in other APIs) is that implementors of executeCell must call createNotebookCellExecution inside their body. Not nice...

@roblourens
Copy link
Member

I don't like that anybody can call createNotebookCellExecution, only the kernel should be able to do that. I was trying to figure out how to hand over some factory to the kernel but don't know how to do that in a nice way.

@roblourens
Copy link
Member

I guess if the execution object doesn't do anything outside of the kernel methods, it's ok. Thanks for writing this out though, that helps clear it up a lot.

@jrieken
Copy link
Member

jrieken commented Aug 13, 2020

A slight variant would be to allow to assign the NotebookCellExecution to a cell, e.g an extension could do

const exec = vscode.notebook.createNotebookCellExecution(...);
cell.execution = exec;

exec.pushOutput(...)
exec.done(new Error('didn't know what to to'))

That would be closer to todays approach but be more strictly typed (moves this from metadata to a "topic type"). Tho, it doesn't address your concern that basically any extension can "execute" a cell. If we want to enforce execution to be strictly kernel only then we need to find a way to pass some kind of execution factory to the kernel, maybe a function on the kernel that signals that it is the selected kernel for a notebook and that includes a mechanism to set execution.

@DonJayamanne
Copy link
Contributor

A few comments (context = Jupyter/Python extension):

  • pushOutput
    As part of executing a cell, its possible some existing output can get cleared.
    E.g. half way through execution, the output can get cleared with the new output.
  • pushOutput
    As part of executing a cell, its possible for output from one cell to cause an update of output in another cell (even though that other cell is not running).
  • I do like the addition of onDidStartExecution
    This is something we need outside the Python extension (https://github.com/microsoft/vscode-gather)
    I.e. I'd prefer this to be accessible for other extensions (to monitor execution state of current notebook).
  • When looking at NotebookCellExecution
    I'm not sure about resolve & reject.
    This makes the assumption that executionOrders are not available if there's an error.
    Similarly this makes the assumption that message will not be updated if there's an error.

To me the NotebookCellExecution interface looks very much like a cancellable promise or a Task in .NET TPL.

  • It provides some notion of the state of the task
  • It allows us to cancel the pending task

Based on this observation I'd propose the following:

interface NotebookCellExecution {
	cell: NotebookCell;
	cancel(): void;
	done: Promise<void>;
}

interface NotebookExecution {
	document: NotebookDocument;
	cancel(): void;
	done: Promise<void>;
}

interface NotebookKernel {
	executeCell(document: NotebookDocument, cell: NotebookCell): void;
	
	// Kernel must always fire this event when running a cell.
	// This also allows VSCode to cancel an executing cell.
	onDidStartCellExecution: Event<NotebookCellExecution>;
	// Triggered for notebook (all cells), VSC can use this to cancel notebook execution.
	onDidStartNotebookExecution: Event<NotebookCellExecution>;

	// The `cancel` method on the above two interfaces `possibly` removes the need for `NotebookKernel.cancelExecution` & `NotebookKernel.cancelAllCellsExecution`
}
  • NotebookKernel.executeCell
    • For all intensive purposes, this method could be treated as a noop by VS Code.
    • Its just a way for VS Code to notify the kernel that it needs to execute a cell.
    • Note: In the case of Jupyter (Python extension) we queue cells for execution (this is an implementation detail).
    • Thus the events such as onDidStartCellExecution is what tells VSCode whether a cell has actually started executing and its more interesting (at this point, VSCode can update the cell UI with a spinner and a cancel button, etc).
  • NotebookCellExecution.done instead of NotebookCellExecution.progress (or both).
    • I personally feel knowing whether a cell execution has completed would be more important than monitoring the progress (at least from an extension perspective, this allows me to do some post processing on output as a separate extension, where as progress is UI focused).
    • Progress would probably be more useful for core extensions (Liveshare & VSCode to update UI)
  • NotebookCellExecution.pushOutput
    • Removed, as I feel different extensions might want to deal with output in different ways
    • E.g. in Python extension we might want to clear output or clear and then append or append.
    • Also not having an explicit function to update outputs is better, else having that leads to the assumption that its the only way to update output.
  • Updating metadata
    • Updating metadata would be handled by the kernel (e.g. executionOrder, message, etc)

@DonJayamanne
Copy link
Contributor

we may want to explore approaches that allow extensions to control the behavior.

Thanks, we (Python extension) will need this.
As clearing cells during execution should only clear the output and nothing else.

@jrieken
Copy link
Member

jrieken commented Aug 13, 2020

I like where this going. A few followup questions:

  • Assuming executeCell doesn't return anything, what should VS Code do after calling the function and before receiving the event. The extension host could be blocked or queuing cell execution might be super slow and during that time we are blind. Should we have some implicit contract that pressing the button "disables" it until an event is received? Should that event contain some correlation data to know that the event was generated in response to our call?
  • We would like to have a model where output is the result of an execution and not a public thing anyone can write to. For instance Cell#output would be readonly and it would change as execution happens/finishes. That's how we landed on pushOutput but we could extend that to support reset and writing other cells output.

@DonJayamanne
Copy link
Contributor

DonJayamanne commented Aug 13, 2020

Should that event contain some correlation data to know that the event was generated in response to our call?

Hmm, good issues.
How about the following:

  • The correlation between queueCell and onDidStartCellExecution is a list
  • As soon as you call queueCell we push something into a stack,
  • & when we get onDidStartCellExecution fired, it would pertain to the oldest call to queueCell (i.e. this is an implementation detail)
interface NotebookKernel {
	// Probably rename as well, then its obvious that it will not start it.
	// Obvious to extension authors.
	// Triggering of the event `onDidStartCellExecution` is when the actual execution takes place.
	queueCell(document: NotebookDocument, cell: NotebookCell): Promise<void>;
}

would be readonly and it would change as execution happens/finishes. That's how we landed on pushOutput but we could extend that to support reset and writing other cells output.

Interesting, I guess this rules out the ability for other extensions to update a notebook.
Doesn't this end up with a confusing set of API, only kernels can update output, but other extensions can update the rest of the metadata(Notebook, Cell and Output metadata)?

Related issue with regareds to Untrusted Notebooks

Today in the Python extension when opening a notebook, we do not display the output, as it could contain malicious JS. We display a prompt to the user and they click a button to trust the notebook.
As a result of this action, we restore the cell output.
* Hence we are using the ability to update cell output without effecting the dirty state
* We are updating cell output without executing cels
Alternate solution:

  • When a notebook is trusted, we can optionally mark the document as dirty and immediately revert the notebook. However this is kinda hacky, as an auto save could have been performed by VSCode in the interim.

What I'm getting at it, if we're not able to update cell output without executing cells, we'd need a way to hide cells and then restore the cells (either updating the cells or reloading the notebook - even though no changes have been made to the document).

@DonJayamanne
Copy link
Contributor

Related issue with regareds to Untrusted Notebooks

Lets try to address that separately, just thought you should know how we are using the existing API.

We would like to have a model where output is the result of an execution and not a public thing anyone can write to

Hmm, In this case your original proposal works better, where VSC provides an instance of NotebookCellExecution.

@roblourens
Copy link
Member

its possible for output from one cell to cause an update of output in another cell (even though that other cell is not running).

Can you say more about this? Are you talking about the sort of case like with ipywidgets where an output has code that can send a message to code in some other output? If so, that's fine and this API won't block that. Or, are you suggesting that there is a case where a totally new output will be added to a cell that is not executing? If so, I don't understand that case.

I do like the addition of onDidStartExecution
This is something we need outside the Python extension (https://github.com/microsoft/vscode-gather)
I.e. I'd prefer this to be accessible for other extensions (to monitor execution state of current notebook).

This is for triggering executions, not monitoring execution state. That info should still be readable to other things outside of the kernel. Do you want to get an event when a cell starts executing?

(at least from an extension perspective, this allows me to do some post processing on output as a separate extension, where as progress is UI focused).

Would also like to hear more about what you mean here

Doesn't this end up with a confusing set of API, only kernels can update output, but other extensions can update the rest of the metadata(Notebook, Cell and Output metadata)?

Most "metadata" that we have right now is related to execution and all of that will be restricted to the kernel, which makes sense as it's the only thing that knows how to execute a cell. Whatever is left, I guess things like editable and runnable, yeah current thinking is that you could write a command that modifies these from any random extension, I think that's ok though.

Let's not worry too much about untrusted notebooks right now. I think VS Code should handle that in a nice coherent way in general, and we shouldn't open the API just for that case.

@DonJayamanne
Copy link
Contributor

Or, are you suggesting that there is a case where a totally new output will be added to a cell that is not executing

No this isn't the case, we have scenarios where existing output in other cells are updated.
I.e. no new output is added. Hence what we have today is ok, provided extensions can update output in other cells.

Do you want to get an event when a cell starts executing?

Yes.

I think VS Code should handle that in a nice coherent way in general, and we shouldn't open the API just for that case.

I agree, realized that i'd be derailing this conversation hence my subsequent suggestion to ignore untrusted notebooks.

@rchiodo
Copy link
Contributor

rchiodo commented Aug 18, 2020

@roblourens I'm confused on this part here:

        // When the play button is clicked, the cell is immediately running. Don't wait for a roundtrip to show the spinner. 
	// vscode needs to create the execution object for that to make sense.
	executeCell(document: NotebookDocument, cell: NotebookCell, execution: NotebookCellExecution): void;

If there's a onDidStartExecution event, why would you need the execution parameter? Doesn't the extension create it when it fires the onDidStartExecution event?

@DonJayamanne
Copy link
Contributor

Doesn't the extension create it when it fires the onDidStartExecution event?

Yes, at that point, VS Code will call executeCell with the same NotebookCellExecution that was created by the extension.
This way, there's only one entry point for executing cells and that's executeCell.
The onDidStartExecution is kind of a trigger for extensions to tell VSCode that extension has manually started executing a cell (& not the user).

@rchiodo
Copy link
Contributor

rchiodo commented Aug 18, 2020

If onDidStartExecution is the way an extension indicates its own execution, perhaps the createNotebookCellExecution is unnecessary? VS code would generate the cell execution item.

So it would look more like this:

interface NotebookKernel {
	// When the play button is clicked, the cell is immediately running. Don't wait for a roundtrip to show the spinner. 
	// vscode needs to create the execution object for that to make sense.
	executeCell(document: NotebookDocument, cell: NotebookCell, execution: NotebookCellExecution): void;

	// The kernel also needs a way to trigger a cell execution for "run all cells", liveshare, attaching to a kernel with a running cell, etc
	// We would want the kernel to be able to trigger this, but not other extensions.
	// And it bypasses the 'runnable' metadata.
	// So the kernel gets this event that will trigger an executeCell call when fired.
	onDidStartExecution: Event<NotebookCell>;

}

@rebornix
Copy link
Member Author

rebornix commented Aug 18, 2020

❤️ for the workflow and it feels natural, here is the way I interpret the API

  • when users start Run Cell from the UI or commands, the core generates a NotebookCellExecution with executionStartTime filled in automatically, run executeCell
  • a kernel can emit event onDidStartExecution: Event<NotebookCellExecution> with executionStartTime prefilled, which will at the end run executeCell
  • when execution finishes/resolves, it returns outputs and result: NotebookCellExecutionResult.
  • NotebookCell#executionData.outputs and NotebookCell#executionData.result are updated accordingly
  • runStartTime and lastRunDuration are calcualted based on execution result (runStartTime and the time when the execution resolves)
interface NotebookCellExecutionData {
	outputs: Readonly<CellOutput[]>;
	result: Readonly<{
		executionOrder?: number;
		message?: string;
		resultKind: NotebookCellExecutionResultKind;
	}>;
	runStartTime?: number;
	lastRunDuration?: number;
}

I was confused of the grouping of the properties in NotebookCellExecutionData but after thinking through I think they make sense

  • outputs is the result of the execution of the cell content (execution_result in jupyter)
  • NotebookCellExecutionResult { executionOrder?: number; message?: string; resultKind: NotebookCellExecutionResultKind; } is the meta info of the execution (execution_reply in jupyter)
  • runStartTime and lastRunDuration are used to calculate the elapsed time.

Maybe they can have better names to reflect how they are generated.

@rchiodo
Copy link
Contributor

rchiodo commented Aug 18, 2020

I still think you could eliminate the need for the createNotebookCellExecution. Start time on the NotebookCellExecution would be generated when the event fired.

@rchiodo
Copy link
Contributor

rchiodo commented Aug 18, 2020

I think I know the answer to this question already, but would the python extension use 'onDidStartExecution' to modify the output of another cell in the document? Seems kinda weird. The user is executing a cell and we fire an execution for another cell.

This message here is the specific use case for why this is necessary:
https://jupyter-client.readthedocs.io/en/stable/messaging.html#update-display-data

Jupyter sends that to update a previous cell's output.

@roblourens
Copy link
Member

This is confusing because we keep coming back to this topic of when outputs can actually be updated. When this came up just above in the thread, I thought we established that scenario was only for other creative usages of the notebook API. But looking at this doc and talking to Peng, it seems clear that a basic capability of Jupyter is updating the output of cell B while executing cell A (but only during an execution). Would really appreciate someone sharing a notebook that does this just so I can see it in action.

But, that doesn't mean that scenario has to be explicitly supported in our API. The API design question that is hard to answer is, are we modeling Jupyter/the superset of all notebook functionality in the world, or are we modeling a simple ideal notebook that mostly matches real notebooks but may not be a perfect fit.

In the first case, let's go with something like the API above where any cell's output can be updated during an execution. In the second, we can just say that in the case of update_display_data, the Python extension can hack around it by creating an execution for that other cell, updating its outputs, and resolving it immediately, setting the same duration/result/message so the rest of the UI stays the same. This is a tempting option but then hacks like this can conflict with other things like the change we made to show a progress spinner for a minimum of X ms, and we need to consider the hack a golden path that we don't break.

@rchiodo
Copy link
Contributor

rchiodo commented Aug 18, 2020

Here's example code (although it doesn't work with our extension at the moment)
https://mindtrove.info/jupyter-tidbit-display-handles/#:~:text=IPython%27s%20display%20%28%29%20function%20can%20return%20a%20DisplayHandle,from%20any%20other%20cell%20in%20a%20Jupyter%20Notebook.

I've fixed the bug in a branch though. This is what it looks like in action (it's actually updating two cells at once):
update

@rchiodo
Copy link
Contributor

rchiodo commented Aug 18, 2020

It should be noted that IPywidgets do this sort of thing constantly.

@roblourens
Copy link
Member

Thanks. I understand the way that ipywidgets run code in the same context and communicate directly, this is different from the updates coming from the kernel (which maybe ipywidgets do that too I guess)

@rchiodo
Copy link
Contributor

rchiodo commented Aug 18, 2020

Yes IPyWidgets can do that too. They'll send update_display_data messages to other cells.

@roblourens
Copy link
Member

roblourens commented Aug 18, 2020

Can they do that at any time or only during an "execution"?

@rebornix
Copy link
Member Author

I think there might be a case where you have an ipywidget slider, which can send out update_display_data events, and update a static text output in another cell. It's not part of any execution.

@rchiodo
Copy link
Contributor

rchiodo commented Aug 19, 2020

Yes @rebornix is correct. At any point messages can come in. We have to listen to update_display_data all the time.

@roblourens
Copy link
Member

roblourens commented Aug 19, 2020

Peng and I talked for awhile about what this means. And per #103713 (comment) we can design the model for this or we can let Python work around it with a "fake" execution. But I don't like such a major hack becoming a recommended way to do something this essential.

Here is one basic concept

interface CellExecutionDataAccessor {
	updateCellOutputs(cell: NotebookCell, outputs: CellOutput[]);
	createNotebookCellExecution(cell: NotebookCell, executionStartTime?: number): NotebookCellExecution;
}

interface NotebookKernelProvider<T extends NotebookKernel = NotebookKernel> {
	resolveKernel?(kernel: T, document: NotebookDocument, webview: NotebookCommunication, cellExecutionDataAccessor: CellExecutionDataAccessor, token: CancellationToken): ProviderResult<void>;
}

This uses the resolveKernel call that already exists to give the extension an accessor for the kernel's private mutation methods. Note this is also moving createNotebookCellExecution to there. When resolveKernel is called, it's handed a CellExecutionDataAccessor which is associated with Kernel+Document, not just the kernel instance. This gets increasingly complex but it lets us scope the output mutations to just the kernel at least.

We flattened execution data

// --- READ API

interface NotebookCellExecutionData {
	readonly outputs: Readonly<CellOutput[]>;
	readonly resultKind: NotebookCellExecutionResultKind;
	readonly executionOrder?: number;
	readonly message?: string;
	readonly runStartTime?: number;
	readonly lastRunDuration?: number;
}

interface NotebookCell {
	readonly executionData: Readonly<NotebookCellExecutionData> | undefined;
}

Answering the questions above about onDidStartExecution vs executeCell, we have the API such that firing the event will not trigger a executeCell call. It's simpler this way, so every time executeCell is called, the extension can always trigger a cell execution at that point. Otherwise, sometimes when that method is called, you would trigger a cell execution, and sometimes (when it's the result of the extension firing the event) it would not trigger an execution, which is confusing.

@rchiodo
Copy link
Contributor

rchiodo commented Aug 19, 2020

Thanks @roblourens. The CellExecutionDataAccessor seems fine to me. The reason for this instead of just letting arbitrary updating is so it's more explicit?

With regards to the onDidStartExecution, this means you'll use this to mark a cell as executing I assume? So if there's some external way to run a cell (like say run all cells), we'll fire onDidStartExecution for each and the UI will update to show executing.

@DonJayamanne
Copy link
Contributor

DonJayamanne commented Aug 19, 2020

Note:
I don't see a way to update the executionOrder, message, lastStartTime and lastRunDuration.
Oops found that its using NotebookCellExecution

Based on the API, the executionStartTime is readonly and can only be set when calling createNotebookCellExecution.

However this doesn't work for us
E.g. when user hits run cell, with a simple print("Hello Wold")

  • We want to immediately indicate the cell has started running
  • However its possible we're still buy starting the Jupyter kernel (so technically the cell has not yet been sent to the kernel)
  • After say 5 seconds, the kernel starts, now extension sends the cell code to the Jupyter kernel
  • After execution , it will display >5s to execute the print statement

This is misleading.
What we'd like is

  • We'd like to mark a cell as being busy (queued for execution)
  • & control when the cell actually starts

This makes a lot of sense when we start running all cells in a notebook, we might want to mark all cells as buys, however only the first cell is executing and others are marked as buys to let the user know that we are processing them (whether we run all in parallel or in sequence is an implementation detail - i.e. upto extensions)

@roblourens
Copy link
Member

This API lets you set the total duration if you want to run a stopwatch yourself. You can use that to at least end up with the correct duration. It does not give you a way to determine when the timer shows up and starts counting.

We talked about having some "queued" state in the past, I thought that we concluded it wasn't really necessary. But if we want to add that, we can figure out how to take an execution through queued/running states.

This is sort of in conflict with the idea we had that the cell UI should change immediately on clicking the run button (like without a round trip to the extension host).

@DonJayamanne
Copy link
Contributor

DonJayamanne commented Aug 19, 2020

Will discuss offline and update this

@rchiodo
Copy link
Contributor

rchiodo commented Aug 19, 2020

My opinion is to display the execution time for the cell as we saw it. The magic commands do this but without the network overhead but I don't think anybody is that picky. Basically this option:

Should we support both, out of the box we have a good approximation of times, and if users use the magic commands, then that will work as well (i believe majority of the users won't really care about this stuff, though i could be completely wrong)

@DonJayamanne
Copy link
Contributor

thanks @rchiodo

This is sort of in conflict with the idea we had that the cell UI should change immediately on clicking the run button (like without a round trip to the extension host).

@roblourens
Can we not update the cell UI state (just icon), change from play to stop and leave the timer for extension to update via some other means.

@roblourens
Copy link
Member

roblourens commented Aug 19, 2020

Yeah, we will have to give some mechanism for starting the timer.

After more discussion today, the conclusion is that we should table this discussion for now and revisit it later. I think the main priority right now is to continue making progress towards a nice stable notebook experience with Jupyter and our other partners. We have a flexible API right now that we have implemented and you have implemented, and it would be best to keep that and determine what the ideal API is later on when things settle down and stabilize and we have an even better understanding of everyone's scenarios.

For future reference, my code samples above represent an up to date proposal, with the exception of what we just said about the timer. And also, we discussed loosening it up a bit more and making the output updates and even the execution updates just part of the general edit API.

Thanks for your feedback in talking through this with us @DonJayamanne and @rchiodo, I think we learned a lot through this exploration.

@roblourens
Copy link
Member

Ok, taking this issue way back to its roots for a moment. I pushed a change for "clear outputs" so that github issues and, I think, python, can remove their hacks to clear execution state when outputs.length goes to 0. The logic is to wipe runState, runStartTime, lastRunDuration, and statusMessage in the "clear outputs" command UNLESS the cell's runState is Running. So then it should work to clear the previous outputs while a cell is running.

@jrieken
Copy link
Member

jrieken commented Sep 1, 2020

Let's close this

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-proposal notebook under-discussion Issue is under discussion for relevance, priority, approach
Projects
None yet
Development

No branches or pull requests

5 participants