Trim notebook large output for better performance #9594

echarles · 2021-01-11T11:51:40Z

References

Code changes

Add a maximumTopBottomOutput notebook setting and trim outputs that are larger than a predefined multiple of that maximumTopBottomOutput value. A relevant message is shown to the user. If the user runs again the cell, he will get the complete output.

User-facing changes

Opening a notebook, the user will be shown with a message for large outputs.

Backwards-incompatible changes

None.

jupyterlab-dev-mode · 2021-01-11T11:51:51Z

Thanks for making a pull request to JupyterLab!

To try out this branch on binder, follow this link:

echarles · 2021-01-11T11:54:12Z

@goanpeca @isabela-pf Thx for your reviews and feedbacks.

mlucool

Suggestion: move this to the body. Putting it in the ctor will help with page load, but won't prevent a notebook slowdown from a large number of DOM nodes accidently being added when a cell outputs too much content.

mlucool · 2021-01-11T14:07:36Z

packages/outputarea/src/widget.ts

+              <div style="margin: 10px"
+                <pre>Output of this cell has been trimmed on the initial display.</pre>
+                <pre>Total outputs is ${model.length}, displaying the first ${maximumTopBottomOutput} top and last ${maximumTopBottomOutput} bottom outputs.</pre>
+                <pre>Run again this cell to get the complete output.</pre>


Another idea is adding button to click to show all output for this cell. Maybe that's v2?

I have thought about that indeed for a v2 iteration.

mlucool · 2021-01-11T14:08:09Z

packages/outputarea/src/widget.ts

-    for (let i = 0; i < model.length; i++) {
-      const output = model.get(i);
-      this._insertOutput(i, output);
+    const maximumTopBottomOutput = options.maximumTopBottomOutput || 10;


What if options.maximumTopBottomOutput === 0 disabled this feature so it is backward compatible?

Great idea! will do

echarles · 2021-01-11T14:52:13Z

Suggestion: move this to the body. Putting it in the ctor will help with page load, but won't prevent a notebook slowdown from a large number of DOM nodes accidently being added when a cell outputs too much content.

@mlucool Can you elaborate a bit on this. I am not sure to fully understand. Thx

mlucool · 2021-01-11T18:58:12Z

@mlucool Can you elaborate a bit on this. I am not sure to fully understand. Thx

My understanding of this change is that when a user opens a notebook, only the first and last N items are shown for a cell. The problem is if you run a cell and it outputs 10k outputs, a user's notebook may still freeze up due to the number of outputs rendered. By moving the logic out of the ctor (e.g. into _insertOutput) it saves the user from a footgun.

echarles · 2021-01-11T19:19:07Z

My understanding of this change is that when a user opens a notebook, only the first and last N items are shown for a cell. The problem is if you run a cell and it outputs 10k outputs, a user's notebook may still freeze up due to the number of outputs rendered. By moving the logic out of the ctor (e.g. into _insertOutput) it saves the user from a footgun

OK, I see. I was thinking the run as being a workaround for the user to still see the output waiting on the next dev iteration that would show up a button to reveal the unhidden content. I may come up tomorrow with both: moving to _insertOutput have a button (need to test a bit how to implement that and how it potentially affects performance)

echarles · 2021-01-12T13:06:11Z

@mlucool I have implemented the reveal of the trimmed outputs when the user clicks on the message + the backwards compatibility if the maximumTopBottomOutput is zero (or negative) (see screencast here after.

For the logic move to the insertOutput so that outputs are also trimmed on code execution, I came to read again https://jupyter-client.readthedocs.io/en/stable/messaging.html to make sure we can trigger the end of the execution. I have not (yet) implemented that yet as the output area would need to watch the shell channel (which is does not for now) for a execute_reply message (which it does not for now). This drives me to 2 questions:

As as user, if I request a cell execution, I explicitly want all the outputs (but that is just me, I guess asking to more users will generate a variety of answers).
There is a risk that the execute_reply message does not come through, and that the output area will not show the terminating lines (@Carreau can may be tell more on that risk).

mlucool · 2021-01-12T15:24:12Z

@mlucool I have implemented the reveal of the trimmed outputs when the user clicks on the message + the backwards compatibility if the maximumTopBottomOutput is zero (or negative) (see screencast here after.

Looks great!

As as user, if I request a cell execution, I explicitly want all the outputs (but that is just me, I guess asking to more users will generate a variety of answers).

I don't think this is always true. If you do x + 1, then certainly show the output. If you run run_work_on_100000_machines() and all 100k return the same error, you really only want the first and last few. In general, at some point too much output is more noise then value (lab's performance will degrade)

There is a risk that the execute_reply message does not come through, and that the output area will not show the terminating lines (@Carreau can may be tell more on that risk).

To be clearer, I am suggesting as data comes in we do something like this:

Message 1->2N output as is (no change)
Message 2N+1: Output: 1...N, Message, N+2...2N+1
Message 2N+2: Output: 1...N, Message, N+3...2N+2
...
Message 3N: Output: 1...N, Message, 2N...3N

That is, as a each new message comes in, start replacing the oldest of the tail after some threshold.

echarles · 2021-01-12T16:23:15Z

I don't think this is always true. If you do x + 1, then certainly show the output. If you run run_work_on_100000_machines() and all 100k return the same error, you really only want the first and last few. In general, at some point too much output is more noise then value (lab's performance will degrade)

True.

That is, as a each new message comes in, start replacing the oldest of the tail after some threshold.

Sounds like a great idea . The tail will change in live, if the user wants to see the body, he can always click on the message.

Let me give it a try.

Carreau · 2021-01-12T17:15:58Z

2. There is a risk that the execute_reply message does not come through, and that the output area will not show the terminating lines (@Carreau can may be tell more on that risk).

Most of the risk is if the network breaks; or if the kernel crash; so I think that should be fine.

Note that there is also:

--NotebookApp.iopub_data_rate_limit=<Float>
    (bytes/sec) Maximum rate at which stream output can be sent on iopub before
    they are limited.
    Default: 1000000
--NotebookApp.iopub_msg_rate_limit=<Float>
    (msgs/sec) Maximum rate at which messages can be sent on iopub before they
    are limited.
    Default: 1000

But those limit the rate to protect against accidental while True loop in teaching context.
Most o those might also be completely fixed with a server side model in the long run.

echarles · 2021-01-13T15:16:09Z

The tail of the output area is now showing the new outputs in a "stream" way on cell run.

echarles · 2021-01-13T18:03:24Z

FYI I am working to take into account the kernel execute_reply so the above behavior is also applicable to new cells.

ellisonbg · 2021-01-13T18:35:07Z

A few comments:

In the first versions of JupyterLab we explored more aggressive output collapsing and got some pretty strong pushback. I don't have a link to the specific issues but am hesitant to ship something that doesn't address users who are used to how the classic notebook works with large output.
Requiring a user to rerun a cell is going to introduce a massive amount of pain. The reason is that long output is often associated with a long running cell. Image how frustrating it would be to run a cell overnight, not see the output you need to see, and having to rerun the cell again. While I am open the this type of approach overall, I think we have to have an approach that allows the user to perform a simple action (click a button) to instantly view all the output.
Because output can be a mix of visual and textual, I think it is important for us to work hard on the visual treatment of this UI.
Not sure 10 is large enough. Do we have any performance data to help guide our decision on the right cut point?
How is the limit of 10 calculated? What counts towards that limit?

echarles · 2021-01-13T19:07:07Z

In the first versions of JupyterLab we explored more aggressive output collapsing and got some pretty strong pushback. I don't have a link to the specific issues but am hesitant to ship something that doesn't address users who are used to how the classic notebook works with large output.

If this is a show-stopper, we can set the setting maximumTopBottomOutput to 0 to ensure backwards compabitility

Requiring a user to rerun a cell is going to introduce a massive amount of pain. The reason is that long output is often associated with a long running cell. Image how frustrating it would be to run a cell overnight, not see the output you need to see, and having to rerun the cell again. While I am open the this type of approach overall, I think we have to have an approach that allows the user to perform a simple action (click a button) to instantly view all the output.

User does not need to rerun to see the output, he just has to click on the message to see the trimmed outpus.

Because output can be a mix of visual and textual, I think it is important for us to work hard on the visual treatment of this UI.

This aims to address outputs with many top divs (eg. more than 100 ), so even the cell has multiple visual outputs, you would not get those 100 visuals.

Not sure 10 is large enough. Do we have any performance data to help guide our decision on the right cut point?

We have benchmarks in https://github.com/jupyterlab/benchmarks to generate such figures, but this would take quite some work to run them and extract them details. 10 sounds like a reasonable number. We could go to 20 to be sure we don't strip out most of the cells. This is also configurable and the goal is to update that number based on the subsequent RC releases.

How is the limit of 10 calculated? What counts towards that limit?

See previous answer.

echarles · 2021-01-13T19:13:26Z

BTW I need to update the shown message as it says users has to rerun to see the outputs: User has just to click to see the outputs.

mlucool · 2021-01-13T19:36:36Z

I think maximumTopBottomOutput is hard to understand as setting this to 10 give you 20 outputs. It may be clearer to name this maxNumberOutputs and use maximumTopBottomOutput/2 on each side of the message?

echarles · 2021-01-14T08:05:04Z

@mlucool I have changed the setting name to maxNumberOutputs. While ensuring the trimmed output was working fine for a new cell listening to the execute_reply message, I came across the following 2 issues.

First, let's consider the number of output messages will be 17 and that it is the first time the user runs the cell,.

In that case, the model is not yet populated, and we also don't know in advance it will be 17.
When the cell is run, the kernel message are coming in. We can monitor and say that after the 10th, we show the clickable info message, and only show the last message to show the activity (see previous screencast).
Then we receive the 15th message and with the execute_reply kernel message, we know the execution is over. In that case, we would need to remove the clickable info message and display the 7 last outputs.

As a user experience, this is not great as we show and hide directly the clickable info message.

There is also a second issue. When listening on the execute_reply kernel message, the output area is still inserting widgets after the reception of that kernel message (see the following example in case of 1000 outputs). It is difficult/impossible to exactly know when the latest message is received.

widget.ts:458 ---- 873
widget.ts:458 ---- 874
widget.ts:458 widget.ts:636 --- {header: {…}, msg_id: "5043dda5-97048cdeaef6c9d9327788ff_3044", msg_type: "execute_reply", parent_header: {…}, metadata: {…}, …}
widget.ts:458 ---- 876
widget.ts:458 ---- 877
widget.ts:458 ---- 878
widget.ts:458 ---- 879

Unless we find a way to overcome the above 2 issues, I would suggest to revert back to the initial approach where we only show the message on already populated cells. When the user will open the notebook, we will get performance gain as we can apply the trimmed logic.

The downside is that the user will still see large outputs on runCellor runAllCells, but in terms of performance for the notebook rendering, I don't see real issue as the notebook is already loaded. We only update the DOM for part of the cell outputs, which is OK for performance.

WDYT?

mlucool · 2021-01-14T14:40:14Z

I don't see real issue as the notebook is already loaded.

I opened the notebook in your second screenshot in the first update (i.e. the one that says NameError) in both Lab and Notebook. Even after the pageload, notebook says somewhat usable and I can rerun/clear a cell. Lab basically becomes not responsive to most actions. While I don't have hard data on this, I have a hunch this is due to the CSS/wrapping each in lumino widgets. This means that making this PR work for both load and runtime is pretty important to making lab stable. This state isn't hard to get into if you have a cell that does something highly parallelized and they all have errors.

To address the two issues you raised:
Assume N=20 and we show N/10 before/after. I think the problem is you aren't always showing the last N/2. That is, this should always have the correct output even if you don't know if more messages will come. So in the case described 1-17 should be rendered. This holds for all outputs 1-20. At output 21, you'd see 1-10, msg, 11-21. Then at output 22 it would show 1-10, msg, 12-22. There is never a time when this would need future information to display correctly.

This means that the clickable info is never removed because it is only added after we are sure it will be needed. This should also remember if that if a user clicked show all output when output 21 came, then output 22 comes, we remember not to hide anything.

ellisonbg · 2021-01-14T21:46:35Z

Thanks for clarifying points. Another set of questions: * Are we storing the state related to this in the notebook metadata? What happens if a user reopens a notebook? * How does this state interact with the other state (scrolling, collapsed) that determines how outputs are rendered and handled?

…

On Thu, Jan 14, 2021 at 12:13 PM Eric Charles ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In packages/notebook-extension/schema/tracker.json <#9594 (comment)> : > @@ -388,6 +388,12 @@ "description": "Defines the observed bottom margin for the virtual notebook, set a positive number of pixels to render cells below the visible view", "type": "string", "default": "1000px" + }, + "maxTopBottomCellOutput": { + "title": "The maximum number of output cells to display in the top and bottom for cells", + "description": "Defines the maximum number of output cells to display in the top and bottom for cells with many outputs. Output in between will be trimmed and not displayed.", i think the message is there (next line) — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#9594 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAGXUCP637NZ6GHVMPJHLDSZ5F57ANCNFSM4V5OJRUA> .

-- Brian E. Granger Principal Technical Program Manager, AWS AI Platform ([email protected]) On Leave - Professor of Physics and Data Science, Cal Poly @ellisonbg on GitHub

goanpeca · 2021-01-15T00:17:51Z

Are we storing the state related to this in the notebook metadata?

Nope.

What
happens if a user reopens a notebook?

By default the output cells will be hidden unless the users clicks the message.

How does this state interact with the other state (scrolling, collapsed)
that determines how outputs are rendered and handled?

Good question, not sure!

echarles · 2021-01-15T06:31:50Z

How does this state interact with the other state (scrolling, collapsed) that determines how outputs are rendered and handled?

It the cell output is collapsed, the user can ucollapse it. Depending on the output length, the cell will be shown with or without the trimmed outputs. The some outputs would have been trimmed, the user can still then click on the message and the them like before.

For the scroll behavior, the trimmed outputs are not show and not in the DOM, so they will not be impacted. Only on reveal, they would be.

echarles · 2021-01-15T10:00:18Z

@goanpeca Thx a lot for your review. I have pushed the needed changes.

@ellisonbg Thx for the questions. Anything else we need to provide? I propose to leave this PR open for a few days (this Firday and weekend, so until Monday 18 Jan included). We'd like then if no more concern is raised, to get this merged to release a JupyterLab 2.3.0rc0 version with this in as discussed during the last weekly JupyterLab meeting.

echarles · 2021-01-19T07:46:44Z

@goanpeca This PR is ready for final review (lint is fixes, the other notebook js failure is unrelated to these changes and linked to flaky tests previously seen on the 2.x branches)

goanpeca

Thanks a lot for this work @echarles !

All works as expected

echarles · 2021-01-20T18:23:28Z

@ellisonbg From today community meeting, I understand you may have improvement proposals to make sure the message is seen by the user (or via CSS or via right-click menu) and you also said that it should prolly not block the merge as we can iterate on the message quality in the next Release Candidates.

We have planned resources to release tomorrow 2.3.0RC0. As discussed, it will be much appreciated to get any feedback (merge as it or not). As option, we can set the setting maxNumberOutputs to 0 that would ensure backwards compatibility (all outputs will be show, no message to the user) but we would prefer having the value set to a non-zero value. We could maybe go to 100 instead of 20 so that less cells fall into the trimmed output case.

ellisonbg · 2021-01-20T18:59:17Z

I am supportive of this being merged in its current state (pending any code reviews) with a maxNumberOutputs set to a non-zero value (maybe a compromise of 50). Longer term I think we should explore how we can make this more usable:

Improved styling of the message.
Context menu items on the output and notebook to show all the outputs.
A indicator on the output prompt area to show the user there is more output that is being hidden so that can tell this without scrolling as much.

Nice work though!

echarles · 2021-01-20T19:06:23Z

Thx @ellisonbg for your intake and as usual great inputs, reviews and feedbacks to make jupyterlab a greater piece of software for the users.

@isabela-pf At some point, we can get advice from you on the actions listed by Brian: Improved styling of the message and/or Context menu items on the output and notebook to show all the outputs and/or A indicator on the output prompt area to show the user there is more output that is being hidden so that can tell this without scrolling as much... andor anything you think would make a great user-experience.

echarles · 2021-01-21T06:46:41Z

maxNumberOutputs is now set to 50. I have opened #9652 for the user interaction enhancement.

goanpeca · 2021-01-21T15:06:05Z

Thanks for this amazing work @echarles and all reviewers for their comments. Merging now.

Will cut a release with @blink1073 in a couple of hours!

Cheers

isabela-pf · 2021-01-22T00:48:12Z

Just to ping everyone here who might be interested, I commented on #9593 since this PR has been merged.

uribe-convers · 2021-04-07T18:15:40Z

Hi @goanpeca and @mlucool,

this feature is very helpful but for the work I do, I need to turn it off. I manage multiple users and would prefer to set the default to maxNumberOutputs: 0 as the users build the docker container with Jupyter.

Is there a way I can do set this default value using the jupyter_notebook_config.py file? I don't see it in the options available in the docs.

Thanks fr the help!
Simon

mlucool · 2021-04-07T18:18:34Z

All defaults should be overridable with overrides.json. I too learned about this great feature only recently!

uribe-convers · 2021-04-07T19:36:32Z

Thanks @mlucool, that's a cool way of setting user preferences! However, I haven't been able to get it working...
I created a settings dir and overrides.json file in my environment /root/home/my-user/miniconda3/envs/env/share/jupyter/lab/settings/overrides.json with the text: {"maxNumberOutputs": 0}.

I restarted the Jupyter server and refreshed the browser tab but still don't see the changes. Is there anything else I need to do?
Thank you!

mlucool · 2021-04-07T19:39:41Z

I didn't test this, but you'll want something like this (you need to scope the settings):

    "@jupyterlab/notebook-extension:tracker": {
         "maxNumberOutputs": 0
     }

uribe-convers · 2021-04-07T21:05:56Z

Thanks @mlucool, I appreciate the help! I've tried to put the code below in /root/home/my-user/miniconda3/envs/env/share/jupyter/lab/settings/overrides.json and in /root/home/.jupyter/lab/settings/overrides.json and nothing changes, but I'll keep trying! :)

{
  "@jupyterlab/notebook-extension:tracker": {
    "maxNumberOutputs": 10
    }
}

jasongrout · 2021-04-07T21:43:16Z

Thanks @mlucool, I appreciate the help! I've tried to put the code below in /root/home/my-user/miniconda3/envs/env/share/jupyter/lab/settings/overrides.json and in /root/home/.jupyter/lab/settings/overrides.json and nothing changes, but I'll keep trying! :)

I would suggest trying the example in the docs (changing the default theme) first to make sure that you have the file location correct first, then working on getting the right setting name.

uribe-convers · 2021-04-08T01:30:17Z

yeah, totally @jasongrout—I've tried that too. Thanks

trim large notebook outputs

d9fb9ea

echarles changed the base branch from master to 2.3.x January 11, 2021 11:51

echarles mentioned this pull request Jan 11, 2021

WIP: Hide many outputs and only display top and bottom based on settings #9184

Closed

github-actions bot added pkg:cells pkg:notebook pkg:outputarea labels Jan 11, 2021

echarles changed the title ~~Performance/trim output~~ Trim notebook large output for better performance Jan 11, 2021

mlucool suggested changes Jan 11, 2021

View reviewed changes

echarles added 2 commits January 12, 2021 13:50

clickable message to untrim the output area

4ad9f31

clickable message to untrim the output area

d2a201b

echarles added 2 commits January 13, 2021 14:17

move output trim from constructor to insertOutput

eba171e

Streaming output trimed tail

4cc3049

Use maximumTopBottomOutput to maxNumberOutputs

5fea4e5

echarles added 2 commits January 15, 2021 08:20

minor improvement for the trimmed output code based on @goanpeca review

29a6a19

ensure backwords compatibility if maxNumberOutputs setting is set to 0

739c823

echarles mentioned this pull request Jan 15, 2021

Trim large notebook outputs for better performance #9593

Closed

revert tsconfigdoc.json

a7369f3

goanpeca approved these changes Jan 19, 2021

View reviewed changes

jasongrout mentioned this pull request Jan 20, 2021

Weekly Dev Meetings: Jan-Jul 2021 jupyterlab/frontends-team-compass#117

Closed

echarles mentioned this pull request Jan 21, 2021

Better interaction with the user in case of Trimmed Output #9652

Open

Set default value for maxNumberOutputs to 50

d9b9fb2

goanpeca merged commit 2aafad4 into jupyterlab:2.3.x Jan 21, 2021

mlucool mentioned this pull request Jun 11, 2021

Perf: Add virtual Notebook for delayed cell rendering #10131

Merged

github-actions bot added the status:resolved-locked Closed issues are locked after 30 days inactivity. Please open a new issue for related discussion. label Oct 6, 2021

github-actions bot locked as resolved and limited conversation to collaborators Oct 6, 2021

Trim notebook large output for better performance #9594

Trim notebook large output for better performance #9594

Conversation

echarles commented Jan 11, 2021

References

Code changes

User-facing changes

Backwards-incompatible changes

jupyterlab-dev-mode bot commented Jan 11, 2021

echarles commented Jan 11, 2021

mlucool left a comment

Choose a reason for hiding this comment

mlucool Jan 11, 2021

Choose a reason for hiding this comment

echarles Jan 11, 2021

Choose a reason for hiding this comment

mlucool Jan 11, 2021

Choose a reason for hiding this comment

echarles Jan 11, 2021

Choose a reason for hiding this comment

echarles commented Jan 11, 2021

mlucool commented Jan 11, 2021

echarles commented Jan 11, 2021

echarles commented Jan 12, 2021 • edited Loading

mlucool commented Jan 12, 2021

echarles commented Jan 12, 2021

Carreau commented Jan 12, 2021

echarles commented Jan 13, 2021 • edited Loading

echarles commented Jan 13, 2021

ellisonbg commented Jan 13, 2021

echarles commented Jan 13, 2021 • edited Loading

echarles commented Jan 13, 2021

mlucool commented Jan 13, 2021

echarles commented Jan 14, 2021

mlucool commented Jan 14, 2021 • edited Loading

ellisonbg commented Jan 14, 2021 via email

goanpeca commented Jan 15, 2021

echarles commented Jan 15, 2021

echarles commented Jan 15, 2021

echarles commented Jan 19, 2021

goanpeca left a comment

Choose a reason for hiding this comment

echarles commented Jan 20, 2021 • edited Loading

ellisonbg commented Jan 20, 2021

echarles commented Jan 20, 2021

echarles commented Jan 21, 2021

goanpeca commented Jan 21, 2021

isabela-pf commented Jan 22, 2021

uribe-convers commented Apr 7, 2021

mlucool commented Apr 7, 2021

uribe-convers commented Apr 7, 2021

mlucool commented Apr 7, 2021

uribe-convers commented Apr 7, 2021

jasongrout commented Apr 7, 2021

uribe-convers commented Apr 8, 2021 • edited Loading

echarles commented Jan 12, 2021 •

edited

Loading

echarles commented Jan 13, 2021 •

edited

Loading

echarles commented Jan 13, 2021 •

edited

Loading

mlucool commented Jan 14, 2021 •

edited

Loading

echarles commented Jan 20, 2021 •

edited

Loading

uribe-convers commented Apr 8, 2021 •

edited

Loading