Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improving page load time #380

Closed
mlucool opened this issue Feb 28, 2023 · 4 comments
Closed

Improving page load time #380

mlucool opened this issue Feb 28, 2023 · 4 comments

Comments

@mlucool
Copy link
Contributor

mlucool commented Feb 28, 2023

Problem

As I was debugging jupyter-server/jupyter_server#312, I found that certain lab APIs used for pageload are called many times without caching. This leads to a large increase in pageload time in at least some circumstances, namely if the FS is slow, this doubles my page load time for a bare bones jupyterlab from ~3s to ~6s.

This can be traced to get_settings. Using %%time:

%%time
x = get_settings(app_settings_dir=app_settings_dir, labextensions_path=labextensions_path, overrides= {}, schema_name='', schemas_dir=schemas_dir, settings_dir=settings_dir);

With a settings_dir that is fast (SSD), I see:

CPU times: user 40.2 ms, sys: 4.28 ms, total: 44.5 ms
Wall time: 69.3 ms

With my setting_dir on a slower file system (NFS):

CPU times: user 323 ms, sys: 9.66 ms, total: 333 ms
Wall time: 401 ms

Why this matters is this API is called 17 times to load the page leading to up to (401-70)*17=5.6s more time to load the page (this is a max, in practice some settings are faster to load than schema_name='')
image

Digging deeper via pyinsturment, the cause for me is mostly in _get_user_settings:
image
Surprisingly this is blaming loads, which does not match my mental model, but json5.loads is not known to be fast (and often it is passing in a schema that never changes as its the default and not from userspace).

The fact that there are 17 calls adds a bit more pain as I think there is some HTTP1.1 head of line blocking and/or tornado being single threaded blocking going on which forces this to be serialized.

Creating reproducers here is a bit hard, but adding this here seems to simulate it ok-ish (I'm not sure it's realistic):

    import time
    time.sleep(0.1)

    if os.path.exists(path):
        stat = os.stat(path)

Proposed Solution

There are a number of ways to fix this. To squeeze as much performance as we can from this, I propose we:

  1. Make an API that Jupyterlab and query all data it needs in one shot
  2. Store all this data in memory in a cache. We can use a filewatcher to know what it is updated from outside jupyterlab_server. This is a small number of very infrequently changed files.

With this change I suspect in my fast setup we'd save ~1s of the 3s page load time and 5s on my slow setup, which is pretty substantial!

CC @afshin

@brichet
Copy link
Contributor

brichet commented Mar 13, 2023

Thanks @mlucool for reporting this.

Jupyterlab already has a settingsRegistry that stores all loaded settings.

If I'm not mistaken, the way it works currently is:

  1. gets all the settings from the server (the first request at top of the list)
  2. keeps in the registry only those that don't have jupyter.lab.transform=true (or have their transform() function already loaded, which I don't think can happen at this point)
  3. every plugin with a transform() function will request its settings again because it has not been registered
  4. when the shell is restored, the settings are requested again (the last settings?... in the list)

Also, some plugins request a hard reload on events (e.g. shortcuts-extension:shortcuts, lsp-extension:plugin, notebook-extension:panel, docmanager-extension:plugin)


I think the [2, 3] could be fixed by storing the settings requested from the server but not loaded (waiting for the transform() function).

Regarding [4]:
https://github.com/jupyterlab/jupyterlab/blob/400d98fd12863c3ed140a7a4f423b5ec9378a563/packages/apputils-extension/src/settingsplugin.ts#L40-L41

  • I don't know in what situation we could have missed some settings in the first request, and so if this request is still necessary
  • it seems not optimized anyway. All the settings are requested again only to get theirs ID. If one of the plugin's ID is not in the registry, its settings will be re-requested to the server.

@fcollonval
Copy link
Member

@brichet I think this is resolved by the two PR you made, isn't it?

@brichet
Copy link
Contributor

brichet commented Apr 25, 2023

It's supposed too.
@mlucool if you want to do more tests, it has been released in jupyterlab=v4.0.0b1 and jupyterlab_server=2.22.0

@fcollonval
Copy link
Member

Closing as resolved

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants