Skip to content
This repository has been archived by the owner on Aug 30, 2022. It is now read-only.

Cleanup kernel for dashboards run within Jupyter Notebook #117

Closed
parente opened this issue Nov 7, 2015 · 26 comments
Closed

Cleanup kernel for dashboards run within Jupyter Notebook #117

parente opened this issue Nov 7, 2015 · 26 comments

Comments

@parente
Copy link
Member

parente commented Nov 7, 2015

Deploying completely standalone dashboards can be overweight for situations in which the notebook author wants to quickly share a series of "micro-dashboards" for a couple of trusted users (e.g., teammates) to play with. It would be nice to hand users a URL to a notebook within Jupyter and have it appear as a dashboard. We already do this somewhat with the ?dashboard query arg which hides everything except the Jupyter menubar. A few more actions are required to get a functioning dashboard for every user that hits the notebook.

  • Hide all chrome, including the menubar.
  • Spawn a completely new kernel to stay independent from other users. (Is this already supported in the notebook?)
  • Run all cells. (This has security implications.)
@parente
Copy link
Member Author

parente commented Nov 7, 2015

Another approach would be to piggyback on the "Deploy as > Local dashboard" which was originally intended for notebook authors to see what their dashboards would look like when deployed. They address all three bullets in the issue description. However, they "leak" kernels. A visit to the dashboard spawns a kernel but nothing ever cleans the kernel up when the user navigates away.

Perhaps cleanup is fixable in the dashboard extension? Periodically check which kernels have connected websocket clients and cleanup those that have none?

@dalogsdon
Copy link
Contributor

Cleanup is an interesting issue. @jhpedemonte and I discussed this a little bit as we were investigating the Thebe alternative in #105. For forever-running processes, such as Spark Streaming, when do we clean up? If the main usage of the dashboard "preview" is to see the layout with less focus on actual content, then navigation away from the page could be ok.

For the deployed case, we discussed manually shutting down standalone dashboard kernels as you can do in the Notebook already. That could be an alternative in this case too.

@parente
Copy link
Member Author

parente commented Nov 9, 2015

Cleanup has been something we've punted to the container level with tmpnb. When the user has not had interaction with a kernel for a while, the container hosting it goes away.

I think forever-running processes are actually misplaced within the same kernel / container as the dashboard. I know we're doing that in our demos, but I want to believe we only do it for educational purposes. If we wanted to deploy a dashboard that we wanted to scale even to a dozen users, I wouldn't want each one doing its own streaming. Instead, I'd expect the stream to be in its own process with an API and the dashboard notebooks just to sample it.

Today, if I want to build that standalone API, I've got to step outside of the notebook environment. But I think this where kernel gateway and notebooks that becomes micro services can get interesting.

@jhpedemonte
Copy link
Collaborator

That makes sense. Our (potential) future work then would be to make those steps as easy as possible for a dev.

@dalogsdon
Copy link
Contributor

Course of action based on discussion above:

  1. Expose "Deploy as > Local dashboard" page as some URL
  2. Start new kernel
  3. Run all code
  4. Kill kernel when page or websocket connection is lost

@parente
Copy link
Member Author

parente commented Nov 11, 2015

Expose "Deploy as > Local dashboard" page as some URL

Not sure we need this one. If user has ability to log into the Jupyter server, they can just hit the URL of the locally deployed dashboard. I'm OK with the notebook author still controlling when that app is deployed.

Kill kernel when page or websocket connection is lost

Immediately when lost is too quick. WS disconnects happen all the time. tmpnb works on a reaping cycle after a period of inactivty.

@dalogsdon
Copy link
Contributor

So then this issue is just to clean up kernels spawned by locally deployed dashboards?

@parente
Copy link
Member Author

parente commented Nov 12, 2015

Yes. Either:

  1. automatically, when there are no active websockets associated with a kernel for some configurable period of time
  2. manually, via the Running tab in the Jupyter notebook UI

I believe the reason the kernel doesn't appear in the UI for the second approach is because there is no associated session resource. Thebe only requests the kernel, not a session to go with it (and rightly so because there is no associated notebook with the kernel).

@jhpedemonte
Copy link
Collaborator

Regarding the Running tab in notebook UI: how do you feel about adding a 'Dashboards' section there, to go with 'Notebook' and 'Terminals'? We might be able to do it as part of our extension.

@jhpedemonte
Copy link
Collaborator

The refresh in the Running tab isn't extensible at all, not even by hacking. We could just add our own 'Dashboards' section and manage it ourselves. Add our own timer and event handler on the refresh button. But we may need to add an /api/dashboards server extension point, to query the running Dashboard kernels.

@parente
Copy link
Member Author

parente commented Nov 13, 2015

Definitely don't want to add new UI to the notebook frontpage (also called dashboard just to be confusing) given the roadmap of where things are headed with phosphor. That just introduces more things that need to be ported.

I was more wondering if, once we're based on jupyter-js-services, if we could make an /api/session request and pass along the filename of the dashboard HTML file plus a little UUID to allow for multiple instances, instead of a proper notebook to get the associated kernel tracked by the UI automatically.

@parente parente mentioned this issue Nov 16, 2015
@parente parente changed the title Additional options for immediate dashboard view Cleanup kernel for dashboards run within Jupyter Notebook Nov 16, 2015
jhpedemonte added a commit to jhpedemonte/dashboards that referenced this issue Nov 16, 2015
When deploying a dashboard, use the Notebook REST API to create a new Notebook Session. This allows the dashboard kernel to show in the list of currently running notebooks.

(ref jupyter#117)

:chicken:

(c) Copyright IBM Corp. 2015
@dalogsdon
Copy link
Contributor

To begin, we create a new notebook session so that the dashboard kernel shows in the Running Notebooks list.

image

@parente We have a fallback to use kernel API directly in case the server does not have a Sessions API. We tested against tmpnb.org, but it has the sessions API since it spawns a notebook server. Is there any scenario that would only have the kernel API?

@parente
Copy link
Member Author

parente commented Nov 17, 2015

Nice!

Kernel gateway doesn't have a session API exposed, only kernels, because it's meant to be headless. I would claim notebook within tmpnb.org is also meant to be headless since tmpnb does the cleanup at the container level. How hard would it be to only create the session when not in tmpnb mode (or whatever the equivalent setting is in the new dashboard branch)?

Also, since the goal here would be to have multiple kernels spawned for a dashboard for independent users, can you spawn multiple sessions for the same dashboard file with the same name? By tacking on a ID?

@jhpedemonte
Copy link
Collaborator

How hard would it be to only create the session when not in tmpnb mode

Not hard. There's already a tmpnb_mode boolean we can use.

can you spawn multiple sessions for the same dashboard file with the same name? By tacking on a ID?

We can. One issue, though, is that the Notebook UI makes a link out of the path passed to the session API. We take advantage of that and use the full dashboard path so clicking on the link in the Running tab will correctly open the dashboard. If we add an ID, it would have to be compatible with this feature, for now. Maybe something like /files/local_dashboards/foobar/index.html?kernelid=1?

@jhpedemonte
Copy link
Collaborator

Seems to work:

screen shot 2015-11-17 at 9 19 19 am

@parente
Copy link
Member Author

parente commented Nov 17, 2015

Works for me, but maybe a diff arg name would be better because it's not really the kernel ID. dashboardid? instanceid?

@jhpedemonte
Copy link
Collaborator

Arg, nevermind. It seems that clicking on those links opens up a URL that looks like this: .../index.html%3Fkernelid%3D1. Notebook is not properly unescaping.

@parente
Copy link
Member Author

parente commented Nov 17, 2015

Can you have repeat entries? If so, then forget the ID. I assumed (probably improperly) that if you had repeats of the same name, only one would show. If they all show but have the same name, that's fine. They're all indistinguishable at a "who's using it?" level anyway.

@dalogsdon
Copy link
Contributor

The issue with the URL encoding is that they split the URL on / and encode the components, but they fail to handle the ? and &, so the query gets encoded.

Do we care about the actual links in this case since we only did this for shutdown purposes? Currently they will open that deployed dashboard and start a new kernel. Since duplicates are allowed in the list, this makes the only purpose of the kernelid portion to link to an existing kernel, which we don't care about.

@parente
Copy link
Member Author

parente commented Nov 17, 2015

Ah, right. Every click starts a new kernel, doesn't return you to the existing one. So, yea, in non-tmpnb mode (or whatever the new equiv is), start a session, just dupe the bare filename, every click starts a new one in notebook UI, don't bother with an ID, and the prime use we tell everyone is for the "shutdown" button.

Hacky, but that's the point. We're turning existing notebook server into a local dashboard host. That's a fine option ahead of full-on deployment at scale.

@dalogsdon
Copy link
Contributor

Sorry, misread "Can you have repeat entries?" as "You can have repeat entries". Looks like duplicates are not allowed in the list. If you attempt to POST api/sessions with a duplicate path, it returns the existing session.

We will need to use the id. We could open an issue on notebook to encode query params properly in an attempt to get the id working. Since we cannot extend the "dashboard" (tree) page, it could be tricky to hack around the issue. We could have the links not work and only use the Shutdown button for the time being and wait for the issue to be fixed, or we could try to come up with another solution.

@dalogsdon
Copy link
Contributor

Also, if we do get the links working, we could make use of the generated IDs to connect to the correct kernel if desired. Currently we store the path in local storage, so only one kernel would work on one machine (browser).

@parente
Copy link
Member Author

parente commented Nov 17, 2015

We could open an issue on notebook to encode query params properly in an attempt to get the id working.

Might be a hard sell since the main use case is notebooks and they don't have query params. But a PR with the simple fix might find its way in, in the interest of future proofing.

We could have the links not work and only use the Shutdown button for the time being and wait for the issue to be fixed.

I think this is fine for now. Especially since this PR isn't going anywhere until the thebe replacement work lands. Rather move onto other enhancements / issues.

dalogsdon added a commit to jhpedemonte/dashboards that referenced this issue Nov 17, 2015
If in tmpnb mode, create a new kernel rather than using session API.

ref jupyter#117

(c) Copyright IBM Corp. 2015
jhpedemonte added a commit to jhpedemonte/dashboards that referenced this issue Nov 17, 2015
When deploying a dashboard, use the Notebook REST API to create a new Notebook Session. This allows the dashboard kernel to show in the list of currently running notebooks.

(ref jupyter#117)

:chicken:

(c) Copyright IBM Corp. 2015
dalogsdon added a commit to jhpedemonte/dashboards that referenced this issue Nov 17, 2015
If in tmpnb mode, create a new kernel rather than using session API.

ref jupyter#117

(c) Copyright IBM Corp. 2015
@jhpedemonte
Copy link
Collaborator

Now creating a unique session for each user (stored in localStorage).

@parente
Copy link
Member Author

parente commented Nov 19, 2015

Unassigning until unblocked in #105.

@parente
Copy link
Member Author

parente commented Dec 8, 2015

This work is now part of jupyter/dashboards_bundlers#3. The associated PR #127 can remain open here until we no longer need it.

@parente parente closed this as completed Dec 8, 2015
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants