Skip to content
This repository has been archived by the owner on Jan 24, 2018. It is now read-only.

Track things needed to use JupyterHub for tmpnb #255

Open
1 of 3 tasks
yuvipanda opened this issue Dec 30, 2016 · 11 comments
Open
1 of 3 tasks

Track things needed to use JupyterHub for tmpnb #255

yuvipanda opened this issue Dec 30, 2016 · 11 comments

Comments

@yuvipanda
Copy link

yuvipanda commented Dec 30, 2016

This ticket tracks the features required in JupyterHub and friends before we can replicate all the features of tmpnb using a particular configuration of JupyterHub.

  • Login without requiring any authentication - TmpAuthenticator
  • Super quick (<3s?) response time between hitting URL and seeing notebook interface
  • Culler that keeps user list clean of dead users / inactive servers
@yuvipanda yuvipanda changed the title Move to using JupyterHub Track things needed to use JupyterHub for tmpnb Dec 30, 2016
@yuvipanda
Copy link
Author

https://github.com/yuvipanda/jupyterhub-tmpauthenticator will take care of the authenticator.

Now, we need to test if there are any ways we can hit our performance deadline easily without pooling. If we can't, then we need to implement pooling! The best way to do that would be as a mixin that can be used with any spawner. Modifications to jupyter-singleuser will also be required to support pooling.

@yuvipanda
Copy link
Author

@rgbkrk I'm specifically interested in finding out what you think about the 'if we can get notebooks to start in under x seconds (for some value of x), we do not need pooling' line of thinking. How low would 'x' have to be?

@rgbkrk
Copy link
Member

rgbkrk commented Dec 30, 2016

I'm specifically interested in finding out what you think about the 'if we can get notebooks to start in under x seconds (for some value of x), we do not need pooling' line of thinking. How low would 'x' have to be?

500ms

It's not starting a single server that's the problem - it's the stampeding herd. Even the current ~3s to launch a single server isn't too bad -- it adds up once more users are piling in. Once you have 10 users show up around the same time, it tends to be 30s for the tenth user because of how Docker operates. Couple that with the typical spikes in load we had on the nature demo and now have on try.jupyter.org and you're talking several minute waits for something that should be a fast demo.

@yuvipanda
Copy link
Author

@rgbkrk Ok, so it's ultimately 'X percentile start time under Ys, for upto Z concurrent servers per second' - so it is a measure of latency and throughput rather than just latency. X could be 90-95, Y could be 1-3s, Z could be anywhere between a hundred to a few thousand. Would those numbers and definitions be acceptable for you? If not which numbers would be, or is the definition not what you had in mind?

@rgbkrk
Copy link
Member

rgbkrk commented Dec 30, 2016

Yeah this is a great definition, thanks @yuvipanda

@minrk
Copy link
Member

minrk commented Jan 2, 2017

Modifications to jupyter-singleuser will also be required to support pooling.

More precisely, modifications to the container entrypoint would be needed, not necessarily jupyterhub-singleuser. This could be modifications to jupyterhub-singleuser, or it could be a different entrypoint that is in a pre-flight stage until removed from the pool and assigned to a specific user, at which point it launches jupyterhub-singleuser.

In general, the pool entrypoint will:

  1. allocate resources
  2. perform common setup

and Spawner.start will:

  1. perform user-specific setup (e.g. setting uid, API token, mounting user volumes, etc.)
  2. finish launching single-user server

Only some cases will be able to get all the way to launching the notebook server in the preflight stage, where no user-specific action (e.g. uid, starting in a not-yet-mounted working dir) is needed. tmpnb is one such case, though.

It's not clear to me how a Mixin approach would work, since there is so much that would be specific to any given Spawner implementation. What are you envisioning there?

@yuvipanda
Copy link
Author

yuvipanda commented Jan 3, 2017 via email

@rgbkrk
Copy link
Member

rgbkrk commented Jan 3, 2017

At the very least, it's not too bad if the first pooling spawner was named TmpnbPoolingSpawner.

@yuvipanda
Copy link
Author

I have deployed a tmpnb-style JupyterHub (URL private, ask me for it!), and it starts up pretty quick even with no pooling. @rgbkrk approves so far. Next step is to train a 100 simulated users at it and track start times.

@willingc
Copy link
Member

willingc commented Mar 6, 2017

@yuvipanda Nice!

@akhmerov
Copy link
Member

akhmerov commented May 20, 2017

Perhaps a relevant aspect is allowing jupyterhub to limit the number of concurrent servers. Requesting infinitely many servers is after all the easiest way to DOS any tmpnb setup.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants