-
Notifications
You must be signed in to change notification settings - Fork 122
Track things needed to use JupyterHub for tmpnb #255
Comments
https://github.com/yuvipanda/jupyterhub-tmpauthenticator will take care of the authenticator. Now, we need to test if there are any ways we can hit our performance deadline easily without pooling. If we can't, then we need to implement pooling! The best way to do that would be as a mixin that can be used with any spawner. Modifications to jupyter-singleuser will also be required to support pooling. |
@rgbkrk I'm specifically interested in finding out what you think about the 'if we can get notebooks to start in under x seconds (for some value of x), we do not need pooling' line of thinking. How low would 'x' have to be? |
500ms It's not starting a single server that's the problem - it's the stampeding herd. Even the current ~3s to launch a single server isn't too bad -- it adds up once more users are piling in. Once you have 10 users show up around the same time, it tends to be 30s for the tenth user because of how Docker operates. Couple that with the typical spikes in load we had on the nature demo and now have on try.jupyter.org and you're talking several minute waits for something that should be a fast demo. |
@rgbkrk Ok, so it's ultimately 'X percentile start time under Ys, for upto Z concurrent servers per second' - so it is a measure of latency and throughput rather than just latency. X could be 90-95, Y could be 1-3s, Z could be anywhere between a hundred to a few thousand. Would those numbers and definitions be acceptable for you? If not which numbers would be, or is the definition not what you had in mind? |
Yeah this is a great definition, thanks @yuvipanda |
More precisely, modifications to the container entrypoint would be needed, not necessarily In general, the pool entrypoint will:
and
Only some cases will be able to get all the way to launching the notebook server in the preflight stage, where no user-specific action (e.g. uid, starting in a not-yet-mounted working dir) is needed. tmpnb is one such case, though. It's not clear to me how a Mixin approach would work, since there is so much that would be specific to any given Spawner implementation. What are you envisioning there? |
I'm going to first make sure I can measure the three variables I want, and
then attempt to do this *without* pooling. If it doesn't work, then I'll
look to pooling and see what I can do with pure mixins (ideally!), if not
then subclasses for a few spawners. I haven't gotten that far yet - I think
step 1 for me is to establish a way to reliably measure these things.
…On Mon, Jan 2, 2017 at 4:34 AM, Min RK ***@***.***> wrote:
Modifications to jupyter-singleuser will also be required to support
pooling.
More precisely, modifications to the container entrypoint would be needed,
not necessarily jupyterhub-singleuser. This could be modifications to
jupyterhub-singleuser, or it could be a different entrypoint that is in a
pre-flight stage until removed from the pool and assigned to a specific
user, at which point it launches jupyterhub-singleuser.
In general, the pool entrypoint will:
1. allocate resources
2. perform common setup
and Spawner.start will:
1. perform user-specific setup (e.g. setting uid, API token, mounting
user volumes, etc.)
2. finish launching single-user server
Only some cases will be able to get all the way to launching the notebook
server in the preflight stage, where no user-specific action (e.g. uid,
starting in a not-yet-mounted working dir) is needed. tmpnb is one such
case, though.
It's not clear to me how a Mixin approach would work, since there is so
much that would be specific to any given Spawner implementation. What are
you envisioning there?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#255 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAB23o75IUAa52socqwHGIIA3ruEh2_Bks5rOO7XgaJpZM4LYEC7>
.
--
Yuvi Panda T
http://yuvi.in/blog
|
At the very least, it's not too bad if the first pooling spawner was named |
I have deployed a tmpnb-style JupyterHub (URL private, ask me for it!), and it starts up pretty quick even with no pooling. @rgbkrk approves so far. Next step is to train a 100 simulated users at it and track start times. |
@yuvipanda Nice! |
Perhaps a relevant aspect is allowing jupyterhub to limit the number of concurrent servers. Requesting infinitely many servers is after all the easiest way to DOS any tmpnb setup. |
This ticket tracks the features required in JupyterHub and friends before we can replicate all the features of tmpnb using a particular configuration of JupyterHub.
The text was updated successfully, but these errors were encountered: