-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up node spin-up with placeholders #643
Comments
(Adding some additional info to this as I believe it doesn't need it's own new issue. Sorry it's a bit unpolished.) EDIT FROM CHRIS: added to the top comment |
Thanks @GeorgianaElena for providing this helpful context! I've taken your comment and incorporated it into the top comment, so that we can keep all of the information in one place. I hope that's OK! I seem to remember a conversation with @yuvipanda and @consideRatio where they said that the user placeholders were not working as well as they thought they were, but I don't know if that was unique to one deployment, or one cloud provider, etc. |
Me and Yuvi have deliberated a lot about this and I'd want to avoid rehashing technical details motivating this suggestion within this issue, but the summarized suggestions are:
|
IIRC there were also discussions about node placeholders instead of user placeholders. |
Update:UTotoronto folks opened a ticket about cluster scale-up duration that I believe caused them spawn timeouts https://2i2c.freshdesk.com/a/tickets/79. According to the report, this wasn't an isolated event. Since the utoronto hub is pretty used by users, I believe this will continue to happen. So, I believe we should prioritize this task. |
I added a note about it. I think we have a quite distinct fingerprint of the issue:
I responded about this in the ticket. |
Thanks a lot @consideRatio! Since it's a different type of beast and not at all what I thought at first, I'll open a separate issue then to discuss about it more. |
Description
We should speed up the time it takes to spin-up our nodes by using placeholders.
Speeding up the nodes would make our hubs perform better whenever there were spikes in activity, or when a user triggers a scale-up event in general. It would help our hubs feel speedier.
Guide for implementation
➡️ Great info about user placeholders in the z2jh docs here and discourse
Performance
To understand whether this would make a bit impact on performance, we could try analyzing the log files from the old U.Toronto hub and compare it with the new one.
Implementing across cloud providers
It's unclear whether this would behave the same way across the major cloud providers. Here's what we know about each:
GKE: According to this note in the z2jh docs this should work on gke at least.
Azure: The original UToronto cluster runs on Azure, and had been using user placeholders ➡️ utoronto-2i2c/jupyterhub-deploy@1c7fa04 (that's not the case anymore since we've migrated it to the pilot-hubs infra as part of #638)
AWS: Not sure if/how this works
Updates and tasks
Updates
No response
The text was updated successfully, but these errors were encountered: