Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Launch user sessions in multiple cluster from a single hub #7

Open
4 tasks
choldgraf opened this issue Jan 12, 2022 · 4 comments
Open
4 tasks

Launch user sessions in multiple cluster from a single hub #7

choldgraf opened this issue Jan 12, 2022 · 4 comments

Comments

@choldgraf
Copy link
Member

choldgraf commented Jan 12, 2022

Description of problem and opportunity to address it

Problem description

When communities have datasets or resources that are spread across multiple cloud locations (across data centers, cloud providers, etc), they currently must deploy one JupyterHub per location to provide access to the cloud resources that are there.
This creates a few problems:

  • The hub configurations, user lists, etc are spread in multiple places, which creates unnecessary complexity to set up and operate
  • It means that all billing for the hub is tied to a single cloud account - whatever is paying for the hub's infrastructure
  • There is extra operational and set-up costs associated with running infrastructure on each of these providers

Proposed solution
We should make it possible for a single hub to launch interactive sessions in multiple cloud locations, not only on the location where a hub is running.

This would allow communities to have a single hub as a "launch pad" for other kinds of infrastructure that is out there. It would reduce the complexity of running multiple hubs at once, and is potentially a way for communities to divide up their interactive sessions across billing accounts.

Implementation guide and constraints

Tech implementation

One likely candidate to make this possible is to define a new JupyterHub Spawner that knows how to talk to other Kubernetes clusters, along with some kind of process that can live on those clusters and "listen" for requests to launch interactive sessions. Then the spawner would request a session on a remote cluster, and direct the person there.

Considerations

  • What to do about filesystems for daily use? It will confuse people if the location where they launch a session also changes the files available to them.
    • Could we treat one file system as the "source of truth" for them and encourage them to keep this one updated?
    • Could we facilitate interaction with external filesystems like GitHub so they don't rely on NFS on a cluster to store their stuff?

Driving test cases

@rabernat has need for a few hubs that are similar flavors of a Pangeo hub. These are attached to a few different pots of money. Rather than providing one hub per test case, we could use this as an opportunity to prototype a multi-cluster launcher that is described here.

Updates and ongoing work

  • We've got a first version of the multi-cluster spawner here: https://github.com/yuvipanda/jupyterhub-multicluster-kubespawner
  • Next step is to deploy this on a hub setup. We're hoping to use the next set of hubs for @rabernat for this.
  • Currently waiting on cloud credits from Google that will power those hubs
  • We also have an offer of credits from the Azure Planetary Computer team. We should decide if we want to use them.
@damianavila
Copy link

The filesystem issue is key and probably not easy to solve.
Wondering if there is some existing abstraction as well that could interact with underlying NFS layers from the different cloud providers... in that scenario, we would have a multispawner to select the node where you want to spawn and a multistorage to select where to persist the stuff you are working on.
Alternatively, we could push on previously discussed @rabernat's idea about riding without a "filesystem" and change people's filesystem-based mindset on the way (which would be the most difficult thing, IMHO).

@yuvipanda
Copy link
Member

Unfortunately cross-DC NFS is not really viable for reliability, performance and security reasons :(

I think step 1 would likely just involve a per-cluster home directory. We could augment it with a shared directory that is sync'd across all the clouds, via either FUSE or something like https://rclone.org/.

I've made a release of the spawner already at https://github.com/yuvipanda/jupyterhub-multicluster-kubespawner, and am waiting for cloud credits to land up before I can do a deployment.

@consideRatio
Copy link
Member

2i2c team sprint meeting notes:

  • Colombia "LEAP" project credits has arrived to an GCP account
  • Yuvi could start working on this next week: to setup a GCP based cluster

@choldgraf
Copy link
Member Author

Update: pinning this one for a bit

@yuvipanda and I just had a conversation about this work, and we agreed that it'd be best to prioritize some other development efforts first before we complete this one, especially since the LEAP hub needed to be deployed quickly enough that we just did it "the old fashioned way".

We're going to focus on these two pieces

And will re-visit this one at a later date.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Status: Ready to work
Development

No branches or pull requests

4 participants