Launch user sessions in multiple cluster from a single hub #7

choldgraf · 2022-01-12T18:54:16Z

Description of problem and opportunity to address it

Problem description

When communities have datasets or resources that are spread across multiple cloud locations (across data centers, cloud providers, etc), they currently must deploy one JupyterHub per location to provide access to the cloud resources that are there.
This creates a few problems:

The hub configurations, user lists, etc are spread in multiple places, which creates unnecessary complexity to set up and operate
It means that all billing for the hub is tied to a single cloud account - whatever is paying for the hub's infrastructure
There is extra operational and set-up costs associated with running infrastructure on each of these providers

Proposed solution
We should make it possible for a single hub to launch interactive sessions in multiple cloud locations, not only on the location where a hub is running.

This would allow communities to have a single hub as a "launch pad" for other kinds of infrastructure that is out there. It would reduce the complexity of running multiple hubs at once, and is potentially a way for communities to divide up their interactive sessions across billing accounts.

Implementation guide and constraints

Tech implementation

One likely candidate to make this possible is to define a new JupyterHub Spawner that knows how to talk to other Kubernetes clusters, along with some kind of process that can live on those clusters and "listen" for requests to launch interactive sessions. Then the spawner would request a session on a remote cluster, and direct the person there.

Considerations

What to do about filesystems for daily use? It will confuse people if the location where they launch a session also changes the files available to them.
- Could we treat one file system as the "source of truth" for them and encourage them to keep this one updated?
- Could we facilitate interaction with external filesystems like GitHub so they don't rely on NFS on a cluster to store their stuff?

Driving test cases

@rabernat has need for a few hubs that are similar flavors of a Pangeo hub. These are attached to a few different pots of money. Rather than providing one hub per test case, we could use this as an opportunity to prototype a multi-cluster launcher that is described here.

Updates and ongoing work

We've got a first version of the multi-cluster spawner here: https://github.com/yuvipanda/jupyterhub-multicluster-kubespawner
Next step is to deploy this on a hub setup. We're hoping to use the next set of hubs for @rabernat for this.
Currently waiting on cloud credits from Google that will power those hubs
We also have an offer of credits from the Azure Planetary Computer team. We should decide if we want to use them.

damianavila · 2022-01-18T16:02:59Z

The filesystem issue is key and probably not easy to solve.
Wondering if there is some existing abstraction as well that could interact with underlying NFS layers from the different cloud providers... in that scenario, we would have a multispawner to select the node where you want to spawn and a multistorage to select where to persist the stuff you are working on.
Alternatively, we could push on previously discussed @rabernat's idea about riding without a "filesystem" and change people's filesystem-based mindset on the way (which would be the most difficult thing, IMHO).

yuvipanda · 2022-01-18T16:29:54Z

Unfortunately cross-DC NFS is not really viable for reliability, performance and security reasons :(

I think step 1 would likely just involve a per-cluster home directory. We could augment it with a shared directory that is sync'd across all the clouds, via either FUSE or something like https://rclone.org/.

I've made a release of the spawner already at https://github.com/yuvipanda/jupyterhub-multicluster-kubespawner, and am waiting for cloud credits to land up before I can do a deployment.

consideRatio · 2022-03-02T16:11:42Z

2i2c team sprint meeting notes:

Colombia "LEAP" project credits has arrived to an GCP account
Yuvi could start working on this next week: to setup a GCP based cluster

choldgraf · 2022-03-11T23:46:58Z

Update: pinning this one for a bit

@yuvipanda and I just had a conversation about this work, and we agreed that it'd be best to prioritize some other development efforts first before we complete this one, especially since the LEAP hub needed to be deployed quickly enough that we just did it "the old fashioned way".

We're going to focus on these two pieces

Cloud usage monitoring and alerting infrastructure and process infrastructure#328
Formula for calculating hub-specific cloud costs on a shared cluster infrastructure#730

And will re-visit this one at a later date.

choldgraf assigned yuvipanda Jan 12, 2022

choldgraf unassigned yuvipanda Mar 12, 2022

choldgraf mentioned this issue May 30, 2022

Define a multi-hub service offering 2i2c-org/team-compass#429

Open

yuvipanda mentioned this issue Mar 7, 2024

Move all 'product feature ideas' away from this repo's issues into ProductBoard 2i2c-org/infrastructure#3789

Open

yuvipanda transferred this issue from 2i2c-org/infrastructure Apr 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Launch user sessions in multiple cluster from a single hub #7

Launch user sessions in multiple cluster from a single hub #7

choldgraf commented Jan 12, 2022 •

edited

Loading

damianavila commented Jan 18, 2022

yuvipanda commented Jan 18, 2022

consideRatio commented Mar 2, 2022

choldgraf commented Mar 11, 2022

Launch user sessions in multiple cluster from a single hub #7

Launch user sessions in multiple cluster from a single hub #7

Comments

choldgraf commented Jan 12, 2022 • edited Loading

Description of problem and opportunity to address it

Implementation guide and constraints

Updates and ongoing work

damianavila commented Jan 18, 2022

yuvipanda commented Jan 18, 2022

consideRatio commented Mar 2, 2022

choldgraf commented Mar 11, 2022

Update: pinning this one for a bit

choldgraf commented Jan 12, 2022 •

edited

Loading