-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deploy and operate a BinderHub for Pangeo #919
Comments
cc @rabernat and @sgibson91 - is there anything major here that I am missing? I believe that @sgibson91 is working with @consideRatio on #857 right now, which is laying the foundation to letting us deploy BinderHubs from the Pangeo cluster CI/CD. |
@choldgraf I think this is a really nice outline of the work that needs to be done to get us into a position where we are ready to deploy a BinderHub. I'm happy with how this is and add to the list of tasks as an when they arise |
I'm also linking this issue as a future reminder to myself to ask about container registries for Pangeo Binder, but that is a ways down the road yet. |
Here's a left-field suggestion: What if we don't implement a log-in system, because we never host a remote server for the jupyter notebook -- and the experience stays mostly the same? Specifically, could we run the notebooks in the browser in a Wasm python environment via JupyterLite? Here, the demo notebooks could be hosted on a static webpage. |
For context: The use case I had in mind was for distributing low-friction demos that don't require a log-in. This is related to the "whitelist vs blacklist" discussion around log ins in today's meeting. |
@alxmrs if we could define a subset of workflows and/or datasets that were possible to use in JupyterLite, this would definitely be a faster way to onboard people into the Pangeo community. I think the trick will be figuring out the "hand-off" between JupyterLite and a situation where you need a fully-loaded environment, so that it doesn't confuse or frustrate people. But at the very least, it shouldn't be too hard to try a demo out. For example, here's the repository that serves the JupyterLite instance linked from try.jupyter.org: https://github.com/jupyter/try-jupyter That shouldn't be too hard to replicate for Pangeo's use-case. I bet you could curate a few notebooks that showed off basic functionality to get people started (but it probably wouldn't work for the more advanced things like Dask Gateway, Zarr, etc). |
I'd like to start working on this in the next couple of weeks. @yuvipanda are there any strategy discussions we need to have? Questions I have:
|
I presume this project might need a dedicated project board to collect all the associated issues. |
I am very happy to see this moving forward! 🤩
This will be paid from the same grant that is covering the current GCP Pangeo Hub (EarthCube Pangeo Forge award). So they will go to the same billing account. If it is easier to put everything in one cluster, that's fine with me. From the "hub owner" perspective, it would still be useful to be able to segregate costs for the binder. |
So sorry for the delay, @sgibson91.
|
Regarding 2i2c paying for cloud. I think that this would require a change to the contract that 2i2c has with Pangeo (which currently only covers personnel costs). Can we do two things:
|
While we wait for @rabernat to update us on the contracting question, I believe the below issue is at least actionable. I will open an issue to track it.
EDIT: Issue is here #1280 |
Let's get the relationships straight. Pangeo has no contract with anyone. Columbia has a contract with 2i2c. ACAICT there are in fact 3 separate contracts now supporting Pangeo-related things (NSF Earthcube @ Columbia, LEAP @ Columbia, M2LInES @ NYU).
This will be complicated to set up. We have only established such a contract already with NYU, not Columbia. It will require considerable administrative overhead. I would estimate 2 months to revise the existing contract. And there is always the possibility that Columbia may reject the proposal that 2i2c will bill us directly for cloud usage. Because the cloud costs for this project are exempt from ICR, it is essential that the cloud bill be segregated from the "services" bill. All that said, I'm fine with trying. |
@rabernat thanks for offering to try! I think it'll definitely simplify setup and longer term operations. |
Thanks @rabernat for sharpening my language - I agree that we need to be clear what organizations are on each side of contracts! For this case, it sounds like:
So, how about I ask CS&S to investigate with the Columbia admin whether this would be complicated to set up. If it seems like it will be massively complicated, then we stick with the status quo and kick the can down the road. If it will not be complicated (say, will take ~ 1 month to set up) then we give this a shot. If we do set this up, we'd also need the following constraints:
|
I think it would really be easiest if we got two separate invoices. Otherwise our admins will have to split the charge manually between two different accounts. |
Hey all - I fleshed out some of the issues around the administrative / cloud payment challenges here, and added that to our list at the top. See some more conversation in that here: |
We have a test Binder that is up and running on our pilot-hubs cluster! 🎉 All the infrastructure is there to make this repeatable, including auto-deployment through CI/CD. So the only thing blocking progress on reinstating the Pangeo Binder on GCP is the credits situation with Columbia. |
Wanted to note that I heard recently from @cgentemann that there are several communities within the NASA ecosystem that would also benefit from having BinderHubs for their workshops and events. This isn't quite the Pangeo community, but it's a useful datapoint to know where people would find value in these Binder services. The only catch is that all of their data lives in AWS, not in GCP. I don't know how difficult it would be to adapt our infrastructure to AWS as well but just wanted to note this. |
At the minute, it's very hacked together to specifically work with Google Artifact Registry for image storage. We should absolutely fix that, but I actually think we could use an eks cluster with a GAR since the cluster and registry are connected through a service account that is provided as a username/password in the hub config, rather than any k8s-level connection. It shouldn't be too much effort to get the BinderHub working on AWS, BinderHub is already cloud-agnostic, it's more about picking the right templates/config from basehub/daskhub to get the features the community need/want. Generally, this BinderHub is sort of hacked together because we don't know how #1382 will pan out and it didn't feel beneficial to get a full solution for BinderHub up-and-running when it could all be torn down and refactored in the not-too-distant future. |
That's helpful context! So it sounds like:
|
Yeah, BUT I also don't want us to start running a whole bunch of hacked together BinderHubs, as that is just loading us up for a giant migration effort when #1382 takes shape/lands. We should maybe cap ourselves at 2-3 (or some other reasonable amount)? |
FWIW, we have another zombie binder running on AWS, https://hub.aws-uswest2-binder.pangeo.io/. It is being run by a skeleton crew of @scottyhq. As long as we are looking at AWS, I would be very happy to see a path towards moving this binder into a more stable situation. Perhaps we can kill multiple birds here. |
Even when that is possible, it maybe makes sense to also explore AWS ECR as well? I guess there will be some benefits to having everything in AWS land at the time to retrieve/fetch images... |
Specifying passwords as we have done is the only way binderhub can push to registries right now (I opened jupyterhub/binderhub#1506), and that's also mostly ok in this context I think. I also don't think your GAR setup is too hacky, @sgibson91! It could be extended to AWS without too much difficulty I think. If we have the money to run other binderhubs, I think we can now. I agree #1382 is the way to go but I also worry that's a long way away, and as long as we don't make decisions here that bind possible ways to pursue #1382 i think it's ok to get some more binderhubs running. |
|
I've been discussing this idea on this issue in the team-compass repo: |
Current status:
Given the above caveat of Scott's AWS account, how/where should we track the setup of the AWS account associated with the Columbia grant? |
I do not fully understand your question, @sgibson91, can you elaborate a little bit more, thanks! |
|
Thanks for the additional context.
That is a really good question I am not sure of the answer to it... |
Right, and if we pass through costs like that I believe we actually have to change our contract with Columbia, as documented below regarding moving the GCP infrastructure to a 2i2c-managed project |
Adding @jmunroe into this conversation because there will be contract amendments involved/needed. |
I opened the following upstream issue to track the technical deployment of the infrastructure to mybinder.org-deploy |
Great to see progress on this. Let me know how I can help. |
@rabernat I think the biggest way you can help is with @jmunroe around the Columbia contract so that we can add cloud billing as a line item on invoicing. That will unblock us on two fronts:
|
With @yuvipanda we recently learned that Columbia AWS accounts have none of the restrictions of the GCP accounts. Anyone can get access. Does that change the calculation of the tradeoffs here? |
The contracting change still needs to happen for the GCP deployment. I think the fact that AWS restrictions are less is why we decided to go with this binder deployment first. But it would be nice to have a sustainable source of credits/money for it. |
What's the definition of "sustainable" here? We have about one year of funding on the Moore Foundation award left. |
I was just under the impression that this was supposed to be funded from that pot. If I can avoid having to do a migration between AWS projects in the future, I would prefer it. |
Sounds good 👍 . Just trying to weight the relative costs of various technical workarounds vs. the cost of amending the subaward. We have lost admin staff at Columbia recently, so our ability to execute complex budgeting actions is really degraded. |
I think setting up an AWS account attached to that pot of money is a quick win right now. However, when CUIT didn't respond to support our application to join the Incommon Federation, we ran out of other pathways around amending the subaward, in terms of the GCP deployment. I appreciate that it's going to take work, but 2i2c have also been trying to find a way to make working on that deployment less of a headache for a long time and have been repeatedly let down on the Columbia side of operations. |
Hey all - I will put together a budget proposal and narrative that includes a line item for cloud costs, and see if we can get this arrangement settled quickly. If we can do this without many months or administrative slowness, then I think it would be worth it in order to reduce the stress of maintaining the infrastructure, and to give us more flexibility in access + configuration that will lead to a better service. I'll report back when we have an idea of how that process goes. My plan will be for 2i2c to include a budget line item for expected cloud costs, this will be a conservative estimate, and we can include in our invoices the actual cloud costs as a direct pass-through. I'll confirm with CS&S that they won't take any indirect costs on top of these cloud infrastructure costs. |
Just an update to this thread. The credits Scott offered have now gone https://discourse.pangeo.io/t/aws-pangeo-jupyterhubs-to-shut-down-friday-march-17/3228 So we need to figure out how else to fund a Pangeo Binder. |
I think that means that the funding would need to come from the Columbia grants themselves, is that right? (maybe @rabernat can comment?) If that is the case then I think we have two options1.
I likely don't have the capacity to spearhead this, so we'll need somebody (@jmunroe @colliand @damianavila) to track and move this forward. Footnotes
|
As I understand from onboarding it not all engineers have columbia accounts, and we don't have a clear process to request. From my perspective if the whole team is not able to support a cluster/hub then it is not sustainable to have just limited access to it. Ref #1799 |
Also, it appears Yuvi and I can no longer login to the Columbia emails we have anyway. Yuvi sent an email to Ryan and Julius. So major problems going to Columbia account route. |
Yep, setting up in Columbia land is a no-go. This meta issue will deal with the pieces needed for a pass-through. |
Description / problem to solve
Problem description
The Pangeo BinderHub has been down for about a month (due to crypto mining, but also because it did not have operational support to keep it going sustainably). The Pangeo community made heavy of use their Binder deployment, and it powered a lot of reproducible sharing (e.g., via gallery.pangeo.io.).
Proposed solution
We should deploy a BinderHub on the 2i2c deployment infrastructure that can live in parallel to the JupyterHub we run for the Pangeo community. We'll need to make a few modifications to their setup (including using up-to-date binderhub versions and locking down auth more reliably).
What's the value and who would benefit
This would allow the Pangeo community to re-gain the use of their BinderHub, which would benefit many people!
Implementation guide and constraints
There are a few things that we should consider here:
Here's a GitHub issue where @scottyhq describes the environment that was available on the Pangeo BinderHub: pangeo-data/pangeo-binder#195 (comment)
Updates and ongoing work
Here are a few major issues that would need to be tackled as part of this effort:
Admin
@octocat
The text was updated successfully, but these errors were encountered: