-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New Hub: Geospatial workshop in Ghana #473
Comments
Thanks @choldgraf for getting this underway! Very much appreciated! I am not sure I understand a lot of what the "Setup Information" section is asking for (e.g. do I need to provide the hub type, url, etc.?), but I can fill in a bit of the "Important Information": Important Information
For two reasons, we may want to extend the hub end date to next year or beyond:
Happy to provide more of the above information with a bit more guidance! Thank you!! |
This is great, we should definitely support this! Do you know which funding source we can use for this, @choldgraf? |
@yuvipanda that's a good question, here are a few options I can think of:
|
I think we should start off using the JROST funds, and then try to find credits elsewhere |
I wonder if @scottyhq, @consideRatio, @jhamman, or @rabernat could comment on what kind of cost we might expect for this workshop. If we have ~30-100 users doing "pangeo-style" environment analysis for 2 weeks, what kind of cost could we expect to incur in cloud infrastructure? This feels like it may be similar to the GeoHackWeeks. |
I spoke with @rabernat who mentioned that we could use the Columbia Pangeo credits for this one. I believe that those are on GCP as well. @sgibson91 @yuvipanda is there any technical challenge to using these credits for this hub? (assuming that it will be a different hub from the "main" Pangeo hubs) |
Hmmm hmmm @choldgraf I'm not feeling confident about cost estimation as it is so extremely dependent on how much work is generated by users on their ability to request compute via Dask-clusters, but I'll try to estimate things anyhow. The base cost could be like any other hub for 2 weeks I guess, but then the dask worker nodes adds to that. They will be configured as spot-instances/preemptible instances that cost ~30% of original instances, so if you have for example a 32 core instances it's like 300 USD / month (150 USD / 2 weeks). I'll go ahead and guesstimate the cost wont go over 1000 USD for Dask worker nodes if ~50 users play around with dask workers and we force machines to be limited to 32 CPU cores and limit autoscaling to ~10 nodes (320 cores). |
that's a really helpful analysis @consideRatio , thanks very much :-) |
Thanks @choldgraf @consideRatio for your efforts here! With my limited understanding of all of this, I think what @consideRatio lays out here sounds very reasonable. I don't anticipate having too many high Dask-usage workloads during the school, since for many participants of the school this will be their first time using Dask or accessing large climate datasets. And especially with so many new Dask users, those CPU and scaling limits mentioned by @consideRatio will be very important. |
@consideRatio gave a really nice costing estimate above! 🙌🏻 From a technical stand-point, I think we run into the same issue as 2i2c-org/team-compass#136 and we don't have billing control of that project. |
Just checking on an update here! This year's workshop is coming up very soon, and I just want to know if it's likely a Hub can be set up and fully functional by July 19th at the latest, or if I should make alternate plans instead (which would be doable, as long as I know soon). Thanks! |
@sgibson91 I believe that we can deploy this hub on Pangeo infrastructure as well, so could the temporary fix for 2i2c-org/team-compass#136 also be applied to this hub? |
@choldgraf we now have a bigger blocker on that project and I've resorted to testing on the GCP project that is currently hosting Pangeo infrastructure (that I can access with my 2i2c account, not Columbia) |
@sgibson91 I think it's fine if we use whatever GCP account we have access to do serve the Ghana Hub. If worst comes to worst, we'll use our $5,000 JROST grant to pay for the cloud infrastructure. |
Ok, well there's a fresh cluster on the |
that'd be super awesome :-) |
Hi @paigem! I think the most important questions here to get going are:
We can generate a URL that will be something like |
There's a WIP PR open to deploy a Hub in #508 :) |
Thank you for your help @rabernat! If a restart of the server doesn't help, I will poke around those hubs a little more (this hub is running in the same project for now so hopefully it should transfer over pretty easily) |
I have tried explicitly shutting down the notebook kernels and restarting, and creating a new notebook, all with no luck. |
@paigem can you go to https://coessing.pangeo.2i2c.cloud/hub/admin, click "stop server" next to your name and try again please? (This is different to killing the kernels, it's more like rebooting your machine) |
Thanks for specifying how to shut down my server @sgibson91. I have shut down my server and logged in again, and I am still getting the same error. |
Ok, thanks for bearing with me there. I will see what I can learn from the Pangeo hub deployments regarding this. |
No problem at all! Thanks for helping figure this out! |
This sounds a bit like pangeo-data/pangeo-cloud-federation#615. I have a few comments in that thread with various commands I ran to grant GCP permissions to Google service accounts and link those Google service accounts with Kubernetes Service Accounts (which are used by the hub). I never did confirm this, but I think there is a potential risk that a user makes requester-pays calls to non-pangeo buckets, which would end up costing money. I never found out if there's a finer-grained way to grant this permission on just certain buckets. |
Ok, some good news! I have bootstrapped the pangeo-notebook image so now |
Omg, I think I have solved this too! At least now when I run that snippet in a notebook, I don't get any error. @paigem can you confirm? Thank you @rabernat and @TomAugspurger for your helpful input! 🙏🏻 |
Yes, this is correct |
It works!! 😄 Amazing - thank you @sgibson91! |
In case you don't know about it, you can use this great website to generate an nbgitpuller link. I generated this one for example And can even put your link behind a fancy looking badge [![Open with Jupyter](https://img.shields.io/badge/Open%20with-Jupyter-orange?style=for-the-badge&logo=Jupyter)](https://coessing.pangeo.2i2c.cloud/hub/user-redirect/git-pull?repo=https%3A%2F%2Fgithub.com%2Fpangeo-gallery%2Fphysical-oceanography&urlpath=lab%2Ftree%2Fphysical-oceanography%2F&branch=master) |
Thanks @rabernat! Yes, I have already made a link using nbgitpuller (thanks to 2i2c docs!) to sync files from a GitHub repo, but I like the fancy badge! I assume this badge is something you include in your GitHub repo? |
You can certainly put the badge in a repo README. The version I shared was a markdown version, so it works well in github. But you could put such badge on any website anywhere, such as the workshop website. The html version would look like <a href="https://coessing.pangeo.2i2c.cloud/hub/user-redirect/git-pull?repo=https%3A%2F%2Fgithub.com%2Fpangeo-gallery%2Fphysical-oceanography&urlpath=lab%2Ftree%2Fphysical-oceanography%2F&branch=master"><img alt="Open with Jupyter" src="https://img.shields.io/badge/Open%20with-Jupyter-orange?style=for-the-badge&logo=Jupyter" /></a> |
Thanks @rabernat - the html version will be very helpful for the website! |
@sgibson91 just to confirm the fix was running this snippet or something else?
|
@damianavila I think the code in this comment did it. The key was the k8s annotation so it knows to use it. (For ref because it took me some time to figure this out: the part of the gcloud command inside |
Thanks for the info, @sgibson91. |
Yeah, I'd also like a review of it to make sure we understand what's going on and that we're not unnecessarily granting elevated privileges |
@paigem if you're happy with the state the hub is in now, I'm going to close this ticket.
We are actually trialling a new support framework using FreshDesk and tickets can be submitted by emailing [email protected]. Are you happy to be a guinea pig and send any issues through this system? |
Yes, the Hub is working great! Thank you!! And I'm happy to trial the FreshDesk support framework! :) |
Background
@paigem works with @rabernat, and is helping to lead/organize a workshop around geospatial analytics (e.g., the "Pangeo stack") in Ghana. In previous years, they have asked attendees to install things on their local machines, but she would love to have access to cloud infrastructure via 2i2c that supports this workshop.
The team behind this workshop currently does not have funding for infrastructure/services, so this would be a pro-bono case. In my opinion, it is well worth the time investment because it is a great cause, and a way to see how our infrastructure could serve those in non-North America/Europe countries.
@paigem could you help us answer some of the questions in the section below?
Setup Information
Important Information
Deploy To Do
The text was updated successfully, but these errors were encountered: