Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Request deployment] New Hub: QCL aka QuantifiedCarbon #2118

Closed
6 of 8 tasks
ASamarkRoth opened this issue Jan 31, 2023 · 41 comments
Closed
6 of 8 tasks

[Request deployment] New Hub: QCL aka QuantifiedCarbon #2118

ASamarkRoth opened this issue Jan 31, 2023 · 41 comments
Assignees

Comments

@ASamarkRoth
Copy link

ASamarkRoth commented Jan 31, 2023

Important dates

  • Target start date: As soon as possible.
  • Required start date: As soon as possible.
  • Any important dates for usage:

Hub Authentication Type

GitHub

First Hub Administrators

Daniel Cox, [email protected], @gizmo404
Joseph McKenna, [email protected], @jtkmckenna

[GitHub Auth only] How would you like to manage your users?

Allowing members of a specific GitHub organization team

[GitHub Teams Auth only] Profile restriction based on team membership

@QuantifiedCarbon/jupyterhub

Hub logo image URL

https://avatars.githubusercontent.com/u/124042132?s=400&u=b84f1c7dfd1f9699b2adec7c8eb9ca7b9b2b0a6e&v=4

Hub logo website URL

https://quantifiedcarbon.com/

Hub user image GitHub repository

No response

Hub user image tag and name

to-be-decided

Extra features you'd like to enable

  • Specific cloud provider or datacenter (otherwise GCP)
  • Dedicated Kubernetes cluster
  • Scalable Dask Cluster

(Optional) Preferred cloud provider

GCP, europe-west1, where either zone b,c,d would work but let's pick europe-west1-d.

(Optional) Billing and Cloud account

2i2c creates the GCP project.

Other relevant information to the features above

Based on information requests tracked in https://2i2c.freshdesk.com/a/tickets/485, Erik concluded the following should be configured.

Custom domain name

QCL control's quantifiedcarbon.com and has provided the following CNAME configurations for the staging/production hubs and grafana server.

jupyter.quantifiedcarbon.com -> qcl.2i2c.cloud
staging.quantifiedcarbon.com -> staging.qcl.2i2c.cloud
grafana.quantifiedcarbon.com -> grafana.qcl.2i2c.cloud

Machine types and profile lists

Core nodes:

  • n2-highmem-2

User nodes:

  • n2-highmem-4
  • n2-highmem-16
  • n2-highcpu-32
  • n2-highcpu-96

singleuser.profileList configuration:

Tasks to deploy the hub

  • 1. Deploy information filled in above
  • 2. Engineer who will deploy the hub is assigned
  • 3. If using GitHub Orgs/Teams Auth, Engineer is given Owner rights to the org to set this up.
  • 4. Initial Hub deployment PR
  • 5. Administrators able to log on -> Hub now in steady-state
@colliand
Copy link
Contributor

colliand commented Feb 1, 2023

Thanks @ASamarkRoth for filling out the new hub request issue. Based on the anticipated usage scenario, I suggest we move forward with a Research Hub (without Dask) on a Shared Cluster hosted on Google Cloud Platform. Monthly cloud computing costs for your hub's usage will be passed through by Code for Science and Society.

Do you have a preferred data center? Perhaps you need your hub co-located next to large data hosted in a particular place?

Do you have specialized software environment needs? Our documentation contains some information about this question. If you don't have a customized environment or are not sure, 2i2c can proceed with a standard image and provide guidance for you to introduce changes in the future.

@ASamarkRoth
Copy link
Author

Thanks for your reply.

One of our main usages would be running very computationally heavy programs over a day or two. In my understanding, then it's not very suitable to use a shared cluster.

On data center preferences, I am not sure of our preferences. We do care about read/write speed but we are not working with too large data sets currently.

@jtkmckenna is leading the development of docker workflow and images at QCL, so we should be able to manage that ourselves. However, we would like to have the opportunity for the user to choose image from link when starting servers.

@colliand
Copy link
Contributor

colliand commented Feb 1, 2023

Thanks @ASamarkRoth for your reply.

2i2c can deploy a hub for QCL on a dedicated cluster. That said, I suggest we wait for input from @consideRatio or another member of 2i2c's engineering team on whether a shared cluster will be suitable for the usage scenario you described.

Yes, 2i2c has deployed hubs with the ability for the user to select the software image. In future exchanges with your team, we will aim to identify the set of images to offer in the menu.

@colliand
Copy link
Contributor

colliand commented Feb 1, 2023

I apologize! I see that the request in the anchor comment in the issue was for a dedicated cluster.

@jmunroe
Copy link
Contributor

jmunroe commented Feb 2, 2023

Hi @ASamarkRoth . I work as a Community Lead at 2i2c.

It might be important to clarify what a "dedicated cluster" means in this context. Even with a "shared cluster", your computational environment would be on its own node/machine -- this is not a shared CPU environment. The "sharing" is for the underlying Kubernetes layer.

Unless there are specific billing or data centre requirements, I suspect that a shared cluster will be suitable.

I am very interested to learn more about your intended workflow. Your message indicated "One of our main usages would be running very computationally heavy programs over a day or two". Would you be able to comment on what type of heavy programs you are targeting? I want to make sure we deploy this in a way that best meets your anticipated needs.

@gizmo404
Copy link

gizmo404 commented Feb 3, 2023

I can comment a little on the types of workload.
We're largely doing electricity grid expansion modelling using GenX coupled to a number of in house codes. We have been running GenX using the commercial Gurobi solver. A typical workflow could be running 20-30 consecutive years with GenX and for each year running our in house codes for multiple year scenarios iteratively until an optimal solution can be found. In total this forms one complete scenario and we can need 2-5 of those on each project.
A lot of the work will be smaller subsets of these for testing and development but towards project deadlines final runs can take large amounts of computing power and is a current bottleneck.
Does this give you more of an idea of our workflow? Perhaps we could also jump on a quick video call to discuss?

@jtkmckenna
Copy link

@colliand @jmunroe @consideRatio

Is there any additional information we can provide? Several of the programs we will be running are highly parallelized with moderate memory usage.

A major motivation for moving to 2i2c is that we have hit a workflow bottle neck with CPU capacity on our previous setup.

@colliand
Copy link
Contributor

Thanks @jtkmckenna @gizmo404 @ASamarkRoth and @jmunroe for exchanges about the QCL hub. I am excited to make this work!

I am not an expert but understand GenX is mostly managed via Julia rather than Python. 2i2c can deploy software environments that includes the ability to launch VS Code from inside JupyterHub with a Julia kernel. It's my understanding that Julia Notebooks perform better inside VS Code than they do as Jupyter Notebooks or in Jupyter lab.

2i2c has the capacity to deploy hubs with the option to select different hardware profiles. After passing through the login page, users of the QCL hub could be offered a menu of small, medium, large hardware profiles. The small machine might be appropriate for making some edits or exploring a code base while the large may be used for doing a bigger analysis.

I like the suggestion from @gizmo404 (and from @ASamarkRoth to our support team) that we get together in a virtual meeting to discuss soon. I'll follow up via email with @ASamarkRoth to arrange for such a meeting.

@consideRatio consideRatio changed the title [Request deployment] New Hub: {{ QCL }} [Request deployment] New Hub: QCL aka QuantifiedCarbon Mar 2, 2023
@consideRatio
Copy link
Member

@jtkmckenna @ASamarkRoth @gizmo404 we have the option of providing you access with grafana dashboards, but currently only in one of two ways.

  1. Individual user invite with username / password management
  2. GitHub idenitities are used, and one or more github organizations are authorized access.

I suggest granting access to the entire github organization https://github.com/QuantifiedCarbon, does that make sense to you? The data exposed shouldn't be very sensitive I think, the total number of running users and their cpu/memory use etc.

@consideRatio
Copy link
Member

consideRatio commented Mar 5, 2023

@jtkmckenna @ASamarkRoth @gizmo404 as part of not coupling too tightly with 2i2c, we'd also like to setup github authentication/authorization in your github organization rather than ours. Practically, to define OAuth2 applications in your github organizations than in the 2i2c-org organization.

  1. If using GitHub Orgs/Teams Auth, Engineer is given Owner rights to the org to set this up.

To set this up, can you temporary grant Owner rights to the GitHub organization to @pnasrat via https://github.com/orgs/QuantifiedCarbon/people ?

There are options to doing this, such as documenting what you should do yourself instead, but we haven't established such documentation yet.

@jtkmckenna
Copy link

jtkmckenna commented Mar 6, 2023

@consideRatio @pnasrat, thank you. Github Organisations will suit is very well. I have sent an invite to pnasrat

If using GitHub Orgs/Teams Auth, Engineer is given Owner rights to the org to set this up.

Done

@pnasrat
Copy link
Contributor

pnasrat commented Mar 6, 2023

@jtkmckenna thanks.

Can you confirm if you can log on to the staging hub via github auth at

https://staging.quantifiedcarbon.com/hub/login?next=%2Fhub%2F

@jtkmckenna
Copy link

Hi @pnasrat,

Unfortunately I am getting a 403 error. I get the landing page:

image

but then when I click to log in, 403:

image

@pnasrat
Copy link
Contributor

pnasrat commented Mar 6, 2023

Thanks yes I see it in the logs also I'll take a look at it, I"m new to 2i2c and this is my first time creating a cluster hub so it's likely an oversight on my part

I'll continue looking into this my time tomorrow morning

@pnasrat
Copy link
Contributor

pnasrat commented Mar 7, 2023

Ah I see, what happened.

When I was following the instructions to add the appropriate permissions for the quantifiedcarbon for the oauth app, I paused to update the documentation and had not clicked the grant button.

Could you try again when you get an opportunity.

@jtkmckenna
Copy link

image

I am in! :)

Shared folder seems to work also:

image

@pnasrat
Copy link
Contributor

pnasrat commented Mar 7, 2023

@jtkmckenna excellent. I think you've been working on the docker image side of this for QCL. Do you know what images I should be adding yet. Once I have those we'll make sure staging is setup as you want and can deploy the production hub.

@jtkmckenna
Copy link

hi @pnasrat

I have two experimental jupyter docker images, they are untested, so I would like the option to use 2i2c docker images also

quay.io/quantifiedcarbon/develop:latest
quay.io/quantifiedcarbon/main:latest

Right now these two images are the same as each other, I am using two paths so that we dont need to keep asking you to point to new tags

@pnasrat
Copy link
Contributor

pnasrat commented Mar 8, 2023

Hi @jtkmckenna, we can definitely do that why don't I set those images up in the profile list of staging and you can test them out there. I'll update you when deployed

@jtkmckenna
Copy link

Hi @pnasrat,

Do we have an ETA on when it will be deployed? I see in the Control Panel -> configurator that I can put in a string for a docker image. Can I already put in our test docker image and start testing?

Thanks

@pnasrat
Copy link
Contributor

pnasrat commented Mar 13, 2023

@jtkmckenna feel free to experiment with your images on the staging hub using the configurator while I get the proper node pools set up and peer reviewed!

@pnasrat
Copy link
Contributor

pnasrat commented Mar 14, 2023

@jtkmckenna I wanted to give you a heads up that I am updating the profile list on the staging environment.

@jtkmckenna
Copy link

@pnasrat Thank you for the heads up. I guess thats why I am getting a 403 error at https://staging.quantifiedcarbon.com/hub/spawn at the moment
image

@pnasrat
Copy link
Contributor

pnasrat commented Mar 14, 2023 via email

@pnasrat
Copy link
Contributor

pnasrat commented Mar 14, 2023

@jtkmckenna please logout via https://staging.quantifiedcarbon.com/hub/logout and I need to debug what is happening as this shouldn't have been impacted by the profileList to the best of my understanding

@pnasrat
Copy link
Contributor

pnasrat commented Mar 14, 2023

My understanding was flawed as the profile list for an org using teams did need explicit access control. @jtkmckenna please logout and try again!

image

@pnasrat
Copy link
Contributor

pnasrat commented Mar 14, 2023

@jtkmckenna note I'm actively working on the highcpu profiles so they may not work currently I just tested a small one.

@jtkmckenna
Copy link

@pnasrat
Copy link
Contributor

pnasrat commented Mar 14, 2023

Ok it’s working for me please log out and I’ll check the oauth state

@pnasrat
Copy link
Contributor

pnasrat commented Mar 14, 2023

If you click on hub/admin link you should see the logout. Let me know then I will probably have to clear the oauth token as it's now looking for groups and next login should prompt you to reauth the app

@jtkmckenna
Copy link

I can confirm I am logged out

@pnasrat
Copy link
Contributor

pnasrat commented Mar 14, 2023

@jtkmckenna I think I see what the issue was - it looks like the allowed_teams config is case sensitive. I apologize. I'm still new to 2i2c and this is the first hub I've brought up so this was definitely an error on my part

@jtkmckenna
Copy link

@pnasrat It worked!!! Great success... but then I think I might have managed to lock myself out again?

I logged in this morning, played around with a small node. Compiled some code and ran a few tests with my development docker image (quay.io/quantifiedcarbon/develop:latest), I am happy, so logged out (via the jupyter logout button)... It went back to the landing page... Good

...but when I log in again I get 403 error

After a short time I try again and am invited to re-authenticate the link to github (~10:20 CET). Then get the 403 error again

I am not sure if the oauth_callback code is useful to you:

https://staging.quantifiedcarbon.com/hub/oauth_callback?code=b66999adc00b6adad1ef&state=eyJzdGF0ZV9pZCI6ICI2MTBmM2NlMWExMDI0MThhODZlYjQ5MzhmODdjNmI4YiIsICJuZXh0X3VybCI6ICIifQ%3D%3D

Reporting this now, I will try again in 20-30 minutes

@jtkmckenna
Copy link

After 10 minutes I get to re-authorize again:

image

But again end up with a 403 error

@pnasrat
Copy link
Contributor

pnasrat commented Mar 15, 2023

Ok let me look at that this morning (I'm on US/Eastern). I'm sorry that this hasn't been a seemless experience for your but I'm still learning the infrastructure setup here.

@pnasrat
Copy link
Contributor

pnasrat commented Mar 15, 2023

Log line for reference [W 2023-03-15 10:16:43.499 JupyterHub github:202] User jtkmckenna is not in allowed org list

@pnasrat
Copy link
Contributor

pnasrat commented Mar 15, 2023

Note to self in case I end up having to recreate - configurator image is set to quay.io/quantifiedcarbon/develop@sha256:cd68bb08bd41ed60b554296263e9f994041a494daa685719ad8e83ce75d5a011

@pnasrat
Copy link
Contributor

pnasrat commented Mar 15, 2023

I think I see my error - mixup between two different levels of auth configuration. I've pushed a fix that I'm waiting to get code reviewed on.

@jtkmckenna can you test if it working consistently for you without 403s, when logging out and back in, and after an elapsed period.

@jtkmckenna
Copy link

Log, logout, login, logout, login, logout.... looking good
Starting up a node, logout from jupyer... login, get jupyter :)

It looks happy. I'll try again after an elapsed period (in about an hour and a half)

@pnasrat
Copy link
Contributor

pnasrat commented Mar 21, 2023

@jtkmckenna the production hub is also up. Note we are currently in the process moving to a different auth provider (that will still use GitHub oauth for you), this shouldn't impact the setup but you may need to reauthenticate when that is done.

The production hub is at https://jupyter.quantifiedcarbon.com as requested. Please confirm you can log in to that and I can close this request out. Any additional needs can be handled through our support process.

@jtkmckenna
Copy link

Its working well for me!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Archived in project
Development

No branches or pull requests

7 participants