-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add utoronto staging hub #866
Conversation
for more information, see https://pre-commit.ci
cpu: 0.1 | ||
memory: 512Mi | ||
limits: | ||
memory: 512Mi | ||
proxy: | ||
chp: | ||
resources: | ||
requests: | ||
cpu: 0.1 | ||
memory: 128Mi | ||
limits: | ||
memory: 512Mi | ||
traefik: | ||
resources: | ||
requests: | ||
cpu: 0.1 | ||
memory: 256Mi | ||
limits: | ||
memory: 512Mi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are some resource limits from the old utoronto config. I wasn't sure whether to still use those or go with the default values in basehub
template.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO we should stick with the current UToronto configuration values
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mmm... can I suggest otherwise, maybe? 😉
I mean, do we know if these values are the proper ones for the current incantation of our hubs from this repo?
If we set the values without validation (and only because we inherit those values), we may be setting conditions that are not actually needed (nor optimal) and makes the whole thing more complex than it should be...
Maybe we can trace back why these values were needed and infer if we need to tweak them here as well before actually doing it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was my feeling too @damianavila, esp since I believe the utoronto hub was the first version of our infra.
Maybe we can trace back why these values were needed and infer if we need to tweak them here as well before actually doing it?
I will dig into it to try find out if these were tweaked afterwards or were part of the initial deployment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah - I was making an assumption here, let me re-phrase my suggestion:
- If they have explicitly requested those limits then we should keep them.
- If they have not and this was just set by us without explicit rationale for special-casing it then we should just go with whatever our other configurations use
From this conversation it sounds like we were the ones that chose those limits, not UToronto, so if that's true then I'm good w/ option 2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the commit history, it seems that the limits were set by us.
But I'm not 100% sure this means that there wasn't a request from utoronto or a limitation observed at some point.
l would love if @yuvipanda could confirm this 👀
This is a side-by-side resource differences. Hope this makes it more easier to follow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks awesome! A couple questions to clarify:
- Now that Automating deployments to Azure hubs #841 is done will this auto-deploy when we merge?
- This probably doesn't quite close Deploy a new cluster for the U. Toronto community #853 , because we'll need to deploy a
prod
hub as well, right? Since nobody will be using this hub yet anyway, why don't we deploy both staging and prod in this PR?
cpu: 0.1 | ||
memory: 512Mi | ||
limits: | ||
memory: 512Mi | ||
proxy: | ||
chp: | ||
resources: | ||
requests: | ||
cpu: 0.1 | ||
memory: 128Mi | ||
limits: | ||
memory: 512Mi | ||
traefik: | ||
resources: | ||
requests: | ||
cpu: 0.1 | ||
memory: 256Mi | ||
limits: | ||
memory: 512Mi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO we should stick with the current UToronto configuration values
grafana: | ||
ingress: | ||
hosts: | ||
- grafana.utoronto.2i2c.cloud |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DNS change needs to be done for this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This lgtm, @GeorgianaElena. You'll have to make DNS changes for staging.utoronto.2i2c.cloud to point to the IP of the nginx-ingress for this to work. But once that's done, hopefully everything (including auth!) should work.
Do deal with DNS issues, my suggested workflow is:
- Run
deploy-support
locally to deploy nginx-ingress - Change DNS to point domains to the nginx-ingress service's external IP
- Then deploy the hub after 5-10 minutes, to allow time for the DNS change to propogate
This should avoid problems with let's encrypt trying to get HTTPS working properly.
AFAIK, nop, this one is using a raw kubeconfig and auto-deploy needs setting up some Service Principal. AFAIK, the terraform stuff for that was merged quite recently, so I am not sure the cluster supporting this hub actually was built before or after that PR was merged. In addition, I think there are still some permission issues to be figured out yet accordingly to this comment: #864 (comment) |
@damianavila is right, #841 won't work for the utoronto hub because of not enough permissions to create a service principal (#864 (review)) :( Also, the cluster was built before the PR
I believe we should deploy the prod hub after we make sure that everything works on this staging hub. It should be a fairly straightforward task, that can happen as part of Step 2 of #638. Also, we might need to take over the prod hub URL too (like we are doing here with the staging one). |
Ah, oops! I just noticed that in the original issue #638, there this piece of context from @yuvipanda: We now have Azure terraform support in this repo (#800) and can set up a new hub infra with that, to run in parallel with current setup. The primary infra difference will be the use of Azure File for home directories rather than the hand-spun NFS VM we have there right now - I think this is a big positive change. I believe that NFS is enabled by default rather than Azure File, so that needs to change before merging this. |
I would like to move this forward since we have a migration due date.
|
Since I have Yuvi's approval and we can tweak resources later, I will try a merge and check how the deploy goes 🤞 |
Totally OK with me (but let's not forget about that discussion!! 😉 ) |
Btw, this is great progress @GeorgianaElena, thanks for pushing forward on it!! |
Absolutely, @damianavila! |
Closes #867
staging.utoronto.2i2c.cloud
to point to the external IP of the new hub (which is the nginx-ingress service's external IP)