Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add utoronto staging hub #866

Merged
merged 6 commits into from
Dec 6, 2021
Merged

Conversation

GeorgianaElena
Copy link
Member

@GeorgianaElena GeorgianaElena commented Dec 2, 2021

Closes #867

  • Deploy the support chart for the utoronto cluster per @yuvipanda's comment.
  • Edit the A record for grafana.utoronto.2i2c.cloud to point to the nginx-ingress service's external IP
  • If everything goes ok in step 1, merge this PR and check if the deploy goes well.
  • Edit the DNS A record of staging.utoronto.2i2c.cloud to point to the external IP of the new hub (which is the nginx-ingress service's external IP)
  • Open an issue about the differences in resource allocation that we have for the utoronto hub and discuss there if we should stick with those or try use the defaults we have in the infra repo ➡️ UToronto resource limits #873

Comment on lines +55 to +73
cpu: 0.1
memory: 512Mi
limits:
memory: 512Mi
proxy:
chp:
resources:
requests:
cpu: 0.1
memory: 128Mi
limits:
memory: 512Mi
traefik:
resources:
requests:
cpu: 0.1
memory: 256Mi
limits:
memory: 512Mi
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are some resource limits from the old utoronto config. I wasn't sure whether to still use those or go with the default values in basehub template.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO we should stick with the current UToronto configuration values

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mmm... can I suggest otherwise, maybe? 😉
I mean, do we know if these values are the proper ones for the current incantation of our hubs from this repo?
If we set the values without validation (and only because we inherit those values), we may be setting conditions that are not actually needed (nor optimal) and makes the whole thing more complex than it should be...
Maybe we can trace back why these values were needed and infer if we need to tweak them here as well before actually doing it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was my feeling too @damianavila, esp since I believe the utoronto hub was the first version of our infra.

Maybe we can trace back why these values were needed and infer if we need to tweak them here as well before actually doing it?

I will dig into it to try find out if these were tweaked afterwards or were part of the initial deployment.

Copy link
Member

@choldgraf choldgraf Dec 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah - I was making an assumption here, let me re-phrase my suggestion:

  • If they have explicitly requested those limits then we should keep them.
  • If they have not and this was just set by us without explicit rationale for special-casing it then we should just go with whatever our other configurations use

From this conversation it sounds like we were the ones that chose those limits, not UToronto, so if that's true then I'm good w/ option 2

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the commit history, it seems that the limits were set by us.
But I'm not 100% sure this means that there wasn't a request from utoronto or a limitation observed at some point.

l would love if @yuvipanda could confirm this 👀

This is a side-by-side resource differences. Hope this makes it more easier to follow
resource-diffs

Copy link
Member

@choldgraf choldgraf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks awesome! A couple questions to clarify:

Comment on lines +55 to +73
cpu: 0.1
memory: 512Mi
limits:
memory: 512Mi
proxy:
chp:
resources:
requests:
cpu: 0.1
memory: 128Mi
limits:
memory: 512Mi
traefik:
resources:
requests:
cpu: 0.1
memory: 256Mi
limits:
memory: 512Mi
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO we should stick with the current UToronto configuration values

grafana:
ingress:
hosts:
- grafana.utoronto.2i2c.cloud
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DNS change needs to be done for this

Copy link
Member

@yuvipanda yuvipanda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This lgtm, @GeorgianaElena. You'll have to make DNS changes for staging.utoronto.2i2c.cloud to point to the IP of the nginx-ingress for this to work. But once that's done, hopefully everything (including auth!) should work.

Do deal with DNS issues, my suggested workflow is:

  1. Run deploy-support locally to deploy nginx-ingress
  2. Change DNS to point domains to the nginx-ingress service's external IP
  3. Then deploy the hub after 5-10 minutes, to allow time for the DNS change to propogate

This should avoid problems with let's encrypt trying to get HTTPS working properly.

@damianavila
Copy link
Contributor

Now that #841 is done will this auto-deploy when we merge?

AFAIK, nop, this one is using a raw kubeconfig and auto-deploy needs setting up some Service Principal. AFAIK, the terraform stuff for that was merged quite recently, so I am not sure the cluster supporting this hub actually was built before or after that PR was merged. In addition, I think there are still some permission issues to be figured out yet accordingly to this comment: #864 (comment)

@GeorgianaElena
Copy link
Member Author

Now that #841 is done will this auto-deploy when we merge?

@damianavila is right, #841 won't work for the utoronto hub because of not enough permissions to create a service principal (#864 (review)) :( Also, the cluster was built before the PR

This probably doesn't quite close #853 , because we'll need to deploy a prod hub as well, right? Since nobody will be using this hub yet anyway, why don't we deploy both staging and prod in this PR?

I believe we should deploy the prod hub after we make sure that everything works on this staging hub. It should be a fairly straightforward task, that can happen as part of Step 2 of #638. Also, we might need to take over the prod hub URL too (like we are doing here with the staging one).

@GeorgianaElena
Copy link
Member Author

Ah, oops! I just noticed that in the original issue #638, there this piece of context from @yuvipanda:

We now have Azure terraform support in this repo (#800) and can set up a new hub infra with that, to run in parallel with current setup. The primary infra difference will be the use of Azure File for home directories rather than the hand-spun NFS VM we have there right now - I think this is a big positive change.

I believe that NFS is enabled by default rather than Azure File, so that needs to change before merging this.

@GeorgianaElena
Copy link
Member Author

GeorgianaElena commented Dec 6, 2021

I would like to move this forward since we have a migration due date.
I was thinking about the following plan:

  • Deploy the support chart for the utoronto cluster per @yuvipanda's comment.
  • Edit the A record for grafana.utoronto.2i2c.cloud to point to the nginx-ingress service's external IP
  • If everything goes ok in step 1, merge this PR and check if the deploy goes well.
  • Edit the DNS A record of staging.utoronto.2i2c.cloud to point to the external IP of the new hub (which is the nginx-ingress service's external IP)
  • Open an issue about the differences in resource allocation that we have for the utoronto hub and discuss there if we should stick with those or try use the defaults we have in the infra repo

@GeorgianaElena
Copy link
Member Author

Since I have Yuvi's approval and we can tweak resources later, I will try a merge and check how the deploy goes 🤞

@GeorgianaElena GeorgianaElena merged commit 0ed463c into 2i2c-org:master Dec 6, 2021
@damianavila
Copy link
Contributor

damianavila commented Dec 6, 2021

Open an issue about the differences in resource allocation that we have for the utoronto hub and discuss there if we should stick with those or try use the defaults we have in the infra repo

Totally OK with me (but let's not forget about that discussion!! 😉 )

@damianavila
Copy link
Contributor

Btw, this is great progress @GeorgianaElena, thanks for pushing forward on it!!

@GeorgianaElena
Copy link
Member Author

Totally OK with me (but let's not forget about that discussion!! )

Absolutely, @damianavila!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create a staging hub for the UToronto cluster
4 participants