Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add carbonplan cluster + hubs #391

Merged
merged 12 commits into from
May 14, 2021
Merged

Conversation

yuvipanda
Copy link
Member

@yuvipanda yuvipanda commented May 10, 2021

  • staging and prod hubs that are exactly the same,
    with just domain differences
  • Uses traditional autohttps + LoadBalancer to get traffic
    into the cluster. Could be nginx-ingress later on if necessary.
  • Manual DNS entries for staging.carbonplan.2i2c.cloud and
    carbonplan.2i2c.cloud. Initial manual deploy with
    proxy.https.enabled set to false to complete deployment,
    fetch externalIP of proxy-public service, setup DNS,
    then re-deploy with proxy.https.enabled set to true.
  • Simple hacky script to setup EFS properly for a cluster
  • Use image automatically built at https://github.com/carbonplan/trace/tree/main/envs/python-notebook
  • Add ssh keys used to build the cluster
  • Standardize labels used for our nodes - see Standardize on nodepool labeling conventions #136

Depends on #389 and #390

Ref #291

@yuvipanda
Copy link
Member Author

yuvipanda commented May 11, 2021

I've deleted everything and recreating it from scratch.

  1. cd into the kops directory

  2. Set a 'state' bucket for kops - it will store current cluster state here. export KOPS_STATE_STORE=s3://<2i2c>-<hub-name>-kops-state

  3. Create this bucket: aws s3 mb $KOPS_STATE_STORE --region <region>. (FIXME: Make this versioned?)

  4. Render the kops config file: jsonnet carbonplan.jsonnet -y > carbonplan.kops.yaml

  5. Delete last line of the yaml file (...)

  6. Create the cluster config in s3: kops create -f carbonplan.kops.yaml

  7. Create an ssh key ssh-keygen -f ssh-key

  8. Create the cluster itself: kops update cluster carbonplanhub.k8s.local --yes --ssh-public-key ssh-key.pub --admin. The --admin will modify your ~/.kube/config file to point to the new cluster.

  9. Wait for kops cluster to create kops validate cluster --wait 10m

  10. In another terminal, workaround Ability to run CoreDNS on master node kubernetes/kops#11199 with k -n kube-system patch deployment kube-dns --type json --patch '[{"op": "add", "path": "/spec/template/spec/tolerations", "value": [{"key": "node-role.kubernetes.io/master", "effect": "NoSchedule"}]}]' and k -n kube-system patch deployment kube-dns-autoscaler --type json --patch '[{"op": "add", "path": "/spec/template/spec/tolerations", "value": [{"key": "node-role.kubernetes.io/master", "effect": "NoSchedule"}]}]'. Validate will not pass until this is done.

  11. Create an EFS file system for this hub, with python3 setup-efs.py <cluster-name> <region>. You need boto3 python package installed. This will output an fs- id, which you should use in basehub.nfsPVC.nfs.serverIP. It should be something like fs-<id>.efs.<region>.amazonaws.com

  12. cd back to the root of the repo

  13. Generate kubeconfig for deployer: KUBECONFIG=secrets/carbonplan.yaml kops export kubecfg --admin=730h carbonplanhub.k8s.local. If you already have a file there you need to remove it.

  14. Encrypt generated kubeconfig: sops -i -e secrets/carbonplan.yaml

  15. Set proxy.https.enabled to false in carbonplan.clusters.yaml. This creates the hubs without trying to give them HTTPS, so we can create DNS entries for them appropriately.

  16. Deploy hubs, with python3 deployer deploy carbonplan --skip-hub-health-test, and wait! It should hopefully complete successfully

  17. Get external IP for staging with kubectl -n staging get svc proxy-public, and make a DNS CNAME record for staging.carbonplan.2i2c.cloud pointing to that.

  18. Get external IP for staging with kubectl -n prod get svc proxy-public, and make a DNS CNAME record for carbonplan.2i2c.cloud pointing to that.

  19. Wait for about 5 minutes to make sure the DNS records actually resolve

  20. Set proxy.https.enabled to true in carbonplan.clusters.yaml, so we can get HTTPS!

  21. Run python3 deployer deploy carbonplan again, and this should setup HTTPS + run a test!

@yuvipanda
Copy link
Member Author

When rotating the master, you might run into kube-dns not evicting since that would violate its pdb. You can temporarily work around this by reducing the availability guarantee:

k -n kube-system patch pdb kube-dns --type json -p '[{"op": "replace", "path": "/spec/minAvailable", "value": 0}]'

However, once done, you should undo the change

k -n kube-system patch pdb kube-dns --type json -p '[{"op": "replace", "path": "/spec/minAvailable", "value": 1}]'

@damianavila
Copy link
Contributor

OK, I left some comments, I will try to deploy this one from scratch tomorrow and report back how it goes!

@damianavila
Copy link
Contributor

When rotating the master, you might run into kube-dns not evicting since that would violate its pdb.

I am pretty sure that was the problem preventing my kops rolling-update to succeed...

- staging and prod clusters that are exactly the same,
  with just domain differences
- Uses traditional autohttps + LoadBalancer to get traffic
  into the cluster. Could be nginx-ingress later on if necessary.
- Manual DNS entries for staging.carbonplan.2i2c.cloud and
  carbonplan.2i2c.cloud. Initial manual deploy with
  `proxy.https.enabled` set to false to complete deployment,
  fetch externalIP of `proxy-public` service, setup DNS,
  then re-deploy with `proxy.https.enabled` set to true.

Ref 2i2c-org#291
We have three sets of labels:

1. What components of a JupyterHub can run here? core / user
2. What components of a dask gateway can run here? core / scheduler /
   worker
3. What are the features of the node pool we care about? For example,
   if we want to be on an r5.xlarge node, we should target the
   existing node.kubernetes.io/instance-type label

This gives us flexibility without adding too much overhead.
dask-gateway requires that the image used for it
contains the `dask-gateway` package. The scheduler image is
the same image as the user notebook image, to make sure that
versions match. The previously used image did not have
dask-gateway installed
Otherwise it doesn't know which instance group to scale up
when a pod wants a node with that label
@yuvipanda yuvipanda changed the title Add carbonplan cluster + hubs [WIP] Add carbonplan cluster + hubs May 12, 2021
yuvipanda added a commit to yuvipanda/pilot-hubs that referenced this pull request May 13, 2021
With
2i2c-org@3c344a4,
we're trying to normalize labels in our clusters. Primarily,
we want to have three sets of labels that can be composed as
needed.

1. What components of a JupyterHub can run here? core / user
2. What components of a dask gateway can run here? core / scheduler /
   worker
3. What are the features of the node pool we care about? For example,
   if we want to be on an r5.xlarge node, we should target the
   existing node.kubernetes.io/instance-type label

This gives us flexibility without adding too much overhead.

2i2c-org#391 changes the base hub
template to use these new labels. We should change our GKE clusters to
have these labels too before we can deploy that PR. This PR adds
these labels in addition to the existing labels, to avoid any
disruption
@yuvipanda
Copy link
Member Author

I'll need to deploy and merge #397 first before merging this, so as to not break existing clusters.

@yuvipanda
Copy link
Member Author

With #397 all GKE clusters have new labels. I've removed Farallon from CI/CD in #400

@yuvipanda yuvipanda changed the title [WIP] Add carbonplan cluster + hubs Add carbonplan cluster + hubs May 14, 2021
@yuvipanda
Copy link
Member Author

Ok this one is ready to merge :)

Copy link
Contributor

@damianavila damianavila left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have been reviewing this one as it was evolving.
Last final check LGTM
This is great @yuvipanda!
Feel free to push the green button!

@yuvipanda yuvipanda merged commit 020b06a into 2i2c-org:master May 14, 2021
@yuvipanda
Copy link
Member Author

@damianavila done!

@yuvipanda yuvipanda mentioned this pull request May 27, 2021
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants