-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Teach CI how to install Helm charts into a GKE cluster #105
base: main
Are you sure you want to change the base?
Conversation
Anticipating that we will add some automation to deploy Helm charts into GKE as part of CI, we should have some supervision over what MRs are allowed to leverage that automation.
GKE Autopilot applies a default CPU request of 50m (https://cloud.google.com/kubernetes-engine/docs/concepts/autopilot-resource-requests#defaults) and will issue a warning if CPU resources are unset. So let's explicitly apply the default value to quiet the warnings.
LoadBalancer IPs in GKE are public IPs, and as such they take a few minutes to register, which extends the duration of the CI by quite a lot. Since we currently have no use for public IPs for these services, let's change them back to ClusterIPs to speed up the CI.
We have configured a GKE Autopilot cluster for running CI. This MR teaches the CI how to deploy an example Helm chart into that cluster, and runs a simple test to demonstrate that the chart deployment is successful.
I am not sure how to prevent PRs from running arbitrary code as part of the CI checks. There is no sensitive data in the GKE cluster, since it gets scrubbed between CI runs, so it's more about preventing random PRs from mining bitcoin. I wonder if there is a mechanism in GitHub that delays a CI check from running in a PR until after some human intervention. |
@bcfriesen I think that is possible with manual triggers |
Thanks! That's perfect. It looks like I don't have permission to create environments in this repo either, so will have to wait on that too. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re secret management, we could move the chart out of deployment-recipes into its own repo that LBNL has owner rights to, though AFAIK the pull_request
target should block PRs from forks from accessing them already (versus pull_request_target
per https://securitylab.github.com/resources/github-actions-preventing-pwn-requests/).
run: | | ||
cd ./lbnl/helm; | ||
helm dependency build; | ||
bash ./scripts/install.sh ${HELM_INSTALL_NAME} ${NAMESPACE} ${GC_REGISTRY} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do you feel about invoking helm install
here directly instead of using the script? That'd allow us to use --set
parameters or a separate overlay values.yaml instead of changing the base one.
Wrapping the Helm commands in scripts prevents other tools that speak Helm (e.g. Argo) from interacting with the chart, so I'd want to avoid them if possible.
The main use case I'd expect for separate tools is for handling environment prerequisites that Helm's built-in features can't handle well. Database initialization is one such case, where it's annoying to spin up a managed cluster prior to install.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel great about replacing this with helm install
- I already don't like that load-bearing script. It exists mainly as a convenience tool to override the ~dozen container image repos without having to invoke --set
a dozen times. Maybe there is a more Helm-ish way to do that though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's not a great way to have both a default in values.yaml and an effective default from elsewhere, unfortunately. The templates like
{{ .Values.smd.deployment.image.repository }}/{{ .Values.smd.deployment.image.name }}:{{ .Values.smd.deployment.image.tag }}
can't be made to ignore the root values.yaml defaults. You can use a function that gives precedence to an optional .Values.defaultRepository
key, but that sort of hidden behavior is worse--it won't be obvious what the effective value is unless you know the template innards.
The overlay values.yaml is equivalent to the repeated --set
, and you can combine those if you want to keep them separate--something like helm install /path/to/chart -f override-repositories.yaml -f override-service-type.yaml
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The overlay values.yaml is equivalent to the repeated --set, and you can combine those if you want to keep them separate--something like helm install /path/to/chart -f override-repositories.yaml -f override-service-type.yaml
Ah neat, I like that more. I'll break out that script and try this out instead. Thanks!
runs-on: ubuntu-latest | ||
env: | ||
NAMESPACE: ochami-ci | ||
GC_REGISTRY: ${{ secrets.GC_PROJECT_REGION }}-docker.pkg.dev/${{ secrets.GC_PROJECT_ID }}/ochami |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How will we keep these up to date? The GHCR repos will track the latest tags in their upstream repos, but won't push to the GCE registries.
If we want to, for example, test unreleased builds, we could use draft PRs with temporary overrides to an alternate registry/tag pending an upstream release.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is another shortcoming that I hadn't gotten around to. One option is to teach the existing repos for each container image how to build and push to the GC artifact registry. That would require adding GitHub Secrets to each repo that allow a Google Cloud Build robot to build and push the images as part of CI. I am not a huge fan of this approach.
Another approach would be to mirror each microservice repo in LBNL mirrors, and teach the mirrors how to build and push to the Google Cloud registry. IIUC GitLab has a pull mirroring feature that makes this relatively straightforward. Does GitHub have something like that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, we'd have to pull from our end and run our own build job.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK - so one could recreate the pull mirroring behavior using a robot running in a K8s CronJob that checks for new tags in the various microservice repos, and builds/pushes the container images to the GC Artifact Registry. Or maybe that is too complicated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO the simplest approach would be to just use the original GHCR-hosted images. Do we have a use case to build them separately?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recall a limitation somewhere in Google Cloud that container image locations were limited to Google Cloud Artifact Repository and to DockerHub, which motivated the desire to build them separately. But I might be mixing up GKE and Google Cloud Build in that regard. It is definitely simpler to just use the GHCR images.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There shouldn't be any such limitation--GKE should be able to pull from any public repo's GHCR fine. Private GHCR repos are more complicated, but we shouldn't have any.
The GCP artifact repository can serve as a pull-through cache mirror, but that's more to deal with DockerHub free pull limits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, thanks! That will reduce a lot of this complexity then.
Addresses #104.
Since I don't have permission to add Secrets to this repo at this time, I will leave this PR in draft form.