Teach CI how to install Helm charts into a GKE cluster #105

bcfriesen · 2025-01-03T21:23:02Z

Addresses #104.

Since I don't have permission to add Secrets to this repo at this time, I will leave this PR in draft form.

Anticipating that we will add some automation to deploy Helm charts into GKE as part of CI, we should have some supervision over what MRs are allowed to leverage that automation.

GKE Autopilot applies a default CPU request of 50m (https://cloud.google.com/kubernetes-engine/docs/concepts/autopilot-resource-requests#defaults) and will issue a warning if CPU resources are unset. So let's explicitly apply the default value to quiet the warnings.

LoadBalancer IPs in GKE are public IPs, and as such they take a few minutes to register, which extends the duration of the CI by quite a lot. Since we currently have no use for public IPs for these services, let's change them back to ClusterIPs to speed up the CI.

We have configured a GKE Autopilot cluster for running CI. This MR teaches the CI how to deploy an example Helm chart into that cluster, and runs a simple test to demonstrate that the chart deployment is successful.

bcfriesen · 2025-01-06T17:55:43Z

I am not sure how to prevent PRs from running arbitrary code as part of the CI checks. There is no sensitive data in the GKE cluster, since it gets scrubbed between CI runs, so it's more about preventing random PRs from mining bitcoin. I wonder if there is a mechanism in GitHub that delays a CI check from running in a PR until after some human intervention.

cjh1 · 2025-01-06T18:07:00Z

@bcfriesen I think that is possible with manual triggers

bcfriesen · 2025-01-06T19:34:56Z

@bcfriesen I think that is possible with manual triggers

Thanks! That's perfect. It looks like I don't have permission to create environments in this repo either, so will have to wait on that too.

rainest

Re secret management, we could move the chart out of deployment-recipes into its own repo that LBNL has owner rights to, though AFAIK the pull_request target should block PRs from forks from accessing them already (versus pull_request_target per https://securitylab.github.com/resources/github-actions-preventing-pwn-requests/).

rainest · 2025-01-06T20:19:44Z

.github/workflows/chart-test.yaml

+        run: |
+          cd ./lbnl/helm;
+          helm dependency build;
+          bash ./scripts/install.sh ${HELM_INSTALL_NAME} ${NAMESPACE} ${GC_REGISTRY}


How do you feel about invoking helm install here directly instead of using the script? That'd allow us to use --set parameters or a separate overlay values.yaml instead of changing the base one.

Wrapping the Helm commands in scripts prevents other tools that speak Helm (e.g. Argo) from interacting with the chart, so I'd want to avoid them if possible.

The main use case I'd expect for separate tools is for handling environment prerequisites that Helm's built-in features can't handle well. Database initialization is one such case, where it's annoying to spin up a managed cluster prior to install.

I feel great about replacing this with helm install - I already don't like that load-bearing script. It exists mainly as a convenience tool to override the ~dozen container image repos without having to invoke --set a dozen times. Maybe there is a more Helm-ish way to do that though.

There's not a great way to have both a default in values.yaml and an effective default from elsewhere, unfortunately. The templates like

{{ .Values.smd.deployment.image.repository }}/{{ .Values.smd.deployment.image.name }}:{{ .Values.smd.deployment.image.tag }}

can't be made to ignore the root values.yaml defaults. You can use a function that gives precedence to an optional .Values.defaultRepository key, but that sort of hidden behavior is worse--it won't be obvious what the effective value is unless you know the template innards.

The overlay values.yaml is equivalent to the repeated --set, and you can combine those if you want to keep them separate--something like helm install /path/to/chart -f override-repositories.yaml -f override-service-type.yaml

The overlay values.yaml is equivalent to the repeated --set, and you can combine those if you want to keep them separate--something like helm install /path/to/chart -f override-repositories.yaml -f override-service-type.yaml

Ah neat, I like that more. I'll break out that script and try this out instead. Thanks!

rainest · 2025-01-06T20:28:29Z

.github/workflows/chart-test.yaml

+    runs-on: ubuntu-latest
+    env:
+      NAMESPACE: ochami-ci
+      GC_REGISTRY: ${{ secrets.GC_PROJECT_REGION }}-docker.pkg.dev/${{ secrets.GC_PROJECT_ID }}/ochami


How will we keep these up to date? The GHCR repos will track the latest tags in their upstream repos, but won't push to the GCE registries.

If we want to, for example, test unreleased builds, we could use draft PRs with temporary overrides to an alternate registry/tag pending an upstream release.

This is another shortcoming that I hadn't gotten around to. One option is to teach the existing repos for each container image how to build and push to the GC artifact registry. That would require adding GitHub Secrets to each repo that allow a Google Cloud Build robot to build and push the images as part of CI. I am not a huge fan of this approach.

Another approach would be to mirror each microservice repo in LBNL mirrors, and teach the mirrors how to build and push to the Google Cloud registry. IIUC GitLab has a pull mirroring feature that makes this relatively straightforward. Does GitHub have something like that?

No, we'd have to pull from our end and run our own build job.

OK - so one could recreate the pull mirroring behavior using a robot running in a K8s CronJob that checks for new tags in the various microservice repos, and builds/pushes the container images to the GC Artifact Registry. Or maybe that is too complicated?

IMO the simplest approach would be to just use the original GHCR-hosted images. Do we have a use case to build them separately?

I recall a limitation somewhere in Google Cloud that container image locations were limited to Google Cloud Artifact Repository and to DockerHub, which motivated the desire to build them separately. But I might be mixing up GKE and Google Cloud Build in that regard. It is definitely simpler to just use the GHCR images.

There shouldn't be any such limitation--GKE should be able to pull from any public repo's GHCR fine. Private GHCR repos are more complicated, but we shouldn't have any.

The GCP artifact repository can serve as a pull-through cache mirror, but that's more to deal with DockerHub free pull limits.

Awesome, thanks! That will reduce a lot of this complexity then.

bcfriesen added 4 commits January 3, 2025 13:19

lbnl, helm: add CODEOWNERS

56b5caa

Anticipating that we will add some automation to deploy Helm charts into GKE as part of CI, we should have some supervision over what MRs are allowed to leverage that automation.

lbnl, helm: teach CI how to deploy a Helm chart into a GKE cluster

81fdb4f

We have configured a GKE Autopilot cluster for running CI. This MR teaches the CI how to deploy an example Helm chart into that cluster, and runs a simple test to demonstrate that the chart deployment is successful.

rainest reviewed Jan 6, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Teach CI how to install Helm charts into a GKE cluster #105

Teach CI how to install Helm charts into a GKE cluster #105

bcfriesen commented Jan 3, 2025

bcfriesen commented Jan 6, 2025

cjh1 commented Jan 6, 2025

bcfriesen commented Jan 6, 2025

rainest left a comment

rainest Jan 6, 2025

bcfriesen Jan 6, 2025

rainest Jan 6, 2025

bcfriesen Jan 6, 2025

rainest Jan 6, 2025

bcfriesen Jan 6, 2025

rainest Jan 6, 2025

bcfriesen Jan 8, 2025

rainest Jan 8, 2025

bcfriesen Jan 8, 2025

rainest Jan 8, 2025

bcfriesen Jan 8, 2025

Teach CI how to install Helm charts into a GKE cluster #105

Are you sure you want to change the base?

Teach CI how to install Helm charts into a GKE cluster #105

Conversation

bcfriesen commented Jan 3, 2025

bcfriesen commented Jan 6, 2025

cjh1 commented Jan 6, 2025

bcfriesen commented Jan 6, 2025

rainest left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment