Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Teach CI how to install Helm charts into a GKE cluster #105

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

bcfriesen
Copy link
Contributor

Addresses #104.

Since I don't have permission to add Secrets to this repo at this time, I will leave this PR in draft form.

Anticipating that we will add some automation to deploy Helm charts into GKE as
part of CI, we should have some supervision over what MRs are allowed to
leverage that automation.
GKE Autopilot applies a default CPU request of 50m
(https://cloud.google.com/kubernetes-engine/docs/concepts/autopilot-resource-requests#defaults)
and will issue a warning if CPU resources are unset. So let's explicitly apply
the default value to quiet the warnings.
LoadBalancer IPs in GKE are public IPs, and as such they take a few minutes to
register, which extends the duration of the CI by quite a lot. Since we
currently have no use for public IPs for these services, let's change them back
to ClusterIPs to speed up the CI.
We have configured a GKE Autopilot cluster for running CI. This MR teaches the
CI how to deploy an example Helm chart into that cluster, and runs a simple
test to demonstrate that the chart deployment is successful.
@bcfriesen
Copy link
Contributor Author

I am not sure how to prevent PRs from running arbitrary code as part of the CI checks. There is no sensitive data in the GKE cluster, since it gets scrubbed between CI runs, so it's more about preventing random PRs from mining bitcoin. I wonder if there is a mechanism in GitHub that delays a CI check from running in a PR until after some human intervention.

@cjh1
Copy link
Member

cjh1 commented Jan 6, 2025

@bcfriesen I think that is possible with manual triggers

@bcfriesen
Copy link
Contributor Author

@bcfriesen I think that is possible with manual triggers

Thanks! That's perfect. It looks like I don't have permission to create environments in this repo either, so will have to wait on that too.

Copy link
Contributor

@rainest rainest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re secret management, we could move the chart out of deployment-recipes into its own repo that LBNL has owner rights to, though AFAIK the pull_request target should block PRs from forks from accessing them already (versus pull_request_target per https://securitylab.github.com/resources/github-actions-preventing-pwn-requests/).

run: |
cd ./lbnl/helm;
helm dependency build;
bash ./scripts/install.sh ${HELM_INSTALL_NAME} ${NAMESPACE} ${GC_REGISTRY}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you feel about invoking helm install here directly instead of using the script? That'd allow us to use --set parameters or a separate overlay values.yaml instead of changing the base one.

Wrapping the Helm commands in scripts prevents other tools that speak Helm (e.g. Argo) from interacting with the chart, so I'd want to avoid them if possible.

The main use case I'd expect for separate tools is for handling environment prerequisites that Helm's built-in features can't handle well. Database initialization is one such case, where it's annoying to spin up a managed cluster prior to install.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel great about replacing this with helm install - I already don't like that load-bearing script. It exists mainly as a convenience tool to override the ~dozen container image repos without having to invoke --set a dozen times. Maybe there is a more Helm-ish way to do that though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's not a great way to have both a default in values.yaml and an effective default from elsewhere, unfortunately. The templates like

{{ .Values.smd.deployment.image.repository }}/{{ .Values.smd.deployment.image.name }}:{{ .Values.smd.deployment.image.tag }}

can't be made to ignore the root values.yaml defaults. You can use a function that gives precedence to an optional .Values.defaultRepository key, but that sort of hidden behavior is worse--it won't be obvious what the effective value is unless you know the template innards.

The overlay values.yaml is equivalent to the repeated --set, and you can combine those if you want to keep them separate--something like helm install /path/to/chart -f override-repositories.yaml -f override-service-type.yaml

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The overlay values.yaml is equivalent to the repeated --set, and you can combine those if you want to keep them separate--something like helm install /path/to/chart -f override-repositories.yaml -f override-service-type.yaml

Ah neat, I like that more. I'll break out that script and try this out instead. Thanks!

runs-on: ubuntu-latest
env:
NAMESPACE: ochami-ci
GC_REGISTRY: ${{ secrets.GC_PROJECT_REGION }}-docker.pkg.dev/${{ secrets.GC_PROJECT_ID }}/ochami
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How will we keep these up to date? The GHCR repos will track the latest tags in their upstream repos, but won't push to the GCE registries.

If we want to, for example, test unreleased builds, we could use draft PRs with temporary overrides to an alternate registry/tag pending an upstream release.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is another shortcoming that I hadn't gotten around to. One option is to teach the existing repos for each container image how to build and push to the GC artifact registry. That would require adding GitHub Secrets to each repo that allow a Google Cloud Build robot to build and push the images as part of CI. I am not a huge fan of this approach.

Another approach would be to mirror each microservice repo in LBNL mirrors, and teach the mirrors how to build and push to the Google Cloud registry. IIUC GitLab has a pull mirroring feature that makes this relatively straightforward. Does GitHub have something like that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we'd have to pull from our end and run our own build job.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK - so one could recreate the pull mirroring behavior using a robot running in a K8s CronJob that checks for new tags in the various microservice repos, and builds/pushes the container images to the GC Artifact Registry. Or maybe that is too complicated?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO the simplest approach would be to just use the original GHCR-hosted images. Do we have a use case to build them separately?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recall a limitation somewhere in Google Cloud that container image locations were limited to Google Cloud Artifact Repository and to DockerHub, which motivated the desire to build them separately. But I might be mixing up GKE and Google Cloud Build in that regard. It is definitely simpler to just use the GHCR images.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There shouldn't be any such limitation--GKE should be able to pull from any public repo's GHCR fine. Private GHCR repos are more complicated, but we shouldn't have any.

The GCP artifact repository can serve as a pull-through cache mirror, but that's more to deal with DockerHub free pull limits.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thanks! That will reduce a lot of this complexity then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants