-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[kots]: use Helm for the Installer job #8491
Conversation
Codecov Report
@@ Coverage Diff @@
## main #8491 +/- ##
=======================================
Coverage 11.17% 11.17%
=======================================
Files 18 18
Lines 993 993
=======================================
Hits 111 111
Misses 880 880
Partials 2 2
Flags with carried forward coverage won't be shown. Click here to find out more. 📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more |
f3c7990
to
25d4216
Compare
25d4216
to
ff70da4
Compare
ff70da4
to
406678c
Compare
Thanks for creating this pull request as a home for the discussion on this topic. 🙏 In general, I like the approach of using helm to deploy and manage the resources so that we don't need to implement stuff where a solution is already available. I would like to hear the opinions of @csweichel and @geropl about this in particular whether they are seeing downsides of this approach that we would remedy with implementing our own solution. Open Questions:
Link to internal discussion for more context: https://gitpod.slack.com/archives/C01KLC56NP7/p1645713248561109 |
Personally, I don't see this as something that's necessary to be in the Installer. Perhaps the only exception to that is we may want to put a Unlike when we abstracted the I did create #8379 to test out a few things on this subject, which led me to realise that we could just use Helm as a way of deploying the Installer's YAML to a cluster. |
I understand the motivation, but wonder what's the story for the installer outside of replicated? How would we solve this issue for e.g. SaaS oder future BYOC installations? Solving this issue with Helm is one way - but it feels like a step backwards almost. How would resource deletion/upgrade be implement for a Kubernetes operator? |
No idea how we would do it on an operator, but that would certainly require more effort and testing than it would be using Helm. Not saying that's a deal-breaker, but this is a practical way of getting this working in fairly short order. My assumption with SaaS is that it's currently handled via ArgoCD - we've certainly had no requirement from @gitpod-io/platform to provide a native way of removing now-orphaned resources (that may be also due to the fact that we're not really removing resources). |
We should find out
That's not an unreasonable assumption. But it's that: an assumption. We can use helm for this. But let's use Helm because it's the right thing to do, we're aware of the limits this imposes (e.g. limits to custom resource removal handling), and we're aware of what the alternatives would have been.
@gitpod-io/platform is by far not the only stakeholder in this. It's the product teams who deploy their code. Platform provides services used by some other product teams, but it's the product teams who own the deployment. You need to go and talk to those teams to understand what the requirements are here. Chances are, the teams will defer to platform. That's a problem we're solving these days. But product teams need to carry their operational responsibility, and having an opinion about the behaviour of the installer is part of that. |
Since I did #8376, I don't believe we have any custom operators in the Installer anymore. The cert-manager CRDs are installed outside of the Installer and will remain that was as far as I'm concerned. Fair point on the other team implications. I would recommend that we don't do a stable release to Replicated until we have a way of automatically removing unused resources - I do fear lots of support calls into @gitpod-io/engineering-self-hosted on "I'm not using in-cluster DB but there's a |
I was wondering, doesn't replicated help us OOTB here? I.e. they consume the manifests we produce and apply that against the cluster, no? If so, for replicated this is not relevant - and for the other users of the installer (webapp, IDE, workspace) it would be interesting to know how they handle this problem today. |
Yes and no. It does for the manifests that we create in full, but that's only things like certs, secrets and others bits of config. For the actual Gitpod application, the gitpod-installer-job.yaml runs a Bash script which creates the config file, edits it and then applies it - Replicated/KOTS doesn't know anything about those resources and so can't help us with removal. The only way we'd be able to achieve that would be to completely abandon the Installer and to write all the logic in KOTS (which we don't want to do) |
That's disappointing! It was my understanding hitherto that we render the Kubernetes manifests, and KOTS applies them. What's the point of Replicated if we end up having to implement the hard parts? It looks like in KOTS we are already using the Helm hack. Why not keep this contained to KOTS and think of a proper solution for ourselves, i.e. the other consumer of the installer? Sounds to me like an RFC which discusses those ideas would be in order. |
It is disappointing, but it's also a consequence of us having the Installer to generate the manifests. If you recall the initial meetings with Replicated, their initial thought was us generating the manifests from the Installer and then using KOTS to decide which is implemented - we (rightly) decided against that as something that would be unmaintainable. The link I sent in my previous comment was from this PR - what's currently in The proposed change is for installations via Replicated only. If we were to incorporate this into the Installer, that would be down the line (and would certainly require an RFC) |
406678c
to
998f5ac
Compare
@gitpod-io/engineering-self-hosted are we in a position to get this merged? |
/werft run no-preview 👍 started the job as gitpod-build-sje-kots-helm.6 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like other people mentioned here, I'm not a big fan of using Helm internally again but not cleaning up unnecessary resources when users update config feels like a much bigger issue, and we should definitely fix that.
For now, This feels like a good fix, as users never need to interact with any of the Helm stuff here (or even notice this), and it seems to work as expected.
/hold for @corneliusludmann to approve this before we merge.
Writing here because @corneliusludmann pinged initially, and @mrsimonemms and I had Slack disussions about this:
|
This allows for automatic deletion of resources that are no longer used
998f5ac
to
2ffd035
Compare
/werft run no-preview 👍 started the job as gitpod-build-sje-kots-helm.8 |
Thanks all for the input. My summary of the discussion:
It would be very interesting if Helm would handle issues like these as well since this would be much harder to solve on ower own than just removing unnecessary resources. Could you check this, @mrsimonemms? |
@corneliusludmann upsettingly, Helm does NOT solve the label issue you raised and for the same reasons - the field is immutable. Personally, I don't think that makes much difference as to this PR as it's a change that's happened elsewhere that's broken the update from February to March releases. It would have been nice had Helm solved this, but it doesn't. What it does mean is that we must all be careful when introducing potentially breaking changes against long-running Gitpod clusters. In SaaS, we tend to use a blue/green deployment pattern but many self-hosted users will want to simply upgrade their existing clusters. |
Description
This changes the KOTS Installer job to use Helm for the installation of Gitpod rather than
kubectl apply
. IMPORTANT this still uses the Gitpod Installer for generating the Kubernetes YAML and is not in any way us going back to using Helm for installing Gitpod.One thing we have to be aware of with KOTS is that we control what resources third-parties/customers have installed on their clusters under the umbrella of "Gitpod". It is key that we should remove resources when they've been finished with.
An example of this:
If we use
kubectl apply
, there is no history and any resources that are no longer required (eg, themysql-0
statefulset) remain in the cluster. As they are no longer being used, they become orphaned from Gitpod installation YAML. More importantly, any persistent volume claims remain and the costs therein.By switching to
helm upgrade
, we hook into Helm's history. Any resources that are removed between the previous and current installations are deleted.Helm can be thought of as fulfilling two jobs:
As we are still using the Installer to generate the YAML (ie, part 1), we're only using Helm for part 2.
The alternative way of doing this is to implement our own version of this. In reality, we'd likely end up building something very similar to how Helm works. By doing it this way, we reduce the time involved and benefit from all their testing.
How to test
Deploy via KOTS:
Release Notes
Documentation