Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[kots]: use Helm for the Installer job #8491

Merged
merged 2 commits into from
Mar 15, 2022
Merged

[kots]: use Helm for the Installer job #8491

merged 2 commits into from
Mar 15, 2022

Conversation

mrsimonemms
Copy link
Contributor

Description

This changes the KOTS Installer job to use Helm for the installation of Gitpod rather than kubectl apply. IMPORTANT this still uses the Gitpod Installer for generating the Kubernetes YAML and is not in any way us going back to using Helm for installing Gitpod.

One thing we have to be aware of with KOTS is that we control what resources third-parties/customers have installed on their clusters under the umbrella of "Gitpod". It is key that we should remove resources when they've been finished with.

An example of this:

  • day 1: a user installs Gitpod with in-cluster database/registry/object storage as a way of getting up and running quickly. They use this to get familiar with Gitpod and the installation process
  • day 2: once they've decided they're happy, they change their KOTS configuration to migrate to external (and more performant) database/registry/object storage.

If we use kubectl apply, there is no history and any resources that are no longer required (eg, the mysql-0 statefulset) remain in the cluster. As they are no longer being used, they become orphaned from Gitpod installation YAML. More importantly, any persistent volume claims remain and the costs therein.

By switching to helm upgrade, we hook into Helm's history. Any resources that are removed between the previous and current installations are deleted.

Helm can be thought of as fulfilling two jobs:

  1. generating the YAML from templates
  2. deploying and managing resources

As we are still using the Installer to generate the YAML (ie, part 1), we're only using Helm for part 2.

The alternative way of doing this is to implement our own version of this. In reality, we'd likely end up building something very similar to how Helm works. By doing it this way, we reduce the time involved and benefit from all their testing.

How to test

Deploy via KOTS:

  • install with in-cluster dependencies
  • upgrade to external dependencies - watch the old resources disappear

Release Notes

[kots]: use Helm for the Installer job

Documentation

@mrsimonemms mrsimonemms changed the title WIP: [kots]: use Helm for the Installer job WIP: [kots]: use Helm for the Installer job Feb 28, 2022
@codecov
Copy link

codecov bot commented Feb 28, 2022

Codecov Report

Merging #8491 (998f5ac) into main (0f5182e) will not change coverage.
The diff coverage is n/a.

❗ Current head 998f5ac differs from pull request most recent head 2ffd035. Consider uploading reports for the commit 2ffd035 to get more accurate results

@@           Coverage Diff           @@
##             main    #8491   +/-   ##
=======================================
  Coverage   11.17%   11.17%           
=======================================
  Files          18       18           
  Lines         993      993           
=======================================
  Hits          111      111           
  Misses        880      880           
  Partials        2        2           
Flag Coverage Δ
components-gitpod-cli-app 11.17% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more

@roboquat roboquat added size/L and removed size/XXL labels Mar 2, 2022
@roboquat roboquat added size/S and removed size/L labels Mar 2, 2022
@gitpod-io gitpod-io deleted a comment from github-actions bot Mar 2, 2022
@roboquat roboquat added size/M and removed size/S labels Mar 2, 2022
@mrsimonemms mrsimonemms marked this pull request as ready for review March 2, 2022 17:42
@mrsimonemms mrsimonemms requested a review from a team March 2, 2022 17:42
@github-actions github-actions bot added the team: delivery Issue belongs to the self-hosted team label Mar 2, 2022
@mrsimonemms mrsimonemms changed the title WIP: [kots]: use Helm for the Installer job [kots]: use Helm for the Installer job Mar 3, 2022
@corneliusludmann
Copy link
Contributor

Thanks for creating this pull request as a home for the discussion on this topic. 🙏

In general, I like the approach of using helm to deploy and manage the resources so that we don't need to implement stuff where a solution is already available.

I would like to hear the opinions of @csweichel and @geropl about this in particular whether they are seeing downsides of this approach that we would remedy with implementing our own solution.

Open Questions:

  • We want to use this approach for the Replicated installation path only, right? Do we still need the functionally for the Installer anyway (for our internal usage)?
  • If yes, would be the helm approach just a temporary solution until we have it in the Installer? Would there be a migration path?

Link to internal discussion for more context: https://gitpod.slack.com/archives/C01KLC56NP7/p1645713248561109

@mrsimonemms
Copy link
Contributor Author

Personally, I don't see this as something that's necessary to be in the Installer. Perhaps the only exception to that is we may want to put a helm subcommand in the render command to output to a Helm directory structure (to replace this). I'm open-minded on this, but I would probably be in favour of adding this.

Unlike when we abstracted the helm template command as part of the Installer, I don't see that as necessary in this case. This command needed to be used internally to the Installer so was correct to implement it. Adding the helm upgrade adds a lot of complexity to the Installer which gets us to no better a position than adding "download Helm and run helm upgrade" to the docs

I did create #8379 to test out a few things on this subject, which led me to realise that we could just use Helm as a way of deploying the Installer's YAML to a cluster.

@csweichel
Copy link
Contributor

I understand the motivation, but wonder what's the story for the installer outside of replicated? How would we solve this issue for e.g. SaaS oder future BYOC installations? Solving this issue with Helm is one way - but it feels like a step backwards almost.

How would resource deletion/upgrade be implement for a Kubernetes operator?

@mrsimonemms
Copy link
Contributor Author

No idea how we would do it on an operator, but that would certainly require more effort and testing than it would be using Helm. Not saying that's a deal-breaker, but this is a practical way of getting this working in fairly short order.

My assumption with SaaS is that it's currently handled via ArgoCD - we've certainly had no requirement from @gitpod-io/platform to provide a native way of removing now-orphaned resources (that may be also due to the fact that we're not really removing resources).

@csweichel
Copy link
Contributor

csweichel commented Mar 3, 2022

No idea how we would do it on an operator,

We should find out

but that would certainly require more effort and testing than it would be using Helm.

That's not an unreasonable assumption. But it's that: an assumption.

We can use helm for this. But let's use Helm because it's the right thing to do, we're aware of the limits this imposes (e.g. limits to custom resource removal handling), and we're aware of what the alternatives would have been.

My assumption with SaaS is that it's currently handled via ArgoCD - we've certainly had no requirement from https://github.com/orgs/gitpod-io/teams/platform to provide a native way of removing now-orphaned resources (that may be also due to the fact that we're not really removing resources).

@gitpod-io/platform is by far not the only stakeholder in this. It's the product teams who deploy their code. Platform provides services used by some other product teams, but it's the product teams who own the deployment. You need to go and talk to those teams to understand what the requirements are here.

Chances are, the teams will defer to platform. That's a problem we're solving these days. But product teams need to carry their operational responsibility, and having an opinion about the behaviour of the installer is part of that.

@mrsimonemms
Copy link
Contributor Author

We can use helm for this. But let's use Helm because it's the right thing to do, we're aware of the limits this imposes (e.g. limits to custom resource removal handling), and we're aware of what the alternatives would have been.

Since I did #8376, I don't believe we have any custom operators in the Installer anymore. The cert-manager CRDs are installed outside of the Installer and will remain that was as far as I'm concerned.

Fair point on the other team implications. I would recommend that we don't do a stable release to Replicated until we have a way of automatically removing unused resources - I do fear lots of support calls into @gitpod-io/engineering-self-hosted on "I'm not using in-cluster DB but there's a mysql-0 pod" and the associated PVC costs

@csweichel
Copy link
Contributor

I was wondering, doesn't replicated help us OOTB here? I.e. they consume the manifests we produce and apply that against the cluster, no? If so, for replicated this is not relevant - and for the other users of the installer (webapp, IDE, workspace) it would be interesting to know how they handle this problem today.

@mrsimonemms
Copy link
Contributor Author

I was wondering, doesn't replicated help us OOTB here?

Yes and no. It does for the manifests that we create in full, but that's only things like certs, secrets and others bits of config.

For the actual Gitpod application, the gitpod-installer-job.yaml runs a Bash script which creates the config file, edits it and then applies it - Replicated/KOTS doesn't know anything about those resources and so can't help us with removal. The only way we'd be able to achieve that would be to completely abandon the Installer and to write all the logic in KOTS (which we don't want to do)

@csweichel
Copy link
Contributor

Replicated/KOTS doesn't know anything about those resources and so can't help us with removal

That's disappointing! It was my understanding hitherto that we render the Kubernetes manifests, and KOTS applies them. What's the point of Replicated if we end up having to implement the hard parts?

It looks like in KOTS we are already using the Helm hack. Why not keep this contained to KOTS and think of a proper solution for ourselves, i.e. the other consumer of the installer? Sounds to me like an RFC which discusses those ideas would be in order.

@mrsimonemms
Copy link
Contributor Author

It is disappointing, but it's also a consequence of us having the Installer to generate the manifests. If you recall the initial meetings with Replicated, their initial thought was us generating the manifests from the Installer and then using KOTS to decide which is implemented - we (rightly) decided against that as something that would be unmaintainable.

The link I sent in my previous comment was from this PR - what's currently in main is kubectl apply -f -.

The proposed change is for installations via Replicated only. If we were to incorporate this into the Installer, that would be down the line (and would certainly require an RFC)

@mrsimonemms
Copy link
Contributor Author

@gitpod-io/engineering-self-hosted are we in a position to get this merged?

@mrsimonemms
Copy link
Contributor Author

mrsimonemms commented Mar 9, 2022

/werft run no-preview

👍 started the job as gitpod-build-sje-kots-helm.6

Pothulapati
Pothulapati previously approved these changes Mar 11, 2022
Copy link
Contributor

@Pothulapati Pothulapati left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like other people mentioned here, I'm not a big fan of using Helm internally again but not cleaning up unnecessary resources when users update config feels like a much bigger issue, and we should definitely fix that.

For now, This feels like a good fix, as users never need to interact with any of the Helm stuff here (or even notice this), and it seems to work as expected.

/hold for @corneliusludmann to approve this before we merge.

@geropl
Copy link
Member

geropl commented Mar 11, 2022

Writing here because @corneliusludmann pinged initially, and @mrsimonemms and I had Slack disussions about this:

  • it feels like a step backwards to me to use helm for applying the changes
  • but I see the benefit of it being easy to use right now
  • I (and we as WebApp) are also looking for ways to apply/rollout our changes to app clusters (and having SH in mind): we should definitely start tasks about this and share ideas
  • the only (minor?) concern I have is about migration: Given we go with helm now, and implement a different solution for all deployment modes later we need to take care about migrating from helm to "new method" in the replicated installer

Simon Emms added 2 commits March 14, 2022 15:04
This allows for automatic deletion of resources that are no longer
used
@mrsimonemms
Copy link
Contributor Author

mrsimonemms commented Mar 14, 2022

/werft run no-preview

👍 started the job as gitpod-build-sje-kots-helm.8

@corneliusludmann
Copy link
Contributor

Thanks all for the input.

My summary of the discussion:

  • It feels quite odd to use Helm for installing the Gitpod resources since we originally created the Installer to get rid of Helm. However, we dropped Helm due to the difficulties of maintaining the Helm templates with the more and more messy values.yaml file. Now, we just want to use Helm for managing the resources and let the Installer do the templating job.
  • There are no big concerns besides that it feels odd and the migration path when we have another solution in the future. The latter can be handled in the Replicated installer job, though.
  • We have a general lack of information around alternatives, requirements across teams, vision for a general solution. We will start a process that incorporates all teams on this topic next quarter. However, that needs some time and we have the need for a solution for the Replicated installer now. That shouldn't hold us back to use this approach for the Replicated installer, especially when we have a path to migrate this in the future to another approach.

It would be very interesting if Helm would handle issues like these as well since this would be much harder to solve on ower own than just removing unnecessary resources. Could you check this, @mrsimonemms?

@mrsimonemms
Copy link
Contributor Author

@corneliusludmann upsettingly, Helm does NOT solve the label issue you raised and for the same reasons - the field is immutable.

Personally, I don't think that makes much difference as to this PR as it's a change that's happened elsewhere that's broken the update from February to March releases. It would have been nice had Helm solved this, but it doesn't.

What it does mean is that we must all be careful when introducing potentially breaking changes against long-running Gitpod clusters. In SaaS, we tend to use a blue/green deployment pattern but many self-hosted users will want to simply upgrade their existing clusters.

@roboquat roboquat merged commit 86b5d8a into main Mar 15, 2022
@roboquat roboquat deleted the sje/kots-helm branch March 15, 2022 16:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-note size/M team: delivery Issue belongs to the self-hosted team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants