Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[kube-prometheus-stack] Use kubectl replace to upgrade prometheuses CRD #1510

Conversation

sathieu
Copy link
Contributor

@sathieu sathieu commented Nov 17, 2021

What this PR does / why we need it:

The prometheuses CRD is too long for kubectl apply since prometheus-operator v0.52.0 (#1485).

This PR fixes this in the upgrade notes and when using ArgoCD.

Which issue this PR fixes

Special notes for your reviewer:

Checklist

  • DCO signed
  • Chart Version bumped
  • Title of the PR starts with chart name (e.g. [prometheus-couchdb-exporter])

mosheavni
mosheavni previously approved these changes Nov 18, 2021
@@ -5,6 +5,7 @@ apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
annotations:
argocd.argoproj.io/sync-options: Replace=true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i dont think we should add deployment specific annotations to the crd.
imho the change in the docs for manual upgrade should be enough.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@monotek The problem here, is that CRDs are not templated. There is no way to add annotations except hacky workarounds.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sathieu Great idea to add this directly to the CRD; it solves the problem for ArgoCD users and the annotation will simply be ignored for everyone else. The "manual upgrade" isn't really a viable option when using ArgoCD; every sync attempt will fail because the CRD is too large. Without this change, the only way for an ArgoCD user to fix this is to replace everything on each sync operation (risky) or to extract the CRD from the chart so it can be updated locally.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As helm is not supposed to update crds anyway, most people add and manage these crds "by hand" or via some other automated step, outside of helm..
Therefore the crd does not need such annotations.
In the long term this would make the CRD also kind of messy, if everybody starts to add things for his special deployment method, which in the end are not realy needed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guys, we have to pay attention to the way the CRDs are updated in this chart. It's not thru manual changes like the above but thru an automated process that pulls them from the Prometheus Operator repo. So this change here is even pointless because it will get overwritten upon the next major update of the CRDs.

Copy link
Member

@Xtigyro Xtigyro Dec 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@steinarox Ah, apologies - didn't realize we were updating the CRDs after every pull from the Prometheus Operator repo thru the script. You are right.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And one of the reasons I didn't realize that was because that script was not meant to modify the CRDs.

Copy link
Member

@Xtigyro Xtigyro Dec 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the other maintainers might have another take on the matter but I'm with @monotek on this one. The chart is not meant to create new opinionated versions of the used external code.

Having said that - @sathieu how about adding this functionality in a completely optional way. That is to say - adding another section in the README.md about tools like ArgoCD, Flux, etc. - and state what particular changes are needed to make the CRDs fully compatible with those third-party tools that might operate in a non-standard way. Maybe even putting another small script that does the changes upon manual execution?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Xtigyro There is no easy way to workaround this on the ArgoCD side.

Possible correct solutions:

Heavy workaround:

I won't work more on this as I have a workaround (thanks to Ansible, see https://gitlab.com/kubitus-project/kubitus-installer/-/issues/70). But this workaround is clearly not easily actionable to everyone.

Copy link

@brsolomon-deloitte brsolomon-deloitte Dec 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As helm is not supposed to update crds anyway, most people add and manage these crds "by hand" or via some other automated step, outside of helm..

What examples exist that this is how "most people" managed CRDs? The kube-prometheus-stack Helm chart includes the CRDs and applying/creating CRDs through Helm/ArgoCD is common practice and works.

Great idea to add this directly to the CRD; it solves the problem for ArgoCD users and the annotation will simply be ignored for everyone else.

Agreed; this is a simple and harmless 1-line annotation that solves what is otherwise a huge pain for administrators, and does not affect those not using ArgoCD. Is it really worth it to take a philosophical stance against "add[ing] deployment specific annotations to the crd"?

how about adding this functionality in a completely optional way

It's already been pointed out in multiple threads why hackishly going in and editing or sed'ing a CRD YAML is inarguably suboptimal to the simple change that is being proposed here.

@@ -5,6 +5,7 @@ apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
annotations:
argocd.argoproj.io/sync-options: Replace=true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As helm is not supposed to update crds anyway, most people add and manage these crds "by hand" or via some other automated step, outside of helm..
Therefore the crd does not need such annotations.
In the long term this would make the CRD also kind of messy, if everybody starts to add things for his special deployment method, which in the end are not realy needed.

@sathieu
Copy link
Contributor Author

sathieu commented Nov 23, 2021

@monotek While I understand the general rule to avoid using tool-specific annotations, the current situation is broken. Expanding the topic, the crds upgrade handling is currently a manual step, this could be improved.

A few charts handling this correctly:

Some other charts use a subchart (like here) but requires extra manual steps.

If we want to implement a solution like gatekeeper and velero, this PR is still needed as crds don't allow templating.

If we want to implement the subchart solution, we'll have problem as kube-prometheus-stack uses the CRDs it creates (and we'll also have the upgrade problems).

@itz-Jana
Copy link

I also just hit this issue and support this change.
I think it is important to have a solution for the problem now even if it is not perfect, as this gives more time to think about and implement a better solution.

@andrewgkew
Copy link
Contributor

Would be good to separate the changes between the README change and then new argoCD addition so that the README can go up and users be aware of the new larger CRD from prometheus operator.

Is that team aware of the larger CRD and have plans to shrink it?

@monotek
Copy link
Member

monotek commented Nov 27, 2021

I guess one of the chart maintainers should decide how to proceed.

@sathieu
Copy link
Contributor Author

sathieu commented Dec 1, 2021

@monotek What can we do to move this PR forward?

It has fixes for upgrade with kubectl, and install/upgrade with ArgoCD.

Other methods will need other fixes (using kustomize, like cloudnativedaysjp/dreamkast-infra#1262, or anything else), but at least this PR improves things.

While I understand your point about limiting tool-specific annotations (and this should be done when possible), I think that perfect is the enemy of good.

This PR is currently a blocker for us (see here and here)

@sathieu sathieu requested a review from monotek December 1, 2021 08:58
@monotek
Copy link
Member

monotek commented Dec 2, 2021

@monotek What can we do to move this PR forward?

As i will not approve it you have to wait for one of the chart maintainers to decide.

@sathieu sathieu force-pushed the argocd_crd_metadata_too_long branch from 9019841 to d5be1b7 Compare December 3, 2021 16:55
@sathieu
Copy link
Contributor Author

sathieu commented Dec 3, 2021

@bismarck, @gianrubio, @gkarthiks, @scottrigby, @vsliouniaev, @Xtigyro Please review 🙏. This is a blocker for us currently...

@mrueg
Copy link
Member

mrueg commented Dec 3, 2021

Probably this change should be requested upstream in prometheus-operator. In my opinion this chart should not alter CRDs when importing them, as it is potentially error prone and the chance that someone checks imported CRDs for changes is almost zero.

@sathieu
Copy link
Contributor Author

sathieu commented Dec 7, 2021

@mrueg Upstream use kubebuilder, and I don't see any way to add annotations with it.

@sathieu
Copy link
Contributor Author

sathieu commented Dec 7, 2021

I've reported this upstream prometheus-operator/prometheus-operator#4439.

@mrueg What about using a kustomize patch instead of sed? This is the method used in crossplane (crossplane/crossplane#1020).

@monotek
Copy link
Member

monotek commented Dec 7, 2021

by the way... server side apply should work too without error:

kubectl apply --server-side --force-conflicts -f https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/v0.52.0/example/prometheus-operator-crd/monitoring.coreos.com_alertmanagerconfigs.yaml
kubectl apply --server-side --force-conflicts -f https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/v0.52.0/example/prometheus-operator-crd/monitoring.coreos.com_alertmanagers.yaml
kubectl apply --server-side --force-conflicts -f https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/v0.52.0/example/prometheus-operator-crd/monitoring.coreos.com_podmonitors.yaml
kubectl apply --server-side --force-conflicts -f https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/v0.52.0/example/prometheus-operator-crd/monitoring.coreos.com_probes.yaml
kubectl apply --server-side --force-conflicts -f https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/v0.52.0/example/prometheus-operator-crd/monitoring.coreos.com_prometheuses.yaml
kubectl apply --server-side --force-conflicts -f https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/v0.52.0/example/prometheus-operator-crd/monitoring.coreos.com_prometheusrules.yaml
kubectl apply --server-side --force-conflicts -f https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/v0.52.0/example/prometheus-operator-crd/monitoring.coreos.com_servicemonitors.yaml
kubectl apply --server-side --force-conflicts -f https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/v0.52.0/example/prometheus-operator-crd/monitoring.coreos.com_thanosrulers.yaml

@sathieu
Copy link
Contributor Author

sathieu commented Dec 7, 2021

@monotek My usecase is within ArgoCD (where server-side apply is not implemented argoproj/argo-cd#2267). The only solutions I see is shrinking the CRD, or adding the annotation argocd.argoproj.io/sync-options: Replace=true.

@pblgomez
Copy link

too messy, I even have argocd managed by argocd so modifying the cm without values in helm are not really an option.

@irizzant
Copy link
Contributor

irizzant commented Dec 23, 2021

It's not messy at all, and the additional plugin gives you flexibility that can be used in a number of different scenarios.
We have ArgoCD managed by itself and if you installed it with Helm it's just a metter of adding the configManagementPlugins here and create 2 files (kustomization and Prometheus CRD patch).

@altitudems
Copy link

I think @irizzant 's solution is actually great. Appreciate you taking the time to share. Working great here.

@patrickjahns
Copy link

One thing in regards to @irizzant solution - be careful with helm-charts that test for k8s versions.

See the remarks in the ArgoCD Discussion on using customize with helm together

A suggestion from this comment would be, to improve it as following:

apiVersion: v1
kind: ConfigMap
metadata:
  name: argocd-cm
data:
  configManagementPlugins: |
    - name: kustomized-remote-helm
      init:
        command: ["/bin/sh", "-c"]
        args: ["helm repo add $HELM_REPO_URL && helm repo update"]
      generate:
        command: [sh, -c]
        args: ["helm template $OPTIONS --kube-version $KUBE_VERSION --api-versions $KUBE_API_VERSIONS --include-crds > all.yaml && kustomize build"]

@irizzant
Copy link
Contributor

irizzant commented Jan 4, 2022

The above comment is a good improvement, though ArgoCD does not signal any out-of-sync resource in kube-prometheus-stack chart after applying it meaning that this chart does not use customized output based on k8s version.

@patrickjahns
Copy link

The above comment is a good improvement, though ArgoCD does not signal any out-of-sync resource in kube-prometheus-stack chart after applying it meaning that this chart does not use customized output based on k8s version.

Please be aware, that especially for ingress and pdb this is the case for this chart. See:

Helpers:
https://github.com/prometheus-community/helm-charts/blob/main/charts/kube-prometheus-stack/templates/_helpers.tpl#L132-L166

Ingress:
https://github.com/prometheus-community/helm-charts/blob/main/charts/kube-prometheus-stack/templates/prometheus/ingress.yaml#L7-L8

PDB:
https://github.com/prometheus-community/helm-charts/blob/main/charts/kube-prometheus-stack/templates/prometheus/podDisruptionBudget.yaml#L2

So in your configuration and your use case the k8s version does not make a difference for the final rendered chart, for other users it might very well

@irizzant
Copy link
Contributor

irizzant commented Jan 4, 2022

Ah I see, I indeed don't use Ingress and PodDisruptionBudget in my case so that makes sense

@FrediWeber
Copy link
Contributor

@irizzant Please note, that even if this solution works for you, Helm discourages the use of helm template for generating resources that are to be deployed directly.
helm/helm#3553 (comment)
helm/helm#3553 (comment)

This issue is a huge blocker for us too and we would be glad, if we could add this annotation as a workaround until a more permanent solution is found upstream (e.g. shrinking the CRD).

@sathieu
Copy link
Contributor Author

sathieu commented Jan 4, 2022

@FrediWeber ArgoCD uses helm template internally, so the same limitations apply.

@irizzant
Copy link
Contributor

irizzant commented Jan 4, 2022

@FrediWeber if you refer to the linked discussion you'll see that using helm template to render the manifests is a solution that works not only for me.
Kustomize (and the aforementioned ArgoCD) for example does exactly that when you enable Helm support to it using kustomize.buildOptions: --enable-helm.

Also I don't see where they're discouraging this, since the comment you linked ends up with:

But we recognize that many users DO take the output of helm template to integrate parts of Helm into their projects in new and exciting ways. And we're happy to encourage that to improve upon what's been built by others in the community.

helm template is a suppported and perfectly legal Helm command to render the manifests which of course doesn't expect a cluster running so it doesn't have the possibility to check what Kubernetes API are supported, and that's why --kube-version and --api-versions can be added as parameters.

@FrediWeber
Copy link
Contributor

FrediWeber commented Jan 4, 2022

Yes, you sure can use it but it is not expected to work exactly as expected by chart developers. Not many projects test their charts against helm template (e.g. add all namespace fields in all templates).
But if ArgoCD uses helm template anyway, why not use the - in that case perfectly valid solution - from @irizzant and close this Pull Request? Maybe we could add something about it in some docs?

Update:
After reading prometheus-operator/prometheus-operator#4439 I think the best solution would still be to add this annotation in the CRD temporarily. After the CRD got shrunk in upstream we can still remove it.
Yes, strictly speaking it is a Kubernetes or ArgoCD (not yet implementing server side apply functionality) problem but we can save so much time and hassle for so many people by just adding this annotation.
So what are the potential downsides of adding it? If you can't find any - we should just do it.

@monotek
Copy link
Member

monotek commented Jan 4, 2022

There can't be a shrink of the CRD.
If you shrink it, you'll likely will remove stuff which is needed.
Even if you only remove some comments now, at some point, with some new version implementing new features, it will likely be to big again.

Please talk to ArgoCD developers to implement server side apply.
Thats the way to go.
Not to adjust the chart for some particular deployment tool.

I still vote against changing the CRD in the chart.
No matter if permanently or temporarly.

@irizzant
Copy link
Contributor

irizzant commented Jan 4, 2022

@FrediWeber
Helm 2 use to have an in-cluster agent to work with charts, but Helm 3 competely removed this component.
Now Helm first use the templates and values files and then translate them into manifests (and this step is the same as helm template except for the already mentioned in-cluster checks).
Finally it deploys the rendered manifests using Kubernetes API.

Consequently yes, there are differences between manifests rendered with helm template but if you add the already mentioned parameters to helm template they end up to be the same. There is no need to test specifically for this.

Adding ArgoCD annotations to the upstream CRD is innocuous as a change and I agree on this, but my suggested approach (with this update ) is still a viable way to work with kube-prometheus-stack chart whether they change the upstream CRD or not.

My personal vote is against changing the CRD in the chart adding an extraneous annotation though, but this is not up to me

Copy link
Member

@monotek monotek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove update of python code and use kubectl apply --server-side -f ... instead of kubectl replace as we're already using it this way here now: https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack#from-24x-to-25x

@FrediWeber
Copy link
Contributor

FrediWeber commented Jan 4, 2022

@monotek I understand that you're hesitant to implement this workaround because ArgoCD is "just one solution" amongst many and there is the chance, that it will not be removed again. On the other side, there is an imminent problem of many ArgoCD users with this chart. Of course, this needs to be addressed in ArgoCD (e.g. by adding support for server side apply) but what is the downside of temporarily adding the annotation? Even if the CRDs doesn't get shrunk in the upstream chart it will take some time to implement server side apply in ArgoCD. This would be a viable workaround - and I say it again - that would spare many users some time without any downsides.

@irizzant Thank you very much for pointing this out. Actually I didn't know that and your solutions seems absolutely fine and we'll also using it if there is no other workaround. I just think that we could implement this workaround and thereby remove the necessity to modify every ArgoCD instance managing this chart.

@FrediWeber
Copy link
Contributor

@monotek Would it be okay for you, if I write something about this in the readme, in the "breaking changes" section you linked? I would include the solution of @irizzant. Strictly speaking it is not a problem of this project but it could help some people.

@jplanza-gr
Copy link

I've been following along with the discussion here; I was hoping we'd get the ArgoCD annotation added but I can understand why the maintainers don't want to include something specific to a particular deployment tool. I'd like to offer a compromise; would you folks consider providing a flag in the chart that makes CRD installation optional?

The cert-manager chart uses this to great effect: https://cert-manager.io/docs/installation/helm/#3-install-customresourcedefinitions

The problem they're working against is that if you remove and replace a chart with CRDs, all objects using the new resources are deleted... this is mighty inconvenient if those resources are certificates that you have to renew.

If ArgoCD users could suppress the installation of the CRDs, it's pretty easy to install them independently and make any necessary annotations or changes. This allows kube-prometheus-stack to stay "vendor agnostic" and still provides a straightforward workaround for those of us who need it.

@monotek
Copy link
Member

monotek commented Jan 6, 2022

Helm install already supports "--skip-crds" as argument out if the box. So no need to add it to the chart.

@Cheshirez
Copy link
Contributor

Workaround with --skip-crds would work, as it's supporting by helm template as well.
Reg. ArgoCD - there's PR to add this ability there https://github.com/argoproj/argo-cd/pull/8012/files

@irizzant
Copy link
Contributor

irizzant commented Jan 7, 2022

As already reported, here the suggested way is using --server-side kubectl parameter.

The way I see it another possibility could be adding a kubernetes Job to the chart to upgrade CRDs which would execute the kubectl apply --server-side commands, similar to what is done here

@jplanza-gr
Copy link

@monotek @irizzant Have either of you used ArgoCD before? When using a declarative system, we don't have direct access to "helm install" or "kubectl"... the entire point is to manage your helm charts with version control and to not run CLI commands directly. The other suggestions in the thread (kustomize, etc) require a bunch of additional complexity that will make this installation harder to maintain.

Again, I respect that you don't want to make a vendor-specific change; that completely makes sense. But providing the ability to suppress CRD installation seems like it would be trivial, vendor neutral, and help out many of the folks here who want to use Prometheus. Instead, it seems like the approach here is to hold fast and wait for ArgoCD to innovate around you; is that really in the best interests of the community?


Helm install already supports "--skip-crds" as argument out if the box. So no need to add it to the chart.


As already reported, here the suggested way is using --server-side kubectl parameter.

The way I see it another possibility could be adding a kubernetes Job to the chart to upgrade CRDs which would execute the kubectl apply --server-side commands, similar to what is done here

@irizzant
Copy link
Contributor

irizzant commented Jan 7, 2022

@monotek @irizzant Have either of you used ArgoCD before?

@jplanza-gr No, never! Indeed this comment was written by having absolutely no knowledge of ArgoCD.

When using a declarative system, we don't have direct access to "helm install" or "kubectl"... the entire point is to manage your helm charts with version control and to not run CLI commands directly

Since here we are ArgoCD noobs, if a Job is added to this chart as I suggested here, would you mind explaining what the required changes would be for your Application definition in ArgoCD? Or for your ArgoCD setup more in general?

@jplanza-gr
Copy link

Sure, no problem. Unless there are future changes to ArgoCD, here's what I'd do today.

  1. Disable deployment of the CRDs by the main chart.
  2. Create a new ArgoCD "Application" to manage just the CRDS; with the following sync options:
    -- ApplyOutOfSyncOnly=true
    -- Replace=true

When the chart deploys the CRDs, the main problem an ArgoCD user faces is that the scope of these sync options is applied to everything in the chart. This means that we run the risk of having resources completely replaced rather than updated. Off the top of my head, the most important thing we could lose is any persistent caching of metrics; a sync would run and those resources (including PVs) could be replaced rather than updated in-place.

If we can disable CRD installation from the main chart, it's easy to download the CRDs, store them elsewhere, and manage them with the right options. CRDs don't change very often, so we can get the desired behavior for both the CRD and chart resources. As I mentioned above, the "cert-manager" chart uses this to explicitly separate the CRD from the chart; it stores CA-validated certificates in custom resources, so you'd run the risk of destroying all your certificates during any operation that removed the CRDs.

I think the "kustomize" workarounds are viable if you're already using it, but anyone on vanilla Helm 3 charts would need to introduce that just for this one application. Quite frankly, every org has practitioners with varying levels of experience and we'd rather not introduce complexity that will trip up newer folks.

Back to my earlier suggestion, a toggle to deploy the CRDs with a default of true would be perfect here; your typical users won't notice any change, but those of us who have to workaround the issue can do so. Thanks for hearing me out on this.

@sathieu sathieu force-pushed the argocd_crd_metadata_too_long branch from d5be1b7 to 923941d Compare January 7, 2022 16:38
@prometheus-community prometheus-community locked as off-topic and limited conversation to collaborators Jan 7, 2022
@monotek
Copy link
Member

monotek commented Jan 7, 2022

@monotek @irizzant Have either of you used ArgoCD before? When using a declarative system, we don't have direct access to "helm install" or "kubectl"..

No. And reading this issue there is no reason i would.

The point here is, that you use tooling, which does not implement all options of Helm and Kubernetes completly, but still trying to use Helm charts.

Maybe you want to try out FluxCD, which supports --skip-crds and also server-side apply.

As we will not merge this, there a workarounds desribed and to reduce noise from now on i will close this issue and lock the conversation. If you want to discuss the issue further, i suggest using the ArgoCD issue tracker.

@monotek monotek closed this Jan 7, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet