-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Helm Operator fails when two different types of Custom Resources have same name #3357
Comments
Hi @pre, Tks for raise that I will be looking on it and I will keep you updated. |
@pre Yeah this is a known issue caused by a limitation of Helm. The Helm operator uses Helm's install/upgrade/uninstall code to manage the releases. As a result, the Helm operator uses Helm's release secrets, and therefore the same limitation that applies to Helm releases (no two releases can share the same name in the same namespace) applies to CRs managed by the Helm operator.
Helm controls the release secret name, which is based on the release name. And the release name is taken directly from the CR name. We used to include an encoded version of the CR's UID in the release name, but that caused other issues. We may be able to do this logic (assuming an existing release):
An even better option would be to run a ValidatingAdmissionWebhook that would handle CR create events and check to make sure a release does not already exist with the same name as the CR. This would prevent the CR with the duplicate name from even being created in the first place. The downside there, is that it makes the operator a bit harder to deploy. |
Issues go stale after 90d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle stale |
For a wonderful MVP for better user experience, a helpful error message would be great. The current error is very much confusing - if you don't remember that this issue exists, you may end up spending quite an amount of time figuring out what's going on. /remove-lifecycle stale |
The {uid} suffix fixes this problem #1818. To make matters worse, you can't add openapi patterns to enforce a naming scheme on the CR. This leaves us with having to write an admission webhook to enforce a name |
Issues go stale after 90d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle stale |
Stale issues rot after 30d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle rotten |
Rotten issues close after 30d of inactivity. Reopen the issue by commenting /close |
@openshift-bot: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Bug Report
Helm operator writes Helm specific metadata of the deployed Custom Resource (instance of a Chart) in a Kubernetes Secret.
The name of the Secret does not contain the name of the Custom Resource Definition but it contains the name of the Custom Resource. As a result, one instance of CRD_A and one instance of CRD_B will corrupt each other's K8s Secret if CR_A and CR_B both have the same name.
For example:
control
is deployed, its Helm Operator will create a Kubernetes Secret namedsh.helm.release.v1.control.v1
control
is deployed, its Helm Operator will create a Kubernetes Secret using the same namesh.helm.release.v1.control.v1
This secret contains Helm's own internal metadata. Since the metadata in
sh.helm.release.v1.control.v1
was aboutLolcat
, the change set forDoge
applied later will cause Helm Operator to fail.As a corollary, the name of any Custom Resource backed by a Helm Opeator must be unique in a given namespace.
While it is possible to have two independent and de-coupled Custom Resource Definitions (Lolcat and Doge), any instance of a Custom Resource backed by the Helm Operator must have a unique name in that namespace. This is surprising and not obvious to debug when you first see the error message for the first time.
Even if this could not be fixed due to Helm internals, it'd save a significant amount of cumulative debugging time if the Helm Operator would give a sensible error message. The current error message is about a Helm metadata about the wrong Custom Resource instance.
In the snipped below an instance of
Doge
namedcontrol
fails, because an instance ofLolcat
namedcontrol
has been deployed earlier. As you can see, the name of the secret issh.helm.release.v1.control.v6
since theLolcat
was already inv5
. However, it should have beenv1
since this was the first deployment ofDoge
.After this first message, the later error messages are about incorrectly trying to adopt existing resources (of the other CRD).
What happens is that
Doge
has now corrupted the secret which was aboutLolcat
. As a result, the Helm Operator ofLolcat
now has wrong metadata and starts failing with the error below. The problem is thatmeta.helm.sh/release-name
is now aboutDoge
even though this wasLolcat
's Helm metadata.Environment
operator-sdk version: v0.18.2
Kubernetes cluster kind: Minikube
Are you writing your operator in ansible, helm, or go?
Helm
Possible Solution
Maybefix: Include the name of the Custom Resource Definition in the Helm Secret's name?
Remedy: Provide a sensible error message.
The text was updated successfully, but these errors were encountered: