-
Notifications
You must be signed in to change notification settings - Fork 326
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Set owner reference to secrets created by webhook cert manager #530
Conversation
@@ -59,6 +62,10 @@ func (c *Command) init() { | |||
c.flagSet = flag.NewFlagSet("", flag.ContinueOnError) | |||
c.flagSet.StringVar(&c.flagConfigFile, "config-file", "", | |||
"Path to a config file to read webhook configs from. This file must be in JSON format.") | |||
c.flagSet.StringVar(&c.flagDeploymentName, "deployment-name", "", | |||
"Name of deployment that is the owner of the secret") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not clear what "the secret" is. I think we can just say name of the deployment this pod is running in. What we do with that is an impl detail
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -251,6 +281,14 @@ func (c *Command) reconcileCertificates(ctx context.Context, clientset kubernete | |||
|
|||
certSecret.Data[corev1.TLSCertKey] = bundle.Cert | |||
certSecret.Data[corev1.TLSPrivateKeyKey] = bundle.Key | |||
certSecret.OwnerReferences = []metav1.OwnerReference{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a comment that this is here to update existing secrets that were created before we added ownerReference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
56e6430
to
1161412
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Did you find out what happens if the cert-manager deployment is deleted?
oh oh I did!! i forgot to put it in here. you were right. the pods just keep them in memory and continue to work. it only becomes problematic if the cert expires or the pod has to restart. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧀
When the certificate secret is created or updated, set an OwnerReference on the secret as the webhook-cert-manager deployment. This ensures that deletion of the deployment will also delete the secrets. This addresses the race condition bug that we sometimes see when re-installing consul on a cluster that had a consul deleted from it. This was because the helm delete would not delete the existing secrets with certificates. When the controller would get created with a new installation, it would mount the existing secret (which was stale) and the secret on disk would get rotated before the cert watcher started which would lead to the controller using certificates signed by a CA different from the CA bundle on the MWC which would lead to x509 errors. This change would ensure the secrets get deleted every single time and hence, a new secret would always get created during a helm install. This also ensure an existing secret, when updated is updated with the owner ref ensuring helm upgrades or installs to a cluster with an existing secret give people the desired behavior as well.
1161412
to
7c57bdb
Compare
Changes proposed in this PR:
When the certificate secret is created or updated, set an OwnerReference on the secret as the
webhook-cert-manager
deployment. This ensures that deletion of the deployment will also delete the secrets. This addresses the race condition bug that we sometimes see when re-installing consul on a cluster that had a consul deleted from it. This was because the helm delete would not delete the existing secrets with certificates. When the controller would get created with a new installation, it would mount the existing secret (which was stale) and the secret on disk would get rotated before the cert watcher started which would lead to the controller using certificates signed by a CA different from the CA bundle on the MWC which would lead to x509 errors.This change would ensure the secrets get deleted every single time and hence, a new secret would always get created during a helm install. This also ensure an existing secret, when updated is updated with the owner ref ensuring helm upgrades or installs to a cluster with an existing secret give people the desired behavior as well.
How I've tested this PR: hashicorp/consul-helm#987
How I expect reviewers to test this PR: Code review
Checklist: