-
Notifications
You must be signed in to change notification settings - Fork 545
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug 1769030: Replacing operator creates duplicate secrets #1123
Bug 1769030: Replacing operator creates duplicate secrets #1123
Conversation
e2e test inprog |
decb744
to
4979bee
Compare
cadbe2d
to
c43b2ee
Compare
sa.SetNamespace(namespace) | ||
if _, updateErr := o.kubeClient.UpdateServiceAccount(sa); updateErr != nil { | ||
err = errorwrap.Wrapf(updateErr, "error updating service account: %s", sa.GetName()) | ||
return | ||
} | ||
|
||
for _, secret := range preSa.Secrets { | ||
foregroundDelete := metav1.DeletePropagationForeground |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think doing a foreground type of deletion is necessary if there's nothing left referencing these secrets as dependencies, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. This is unnecessary but I thought it would not hurt. Do you think we should just leave it blank then? Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe you should have nothing passed into the DeleteOptions below.
@@ -116,12 +116,35 @@ func (o *StepEnsurer) EnsureServiceAccount(namespace string, sa *corev1.ServiceA | |||
return | |||
} | |||
|
|||
// Before UpdateServiceAccount and dereference secrets of the older SA, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't the original bug report have this issue with the same serviceaccount name?
From line 114, this section should only be hit if the serviceaccount doesn't exist at all, so I'm not convinced this will fix the bug.
I think the issue may be that Create
on ServiceAccount
always creates a secret even if the serviceaccount exists? Maybe we should start with a Get
before the initial Create
?
Do you agree, or am I missing how this fixes the issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might have missed !k8serrors.IsAlreadyExists
on line 114 which reports any SA creation problem other than SA exists. Therefore, the only scenario where my addition hits is if the SA does exists.
The logic of the original code got me at first as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is what I get for reviewing too quickly 😄 Ignore my previous comment.
However - I really don't think we should be in the business of deleting the serviceaccount secrets.
I did some testing, and the issue is that when we Update
, our SA definition doesn't have the secrets set, which removes the configured secret and causes the SA controller to generate a new one:
$ k -n default create serviceaccount test
serviceaccount/test created
$ k -n default get secrets
NAME TYPE DATA AGE
default-token-67bw8 kubernetes.io/service-account-token 3 3m23s
test-token-9lx4p kubernetes.io/service-account-token 3 3s
# an update that doesn't cause the bug
$ k -n default label serviceaccounts test with=test
serviceaccount/test labeled
# no duplicate secrets
$ k -n default get secrets
NAME TYPE DATA AGE
default-token-67bw8 kubernetes.io/service-account-token 3 6m40s
test-token-9lx4p kubernetes.io/service-account-token 3 3m20s
# an update that causes the bug - remove the secrets block
$ k -n default edit serviceaccounts test
serviceaccount/test edited
# now there are duplicates
$ k -n default get secrets
NAME TYPE DATA AGE
default-token-67bw8 kubernetes.io/service-account-token 3 8m3s
test-token-9lx4p kubernetes.io/service-account-token 3 4m43s
test-token-xwmpg kubernetes.io/service-account-token 3 4s
The solution here shouldn't be to delete the extra secrets, it should be to not create the extra secrets in the first place.
A quick option for now would be to:
- Get the current ServiceAccount (if it exists)
- Add the current secret list to the object from the step
- Then call update with that object - it will have the secret list so a new one won't be generated
But it would be even better to use DeepDerivative
or server-side apply.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was my other thought on the fix too. I eventually chose this to make sure we always have the right token in the secrets. I worry that secrets may change in the install plan rendering old secrets to be invalid. However I might be worrying for nothing. I’ll make a commit to use that approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ecordell I liked the server-side approach but it being in beta gives me goosebumps. I am not entirely sure about your suggestion about DeepDerivative
. If you are talking about https://github.com/kubernetes/kubernetes/blob/6be19d85f206d64044ae940540a437c65675a526/third_party/forked/golang/reflect/deep_equal.go#L378 then I do not see how we can use it. If you are talking about the concept. Not every field can be copied (ie: timestamp, etc). If we copy every unset field in the new SA from the existing SA, we need to know exactly what is not copiable. I do not think my understanding of your DeepDerivative
is correct. Therefore, please let me know what you are explicitly suggesting. Thank you!
/hold |
/bugzilla refresh |
@Bowenislandsong: No Bugzilla bug is referenced in the title of this pull request. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@Bowenislandsong: This pull request references Bugzilla bug 1769030, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
98082e7
to
2fc7268
Compare
/hold cancel |
@Bowenislandsong you may want to use deepderivative to compare the objects. |
/bugzilla refresh |
@ecordell: This pull request references Bugzilla bug 1769030, which is valid. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/lgtm |
/lgtm |
…or the operator's ServiceAccount Cause: OLM catalog ensurer EnsureServiceAccount makes sure the service account is updated when a new version of an operator is present. This happens during ExecutePlan applying InstallPlan to a namespace. If it is an update, fields of service account are updated but the references to older secrets are dropped. Consequence: This process of dereferencing secret fails to clean up the older secrets and result in the secrets pilling up as the operator upgrades. Eventually, there will be too many old secrets laying around and only getting cleaned up when the operator is uninstalled. Fix: We carry over older secrets through updating the service account. We also compare the update using DeepDerivative to see if the update changes any existing fields. If not, we skip the update API call since it will not change anything. Result: Older secretes are again referred in the updated SA and no new secrets are created.
f3d4781
to
22c130d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: awgreene, Bowenislandsong, ecordell, gallettilance The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/test e2e-gcp-upgrade |
@Bowenislandsong: All pull requests linked via external trackers have merged. Bugzilla bug 1769030 has been moved to the MODIFIED state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@Bowenislandsong: new pull request created: #1159 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@Bowenislandsong: new pull request created: #1160 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@Bowenislandsong: cannot checkout release-4.2 cancel: error checking out release-4.2 cancel: exit status 1. output: error: pathspec 'release-4.2 cancel' did not match any file(s) known to git. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@Bowenislandsong: #1123 failed to apply on top of branch "release-4.1":
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Cause:
OLM catalog ensurer EnsureServiceAccount makes sure the service account
is updated when a new version of an operator is present. This
happens during ExecutePlan applying InstallPlan to a namespace.
If it is an update, fields of service account are updated but the
references to older secrets are dropped.
Consequence:
This process of dereferencing secret fails to clean up the older
secrets and result in the secrets pilling up as the operator upgrades.
Eventually, there will be too many old secrets laying around and only
getting cleaned up when the operator is uninstalled.
Fix:
We carry over older secrets through updating the service account.
Result:
Older secretes are again referred in the updated SA.