-
Notifications
You must be signed in to change notification settings - Fork 545
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug 1743345: clean up service account, cluster roles, and cluster role bindings after CSV deletion #970
Bug 1743345: clean up service account, cluster roles, and cluster role bindings after CSV deletion #970
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Left some feedback.
a.requeueOwnerCSVs(metaObj) | ||
switch metaObj.(type) { | ||
case *corev1.ServiceAccount: | ||
if err := a.opClient.DeleteServiceAccount(metaObj.GetNamespace(), metaObj.GetName(), &metav1.DeleteOptions{}); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
if syncError = a.opClient.DeleteServiceAccount(...
case *corev1.ServiceAccount: | ||
if err := a.opClient.DeleteServiceAccount(metaObj.GetNamespace(), metaObj.GetName(), &metav1.DeleteOptions{}); err != nil { | ||
logger.WithError(err).Warn("cannot delete service account") | ||
syncError = err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: You may want to put a break
(or else
) here or you'll get the "Deleted" log output even when the deletion fails.
"name": metaObj.GetName(), | ||
"namespace": metaObj.GetNamespace(), | ||
"self": metaObj.GetSelfLink(), | ||
}) | ||
|
||
// Requeues objects that can't have ownerrefs (cluster -> namespace, cross-namespace) | ||
if ownerutil.IsOwnedByKindLabel(metaObj, v1alpha1.ClusterServiceVersionKind) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There seems to be some duplicated work. I wonder if we can rearrange things to make it a little cleaner. Does it make sense to move some of this to the requeueOwnerCSVs
method; particularly the existence check and GC enqueuing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of the code here was already present. The goal was to avoid requeueing CSVs in a deletion scenario. Maybe I can circle back to this later.
@@ -105,6 +105,13 @@ func (q *QueueInformer) metricHandlers() *cache.ResourceEventHandlerFuncs { | |||
} | |||
} | |||
|
|||
func NewQueue(ctx context.Context, options ...Option) (*QueueInformer, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This constructor name makes lot more sense than NewQueueInformer
for the use-case, which I think is to have a QueueInformer that doesn't have an informer or indexer behind it. If that's the case, then I think what you want to do is remove the non-nil informer/indexer test in the queueInformerConfig.validate()
method, or create a new validate method that ignores that check. Additionally, you would need to detect a nil indexer and attempt to use the resource nested in the event, rather than getting it from the indexer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ended up creating a ResourceEvent (with type updated) and putting that on the queue.
ctx, | ||
queueinformer.WithLogger(op.logger), | ||
queueinformer.WithQueue(objGCQueue), | ||
queueinformer.WithSyncer(queueinformer.LegacySyncHandler(op.syncGCObject).ToSyncer()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is using the legacy adapter more convenient than implementing a new Syncer
? If that's the case then we may want to remove that interface going forward.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since I'm familiar with the "old" way, I was going to implement that first.
} else { | ||
switch metaObj.(type) { | ||
case *corev1.ServiceAccount, *rbacv1.ClusterRole, *rbacv1.ClusterRoleBinding: | ||
a.objGCQueueSet.Requeue(ns, metaObj.GetName()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the goal is to send a ResourceEvent
, this will not work as intended since it constructs a key manually. It may be helpful to add a RequeueEvent
method to the ResourceQueue
type.
fc43c40
to
1a59a97
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This commit message is worth reading, didn't repeat it here.
I counted at least six different places CSVs are requeued (seven if you count operator group requeues, which also requeue CSVs). This fact is what requires handling this in a queue rather than just deleting in the CSV deletion handler.
defer r.mutex.RUnlock() | ||
|
||
if queue, ok := r.queueSet[namespace]; ok { | ||
queue.AddRateLimited(resourceEvent) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really what I wanted to do here is queue.AddAfter(resourceEvent, 10 * time.Second)
, but I think what is here works okay at least. I figured a timeout wouldn't be allowed, but it would avoid some potential requeueing due to lingering CSVs.
/test e2e-aws-olm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks really good!
I had some questions mostly about code organization, but I like this approach and the testing for it.
Please add a bug to the title that this is fixing |
This ensures proper resource deletion is done upon CSV deletion. Since this touches a lot of different places, here's a summary of changes made: The RBAC has been modified to be owned by CSV instead of the operator group. An operator group may remain after a CSV is deleted, but the associated resources shouldn't. Similarly, created service accounts were missing an owner reference to the CSV. Due to the large amount of CSV requeueing and potential in progress handling of a CSV, RBAC couldn't be deleted in handleClusterServiceVersionDeletion (because sometimes the RBAC would be recreated by another CSV sync). Instead, a new queue was created for GC-ing resources. The sync loop specifically is used to do deletes so that the loop can return an error (an error being if the CSV is not yet deleted) and will be scheduled to try again later. The requeueing code has been changed to not requeue if the CSV is not in the cache to help not delay the new GC sync loop. The new queue does not utilize an informer or indexer, so the event and the resource are placed directly on the queue rather than relying on the indexer to retrieve by key in the processing loop (processNextWorkItem).
1a59a97
to
565d30c
Compare
@jpeeler: This pull request references Bugzilla bug 1729385, which is invalid:
Comment In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@jpeeler: This pull request references Bugzilla bug 1743345, which is invalid:
Comment In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/bugzilla refresh |
@jpeeler: This pull request references Bugzilla bug 1743345, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ecordell, jpeeler The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/test e2e-aws-olm |
3 similar comments
/test e2e-aws-olm |
/test e2e-aws-olm |
/test e2e-aws-olm |
/retest |
/test unit |
@jpeeler: All pull requests linked via external trackers have merged. Bugzilla bug 1743345 has been moved to the MODIFIED state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/cherry-pick release-4.1 |
@ecordell: #970 failed to apply on top of branch "release-4.1":
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
The approach I have here works, but takes a while as the resync interval has to pass. I was trying avoid creating a new queue for requeueing, though that may not have worked anyway since really it'd need to be requeued after CSV deletion. The commented out section did not work due to CSV requeueing and in progress CSV processing.
(The above is now outdated.)