Bug 1743345: clean up service account, cluster roles, and cluster role bindings after CSV deletion #970

jpeeler · 2019-07-29T20:47:47Z

The approach I have here works, but takes a while as the resync interval has to pass. I was trying avoid creating a new queue for requeueing, though that may not have worked anyway since really it'd need to be requeued after CSV deletion. The commented out section did not work due to CSV requeueing and in progress CSV processing.

(The above is now outdated.)

pkg/controller/registry/resolver/rbac.go

njhale

Looks good! Left some feedback.

njhale · 2019-08-06T18:52:46Z

pkg/controller/operators/olm/operator.go

-		a.requeueOwnerCSVs(metaObj)
+	switch metaObj.(type) {
+	case *corev1.ServiceAccount:
+		if err := a.opClient.DeleteServiceAccount(metaObj.GetNamespace(), metaObj.GetName(), &metav1.DeleteOptions{}); err != nil {


nit:

if syncError = a.opClient.DeleteServiceAccount(...

njhale · 2019-08-06T18:54:34Z

pkg/controller/operators/olm/operator.go

+	case *corev1.ServiceAccount:
+		if err := a.opClient.DeleteServiceAccount(metaObj.GetNamespace(), metaObj.GetName(), &metav1.DeleteOptions{}); err != nil {
+			logger.WithError(err).Warn("cannot delete service account")
+			syncError = err


nit: You may want to put a break (or else) here or you'll get the "Deleted" log output even when the deletion fails.

njhale · 2019-08-06T19:11:17Z

pkg/controller/operators/olm/operator.go

+		"name":      metaObj.GetName(),
+		"namespace": metaObj.GetNamespace(),
+		"self":      metaObj.GetSelfLink(),
+	})

 	// Requeues objects that can't have ownerrefs (cluster -> namespace, cross-namespace)
 	if ownerutil.IsOwnedByKindLabel(metaObj, v1alpha1.ClusterServiceVersionKind) {


There seems to be some duplicated work. I wonder if we can rearrange things to make it a little cleaner. Does it make sense to move some of this to the requeueOwnerCSVs method; particularly the existence check and GC enqueuing?

Most of the code here was already present. The goal was to avoid requeueing CSVs in a deletion scenario. Maybe I can circle back to this later.

njhale · 2019-08-06T19:51:26Z

pkg/lib/queueinformer/queueinformer.go

@@ -105,6 +105,13 @@ func (q *QueueInformer) metricHandlers() *cache.ResourceEventHandlerFuncs {
 	}
 }

+func NewQueue(ctx context.Context, options ...Option) (*QueueInformer, error) {


This constructor name makes lot more sense than NewQueueInformer for the use-case, which I think is to have a QueueInformer that doesn't have an informer or indexer behind it. If that's the case, then I think what you want to do is remove the non-nil informer/indexer test in the queueInformerConfig.validate() method, or create a new validate method that ignores that check. Additionally, you would need to detect a nil indexer and attempt to use the resource nested in the event, rather than getting it from the indexer.

I ended up creating a ResourceEvent (with type updated) and putting that on the queue.

njhale · 2019-08-06T20:52:02Z

pkg/controller/operators/olm/operator.go

+			ctx,
+			queueinformer.WithLogger(op.logger),
+			queueinformer.WithQueue(objGCQueue),
+			queueinformer.WithSyncer(queueinformer.LegacySyncHandler(op.syncGCObject).ToSyncer()),


Is using the legacy adapter more convenient than implementing a new Syncer? If that's the case then we may want to remove that interface going forward.

Since I'm familiar with the "old" way, I was going to implement that first.

njhale · 2019-08-06T20:58:04Z

pkg/controller/operators/olm/operator.go

+		} else {
+			switch metaObj.(type) {
+			case *corev1.ServiceAccount, *rbacv1.ClusterRole, *rbacv1.ClusterRoleBinding:
+				a.objGCQueueSet.Requeue(ns, metaObj.GetName())


If the goal is to send a ResourceEvent, this will not work as intended since it constructs a key manually. It may be helpful to add a RequeueEvent method to the ResourceQueue type.

jpeeler

This commit message is worth reading, didn't repeat it here.

I counted at least six different places CSVs are requeued (seven if you count operator group requeues, which also requeue CSVs). This fact is what requires handling this in a queue rather than just deleting in the CSV deletion handler.

jpeeler · 2019-08-14T21:11:08Z

pkg/lib/queueinformer/resourcequeue.go

+	defer r.mutex.RUnlock()
+
+	if queue, ok := r.queueSet[namespace]; ok {
+		queue.AddRateLimited(resourceEvent)


Really what I wanted to do here is queue.AddAfter(resourceEvent, 10 * time.Second), but I think what is here works okay at least. I figured a timeout wouldn't be allowed, but it would avoid some potential requeueing due to lingering CSVs.

jpeeler · 2019-08-15T01:07:28Z

/test e2e-aws-olm

ecordell

This looks really good!

I had some questions mostly about code organization, but I like this approach and the testing for it.

pkg/controller/operators/olm/operator.go

pkg/controller/operators/olm/operatorgroup.go

pkg/controller/registry/resolver/rbac.go

pkg/lib/queueinformer/config.go

pkg/lib/queueinformer/queueinformer_operator.go

shawn-hurley · 2019-08-19T13:27:44Z

Please add a bug to the title that this is fixing

This ensures proper resource deletion is done upon CSV deletion. Since this touches a lot of different places, here's a summary of changes made: The RBAC has been modified to be owned by CSV instead of the operator group. An operator group may remain after a CSV is deleted, but the associated resources shouldn't. Similarly, created service accounts were missing an owner reference to the CSV. Due to the large amount of CSV requeueing and potential in progress handling of a CSV, RBAC couldn't be deleted in handleClusterServiceVersionDeletion (because sometimes the RBAC would be recreated by another CSV sync). Instead, a new queue was created for GC-ing resources. The sync loop specifically is used to do deletes so that the loop can return an error (an error being if the CSV is not yet deleted) and will be scheduled to try again later. The requeueing code has been changed to not requeue if the CSV is not in the cache to help not delay the new GC sync loop. The new queue does not utilize an informer or indexer, so the event and the resource are placed directly on the queue rather than relying on the indexer to retrieve by key in the processing loop (processNextWorkItem).

openshift-ci-robot · 2019-08-19T16:50:01Z

@jpeeler: This pull request references Bugzilla bug 1729385, which is invalid:

expected the bug to target the "4.2.0" release, but it targets "4.1.z" instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

Bug 1729385: clean up service account, cluster roles, and cluster role bindings after CSV deletion

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci-robot · 2019-08-19T16:56:02Z

@jpeeler: This pull request references Bugzilla bug 1743345, which is invalid:

expected dependent Bugzilla bug 1729385 to be in one of the following states: VERIFIED, RELEASE_PENDING, CLOSED (ERRATA), but it is POST instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

Bug 1743345: clean up service account, cluster roles, and cluster role bindings after CSV deletion

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

jpeeler · 2019-08-19T17:16:56Z

/bugzilla refresh

openshift-ci-robot · 2019-08-19T17:17:02Z

@jpeeler: This pull request references Bugzilla bug 1743345, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

ecordell · 2019-08-20T19:58:43Z

/lgtm

openshift-ci-robot · 2019-08-20T20:00:50Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ecordell, jpeeler

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [ecordell,jpeeler]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

jpeeler · 2019-08-20T21:50:31Z

/test e2e-aws-olm

jpeeler · 2019-08-21T00:06:21Z

/test e2e-aws-olm

jpeeler · 2019-08-21T13:18:55Z

/test e2e-aws-olm

jpeeler · 2019-08-21T16:02:37Z

/test e2e-aws-olm

jpeeler · 2019-08-21T19:11:34Z

/retest

jpeeler · 2019-08-21T21:28:04Z

/test unit

openshift-ci-robot · 2019-08-22T01:31:58Z

@jpeeler: All pull requests linked via external trackers have merged. Bugzilla bug 1743345 has been moved to the MODIFIED state.

In response to this:

Bug 1743345: clean up service account, cluster roles, and cluster role bindings after CSV deletion

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

ecordell · 2019-08-22T18:01:34Z

/cherry-pick release-4.1

openshift-cherrypick-robot · 2019-08-22T18:01:39Z

@ecordell: #970 failed to apply on top of branch "release-4.1":

Using index info to reconstruct a base tree...
M	test/e2e/installplan_e2e_test.go
Falling back to patching base and 3-way merge...
Auto-merging test/e2e/installplan_e2e_test.go
Applying: fix(olm): clean up resources on CSV deletion
Using index info to reconstruct a base tree...
M	pkg/controller/operators/olm/operator.go
M	pkg/controller/operators/olm/operatorgroup.go
A	pkg/lib/queueinformer/config.go
M	pkg/lib/queueinformer/queueinformer.go
M	pkg/lib/queueinformer/queueinformer_operator.go
M	pkg/lib/queueinformer/resourcequeue.go
Falling back to patching base and 3-way merge...
Auto-merging pkg/lib/queueinformer/resourcequeue.go
Auto-merging pkg/lib/queueinformer/queueinformer_operator.go
CONFLICT (content): Merge conflict in pkg/lib/queueinformer/queueinformer_operator.go
Auto-merging pkg/lib/queueinformer/queueinformer.go
CONFLICT (content): Merge conflict in pkg/lib/queueinformer/queueinformer.go
CONFLICT (modify/delete): pkg/lib/queueinformer/config.go deleted in HEAD and modified in fix(olm): clean up resources on CSV deletion. Version fix(olm): clean up resources on CSV deletion of pkg/lib/queueinformer/config.go left in tree.
Auto-merging pkg/controller/operators/olm/operatorgroup.go
CONFLICT (content): Merge conflict in pkg/controller/operators/olm/operatorgroup.go
Auto-merging pkg/controller/operators/olm/operator.go
CONFLICT (content): Merge conflict in pkg/controller/operators/olm/operator.go
error: Failed to merge in the changes.
Patch failed at 0002 fix(olm): clean up resources on CSV deletion

In response to this:

/cherry-pick release-4.1

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 29, 2019

openshift-ci-robot requested review from ecordell and njhale July 29, 2019 20:47

openshift-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jul 29, 2019

jpeeler commented Jul 29, 2019

View reviewed changes

pkg/controller/registry/resolver/rbac.go Outdated Show resolved Hide resolved

njhale reviewed Aug 6, 2019

View reviewed changes

jpeeler force-pushed the cleanup-roles-sa branch 2 times, most recently from fc43c40 to 1a59a97 Compare August 14, 2019 21:07

jpeeler commented Aug 14, 2019

View reviewed changes

jpeeler changed the title ~~WIP: clean up service account, cluster roles, and cluster role bindings after CSV deletion~~ clean up service account, cluster roles, and cluster role bindings after CSV deletion Aug 14, 2019

openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 14, 2019

ecordell approved these changes Aug 16, 2019

View reviewed changes

test(e2e): ensure resources are deleted after CSV delete

f8a1453

Jeff Peeler added 2 commits August 19, 2019 12:43

fix(spelling): just a comment correction

565d30c

jpeeler force-pushed the cleanup-roles-sa branch from 1a59a97 to 565d30c Compare August 19, 2019 16:49

jpeeler changed the title ~~clean up service account, cluster roles, and cluster role bindings after CSV deletion~~ Bug 1729385: clean up service account, cluster roles, and cluster role bindings after CSV deletion Aug 19, 2019

openshift-ci-robot added the bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. label Aug 19, 2019

jpeeler changed the title ~~Bug 1729385: clean up service account, cluster roles, and cluster role bindings after CSV deletion~~ Bug 1743345: clean up service account, cluster roles, and cluster role bindings after CSV deletion Aug 19, 2019

openshift-ci-robot added bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. and removed bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. labels Aug 19, 2019

openshift-ci-robot assigned ecordell Aug 20, 2019

openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Aug 20, 2019

openshift-merge-robot merged commit 7d6665d into operator-framework:master Aug 22, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug 1743345: clean up service account, cluster roles, and cluster role bindings after CSV deletion #970

Bug 1743345: clean up service account, cluster roles, and cluster role bindings after CSV deletion #970

jpeeler commented Jul 29, 2019 •

edited

Loading

njhale left a comment

njhale Aug 6, 2019

njhale Aug 6, 2019

njhale Aug 6, 2019

jpeeler Aug 8, 2019

njhale Aug 6, 2019 •

edited

Loading

jpeeler Aug 8, 2019 •

edited

Loading

njhale Aug 6, 2019

jpeeler Aug 8, 2019

njhale Aug 6, 2019

jpeeler left a comment

jpeeler Aug 14, 2019

jpeeler commented Aug 15, 2019

ecordell left a comment

shawn-hurley commented Aug 19, 2019

openshift-ci-robot commented Aug 19, 2019

openshift-ci-robot commented Aug 19, 2019

jpeeler commented Aug 19, 2019

openshift-ci-robot commented Aug 19, 2019

ecordell commented Aug 20, 2019

openshift-ci-robot commented Aug 20, 2019

jpeeler commented Aug 20, 2019

jpeeler commented Aug 21, 2019

jpeeler commented Aug 21, 2019

jpeeler commented Aug 21, 2019

jpeeler commented Aug 21, 2019

jpeeler commented Aug 21, 2019

openshift-ci-robot commented Aug 22, 2019

ecordell commented Aug 22, 2019

openshift-cherrypick-robot commented Aug 22, 2019

Bug 1743345: clean up service account, cluster roles, and cluster role bindings after CSV deletion #970

Bug 1743345: clean up service account, cluster roles, and cluster role bindings after CSV deletion #970

Conversation

jpeeler commented Jul 29, 2019 • edited Loading

njhale left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

njhale Aug 6, 2019 • edited Loading

Choose a reason for hiding this comment

jpeeler Aug 8, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jpeeler left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jpeeler commented Aug 15, 2019

ecordell left a comment

Choose a reason for hiding this comment

shawn-hurley commented Aug 19, 2019

openshift-ci-robot commented Aug 19, 2019

openshift-ci-robot commented Aug 19, 2019

jpeeler commented Aug 19, 2019

openshift-ci-robot commented Aug 19, 2019

ecordell commented Aug 20, 2019

openshift-ci-robot commented Aug 20, 2019

jpeeler commented Aug 20, 2019

jpeeler commented Aug 21, 2019

jpeeler commented Aug 21, 2019

jpeeler commented Aug 21, 2019

jpeeler commented Aug 21, 2019

jpeeler commented Aug 21, 2019

openshift-ci-robot commented Aug 22, 2019

ecordell commented Aug 22, 2019

openshift-cherrypick-robot commented Aug 22, 2019

jpeeler commented Jul 29, 2019 •

edited

Loading

njhale Aug 6, 2019 •

edited

Loading

jpeeler Aug 8, 2019 •

edited

Loading