-
Notifications
You must be signed in to change notification settings - Fork 545
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(olm): add deletion monitoring for api services #750
fix(olm): add deletion monitoring for api services #750
Conversation
/retest |
2 similar comments
/retest |
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking good!
We do need to absolutely verify that there's never a moment during a CSV upgrade that this GC could be triggered; I don't think this should merge without those tests in place.
I also think that this is important to do - but I'm a little fuzzy on how this is fixing upgrades of packagerserver when upgrading openshift. We shouldn't be deleting the packageserver apiservice during an upgrade, right?
_, err := a.lister.OperatorsV1alpha1().ClusterServiceVersionLister().ClusterServiceVersions(owningNamespace).Get(owningCSVName) | ||
if k8serrors.IsNotFound(err) { | ||
logger.Debug("Deleting api service since owning CSV is not found") | ||
syncError = a.OpClient.DeleteAPIService(apiSvc.GetName(), &metav1.DeleteOptions{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also generate some rolebindings in kube-system for APIServices that we should clean up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still need to do this. But I admit I'm less concerned about it since I've never seen any stray RBAC cause package server install issues.
logger.Info("syncing APIService") | ||
|
||
serviceLabels := apiSvc.GetLabels() | ||
owningNamespace, owned := serviceLabels[ownerutil.OwnerNamespaceKey] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should check if labels are nil before accessing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactored to use GetOwnerByKindLabel.
@@ -349,6 +398,22 @@ func (a *Operator) namespaceAddedOrRemoved(obj interface{}) { | |||
} | |||
} | |||
} | |||
// find any API services owned by this namespace and requeue them |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When a namespace gets deleted with a CSV in it, I'm pretty sure we get delete
events for CSVs. I think it would be simpler to just queue up apiservices on CSV delete.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is covering the case of deleting the namespace that also contains OLM. There already exists code to handle cleaning up api services upon CSV deletion in the "normal" case:
for _, desc := range clusterServiceVersion.Spec.APIServiceDefinitions.Owned { |
I assume what happens is upon deleting the namespace hosting OLM, OLM doesn't have time to run any clean up.
@@ -349,6 +398,22 @@ func (a *Operator) namespaceAddedOrRemoved(obj interface{}) { | |||
} | |||
} | |||
} | |||
// find any API services owned by this namespace and requeue them | |||
stringSelector := fmt.Sprintf("%v==%v", ownerutil.OwnerNamespaceKey, namespace.GetName()) | |||
labelSelector, err := labels.Parse(stringSelector) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems like a useful thing to factor out into ownerutil, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I'm not sure if the code is even needed. I'm specifically testing for when OLM is being terminated via namespace deletion, which it doesn't look like will allow/guarantee this code to run. Instead, the api service sync loop upon starting up will detect orphaned api services and get rid of them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/hold
I'll handle writing the end-to-end test.
/hold cancel |
require.NoError(t, err) | ||
|
||
deleted := make(chan struct{}) | ||
go func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using Watch()
instead of polling.
ba876b0
to
80f9874
Compare
80f9874
to
a5e6c48
Compare
Note that the e2e in this PR will pass without the changes. I've been testing by running TestCreateCSVWithOwnedAPIService once and then again without tearing down the cluster. With the way tests currently run, I'm not sure how to automate such a test. |
/retest |
a5e6c48
to
7b1dd07
Compare
7b1dd07
to
29c51cf
Compare
/retest |
3 similar comments
/retest |
/retest |
/retest |
29c51cf
to
fb23834
Compare
/retest |
fb23834
to
6250afb
Compare
/retest |
2 similar comments
/retest |
/retest |
If an api service has owner labels, check to make sure that the namespace and owning CSV still exists. If either does not, delete the api service.
/retest |
/retest ^ to see where we stand with the tests now It might be easier to get #696 merged first, since these tests are written using watches |
/hold holding until master is open for 4.2 😢 |
/hold cancel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: alecmerdler, jpeeler The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/hold cancel |
/test e2e-aws-console-olm |
/retest |
/test e2e-aws-upgrade |
/retest |
If an api service has owner labels, check to make sure that the namespace and owning CSV still exists. If either does not, delete the api service.
This is a follow up to 79b314d. I haven't tested this yet, but maybe the existing e2e is good enough.