Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(olm): add deletion monitoring for api services #750

Merged

Conversation

jpeeler
Copy link

@jpeeler jpeeler commented Mar 11, 2019

If an api service has owner labels, check to make sure that the namespace and owning CSV still exists. If either does not, delete the api service.

This is a follow up to 79b314d. I haven't tested this yet, but maybe the existing e2e is good enough.

@openshift-ci-robot openshift-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Mar 11, 2019
@jpeeler
Copy link
Author

jpeeler commented Mar 11, 2019

/retest

2 similar comments
@jpeeler
Copy link
Author

jpeeler commented Mar 11, 2019

/retest

@jpeeler
Copy link
Author

jpeeler commented Mar 12, 2019

/retest

Copy link
Member

@ecordell ecordell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good!

We do need to absolutely verify that there's never a moment during a CSV upgrade that this GC could be triggered; I don't think this should merge without those tests in place.

I also think that this is important to do - but I'm a little fuzzy on how this is fixing upgrades of packagerserver when upgrading openshift. We shouldn't be deleting the packageserver apiservice during an upgrade, right?

_, err := a.lister.OperatorsV1alpha1().ClusterServiceVersionLister().ClusterServiceVersions(owningNamespace).Get(owningCSVName)
if k8serrors.IsNotFound(err) {
logger.Debug("Deleting api service since owning CSV is not found")
syncError = a.OpClient.DeleteAPIService(apiSvc.GetName(), &metav1.DeleteOptions{})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also generate some rolebindings in kube-system for APIServices that we should clean up

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still need to do this. But I admit I'm less concerned about it since I've never seen any stray RBAC cause package server install issues.

logger.Info("syncing APIService")

serviceLabels := apiSvc.GetLabels()
owningNamespace, owned := serviceLabels[ownerutil.OwnerNamespaceKey]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should check if labels are nil before accessing

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored to use GetOwnerByKindLabel.

@@ -349,6 +398,22 @@ func (a *Operator) namespaceAddedOrRemoved(obj interface{}) {
}
}
}
// find any API services owned by this namespace and requeue them
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When a namespace gets deleted with a CSV in it, I'm pretty sure we get delete events for CSVs. I think it would be simpler to just queue up apiservices on CSV delete.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is covering the case of deleting the namespace that also contains OLM. There already exists code to handle cleaning up api services upon CSV deletion in the "normal" case:

for _, desc := range clusterServiceVersion.Spec.APIServiceDefinitions.Owned {

I assume what happens is upon deleting the namespace hosting OLM, OLM doesn't have time to run any clean up.

@@ -349,6 +398,22 @@ func (a *Operator) namespaceAddedOrRemoved(obj interface{}) {
}
}
}
// find any API services owned by this namespace and requeue them
stringSelector := fmt.Sprintf("%v==%v", ownerutil.OwnerNamespaceKey, namespace.GetName())
labelSelector, err := labels.Parse(stringSelector)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems like a useful thing to factor out into ownerutil, no?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I'm not sure if the code is even needed. I'm specifically testing for when OLM is being terminated via namespace deletion, which it doesn't look like will allow/guarantee this code to run. Instead, the api service sync loop upon starting up will detect orphaned api services and get rid of them.

pkg/controller/operators/olm/operator.go Outdated Show resolved Hide resolved
@alecmerdler alecmerdler self-assigned this Mar 12, 2019
Copy link
Member

@alecmerdler alecmerdler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/hold
I'll handle writing the end-to-end test.

@openshift-ci-robot openshift-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Mar 12, 2019
@alecmerdler
Copy link
Member

/hold cancel

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 13, 2019
require.NoError(t, err)

deleted := make(chan struct{})
go func() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@openshift-ci-robot openshift-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 15, 2019
@openshift-ci-robot openshift-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Mar 18, 2019
@jpeeler
Copy link
Author

jpeeler commented Mar 18, 2019

Note that the e2e in this PR will pass without the changes. I've been testing by running TestCreateCSVWithOwnedAPIService once and then again without tearing down the cluster. With the way tests currently run, I'm not sure how to automate such a test.

@jpeeler
Copy link
Author

jpeeler commented Mar 19, 2019

/retest

@ecordell ecordell added the kind/feature Categorizes issue or PR as related to a new feature. label Mar 19, 2019
@openshift-ci-robot openshift-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Mar 25, 2019
@jpeeler
Copy link
Author

jpeeler commented Mar 26, 2019

/retest

3 similar comments
@jpeeler
Copy link
Author

jpeeler commented Mar 26, 2019

/retest

@jpeeler
Copy link
Author

jpeeler commented Mar 26, 2019

/retest

@jpeeler
Copy link
Author

jpeeler commented Mar 27, 2019

/retest

@jpeeler
Copy link
Author

jpeeler commented Mar 28, 2019

/retest

@jpeeler
Copy link
Author

jpeeler commented Mar 30, 2019

/retest

2 similar comments
@jpeeler
Copy link
Author

jpeeler commented Mar 31, 2019

/retest

@jpeeler
Copy link
Author

jpeeler commented Apr 3, 2019

/retest

Jeff Peeler and others added 2 commits April 3, 2019 15:00
If an api service has owner labels, check to make sure that the
namespace and owning CSV still exists. If either does not, delete the
api service.
@jpeeler
Copy link
Author

jpeeler commented Apr 3, 2019

/retest

@ecordell
Copy link
Member

ecordell commented May 9, 2019

/retest

^ to see where we stand with the tests now

It might be easier to get #696 merged first, since these tests are written using watches

@ecordell
Copy link
Member

ecordell commented May 9, 2019

/hold

holding until master is open for 4.2 😢

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 9, 2019
@ecordell
Copy link
Member

/hold cancel

Copy link
Member

@alecmerdler alecmerdler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label May 17, 2019
@openshift-ci-robot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alecmerdler, jpeeler

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [alecmerdler,jpeeler]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@jpeeler
Copy link
Author

jpeeler commented May 17, 2019

/hold cancel
previous command did not remove the label

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 17, 2019
@jpeeler
Copy link
Author

jpeeler commented May 20, 2019

/test e2e-aws-console-olm

@jpeeler
Copy link
Author

jpeeler commented May 20, 2019

/retest

@jpeeler
Copy link
Author

jpeeler commented May 20, 2019

/test e2e-aws-upgrade

@ecordell
Copy link
Member

/retest

@openshift-merge-robot openshift-merge-robot merged commit f370461 into operator-framework:master May 21, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. kind/feature Categorizes issue or PR as related to a new feature. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants