Cleanup loadbalancers created by ccm #842

sbueringer · 2021-04-09T06:56:19Z

/kind feature

Currently, we only delete the kubeapi loadbalancer created by ClusterAPI. Usually, there are additional loadbalancers created by the ccm. I would also like to delete those on cluster deletion. They cannot be deleted by the ccm, because the ccm is not running anymore. The cluster deletion get's blocked because network etc. cannot be deleted if there are ccm orphan lbs left.

Notes:

we should also add a e2e test for it. I think we should implement it similarly to CAPA: https://github.com/kubernetes-sigs/cluster-api-provider-aws/blob/master/test/e2e/suites/unmanaged/unmanaged_test.go#L178-L210
we have to ensure we're not deleting unrelated LBs in the same OpenStack project
- I'm not sure if there is a way to identify if the loadbalancers are related to the cluster we are deleting
- worst case is that we need a flag on the CAPO controller to opt-out of this behaviour
I assume v0.5+ Implement lb provider #839 will not significatly change the problem, so let's ignore it for now.

sbueringer · 2021-04-09T06:57:51Z

/assign @seanschneeweiss
(we already have an internal implementation of this feature)

/cc @iamemilio I assume you can provide some input on this :)

k8s-ci-robot · 2021-04-09T06:57:52Z

@sbueringer: GitHub didn't allow me to assign the following users: seanschneeweiss.

Note that only kubernetes-sigs members, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time.
For more information please see the contributor guide

In response to this:

/assign @seanschneeweiss
(we already have an internal implementation of this feature)

/cc @iamemilio I assume you can provide some input on this :)

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

sbueringer · 2021-04-09T06:59:49Z

/assign
(assigning myself in place of @seanschneeweiss as he cannot be assigned)

hidekazuna · 2021-04-09T07:27:27Z

This is interesting feature. I want delete or not flag. Because OpenStack cloud provider user not only uses load balancer, but also cinder CSI or ingress controller etc. they(and I) may confuse only load balancers are deleted even if I know load balancer is in CCM and others are in other controllers.

jichenjc · 2021-04-09T07:36:02Z

if you check https://github.com/kubernetes/cloud-provider-openstack
there are many resources might be related ,KMS, CSI (cinder, manila) etc
I think delete something not created by CAPO is dangerous as we might be lack of info to decide whether to delete and delete what?

sbueringer · 2021-04-09T08:07:34Z

I think if there's a reasonable safe way to identify which load balancers have been created by the ccm of the cluster we just deleted, it's fine to delete them. I think I wouldn't touch disks etc. as there is a chance we accidentally delete valuable data this way. Load balancers can be re-created if necessary.

The AWS e2e test verifies that the lbs are deleted and the disks survive the cluster deletion.

The alternative is that you always have to delete the loadbalancers via ccm first and then trigger the cluster deletion. Or that the cluster deletion gets stuck in the middle and someone has to manually cleanup the loadbalancers in OpenStack.

If we think the regular case is that users want to manually cleanup the loadbalancers we can make this opt-in instead of opt-out.

seanschneeweiss · 2021-04-09T09:14:29Z

I've opened a PR #844 with the currently used implementation. This is a draft to be extended by the e2e tests and further filters to prevent unwanted deletion of other objects.
_{Sean Schneeweiss [email protected], Daimler TSS GmbH, legal info/Impressum}

sbueringer · 2021-06-30T04:48:25Z

/unassign
/assign @seanschneeweiss
(It wasn't possible to assign Sean before because he wasn't a member of the org)

seanschneeweiss · 2021-08-04T12:53:48Z

Should there be an additional selection for orphaned load balancers other than the project id? How can we identify the load balancers of the to be deleted cluster?

mkjpryor · 2021-08-18T16:15:35Z

Does this just delete all the load balancers in the OpenStack project? What happens if there is more than one cluster in a project?

Is the issue here that the network/router that we created won't delete because there are load balancers attached to it? In that case, it makes sense to me that the condition should be:

if we are deleting the network then delete the load balancers attached to it first

That means that:

Load balancers for other clusters with separate networks will not be deleted
If we did not create the network, and hence are leaving it behind afterwards, we leave the load balancers behind

That seems like sensible behaviour to me?

hidekazuna · 2021-09-14T00:50:47Z

I am still looking for how to identify all the load balancers capo deployed.
Naming convention is fmt.Sprintf("kube_service_%s_%s_%s", clusterName, service.Namespace, service.Name), but I am not sure what clusterName is.

@jichenjc any comments?

jichenjc · 2021-09-14T02:14:28Z

@hidekazuna see this
https://github.com/kubernetes/cloud-provider/blob/master/app/core.go#L87

the clusterName comes from ctx.ComponentConfig.KubeCloudShared.ClusterName which I believe from k8s itself

iamemilio · 2021-09-14T17:57:43Z

A trick we have used in our deployments is to apply a tag to resources created by CAPO. That might simplify your problem.

jichenjc · 2021-09-15T01:26:26Z

apply a tag to resources created by CAPO

agree, create tag is much simple and should be considered common approach

but looks like this is more like a OCCM question
this issue is talking about LB created by OCCM need to be cleaned up as well if I understand correctly

hidekazuna · 2021-09-15T03:22:15Z

@jichenjc Thanks information.
@iamemilio @jichenjc Yes, LBs in this issue are created by CCM, we can not use tag.
@seanschneeweiss I think we can identify LBs created by CCM deployed by CAPO by it's name starts with kube_service_clusterName.

jichenjc · 2021-09-15T03:25:50Z

this might come a big question, cloud provider support PV, PVC and other resource creation through cinder, manila , I am not sure whether CAPI cluster deletion action here will include those as well?

haven't try e.g whether this will cascade or prevent cluster deletion etc.
so real user use CAPI with workload might have more words on this

mkjpryor · 2021-09-15T05:57:39Z

@jichenjc

CAPI cluster deletion does not delete service load balancers (it does delete the API server load balancer) or Cinder volumes provisioned for PVCs.

One option for CAPI clusters would be to have CAPI clean up LoadBalancer services and PVCs before deleting a cluster.

jichenjc · 2021-09-16T00:29:43Z

CAPI cluster deletion does not delete service load balancers (it does delete the API server load balancer) or Cinder volumes provisioned for PVCs.

One option for CAPI clusters would be to have CAPI clean up LoadBalancer services and PVCs before deleting a cluster.

Yes, I understand , so as you indicated in the PR #990
maybe the reasonable way for now is to use naming conventions to remove the LBs first

and ask the CAPI community about the desire on handling the resources created outside of CAPI lifecycle..
I guess it might be the operator's job to consider ,or maybe better way is to create something really to do a quick test (e.g create CAPI cluster and create PV/PVC and LB etc on it ) then delete and see what happened

hidekazuna · 2021-09-16T02:39:12Z

I want to deploy k8s clusters in the same network and the same project for testing purpose. I do not want to delete LBs based on the network nor project.

seanschneeweiss · 2021-09-17T16:29:40Z

Deleting by name kube_service_<clustername>_<namespace>_<servicename> sounds the most reasonable to me as it includes the cluster name and we should have no unwanted deletions of other resources.
Maybe a check could be added that

counts the numbers of load balancers before deleting the network (when deleting the cluster),
and returns an error or prints a log message to notify about the orphaned load balancer.

seanschneeweiss · 2021-10-22T11:45:55Z

The clustername originates from the setting cluster-name presented to the kube-controller-manager. xref
Default value is kubernetes, but with CAPI/CAPO this value should be set correctly I hope.

seanschneeweiss · 2022-01-18T13:23:31Z

I guess that deleting PVs is out of scope. PVs might contain important data and they should probably never get deleted automatically or as part of a garbage collection.

Load Balancers can be recreated at any time. In addition, orphaned load balancers created by the OCCM are somehow expected and they block network deletion.
So I like the suggestion to delete load balancers based on the naming convention kube_service_<clustername>_<namespace>_<servicename>.

One option for CAPI clusters would be to have CAPI clean up LoadBalancer services and PVCs before deleting a cluster.

For the CAPO it would be good to rely on CAPI to clean the LoadBalancer services. I don't know if there is any provider where the load balancer is independent from the cluster network itself... I'll have to ask the CAPI developers.

mkjpryor · 2022-02-02T11:26:28Z

Does this just delete all the load balancers in the OpenStack project? What happens if there is more than one cluster in a project?

Is the issue here that the network/router that we created won't delete because there are load balancers attached to it? In that case, it makes sense to me that the condition should be:

if we are deleting the network then delete the load balancers attached to it first

That means that:

Load balancers for other clusters with separate networks will not be deleted

If we did not create the network, and hence are leaving it behind afterwards, we leave the load balancers behind

That seems like sensible behaviour to me?

I still think that this seems like reasonable behaviour right?

If CAPO created the network for a cluster and wants to remove it when the cluster is deleted, then it should be assumed that it is OK to remove anything blocking that deletion, e.g. load balancers.

If CAPO did not create the network, then it doesn't need to delete the network as part of cluster deletion and we don't need to worry about it.

mkjpryor · 2022-02-02T11:27:16Z

Just worried we are over-complicating things, and making ourselves brittle, by assuming naming conventions from the CCM...

seanschneeweiss · 2022-04-12T15:15:37Z

If CAPO created the network for a cluster and wants to remove it when the cluster is deleted, then it should be assumed that it is OK to remove anything blocking that deletion, e.g. load balancers.

@mkjpryor Sounds reasonable, indeed. How can we identify whether CAPO created the network, any suggestion?

mkjpryor · 2022-04-12T15:20:54Z

@seanschneeweiss

We already have a gate that decides whether we should delete the network or not. If we are deleting the network, we should delete the load balancers attached to it first.

k8s-triage-robot · 2022-07-11T16:10:02Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

seanschneeweiss · 2022-07-18T21:56:46Z

/remove-lifecycle stale

Time is a bit rare at this moment but I'll hope to work on this very soon. If someone likes to takeover, please do not hesitate to do so.

nikParasyr · 2022-08-06T12:51:13Z

I would like something like but Im not sure how much extra logic and code a feature would add. A different approach, that doesnt require support from CAPO, is for users to use the BeforeClusterDeleteHook from the Lifecycle Hooks. It's still an experimental feature and only works with ClusterClass afaik, but it does allow users to inject arbitrary logic without capo needing to support anything specific.

k8s-triage-robot · 2022-11-04T13:35:27Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

chrischdi · 2022-11-17T08:36:25Z

xref: CAPA has a similar feature: https://cluster-api-aws.sigs.k8s.io/topics/external-resource-gc.html

erkanerol · 2022-11-17T08:53:36Z

For CAPO and CAPVCD, we use our own cleaner operator for this issue. I wish we had a flag in CAPO like CAPA.

k8s-triage-robot · 2022-12-17T09:20:01Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot · 2023-01-16T10:08:40Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen
Mark this issue as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-ci-robot · 2023-01-16T10:08:44Z

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen

Mark this issue as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Apr 9, 2021

sbueringer changed the title ~~Clean up loadbalancers created by ccm~~ Cleanup loadbalancers created by ccm Apr 9, 2021

k8s-ci-robot assigned sbueringer Apr 9, 2021

seanschneeweiss mentioned this issue Apr 9, 2021

WIP: ✨ load balancer: delete orphaned load balancers from CCM witin project #844

Closed

3 tasks

This was referenced May 18, 2021

REQUEST: New membership for seanschneeweiss kubernetes/org#2719

Closed

REQUEST: New membership for seanschneeweiss kubernetes/org#2725

Closed

k8s-ci-robot assigned seanschneeweiss and unassigned sbueringer Jun 30, 2021

seanschneeweiss mentioned this issue Sep 8, 2021

[WIP] ✨load balancers: delete orphaned load balancers from same network #990

Closed

3 tasks

erkanerol mentioned this issue Jan 24, 2022

Improve Cluster deletion giantswarm/roadmap#682

Closed

2 tasks

mdbooth added this to Cluster API OpenStack Mar 18, 2022

mdbooth moved this to Backlog in Cluster API OpenStack Mar 18, 2022

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 11, 2022

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 18, 2022

mnaser mentioned this issue Oct 26, 2022

Remove all load balancers pre-delete vexxhost/magnum-cluster-api#6

Closed

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 4, 2022

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Dec 17, 2022

k8s-ci-robot closed this as not planned Won't fix, can't repro, duplicate, stale Jan 16, 2023

github-project-automation bot moved this from Backlog to Done in Cluster API OpenStack Jan 16, 2023

Cleanup loadbalancers created by ccm #842

Cleanup loadbalancers created by ccm #842

Comments

sbueringer commented Apr 9, 2021 • edited Loading

sbueringer commented Apr 9, 2021

k8s-ci-robot commented Apr 9, 2021

sbueringer commented Apr 9, 2021

hidekazuna commented Apr 9, 2021

jichenjc commented Apr 9, 2021

sbueringer commented Apr 9, 2021 • edited Loading

seanschneeweiss commented Apr 9, 2021 • edited Loading

sbueringer commented Jun 30, 2021

seanschneeweiss commented Aug 4, 2021

mkjpryor commented Aug 18, 2021

hidekazuna commented Sep 14, 2021

jichenjc commented Sep 14, 2021 • edited Loading

iamemilio commented Sep 14, 2021

jichenjc commented Sep 15, 2021

hidekazuna commented Sep 15, 2021

jichenjc commented Sep 15, 2021 • edited Loading

mkjpryor commented Sep 15, 2021

jichenjc commented Sep 16, 2021

hidekazuna commented Sep 16, 2021 • edited Loading

seanschneeweiss commented Sep 17, 2021

seanschneeweiss commented Oct 22, 2021

seanschneeweiss commented Jan 18, 2022 • edited Loading

mkjpryor commented Feb 2, 2022 • edited Loading

mkjpryor commented Feb 2, 2022

seanschneeweiss commented Apr 12, 2022 • edited Loading

mkjpryor commented Apr 12, 2022

k8s-triage-robot commented Jul 11, 2022

seanschneeweiss commented Jul 18, 2022

nikParasyr commented Aug 6, 2022

k8s-triage-robot commented Nov 4, 2022

chrischdi commented Nov 17, 2022

erkanerol commented Nov 17, 2022

k8s-triage-robot commented Dec 17, 2022

k8s-triage-robot commented Jan 16, 2023

k8s-ci-robot commented Jan 16, 2023

sbueringer commented Apr 9, 2021 •

edited

Loading

sbueringer commented Apr 9, 2021 •

edited

Loading

seanschneeweiss commented Apr 9, 2021 •

edited

Loading

jichenjc commented Sep 14, 2021 •

edited

Loading

jichenjc commented Sep 15, 2021 •

edited

Loading

hidekazuna commented Sep 16, 2021 •

edited

Loading

seanschneeweiss commented Jan 18, 2022 •

edited

Loading

mkjpryor commented Feb 2, 2022 •

edited

Loading

seanschneeweiss commented Apr 12, 2022 •

edited

Loading