Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleanup loadbalancers created by ccm #842

Closed
sbueringer opened this issue Apr 9, 2021 · 35 comments
Closed

Cleanup loadbalancers created by ccm #842

sbueringer opened this issue Apr 9, 2021 · 35 comments
Assignees
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@sbueringer
Copy link
Member

sbueringer commented Apr 9, 2021

/kind feature

Currently, we only delete the kubeapi loadbalancer created by ClusterAPI. Usually, there are additional loadbalancers created by the ccm. I would also like to delete those on cluster deletion. They cannot be deleted by the ccm, because the ccm is not running anymore. The cluster deletion get's blocked because network etc. cannot be deleted if there are ccm orphan lbs left.

Notes:

@k8s-ci-robot k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Apr 9, 2021
@sbueringer sbueringer changed the title Clean up loadbalancers created by ccm Cleanup loadbalancers created by ccm Apr 9, 2021
@sbueringer
Copy link
Member Author

/assign @seanschneeweiss
(we already have an internal implementation of this feature)

/cc @iamemilio I assume you can provide some input on this :)

@k8s-ci-robot
Copy link
Contributor

@sbueringer: GitHub didn't allow me to assign the following users: seanschneeweiss.

Note that only kubernetes-sigs members, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time.
For more information please see the contributor guide

In response to this:

/assign @seanschneeweiss
(we already have an internal implementation of this feature)

/cc @iamemilio I assume you can provide some input on this :)

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@sbueringer
Copy link
Member Author

/assign
(assigning myself in place of @seanschneeweiss as he cannot be assigned)

@hidekazuna
Copy link
Contributor

This is interesting feature. I want delete or not flag. Because OpenStack cloud provider user not only uses load balancer, but also cinder CSI or ingress controller etc. they(and I) may confuse only load balancers are deleted even if I know load balancer is in CCM and others are in other controllers.

@jichenjc
Copy link
Contributor

jichenjc commented Apr 9, 2021

if you check https://github.com/kubernetes/cloud-provider-openstack
there are many resources might be related ,KMS, CSI (cinder, manila) etc
I think delete something not created by CAPO is dangerous as we might be lack of info to decide whether to delete and delete what?

@sbueringer
Copy link
Member Author

sbueringer commented Apr 9, 2021

I think if there's a reasonable safe way to identify which load balancers have been created by the ccm of the cluster we just deleted, it's fine to delete them. I think I wouldn't touch disks etc. as there is a chance we accidentally delete valuable data this way. Load balancers can be re-created if necessary.

The AWS e2e test verifies that the lbs are deleted and the disks survive the cluster deletion.

The alternative is that you always have to delete the loadbalancers via ccm first and then trigger the cluster deletion. Or that the cluster deletion gets stuck in the middle and someone has to manually cleanup the loadbalancers in OpenStack.

If we think the regular case is that users want to manually cleanup the loadbalancers we can make this opt-in instead of opt-out.

@seanschneeweiss
Copy link
Contributor

seanschneeweiss commented Apr 9, 2021

I've opened a PR #844 with the currently used implementation. This is a draft to be extended by the e2e tests and further filters to prevent unwanted deletion of other objects.
Sean Schneeweiss [email protected], Daimler TSS GmbH, legal info/Impressum

@sbueringer
Copy link
Member Author

/unassign
/assign @seanschneeweiss
(It wasn't possible to assign Sean before because he wasn't a member of the org)

@seanschneeweiss
Copy link
Contributor

Should there be an additional selection for orphaned load balancers other than the project id? How can we identify the load balancers of the to be deleted cluster?

@mkjpryor
Copy link
Contributor

Does this just delete all the load balancers in the OpenStack project? What happens if there is more than one cluster in a project?

Is the issue here that the network/router that we created won't delete because there are load balancers attached to it? In that case, it makes sense to me that the condition should be:

if we are deleting the network then delete the load balancers attached to it first

That means that:

  1. Load balancers for other clusters with separate networks will not be deleted
  2. If we did not create the network, and hence are leaving it behind afterwards, we leave the load balancers behind

That seems like sensible behaviour to me?

@hidekazuna
Copy link
Contributor

I am still looking for how to identify all the load balancers capo deployed.
Naming convention is fmt.Sprintf("kube_service_%s_%s_%s", clusterName, service.Namespace, service.Name), but I am not sure what clusterName is.

@jichenjc any comments?

@jichenjc
Copy link
Contributor

jichenjc commented Sep 14, 2021

@hidekazuna see this
https://github.com/kubernetes/cloud-provider/blob/master/app/core.go#L87

the clusterName comes from ctx.ComponentConfig.KubeCloudShared.ClusterName which I believe from k8s itself

@iamemilio
Copy link
Contributor

A trick we have used in our deployments is to apply a tag to resources created by CAPO. That might simplify your problem.

@jichenjc
Copy link
Contributor

apply a tag to resources created by CAPO

agree, create tag is much simple and should be considered common approach

but looks like this is more like a OCCM question
this issue is talking about LB created by OCCM need to be cleaned up as well if I understand correctly

@hidekazuna
Copy link
Contributor

@jichenjc Thanks information.
@iamemilio @jichenjc Yes, LBs in this issue are created by CCM, we can not use tag.
@seanschneeweiss I think we can identify LBs created by CCM deployed by CAPO by it's name starts with kube_service_clusterName.

@jichenjc
Copy link
Contributor

jichenjc commented Sep 15, 2021

this might come a big question, cloud provider support PV, PVC and other resource creation through cinder, manila , I am not sure whether CAPI cluster deletion action here will include those as well?

haven't try e.g whether this will cascade or prevent cluster deletion etc.
so real user use CAPI with workload might have more words on this

@mkjpryor
Copy link
Contributor

@jichenjc

CAPI cluster deletion does not delete service load balancers (it does delete the API server load balancer) or Cinder volumes provisioned for PVCs.

One option for CAPI clusters would be to have CAPI clean up LoadBalancer services and PVCs before deleting a cluster.

@jichenjc
Copy link
Contributor

CAPI cluster deletion does not delete service load balancers (it does delete the API server load balancer) or Cinder volumes provisioned for PVCs.

One option for CAPI clusters would be to have CAPI clean up LoadBalancer services and PVCs before deleting a cluster.

Yes, I understand , so as you indicated in the PR #990
maybe the reasonable way for now is to use naming conventions to remove the LBs first

and ask the CAPI community about the desire on handling the resources created outside of CAPI lifecycle..
I guess it might be the operator's job to consider ,or maybe better way is to create something really to do a quick test (e.g create CAPI cluster and create PV/PVC and LB etc on it ) then delete and see what happened

@hidekazuna
Copy link
Contributor

hidekazuna commented Sep 16, 2021

I want to deploy k8s clusters in the same network and the same project for testing purpose. I do not want to delete LBs based on the network nor project.

@seanschneeweiss
Copy link
Contributor

Deleting by name kube_service_<clustername>_<namespace>_<servicename> sounds the most reasonable to me as it includes the cluster name and we should have no unwanted deletions of other resources.
Maybe a check could be added that

  • counts the numbers of load balancers before deleting the network (when deleting the cluster),
  • and returns an error or prints a log message to notify about the orphaned load balancer.

@seanschneeweiss
Copy link
Contributor

The clustername originates from the setting cluster-name presented to the kube-controller-manager. xref
Default value is kubernetes, but with CAPI/CAPO this value should be set correctly I hope.

@seanschneeweiss
Copy link
Contributor

seanschneeweiss commented Jan 18, 2022

I guess that deleting PVs is out of scope. PVs might contain important data and they should probably never get deleted automatically or as part of a garbage collection.

Load Balancers can be recreated at any time. In addition, orphaned load balancers created by the OCCM are somehow expected and they block network deletion.
So I like the suggestion to delete load balancers based on the naming convention kube_service_<clustername>_<namespace>_<servicename>.

One option for CAPI clusters would be to have CAPI clean up LoadBalancer services and PVCs before deleting a cluster.

For the CAPO it would be good to rely on CAPI to clean the LoadBalancer services. I don't know if there is any provider where the load balancer is independent from the cluster network itself... I'll have to ask the CAPI developers.

@mkjpryor
Copy link
Contributor

mkjpryor commented Feb 2, 2022

Does this just delete all the load balancers in the OpenStack project? What happens if there is more than one cluster in a project?

Is the issue here that the network/router that we created won't delete because there are load balancers attached to it? In that case, it makes sense to me that the condition should be:

if we are deleting the network then delete the load balancers attached to it first

That means that:

  1. Load balancers for other clusters with separate networks will not be deleted
  2. If we did not create the network, and hence are leaving it behind afterwards, we leave the load balancers behind

That seems like sensible behaviour to me?

I still think that this seems like reasonable behaviour right?

If CAPO created the network for a cluster and wants to remove it when the cluster is deleted, then it should be assumed that it is OK to remove anything blocking that deletion, e.g. load balancers.

If CAPO did not create the network, then it doesn't need to delete the network as part of cluster deletion and we don't need to worry about it.

@mkjpryor
Copy link
Contributor

mkjpryor commented Feb 2, 2022

Just worried we are over-complicating things, and making ourselves brittle, by assuming naming conventions from the CCM...

@seanschneeweiss
Copy link
Contributor

seanschneeweiss commented Apr 12, 2022

If CAPO created the network for a cluster and wants to remove it when the cluster is deleted, then it should be assumed that it is OK to remove anything blocking that deletion, e.g. load balancers.

@mkjpryor Sounds reasonable, indeed. How can we identify whether CAPO created the network, any suggestion?

@mkjpryor
Copy link
Contributor

@seanschneeweiss

We already have a gate that decides whether we should delete the network or not. If we are deleting the network, we should delete the load balancers attached to it first.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 11, 2022
@seanschneeweiss
Copy link
Contributor

/remove-lifecycle stale

Time is a bit rare at this moment but I'll hope to work on this very soon. If someone likes to takeover, please do not hesitate to do so.

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 18, 2022
@nikParasyr
Copy link
Contributor

I would like something like but Im not sure how much extra logic and code a feature would add. A different approach, that doesnt require support from CAPO, is for users to use the BeforeClusterDeleteHook from the Lifecycle Hooks. It's still an experimental feature and only works with ClusterClass afaik, but it does allow users to inject arbitrary logic without capo needing to support anything specific.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 4, 2022
@chrischdi
Copy link
Member

xref: CAPA has a similar feature: https://cluster-api-aws.sigs.k8s.io/topics/external-resource-gc.html

@erkanerol
Copy link

For CAPO and CAPVCD, we use our own cleaner operator for this issue. I wish we had a flag in CAPO like CAPA.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Dec 17, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot closed this as not planned Won't fix, can't repro, duplicate, stale Jan 16, 2023
@github-project-automation github-project-automation bot moved this from Backlog to Done in Cluster API OpenStack Jan 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
No open projects
Status: Done