Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS NLB linger after they're orphaned #1718

Closed
voor opened this issue May 7, 2020 · 20 comments · Fixed by #3648
Closed

AWS NLB linger after they're orphaned #1718

voor opened this issue May 7, 2020 · 20 comments · Fixed by #3648
Assignees
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/bug Categorizes issue or PR as related to a bug. lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@voor
Copy link
Member

voor commented May 7, 2020

/kind bug

What steps did you take and what happened:
Workload clusters are able to create Network Load Balancers instead of Classic ELBs, you just add an annotation service.beta.kubernetes.io/aws-load-balancer-type: nlb (read more), we need to identify those Load Balancers by tag and delete them as well when the associated workload cluster is destroyed.

Referenced here in a conversation on the topic.

What did you expect to happen:
NLB are deleted alongside the cluster.

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]

Environment:

  • Cluster-api-provider-aws version: v0.5.2
  • Kubernetes version: (use kubectl version):
  • OS (e.g. from /etc/os-release):
@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label May 7, 2020
@vincepri
Copy link
Member

vincepri commented May 7, 2020

/milestone v0.5.x
/help

@k8s-ci-robot
Copy link
Contributor

@vincepri:
This request has been marked as needing help from a contributor.

Please ensure the request meets the requirements listed here.

If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-help command.

In response to this:

/milestone v0.5.x
/help

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added this to the v0.5.x milestone May 7, 2020
@k8s-ci-robot k8s-ci-robot added the help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. label May 7, 2020
@bagnaram
Copy link
Contributor

/assign

@bagnaram
Copy link
Contributor

It is probably safe to assume that we will need an additional elbv2 service to handle the cleanup of the NLB resources

@randomvariable
Copy link
Member

randomvariable commented May 18, 2020

The issue is that you essentially need to drain the workload cluster of services, so that the cloud provider tears down the NLBs. It's not specifically a CAPA problem IMHO, or you add logic to CAPA to handle resources created by the cloud provider integration. Worth adding to the agenda for the meeting, as I'm a bit weary of crossing responsibility boundaries here.

@detiber
Copy link
Member

detiber commented May 18, 2020

@randomvariable that is a good point, we could remove the current ELB Classic cleanup we do in favor of moving the core functionality into core cluster api, where we could delete all Services w/ Type=LoadBalancer prior to deletion of a given Cluster. That would then cover any similar issues that would arise with other infrastructure providers as well.

@randomvariable
Copy link
Member

That could be good. Did suggest to @nckturner, @justinsb, @andrewsykim that we could use the test framework to set up CI for the cloud provider repo. Having CAPI take care of auto-deleting service type="load balancers" would be a neat trick.

@randomvariable
Copy link
Member

We forgot to discuss this in yesterday's meeting. I'll file an issue to Cluster API and add it to the agenda.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 12, 2020
@voor
Copy link
Member Author

voor commented Nov 12, 2020

/remove-lifecycle stale
/remove-lifecycle frozen

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 12, 2020
@voor
Copy link
Member Author

voor commented Nov 12, 2020

/lifecycle frozen

@k8s-ci-robot k8s-ci-robot added the lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. label Nov 12, 2020
@randomvariable randomvariable modified the milestones: v0.6.x, Next Mar 11, 2021
@sedefsavas
Copy link
Contributor

Closing in favor of kubernetes-sigs/cluster-api#3075

@richardcase
Copy link
Member

@sedefsavas - this issue is impacting some customers so going to reopen with this plan in mind:

/reopen
/assign
/lifecycle active

@k8s-ci-robot
Copy link
Contributor

@richardcase: Reopened this issue.

In response to this:

@sedefsavas - this issue is impacting some customers so going to reopen with this plan in mind:

/reopen
/assign
/lifecycle active

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. needs-priority needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. and removed lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. labels Jun 9, 2022
@richardcase
Copy link
Member

/priority critical-urgent

@k8s-ci-robot k8s-ci-robot added priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. and removed needs-priority labels Jun 9, 2022
@richardcase
Copy link
Member

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jun 9, 2022
@richardcase
Copy link
Member

Just tested this scenario and it does occur. The delete of the cluster gets stuck because of

E0609 15:49:16.022022      14 awsmanagedcontrolplane_controller.go:292] controller/awsmanagedcontrolplane "msg"="error deleting network for AWSManagedControlPlane" "error"="failed  │
│ to detach internet gateway \"igw-0f81d9e12a5a97bf2\": DependencyViolation: Network vpc-0b06fcdbbc37ab172 has some mapped public address(es). Please unmap those public address(es) b │
│ efore detaching the gateway.\n\tstatus code: 400, request id: 65dc0fa0-584f-4256-baf5-a2aac2d2dde4" "reconciler group"="controlplane.cluster.x-k8s.io" "reconciler kind"="AWSManaged │
│ ControlPlane" "name"="capi-managed-test-control-plane" "namespace"="default"                                                                                                         │
│ I0609 15:49:16.022130      14 recorder.go:103] events "msg"="Warning"  "message"="Failed to detach Internet Gateway \"igw-0f81d9e12a5a97bf2\" from VPC \"vpc-0b06fcdbbc37ab172\": De │
│ pendencyViolation: Network vpc-0b06fcdbbc37ab172 has some mapped public address(es). Please unmap those public address(es) before detaching the gateway.\n\tstatus code: 400, reques │
│ t id: 65dc0fa0-584f-4256-baf5-a2aac2d2dde4" "object"={"kind":"AWSManagedControlPlane","namespace":"default","name":"capi-managed-test-control-plane","uid":"adefde7f-760d-453d-b81e- │
│ cde2461ccdd6","apiVersion":"controlplane.cluster.x-k8s.io/v1beta1","resourceVersion":"20624"} "reason"="FailedDetachInternetGateway"

And the load balancer still exists.

@sedefsavas
Copy link
Contributor

Looks like we only clean up the CCM created ELBs but not NLBs.

@richardcase
Copy link
Member

Looks like we only clean up the CCM created ELBs but not NLBs.

@sedefsavas - spot on :)

@richardcase
Copy link
Member

/milestone v1.5.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/bug Categorizes issue or PR as related to a bug. lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
9 participants