-
Notifications
You must be signed in to change notification settings - Fork 578
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow garbage collector to delete ec2 instances #4568
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Signed-off-by: Vince Prignano <[email protected]>
2eb5d82
to
7c2d0b5
Compare
/retest |
/test ? |
@vincepri: The following commands are available to trigger required jobs:
The following commands are available to trigger optional jobs:
Use
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/test pull-cluster-api-provider-aws-e2e-eks-gc |
Actually seems these tests have been failing for quite some time https://testgrid.k8s.io/sig-cluster-lifecycle-cluster-api-provider-aws#pr-e2e-eks-gc-main |
FWIW, I've tested the logic in a custom cluster by creating an ec2 instance outside of CAPA, and it deleted properly |
@vincepri - i'm curious in what scenarios there would be an EC2 instance created that isn't managed by CAPA itself (i.e. via Machines/MachinePools)? The original idea of the GC was for cleaning up resources that where created by the CCM as this can block CAPA deleting a cluster. So, things like an application being deployed that has a service of type load balancer. |
@richardcase A few use cases I had in mind:
|
We've certainly seen users on AWS in the past create EC2 instances by themselves and join them to OpenShift clusters when the tooling within Kube/OpenShift didn't support a feature that they needed. I expect we aren't the only people seeing users do that Imagine before we supported say EFA networking, if a user wanted to use that, what would stop them building and adding their own EC2 instances to the workload cluster? I don't think anything would stop them, and we should expect that this has happened somewhere before, where we have feature gaps |
/milestone v2.3.0 |
@vincepri: You must be a member of the kubernetes-sigs/cluster-api-provider-aws-maintainers GitHub team to set the milestone. If you believe you should be able to issue the /milestone command, please contact your Cluster API Provider AWS Maintainers and have them propose you as an additional delegate for this responsibility. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
If the user created the EC2 instances, I think we can argue that the user is responsible for deleting them. However, I think we can argue that the purpose of the garbage collector is to unblock cluster deletion. If a user creates an EC2 instance, CAPA does not delete it in its ordinary reconciliation, and the instance blocks deletion of the subnet, VPC, etc, and therefore blocks cluster deletion. Therefore, we could say that the garbage collector should be extended to clean up all AWS resources that would block cluster deletion. I think the garbage collector must be "best effort," because there are edge cases. For example, some AWS resources may be missing the correct tags, and removing others may require different AWS credentials than the ones the garbage collector has. |
/milestone v2.3.0 |
@vincepri: You must be a member of the kubernetes-sigs/cluster-api-provider-aws-maintainers GitHub team to set the milestone. If you believe you should be able to issue the /milestone command, please contact your Cluster API Provider AWS Maintainers and have them propose you as an additional delegate for this responsibility. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@richardcase Is this good to go? |
/test pull-cluster-api-provider-aws-e2e-eks-gc |
Just testing its not a flake: /test pull-cluster-api-provider-aws-e2e-eks-gc (could be that #4575 will be needed...i get the failing test fixed on that PR) |
/test pull-cluster-api-provider-aws-e2e-eks-gc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks fine, but are there tests covering this case already? Else we need to add tests.
Circling back to this now i finally have some time. After doing another review i think we probably need: |
|
||
instanceID := strings.ReplaceAll(resource.ARN.Resource, "instance/", "") | ||
if err := s.deleteEC2Instance(ctx, instanceID); err != nil { | ||
return fmt.Errorf("deleting EC2 instance %s: %w", instanceID, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I have a new request here because I stumbled over non-actionable error messages in existing GC code (they were missing the region and humans shouldn't be required to loop through all accounts and regions to find an object where only the ID is logged). I'm fixing existing code in a new PR.
return fmt.Errorf("deleting EC2 instance %s: %w", instanceID, err) | |
return fmt.Errorf("deleting EC2 instance %s with ID %s: %w", resource.ARN, instanceID, err) |
/milestone v2.4.0 |
@vincepri: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
What type of PR is this?
What this PR does / why we need it:
/kind feature
Addition to the experimental gc by @richardcase, we should also see if we can scan the CAPA owned tags, to make sure we don't have any leftover. Thoughts?
Release note: