-
Notifications
You must be signed in to change notification settings - Fork 578
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Machine deletions fails if there is no ELB #1084
Comments
How would we connect to the cluster without an elb? 🤔 |
@vincepri Should we reconcile elb on "NotFound" and then try deletion? |
/priority important-soon |
It will be difficult for us to account for all situations where someone manually manipulates the AWS resources that CAPA creates and manages. If you make changes (edits or deletes) to these resources, it's probably up to you to resolve any issues, such as this one. I'm inclined to close this. WDYT @detiber @vincepri @sethp-nr @rudoi? |
I would also be willing to accept a code change to CAPI that attempts to delete the node up to n times and then gives up, without erroring. |
@ncdc I'm a bit torn on it, there are potentially things we could do to work around the load balancer being a blocker like it is today, such as using a static DNS entry for the apiserver endpoint rather than the LB dns name. I'm not sure it's something we can likely do in the short term, but longer term I don't think this should fail in a catastrophic way as it does today. |
How about if we start with my suggestion about not failing on node deletion and see what else trips us up, if anything? |
In the general case of worker nodes, that should probably be fine. |
I'm not sure it makes sense to distinguish? |
Today probably not, but when we have control plane management, we'd need to ensure that we handle the etcd membership properly on deletion of a node that is a control plane, which based on the model we are using with cluster-api-upgrade-tool, would require apiserver access. |
I filed kubernetes-sigs/cluster-api#1446 to try to delete the Node multiple times, then move on without considering it an error. |
The node deletion issue was fixed by kubernetes-sigs/cluster-api#1452, which will be in a future CAPI v0.2.x release. @vivgoyal PTAL and let us know if you think that's sufficient. Thanks! |
Looks good to close. I haven't tested it yet though. If I find any issues, I can anyday open a new one. |
Let's leave it open until you can test it with CAPI v0.2.3 or newer (v1alpha2) |
/priority awaiting-evidence |
@liztio: The label(s) In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/priority awaiting-more-evidence |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
This should have been fixed /close |
@vincepri: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/kind bug
What steps did you take and what happened:
I deleted AWS resources manually and then initiated machine and cluster deletion from CAPA.
As a result of this what I saw is delete machine deployments failing continuously because of:
What did you expect to happen:
What I expected was that deletions would succeed since the resources are NOT FOUND.
Anything else you would like to add:
Also, If I delete the ELB, it could be recreated, unless the Cluster is being deleted. That said, it would also mean a different DNS name, which imply that anything referencing the old DNS name would need to be updated, which may not be done automatically. That would include the kubeconfig secret, but more importantly the client config/kubeadm config for all of the existing Machines in the cluster.
Environment:
kubectl version
): 1.14.1/etc/os-release
):The text was updated successfully, but these errors were encountered: