-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cluster Autoscaler stops working when cluster has nodes with invalid server ID #6716
Comments
/area provider/hetzner Thanks for the issue and PR @maksim-paskal! As far as I can tell from the code this issue is valid, did not reproduce locally.
In that case I would expect hcloud-cloud-controller-manager to delete the |
@apricote Thanks for quick response. Yes, the hcloud-cloud-controller-manager will delete these nodes, but sometimes the cluster-autoscaler will stop handling any Pending pods in your cluster. We manage clusters with more than 50 nodes, and I've observed in the cluster-autoscaler logs that even nodes with valid ProviderID can generate that message. This might occur when the hcloud-cloud-controller-manager is too busy or unavailable. Regardless, I believe that the cluster-autoscaler should continue processing new Pending pods, even if the cluster is experiencing issues with a single node. |
Sounds good then. I am not running the cluster-autoscaler anywhere myself, so any feedback from actual users is very valuable :) |
Which component are you using?: [email protected]
While using autoscaler in Hetzner Cloud provider:
Autoscaler stops working (continuous loop) with message:
Scenarios to catch this behaviour:
What did you expect to happen?:
The autoscaler must function even when worker nodes do not have valid ProviderIDs. This will enable users to add Hetzner Dedicated worker nodes. Additionally, the cluster should continue to operate even when the server has been physically deleted.
The text was updated successfully, but these errors were encountered: