-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Azure- returning in-memory size incorrect value when spot instance is deleted #7373
Comments
/kind cluster-autoscaler |
@adrianmoisey: The label(s) In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/area cluster-autoscaler |
/triage accepted |
Until fixed, one should be able to work around the issue by setting |
How would we go about setting |
@d3v3l0p3r Add it to environment variables (with the value in seconds) defined for container deployment, e.g. using extraEnv in the Helm Chart; something like (untested): extraEnv:
AZURE_GET_VMSS_SIZE_REFRESH_PERIOD: "300" This of course only works for self-hosted (vs AKS-managed) deployment of cluster autoscaler. |
Which component are you using?:cluster-autoscaler
What version of the component are you using?: 1.31
Component version: 1.31
What k8s version are you using (
kubectl version
)?: 1.30.5+k3s1kubectl version
OutputWhat environment is this in?: Azure
What did you expect to happen?: When a VMSS spot instance is deleted and the node is removed from the cluster I expect the autoscaler to invalidate its cache
What happened instead?: Schedulable pods are present, however the in-memory size is 9 but the actual VMSS set is only 7
1 filter_out_schedulable.go:78] Schedulable pods present │
│ I1009 02:24:15.536067 1 static_autoscaler.go:557] No unschedulable pods │
│ I1009 02:24:15.536082 1 azure_scale_set.go:217] VMSS: k8-agent-2, returning in-memory size: 0 │
│ I1009 02:24:15.536093 1 azure_scale_set.go:217] VMSS: k8-agent-d2ds_v5, returning in-memory size: 9
--- eventually this will start logging in a loop when the cluster tries to scale down ----
│ I1009 02:31:59.254556 1 static_autoscaler.go:756] Decreasing size of k8-agent-d2ds_v5, expected=9 current=7 delta=-2 │
│ I1009 02:31:59.254570 1 azure_scale_set_instance_cache.go:77] invalidating instanceCache for k8-agent-d2ds_v5 │
│ I1009 02:31:59.254579 1 azure_scale_set.go:217] VMSS: k8-agent-d2ds_v5, returning in-memory size: 9 │
│ I1009 02:31:59.254594 1 static_autoscaler.go:469] Some node group target size was fixed, skipping the iteration
How to reproduce it (as minimally and precisely as possible):
Setup K3S cluster (not using AKS)
Set provider ID on nodes to proper format ie aks:///
Set kubernetes.azure.com/agentpool node label
Add tags to VMSS for auto scaler
Increase workload to have autoscaler create new nodes.
Delete a VMSS instance from Azure
In memory size never refreshes, new nodes are never created.
I have to restart the cluster-autoscaler pod to scale the cluster back up
Anything else we need to know?:
The text was updated successfully, but these errors were encountered: