-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cluster Autoscaler support for AWS EC2 attribute-based instance selection #5580
Comments
This should have been implemented in this PR #4588 What version of CAS are you using? |
@bwagner5 When looking at the Terraform code, the instance requirements are in ASG LT override, not the LT itself. Could this be the reason? This is the easy way to use the EKS modules self-managed node group. If only LT is queried then one have to use either custom LT or AWS provider resorces instead. |
It should work for both in the LT and as an LT ASG Override. Are you able to try it with an LT instead of an LT override though just to see? |
@bwagner5 I saw that your PR was merged into CAS 1.25 and most probably not backported to older versions of CAS, can you please confirm?! Here the data: latest Helm chart $ helm list -n kube-system | grep cluster-autoscaler
cluster-autoscaler kube-system 4 2023-03-10 07:26:44.405431457 +0000 UTC deployed cluster-autoscaler-9.26.0 1.24.0 Image: $ k get deploy -n kube-system cluster-autoscaler-aws-cluster-autoscaler -o yaml | yq e '.spec.template.spec.containers[0].image'
registry.k8s.io/autoscaling/cluster-autoscaler:v1.25.0 EKS version: $ k version --short
...yaml
Server Version: v1.25.6-eks-48e63af ASG info: $ aws autoscaling describe-auto-scaling-groups --auto-scaling-group-names smng-mixed-2023022810062797600000002d
{
"AutoScalingGroups": [
{
"AutoScalingGroupName": "smng-mixed-2023022810062797600000002d",
"AutoScalingGroupARN": "arn:aws:autoscaling:eu-west-1:<redacted>:autoScalingGroup:<redacted>:autoScalingGroupName/smng-mixed-2023022810062797600000002d",
"MixedInstancesPolicy": {
"LaunchTemplate": {
"LaunchTemplateSpecification": {
"LaunchTemplateId": "lt-090929890da8f991b",
"LaunchTemplateName": "smng-mixed-20230228100627260600000022",
"Version": "1"
},
"Overrides": [
{
"InstanceRequirements": {
"VCpuCount": {
"Min": 4,
"Max": 4
},
"MemoryMiB": {
"Min": 16384,
"Max": 16384
},
"ExcludedInstanceTypes": [
"g*",
"d*",
"z*",
"x*"
],
"BurstablePerformance": "excluded"
}
}
]
},
... |
Yes, that is correct. |
Unfortunately, running into the same issue. Even with the latest chart version and the latest 1.26.1 release of the autoscaler. I did upgrade from 1.24.0 to see if the problem is gone now. But unfortunately that doesn't seem to be the case. In my case I'm aswell using attribute based selection of EC2 instance types.
This issue only occurs if the ASG is scaled to 0 when the autoscaler is starting up. As soon as I scale up to 1 and restart the autoscaler, it will work. Someone else also raised a question here: https://devops.stackexchange.com/questions/16833/cluster-autoscaler-crash-unable-to-build-proper-template-node |
@spr-mweber3 It worked for me even on 1.25.0. CAS leader log excerpt
...
I0315 12:57:46.750288 1 expiration_cache.go:103] Entry smng-mixed-2023022810062797600000002d: {name:smng-mixed-2023022810062797600000002d instanceType:m4.xlarge} has expired
...
I0315 13:04:11.967399 1 scale_up.go:477] Best option to resize: smng-mixed-2023022810062797600000002d
I0315 13:04:11.967414 1 scale_up.go:481] Estimated 1 nodes needed in smng-mixed-2023022810062797600000002d
I0315 13:04:11.967440 1 scale_up.go:601] Final scale-up plan: [{smng-mixed-2023022810062797600000002d 0->1 (max: 3)}]
I0315 13:04:11.967461 1 scale_up.go:700] Scale-up: setting group smng-mixed-2023022810062797600000002d size to 1
I0315 13:04:11.967485 1 auto_scaling_groups.go:248] Setting asg smng-mixed-2023022810062797600000002d size to 1
I0315 13:04:11.967780 1 event_sink_logging_wrapper.go:48] Event(v1.ObjectReference{Kind:"ConfigMap", Namespace:"kube-system", Name:"cluster-autoscaler-status", UID:"12800f36-344b-4f5e-8e32-1f39380d60db", APIVersion:"v1", ResourceVersion:"21128420", FieldPath:""}): type: 'Normal' reason: 'ScaledUpGroup' Scale-up: setting group smng-mixed-2023022810062797600000002d size to 1 instead of 0 (max: 3)
I0315 13:04:12.118636 1 event_sink_logging_wrapper.go:48] Event(v1.ObjectReference{Kind:"ConfigMap", Namespace:"kube-system", Name:"cluster-autoscaler-status", UID:"12800f36-344b-4f5e-8e32-1f39380d60db", APIVersion:"v1", ResourceVersion:"21128420", FieldPath:""}): type: 'Normal' reason: 'ScaledUpGroup' Scale-up: group smng-mixed-2023022810062797600000002d size set to 1 instead of 0 (max: 3)
I0315 13:04:12.125654 1 event_sink_logging_wrapper.go:48] Event(v1.ObjectReference{Kind:"Pod", Namespace:"default", Name:"inflate-multi-az-system-comp-6b67d44f55-86jnd", UID:"344075cb-9762-4594-9f73-e9a3171b53f7", APIVersion:"v1", ResourceVersion:"21128440", FieldPath:""}): type: 'Normal' reason: 'TriggeredScaleUp' pod triggered scale-up: [{smng-mixed-2023022810062797600000002d 0->1 (max: 3)}]
I0315 13:04:12.133217 1 event_sink_logging_wrapper.go:48] Event(v1.ObjectReference{Kind:"Pod", Namespace:"default", Name:"inflate-multi-az-system-comp-6b67d44f55-gcxgr", UID:"64c3a596-3b19-45e1-91c2-26a049d64473", APIVersion:"v1", ResourceVersion:"21128436", FieldPath:""}): type: 'Normal' reason: 'TriggeredScaleUp' pod triggered scale-up: [{smng-mixed-2023022810062797600000002d 0->1 (max: 3)}] SMNG uses the the Terraform code I showed in initial comment. |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
Have the same issue. May I know any update for the thread? Is it resolved in the later releases.. |
/remove-lifecycle rotten |
/area provider/aws |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /close not-planned |
@k8s-triage-robot: Closing this issue, marking it as "Not Planned". In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Which component are you using?: Cluster Autoscaler
Is your feature request designed to solve a problem? If so describe the problem this feature should solve.:
AWS EC2 has a rich set of instance types. AWS atribute-based instance selection described here provides an easy way to specify instance selection for an Auto Scaling Group by providing for example required number of vCPU and memory.
Following a Terraform example using this in EKS module self-managed node group:
Describe the solution you'd like.: At the moment Cluster Autoscaler is not able to create a node template and raises the following error in leeader logs:
Describe any alternative solutions you've considered.: Develop a way to build a proper template node by either:
Additional context.: N/A
The text was updated successfully, but these errors were encountered: