Extended resources provided by ASG via tags is not working #5164

tombokombo · 2022-09-05T10:02:09Z

Which component are you using?:
autoscaler

What version of the component are you using?:
1.25.0-alpha.0 AND 1.23.1

What k8s version are you using (kubectl version)?:

kubectl version Output

$ kubectl version Client Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.9", GitCommit:"6df4433e288edc9c40c2e344eb336f63fad45cd2", GitTreeState:"clean", BuildDate:"2022-04-13T19:57:43Z", GoVersion:"go1.16.15", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.9", GitCommit:"c1de2d70269039fe55efb98e737d9a29f9155246", GitTreeState:"clean", BuildDate:"2022-07-13T14:19:57Z", GoVersion:"go1.17.11", Compiler:"gc", Platform:"linux/amd64"}

What environment is this in?:
sanbox ( AWS )

What did you expect to happen?:
I tried to use extended resource defined as tag in ASG according documentation for AWS it should be k8s.io/cluster-autoscaler/node-template/resources/<resource-name> https://github.com/kubernetes/autoscaler/blob/cluster-autoscaler-1.23.1/cluster-autoscaler/cloudprovider/aws/README.md#auto-discovery-setup . This sould work at least for node-group-auto-discovery mode. Is anybody successfully using it?

What happened instead?:
This tag is never read or ignored. CA is still complaining about insufficient resource and not scaling up.

How to reproduce it (as minimally and precisely as possible):
Add extended resource as descirbe in https://kubernetes.io/docs/tasks/configure-pod-container/extended-resource/
Add k8s.io/cluster-autoscaler/node-template/resources/<resource-name> tag with same reasonable value to ASG

Anything else we need to know?:
I did some tests, it looks like this is never executed https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/aws/aws_manager.go#L412

The text was updated successfully, but these errors were encountered:

ZimmSebas · 2022-09-06T20:52:23Z

We're seeing the same result in our end, but for labels k8s.io/cluster-autoscaler/node-template/labels/<labels-name>.
During an upgrade from 1.22 to 1.23 on EKS, and our Cluster Autoscaler 1.23 has the same results, but on 1.22 we didn't have this problem

The only thing that I see that changed here, may be PR #4238 that plays around with labels? I never checked this code in depth before, but the extractAutoscalingOptionsFromTags has a different approach to the usual.
Maybe that breaks something? Just posting in case it helps whoever takes this

tombokombo · 2022-09-07T21:32:17Z

@ZimmSebas sound similar but its kind of different bug, yours could be related to #4238

Bug that I'm describing is more complex.
If you put eg custom-resouce: 2 to pods requests/limits, scaling up end up here

autoscaler/cluster-autoscaler/core/scale_up.go

Line 463 in c38cc74

return &status.ScaleUpStatus{

as inside computeExpansionOption() predicates will fail

autoscaler/cluster-autoscaler/core/scale_up.go

Line 446 in c38cc74

    
           option, err := computeExpansionOption(context, podEquivalenceGroups, nodeGroup, nodeInfo, upcomingNodes)

with predicate checking error: Insufficient custom-resouce;
so it will never reach

autoscaler/cluster-autoscaler/core/scale_up.go

Line 509 in c38cc74

    
           mainCreatedNodeInfo, err := utils.GetNodeInfoFromTemplate(createNodeGroupResult.MainCreatedNodeGroup, daemonSets, context.PredicateChecker, ignoredTaints)

func which in the end tries to extract k8s.io/cluster-autoscaler/node-template/resources annotation

drmorr0 · 2022-10-17T16:30:50Z

I spent a little while trying to track this down and couldn't figure out how to repro. I know we are using k8s.io/cluster-autoscaler/node-template/resources/ephemeral-storage on our clusters and it is working as expected. We're also on a slightly older version of cluster autoscaler. Is this a regression in behaviour, or has this always been broken? I'm not sure.

tombokombo · 2022-10-26T20:51:39Z

@drmorr0 which version and which cloudprovider?

k8s-triage-robot · 2023-01-24T21:04:53Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

tombokombo added the kind/bug Categorizes issue or PR as related to a bug. label Sep 5, 2022

tombokombo mentioned this issue Sep 26, 2022

Fix/asg resource tags #5214

Merged

jbartosik added the area/cluster-autoscaler label Sep 28, 2022

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 24, 2023

k8s-ci-robot closed this as completed in #5214 Feb 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extended resources provided by ASG via tags is not working #5164

Extended resources provided by ASG via tags is not working #5164

tombokombo commented Sep 5, 2022 •

edited

Loading

ZimmSebas commented Sep 6, 2022 •

edited

Loading

tombokombo commented Sep 7, 2022

drmorr0 commented Oct 17, 2022

tombokombo commented Oct 26, 2022

k8s-triage-robot commented Jan 24, 2023

Extended resources provided by ASG via tags is not working #5164

Extended resources provided by ASG via tags is not working #5164

Comments

tombokombo commented Sep 5, 2022 • edited Loading

ZimmSebas commented Sep 6, 2022 • edited Loading

tombokombo commented Sep 7, 2022

drmorr0 commented Oct 17, 2022

tombokombo commented Oct 26, 2022

k8s-triage-robot commented Jan 24, 2023

tombokombo commented Sep 5, 2022 •

edited

Loading

ZimmSebas commented Sep 6, 2022 •

edited

Loading