Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extended resources provided by ASG via tags is not working #5164

Closed
tombokombo opened this issue Sep 5, 2022 · 5 comments · Fixed by #5214 or #5737
Closed

Extended resources provided by ASG via tags is not working #5164

tombokombo opened this issue Sep 5, 2022 · 5 comments · Fixed by #5214 or #5737
Labels
area/cluster-autoscaler kind/bug Categorizes issue or PR as related to a bug. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.

Comments

@tombokombo
Copy link
Contributor

tombokombo commented Sep 5, 2022

Which component are you using?:
autoscaler

What version of the component are you using?:
1.25.0-alpha.0 AND 1.23.1

What k8s version are you using (kubectl version)?:

kubectl version Output
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.9", GitCommit:"6df4433e288edc9c40c2e344eb336f63fad45cd2", GitTreeState:"clean", BuildDate:"2022-04-13T19:57:43Z", GoVersion:"go1.16.15", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.9", GitCommit:"c1de2d70269039fe55efb98e737d9a29f9155246", GitTreeState:"clean", BuildDate:"2022-07-13T14:19:57Z", GoVersion:"go1.17.11", Compiler:"gc", Platform:"linux/amd64"}

What environment is this in?:
sanbox ( AWS )

What did you expect to happen?:
I tried to use extended resource defined as tag in ASG according documentation for AWS it should be k8s.io/cluster-autoscaler/node-template/resources/<resource-name> https://github.com/kubernetes/autoscaler/blob/cluster-autoscaler-1.23.1/cluster-autoscaler/cloudprovider/aws/README.md#auto-discovery-setup . This sould work at least for node-group-auto-discovery mode. Is anybody successfully using it?

What happened instead?:
This tag is never read or ignored. CA is still complaining about insufficient resource and not scaling up.

How to reproduce it (as minimally and precisely as possible):
Add extended resource as descirbe in https://kubernetes.io/docs/tasks/configure-pod-container/extended-resource/
Add k8s.io/cluster-autoscaler/node-template/resources/<resource-name> tag with same reasonable value to ASG

Anything else we need to know?:
I did some tests, it looks like this is never executed https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/aws/aws_manager.go#L412

@tombokombo tombokombo added the kind/bug Categorizes issue or PR as related to a bug. label Sep 5, 2022
@ZimmSebas
Copy link

ZimmSebas commented Sep 6, 2022

We're seeing the same result in our end, but for labels k8s.io/cluster-autoscaler/node-template/labels/<labels-name>.
During an upgrade from 1.22 to 1.23 on EKS, and our Cluster Autoscaler 1.23 has the same results, but on 1.22 we didn't have this problem

The only thing that I see that changed here, may be PR #4238 that plays around with labels? I never checked this code in depth before, but the extractAutoscalingOptionsFromTags has a different approach to the usual.
Maybe that breaks something? Just posting in case it helps whoever takes this

@tombokombo
Copy link
Contributor Author

@ZimmSebas sound similar but its kind of different bug, yours could be related to #4238

Bug that I'm describing is more complex.
If you put eg custom-resouce: 2 to pods requests/limits, scaling up end up here

return &status.ScaleUpStatus{

as inside computeExpansionOption() predicates will fail
option, err := computeExpansionOption(context, podEquivalenceGroups, nodeGroup, nodeInfo, upcomingNodes)

with predicate checking error: Insufficient custom-resouce;
so it will never reach
mainCreatedNodeInfo, err := utils.GetNodeInfoFromTemplate(createNodeGroupResult.MainCreatedNodeGroup, daemonSets, context.PredicateChecker, ignoredTaints)
func which in the end tries to extract k8s.io/cluster-autoscaler/node-template/resources annotation

@drmorr0
Copy link
Contributor

drmorr0 commented Oct 17, 2022

I spent a little while trying to track this down and couldn't figure out how to repro. I know we are using k8s.io/cluster-autoscaler/node-template/resources/ephemeral-storage on our clusters and it is working as expected. We're also on a slightly older version of cluster autoscaler. Is this a regression in behaviour, or has this always been broken? I'm not sure.

@tombokombo
Copy link
Contributor Author

@drmorr0 which version and which cloudprovider?

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/cluster-autoscaler kind/bug Categorizes issue or PR as related to a bug. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.
Projects
None yet
6 participants