Fix/asg resource tags #5214

tombokombo · 2022-09-26T23:41:20Z

Which component this PR applies to?

autoscaler

What type of PR is this?

/kind bug

What this PR does / why we need it:

Problem is described in #5164
TLDR
In autoscaler docs is written
From version 1.14, Cluster Autoscaler can also determine the resources provided by each Auto Scaling Group via tags. The tag is of the format k8s.io/cluster-autoscaler/node-template/resources/<resource-name>. <resource-name> is the name of the resource, such as ephemeral-storage. The value of each tag specifies the amount of resource provided. The units are identical to the units used in the resources field of a Pod specification.

But I found no way how to make this work. There is long time relevant part of code but never executed. I'm using kops AWS cluster. So this PR reintroduce this functionality.

Which issue(s) this PR fixes:

Fixes #5164

Special notes for your reviewer:

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

Signed-off-by: tombokombo <[email protected]>

tombokombo · 2022-10-14T13:14:52Z

Hi @gjtempleton, could you please look at this PR? thx

tombokombo · 2022-11-10T15:06:40Z

@gjtempleton are you planning to review this PR?

gjtempleton · 2022-12-11T17:21:23Z

Hey @tombokombo, apologies, I largely use assignment to PRs as my filter for which PRs to review hence me missing it in the GH notifications noise before.

/assign @gjtempleton

tombokombo · 2022-12-21T01:20:46Z

@gjtempleton ok, I'm really looking forward for review :)

tombokombo · 2023-01-13T13:13:59Z

@gjtempleton still nothing?

gjtempleton · 2023-01-17T23:06:58Z

Hey, I've tried to spend some time reproducing the issue this is meant to resolve this evening and haven't been able to reproduce this as long as I updated the existing nodes in the ASG to present the custom resource. Are you able to provide further instructions for reproducing the issue you're seeing?

For reference, I was using a custom resource under the name example.com/customresource.

paleozogt · 2023-01-17T23:09:16Z

@gjtempleton If you have nodes running then the problem won't happen. The issue is what autoscaler does when the ASG is at zero and has never been scaled up before.

gjtempleton · 2023-01-17T23:26:01Z

OK, understood and now reproduced.

I'd like to get @x13n's take here, as if we merged this, we'd be at the point of having two cloud provider specific processors in core code, which I'm pretty wary about. I'd also like to ensure we have some tests for this functionality as we already have around the processing of other resources.

I'm wondering whether this also might be affecting any other cloud providers supporting specifying information like this for scale from zero?

paleozogt · 2023-01-18T03:35:23Z

@gjtempleton I can't comment on this MR, but I'd like to note that this problem has been reported over and over (e.g., #3802, #5006, #5164, #5278, etc). Its like the feature just wasn't implemented?

Signed-off-by: tombokombo <[email protected]>

tombokombo · 2023-01-24T14:30:53Z

@gjtempleton I've added some tests, but functionality was already there and tested with ephemeral-storage. Regarding implementation, auto-scaling group tags are aws specific, there is autoscaler aws specific documentation that custom-resources should work via tags. That is the reason why I made aws specific provider modifications. I've could put it into generic template-info-provider, but it could brake half of other cloudproviders that I'm not able to test. There is already specific processor for gcp, so I'm not breaking any pattern. This patch could fix aws specific issues mention by @paleozogt

x13n · 2023-01-27T12:35:15Z

/assign

x13n · 2023-01-27T16:37:26Z

I think the way this is introduced into core is fine - we already are doing roughly the same thing with other cloud providers. I don't like this, but wouldn't block this PR on following an already established pattern. That being said, we should really look into better isolation between different cloud providers. #5394 I proposed some time ago was - in retrospect - too radical, but I think separating each cloud provider to have its own container image (and hence - own main.go) is the direction into which I'd go. I'm planning to update that issue with more details when I find a little bit of extra time.

So - no blockers from me, but leaving the lgtm and approval to people more familiar with aws.

tombokombo · 2023-01-28T16:06:11Z

@x13n thanks for review. I thought the same when I first dig into the code, that it needs core library and separate cloudproviders code...anyway, @gjtempleton its your turn now.

gjtempleton · 2023-02-06T22:03:24Z

If you're happy with it as is (and addressing for the long term in the linked issue) then I'm happy to go with this approach.

/approve

k8s-ci-robot · 2023-02-06T22:03:41Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: gjtempleton, tombokombo

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~cluster-autoscaler/OWNERS~~ [gjtempleton]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

tombokombo · 2023-02-07T09:25:59Z

@gjtempleton ok, thx and who can put here lgtm?

gjtempleton · 2023-02-07T09:33:52Z

Ah, sorry, my bad...
/lgtm

jbartosik · 2023-02-07T12:24:39Z

Looks like this PR broke presubmit, on #5214 I'm getting the following error (full logs):

testing hack/../cluster-autoscaler
# k8s.io/autoscaler/cluster-autoscaler/processors/nodeinfosprovider
Error: processors/nodeinfosprovider/asg_tag_resource_node_info_provider.go:38:67: not enough arguments in call to NewMixedTemplateNodeInfoProvider
	have (*time.Duration)
	want (*time.Duration, bool)
FAIL	k8s.io/autoscaler/cluster-autoscaler [build failed]

@tombokombo @gjtempleton can you please fix or roll back?

tombokombo · 2023-02-07T13:21:31Z

@jbartosik i'm going to provide fix.

tombokombo · 2023-02-07T13:41:55Z

@jbartosik fix here #5485

tombokombo added 2 commits September 27, 2022 01:25

fix asg resource tags

17a09a0

Signed-off-by: tombokombo <[email protected]>

Merge branch 'kubernetes:master' into fix/asg-resource-tags

5ba9efd

k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Sep 26, 2022

k8s-ci-robot requested review from aleksandra-malinowska and feiskyer September 26, 2022 23:41

jbartosik added the area/cluster-autoscaler label Sep 28, 2022

mwielgus requested review from gjtempleton and removed request for aleksandra-malinowska September 29, 2022 22:38

mwielgus added the area/provider/aws Issues or PRs related to aws provider label Sep 29, 2022

k8s-ci-robot assigned gjtempleton Dec 11, 2022

tombokombo mentioned this pull request Jan 23, 2023

Option to ignore pod extended resources #5166

Closed

tombokombo added 2 commits January 23, 2023 22:06

extended test for resource extracting

6189f09

Signed-off-by: tombokombo <[email protected]>

more tests

b0f064a

Signed-off-by: tombokombo <[email protected]>

k8s-ci-robot assigned x13n Jan 27, 2023

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 6, 2023

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 7, 2023

k8s-ci-robot merged commit e911e54 into kubernetes:master Feb 7, 2023

paleozogt mentioned this pull request Feb 28, 2023

Scaling from 0 doesn't work for GPU nodes. Reasons: Insufficient nvidia.com/gpu #5278

Closed

This was referenced May 3, 2023

Fix/asg resource tags for 1.26.x #5722

Merged

Fix/asg resource tags for 1.25.x #5736

Merged

Fix/asg resource tags for 1.24.x #5737

Merged

Fix/asg resource tags for 1.23.x #5739

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix/asg resource tags #5214

Fix/asg resource tags #5214

tombokombo commented Sep 26, 2022 •

edited

Loading

tombokombo commented Oct 14, 2022

tombokombo commented Nov 10, 2022

gjtempleton commented Dec 11, 2022

tombokombo commented Dec 21, 2022

tombokombo commented Jan 13, 2023

gjtempleton commented Jan 17, 2023

paleozogt commented Jan 17, 2023

gjtempleton commented Jan 17, 2023

paleozogt commented Jan 18, 2023

tombokombo commented Jan 24, 2023

x13n commented Jan 27, 2023

x13n commented Jan 27, 2023

tombokombo commented Jan 28, 2023

gjtempleton commented Feb 6, 2023

k8s-ci-robot commented Feb 6, 2023

tombokombo commented Feb 7, 2023

gjtempleton commented Feb 7, 2023

jbartosik commented Feb 7, 2023

tombokombo commented Feb 7, 2023

tombokombo commented Feb 7, 2023

Fix/asg resource tags #5214

Fix/asg resource tags #5214

Conversation

tombokombo commented Sep 26, 2022 • edited Loading

Which component this PR applies to?

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

tombokombo commented Oct 14, 2022

tombokombo commented Nov 10, 2022

gjtempleton commented Dec 11, 2022

tombokombo commented Dec 21, 2022

tombokombo commented Jan 13, 2023

gjtempleton commented Jan 17, 2023

paleozogt commented Jan 17, 2023

gjtempleton commented Jan 17, 2023

paleozogt commented Jan 18, 2023

tombokombo commented Jan 24, 2023

x13n commented Jan 27, 2023

x13n commented Jan 27, 2023

tombokombo commented Jan 28, 2023

gjtempleton commented Feb 6, 2023

k8s-ci-robot commented Feb 6, 2023

tombokombo commented Feb 7, 2023

gjtempleton commented Feb 7, 2023

jbartosik commented Feb 7, 2023

tombokombo commented Feb 7, 2023

tombokombo commented Feb 7, 2023

tombokombo commented Sep 26, 2022 •

edited

Loading