Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix/asg resource tags #5214

Merged
merged 4 commits into from
Feb 7, 2023

Conversation

tombokombo
Copy link
Contributor

@tombokombo tombokombo commented Sep 26, 2022

Which component this PR applies to?

autoscaler

What type of PR is this?

/kind bug

What this PR does / why we need it:

Problem is described in #5164
TLDR
In autoscaler docs is written
From version 1.14, Cluster Autoscaler can also determine the resources provided by each Auto Scaling Group via tags. The tag is of the format k8s.io/cluster-autoscaler/node-template/resources/<resource-name>. <resource-name> is the name of the resource, such as ephemeral-storage. The value of each tag specifies the amount of resource provided. The units are identical to the units used in the resources field of a Pod specification.

But I found no way how to make this work. There is long time relevant part of code but never executed. I'm using kops AWS cluster. So this PR reintroduce this functionality.

Which issue(s) this PR fixes:

Fixes #5164

Special notes for your reviewer:

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Sep 26, 2022
@mwielgus mwielgus requested review from gjtempleton and removed request for aleksandra-malinowska September 29, 2022 22:38
@mwielgus mwielgus added the area/provider/aws Issues or PRs related to aws provider label Sep 29, 2022
@tombokombo
Copy link
Contributor Author

Hi @gjtempleton, could you please look at this PR? thx

@tombokombo
Copy link
Contributor Author

@gjtempleton are you planning to review this PR?

@gjtempleton
Copy link
Member

Hey @tombokombo, apologies, I largely use assignment to PRs as my filter for which PRs to review hence me missing it in the GH notifications noise before.

/assign @gjtempleton

@tombokombo
Copy link
Contributor Author

@gjtempleton ok, I'm really looking forward for review :)

@tombokombo
Copy link
Contributor Author

@gjtempleton still nothing?

@gjtempleton
Copy link
Member

Hey, I've tried to spend some time reproducing the issue this is meant to resolve this evening and haven't been able to reproduce this as long as I updated the existing nodes in the ASG to present the custom resource. Are you able to provide further instructions for reproducing the issue you're seeing?

For reference, I was using a custom resource under the name example.com/customresource.

@paleozogt
Copy link

@gjtempleton If you have nodes running then the problem won't happen. The issue is what autoscaler does when the ASG is at zero and has never been scaled up before.

@gjtempleton
Copy link
Member

OK, understood and now reproduced.

I'd like to get @x13n's take here, as if we merged this, we'd be at the point of having two cloud provider specific processors in core code, which I'm pretty wary about. I'd also like to ensure we have some tests for this functionality as we already have around the processing of other resources.

I'm wondering whether this also might be affecting any other cloud providers supporting specifying information like this for scale from zero?

@paleozogt
Copy link

@gjtempleton I can't comment on this MR, but I'd like to note that this problem has been reported over and over (e.g., #3802, #5006, #5164, #5278, etc). Its like the feature just wasn't implemented?

@tombokombo
Copy link
Contributor Author

@gjtempleton I've added some tests, but functionality was already there and tested with ephemeral-storage. Regarding implementation, auto-scaling group tags are aws specific, there is autoscaler aws specific documentation that custom-resources should work via tags. That is the reason why I made aws specific provider modifications. I've could put it into generic template-info-provider, but it could brake half of other cloudproviders that I'm not able to test. There is already specific processor for gcp, so I'm not breaking any pattern. This patch could fix aws specific issues mention by @paleozogt

@x13n
Copy link
Member

x13n commented Jan 27, 2023

/assign

@x13n
Copy link
Member

x13n commented Jan 27, 2023

I think the way this is introduced into core is fine - we already are doing roughly the same thing with other cloud providers. I don't like this, but wouldn't block this PR on following an already established pattern. That being said, we should really look into better isolation between different cloud providers. #5394 I proposed some time ago was - in retrospect - too radical, but I think separating each cloud provider to have its own container image (and hence - own main.go) is the direction into which I'd go. I'm planning to update that issue with more details when I find a little bit of extra time.

So - no blockers from me, but leaving the lgtm and approval to people more familiar with aws.

@tombokombo
Copy link
Contributor Author

@x13n thanks for review. I thought the same when I first dig into the code, that it needs core library and separate cloudproviders code...anyway, @gjtempleton its your turn now.

@gjtempleton
Copy link
Member

If you're happy with it as is (and addressing for the long term in the linked issue) then I'm happy to go with this approach.

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: gjtempleton, tombokombo

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 6, 2023
@tombokombo
Copy link
Contributor Author

@gjtempleton ok, thx and who can put here lgtm?

@gjtempleton
Copy link
Member

Ah, sorry, my bad...
/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 7, 2023
@k8s-ci-robot k8s-ci-robot merged commit e911e54 into kubernetes:master Feb 7, 2023
@jbartosik
Copy link
Collaborator

Looks like this PR broke presubmit, on #5214 I'm getting the following error (full logs):

testing hack/../cluster-autoscaler
# k8s.io/autoscaler/cluster-autoscaler/processors/nodeinfosprovider
Error: processors/nodeinfosprovider/asg_tag_resource_node_info_provider.go:38:67: not enough arguments in call to NewMixedTemplateNodeInfoProvider
	have (*time.Duration)
	want (*time.Duration, bool)
FAIL	k8s.io/autoscaler/cluster-autoscaler [build failed]

@tombokombo @gjtempleton can you please fix or roll back?

@tombokombo
Copy link
Contributor Author

@jbartosik i'm going to provide fix.

@tombokombo
Copy link
Contributor Author

@jbartosik fix here #5485

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/cluster-autoscaler area/provider/aws Issues or PRs related to aws provider cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Extended resources provided by ASG via tags is not working
7 participants