-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix/asg resource tags #5214
Fix/asg resource tags #5214
Conversation
Signed-off-by: tombokombo <[email protected]>
Hi @gjtempleton, could you please look at this PR? thx |
@gjtempleton are you planning to review this PR? |
Hey @tombokombo, apologies, I largely use assignment to PRs as my filter for which PRs to review hence me missing it in the GH notifications noise before. /assign @gjtempleton |
@gjtempleton ok, I'm really looking forward for review :) |
@gjtempleton still nothing? |
Hey, I've tried to spend some time reproducing the issue this is meant to resolve this evening and haven't been able to reproduce this as long as I updated the existing nodes in the ASG to present the custom resource. Are you able to provide further instructions for reproducing the issue you're seeing? For reference, I was using a custom resource under the name |
@gjtempleton If you have nodes running then the problem won't happen. The issue is what autoscaler does when the ASG is at zero and has never been scaled up before. |
OK, understood and now reproduced. I'd like to get @x13n's take here, as if we merged this, we'd be at the point of having two cloud provider specific processors in core code, which I'm pretty wary about. I'd also like to ensure we have some tests for this functionality as we already have around the processing of other resources. I'm wondering whether this also might be affecting any other cloud providers supporting specifying information like this for scale from zero? |
@gjtempleton I can't comment on this MR, but I'd like to note that this problem has been reported over and over (e.g., #3802, #5006, #5164, #5278, etc). Its like the feature just wasn't implemented? |
Signed-off-by: tombokombo <[email protected]>
Signed-off-by: tombokombo <[email protected]>
@gjtempleton I've added some tests, but functionality was already there and tested with ephemeral-storage. Regarding implementation, auto-scaling group tags are aws specific, there is autoscaler aws specific documentation that custom-resources should work via tags. That is the reason why I made aws specific provider modifications. I've could put it into generic template-info-provider, but it could brake half of other cloudproviders that I'm not able to test. There is already specific processor for gcp, so I'm not breaking any pattern. This patch could fix aws specific issues mention by @paleozogt |
/assign |
I think the way this is introduced into core is fine - we already are doing roughly the same thing with other cloud providers. I don't like this, but wouldn't block this PR on following an already established pattern. That being said, we should really look into better isolation between different cloud providers. #5394 I proposed some time ago was - in retrospect - too radical, but I think separating each cloud provider to have its own container image (and hence - own main.go) is the direction into which I'd go. I'm planning to update that issue with more details when I find a little bit of extra time. So - no blockers from me, but leaving the lgtm and approval to people more familiar with aws. |
@x13n thanks for review. I thought the same when I first dig into the code, that it needs core library and separate cloudproviders code...anyway, @gjtempleton its your turn now. |
If you're happy with it as is (and addressing for the long term in the linked issue) then I'm happy to go with this approach. /approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: gjtempleton, tombokombo The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@gjtempleton ok, thx and who can put here lgtm? |
Ah, sorry, my bad... |
Looks like this PR broke presubmit, on #5214 I'm getting the following error (full logs):
@tombokombo @gjtempleton can you please fix or roll back? |
@jbartosik i'm going to provide fix. |
@jbartosik fix here #5485 |
Which component this PR applies to?
autoscaler
What type of PR is this?
/kind bug
What this PR does / why we need it:
Problem is described in #5164
TLDR
In autoscaler docs is written
From version 1.14, Cluster Autoscaler can also determine the resources provided by each Auto Scaling Group via tags. The tag is of the format k8s.io/cluster-autoscaler/node-template/resources/<resource-name>. <resource-name> is the name of the resource, such as ephemeral-storage. The value of each tag specifies the amount of resource provided. The units are identical to the units used in the resources field of a Pod specification.
But I found no way how to make this work. There is long time relevant part of code but never executed. I'm using kops AWS cluster. So this PR reintroduce this functionality.
Which issue(s) this PR fixes:
Fixes #5164
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: