-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ignore topology.ebs.csi.aws.com/zone label for Balance-Similar-NodeGroups #3230
Comments
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
/remove-lifecycle stale |
Any progress or ETA on this? After some initial troubleshooting I think this label is the cause of the balancing issues that we've been seeing in our EKS clusters. |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
/remove-lifecycle stale |
Bump. This is rather inconvenient, but my workaround is to manually provision the first node of an ASG thats set to 0 |
|
@westernspion Yes, that's #2493 that I mentioned in the original request. From Autoscaler 1.19, there is a
If you're using this for multiple labels, and deploying with Helm, I believe you'll hit #3673, but it should work fine with Helm to just add a single label to ignore. |
I've tested the ignore label with v1.19.x and v1.20.0 of the controller and it has no effect on the scheduling (still getting volume affinity errors for my statefulset after it's ASG reduces to 0 and I scale up again). I reviewed the source and this makes sense, as this controls which labels should be ignored on the autoscaling groups considered, not outright omitting affinities/selectors on resources such as PVs in the cluster So the autoscaler is intelligent enough to consider So looking through the code, and I haven't seen documentation for this, but the autoscaler appears to consider anything prefixed with 'k8s.io/cluster-autoscaler/node-template/label/' a custom label to resolve I'm going to test this out as it appears to answer my original question |
It would be nice to have this label automatically generated from the asg zone lst. |
@westernspion Oh, I'm sorry. I totally misread your original question, and also had forgotten the The documentation for the scale-from-zero ASG Tags -> Node Labels mapping is in the AWS-specific documentation, and is also talked about in the scale to 0 FAQ entry. You are correct, that part of this feature request is not user-doable using #2493, and I see now that I had noted that in the original request: "It would still need to be manually added to the ASG tags for the scale-from-zero case." It's been a while, but as best I recall, I was not using managed node groups, but using eksctl 'unmanaged' node groups, and so had manually added that tag to the nodegroup definition, since I was defining a nodegroup per-AZ anyway. It is a little problematic to include in the The complexity here is because for whatever reason, the EBS CSI Daemonset didn't just use the standard k8s region labels, although I guess in non-EKS setups, they might not be appropriate? So adding it to the default-ignored list seems pretty safe, but can also be done by the user as And I'm not sure of a good way to ensure this EBS CSI-specific label is visible to CA on scaled-to-zero setups, without either some significant assumptions being made, or direct administrator action. Edit: I opened kubernetes-sigs/aws-ebs-csi-driver#729 in case we can solve this completely by (optionally) making EBS CSI use a standard node label instead of its own label. |
Good news. Per kubernetes-sigs/aws-ebs-csi-driver#729 (comment), the AWS EBS CSI driver 0.10 release will include changes that fix this use-case, along with the use of Basically, the CSI driver will prefer the well-known zone label to the above specific label, and ensure both are applied identically in its CSI API; this means that although the Once This means that this proposed change (or the local config, if CA maintainers so-prefer) is sufficient to allow correct behaviour with AWS EBS CSI driver, i.e. no overly-ambitious balanced node groups which scale up but cannot schedule a Pod with an EBS volume, and no falsely-distinguished node-groups when an EBS volume is not present. That doesn't close this ticket, because I still think it would be nice if this label was automatically ignored on AWS, as originally suggested. The old label may be around for a while, and the AWS EBS CSI driver is going to become much more widespread once AWS starts deploying it by default on EKS and k8s removes the in-tree driver, targeted at 1.21 in the 1.17 release timeframe. |
In my not-so-humble opinion, both the eks.amazonaws.com/sourceLaunchTemplateId and topology.ebs.csi.aws.com/zone should always be ignored by the AWS CloudProvider portion of the cluster-autoscaler code. The solution of using an addition CLI arg not ideal since it requires adding settings to the command line args for the cluster-autoscaler that should be defaults. AWS customers are very likely to discover the hard way that running the AWS CSI Driver will break the cluster-autoscaler --balance-similar-nodegroups feature. And if any of those AWS customers are running a cluster older than 1.19 they are completely out of luck. Most will probably be like me and adopt the AWS CSI Driver, then discover that the cluster-autoscaler no longer balances similar nodegroups (hopefully without a serious production issues) and then start searching for why. If they are lucky they will end up here. It is far better to just do the right thing at the outset, and have the cluster-autoscaler ignore the labels that break it. Or maybe even use a whitelist of labels instead of blacklisting some. A whitelist would probably solve this issue and the ones that would arise in the future as other software adds labels to nodes. A good start for labels to whitelist would be:
The eks.amazonaws.com/sourceLaunchTemplateId label is added to nodes by the AWS Nodegroup. It too causes issues with the cluster-autoscaler balance-similar-nodegroups if the nodegroups do not all use the exact same launch template, regardless of whether the templates are similar. |
Just so I understand, your suggestion is that instead of having a list of labels to ignore when considering nodegroups for balancing (as we have now, a combination of generic and AWS-specific), there should be a list of labels to consider for differentiation, and other labels should not be considered, unless explicitly added to say "The value of this label must be the same for the nodegroups to be considered the same"? I ask because your example list there seems to mix both "must differentiate" (e.g., I'm somewhat wary of ignoring Adding a label to the default list is basically saying "No value we find here will affect whether this node is the same as some other node"; where "different launch templates produce interchangable nodes" seems more like a cluster-admin commitment than a generically true statement. But I am aware that I may be wrong about this. |
Yes, that is it exactly. Sorry about the What this approach is designed to do is to keep the cluster-autoscaler working even as new labels are added to nodes. The new labels may be added manually, or via controllers, operators, or deployment automation for any number of reasons. I would prefer if that did not break the cluster-autoscaler when it happens. I could live with the |
It's an interesting idea. I don't expect it'll see much traction here, so might be worth opening as a feature request/change suggestion if you want to pursue it further. |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-contributor-experience at kubernetes/community. |
/remove-lifecycle stale |
When using custom launch template, more labels are added. I guess there should be ignore also for this to work ? I see there is some work done here : https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/processors/nodegroupset/aws_nodegroups.go#L30 Is this where we should add the ignore label set. For example here is an EKS managed node group with custom launch template:
I have 3 similar node pool like this, but in addition of CSI, these might differ also:
|
That would be up to the local administrator to use with The defaults should be very conservative, and only include labels that can never be material to the equivalence of the nodes for balancing purposes. Particularly because I don't think the command-line API for this lets you override an existing default, so if |
fwiw, this is something we are running into with the Cluster API provider as AWS is one of the possible infrastructure providers for CAPI. given the discussion that has happened here kubernetes-sigs/aws-ebs-csi-driver#729 , i wonder if we couldn't make a patch to ignore this label as part of the "well-known labels" until the aws-ebs-csi-driver has completed deprecating the old value in favor of the common one ( edit: |
when balancing similar nodes, the autoscaler will look for differences in labels as one of its checks for similarity. by default it will ignore the well known kubernetes zone label (topology.kubernetes.io/zone), but the aws-ebs-csi-driver will add a unique label for its zone, `topology.ebs.csi.aws.com/zone`. this label will eventually be deprecated in favor of the common well known topology label. for more information about this see the references. ref: kubernetes/autoscaler#3230 ref: kubernetes-sigs/aws-ebs-csi-driver#729
This change adds the aforementioned label to the list of ignored labels in the AWS nodegroupset processor. This change is being made in response to the addition of this label by the aws-ebs-csi-driver. This label will eventually be deprecated by the driver, but its use will prevent AWS users from properly balancing similar nodes. Also adds unit test for the AWS processor. ref: kubernetes#3230 ref: kubernetes-sigs/aws-ebs-csi-driver#729
I'm glad to see #4458, and it addresses the title of the issue, but there's still the issue in the second paragraph, which can't be worked-around by CA command-line:
However, unlike the label-ignore change, I'm not as-certain it's the right fix, and it's probably not as-problematic anyway, since it only affects scale-from-zero, and after that, balancing should work correctly now. (And it can be worked around by adding AWS tags to the ASG if needed anyway). Perhaps it needs to be a new issue with more consideration. It's also possible I just misdiagnosed it. I honestly haven't thought about it since #3230 (comment), when it was shown useful. @elmiko I'm not sure if CAPI has the same problem or not. but a quick look suggests CAPI doesn't yet support scale-from-zero anyway. |
@TBBle i think you are correct that there could be an underlying issue with scale from zero, i haven't tested it thoroughly yet. that said...
it does not yet, but i am currently working on a patch for it. i will try to play around with the zone labels when i am testing it, perhaps i can find the issue you are talking about. |
Sounds good. To avoid confusion, that should probably be a separate ticket to isolate just that behaviour. |
agreed |
This change adds the aforementioned label to the list of ignored labels in the AWS nodegroupset processor. This change is being made in response to the addition of this label by the aws-ebs-csi-driver. This label will eventually be deprecated by the driver, but its use will prevent AWS users from properly balancing similar nodes. Also adds unit test for the AWS processor. ref: kubernetes#3230 ref: kubernetes-sigs/aws-ebs-csi-driver#729
This change adds the aforementioned label to the list of ignored labels in the AWS nodegroupset processor. This change is being made in response to the addition of this label by the aws-ebs-csi-driver. This label will eventually be deprecated by the driver, but its use will prevent AWS users from properly balancing similar nodes. Also adds unit test for the AWS processor. ref: kubernetes#3230 ref: kubernetes-sigs/aws-ebs-csi-driver#729
This change adds the aforementioned label to the list of ignored labels in the AWS nodegroupset processor. This change is being made in response to the addition of this label by the aws-ebs-csi-driver. This label will eventually be deprecated by the driver, but its use will prevent AWS users from properly balancing similar nodes. Also adds unit test for the AWS processor. ref: kubernetes#3230 ref: kubernetes-sigs/aws-ebs-csi-driver#729
The
topology.ebs.csi.aws.com/zone
label is added by the AWS EBS CSI driver as a target for Persistent Volume Node Affinity.If I understand correctly, a Pod that claims such a volume will only be considered by CA for node groups that have this label, but Pods that do not claim such a volume can be scheduled anywhere, so this should not block node-group-similarity in
CreateAwsNodeInfoComparator
.It probably should be added to
buildGenericLabels
likefailure-domain.beta.kubernetes.io/zone
, so that scale-from-zero works correctly, assuming this won't break systems that don't have the AWS EBS CSI driver installed. That would save having to add the label by-hand to the per-AZ ASGs needed when using EBS. I guess it might go odd if only some of the nodegroups run the CSI Daemonset, but I'm not sure that's a rational/expected setup?I just saw #2928 and #2493 (which to be fair, is not listed on the FAQ args-list), so I'd understand if the call is to leave this to user-config, and just document it on the AWS page for AWS EBS CSI users. It would still need to be manually added to the ASG tags for the scale-from-zero case.
I'm on k8s 1.15 so unless this or #2493 is backported a long way, it's not going to affect me immediately anyway.
The text was updated successfully, but these errors were encountered: