-
Notifications
You must be signed in to change notification settings - Fork 313
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cluster-autoscaler does not respect CriticalAddonsOnly taint which is the only taint available to system nodes #2513
Comments
Hi jclangst, AKS bot here 👋 I might be just a bot, but I'm told my suggestions are normally quite good, as such:
|
@jclangst are you using the AKS managed autoscaler? There's a change that went in a while ago to drop that one from the sanitized list of taints. Happy to look into it further. |
@marwanad Good question. No we are using |
@jclangst gotcha - let me see if I can make the change to upstream. The only possible way of doing it would be via a flag to disable that sanitization if anything. I'm very curious - what are you missing in the managed autoscaler? |
@marwanad A few stakeholders have worked together to generate a proposed solution which sounds aligned with your proposal for a new flag in upstream: kubernetes/autoscaler#4097. As for the missing features, I don't have the complete list, but I know that the "managed autoscaler" was making it hard to uncover the root cause for issues like this b/c we didn't have complete visibility into how Azure was configuring it so we needed to remove that layer of abstraction to make progress: kubernetes/autoscaler#4099. In other words, we had to introduce custom patches to the autoscaler until there are upstream fixes. |
@jclangst fair enough, we're usually quick when patching and back-porting those upstream issues but totally understand if you're in a happy place with the unmanaged version. The only downside is certain fixes/improvements that are AKS-specific that are bit of a hassle to get upstream (example the use of CriticalAddons taint is different than other providers). I'll probably resolve this issue (since it relates to upstream CA and will try and work on an upstream change to accommodate that or if you're open to it - feel free to PR it there. What are you lacking in terms of visibility for the managed autoscaler? Are the logs not sufficient? You should be able to see most configs there. |
@marwanad Yes, I think that you can close this issue. I wanted to bring visibility to the AKS team in case you had opinions different than mine which is that a fix is needed in the upstream CA (see originally linked issue). For visibility, just the inability to mess with the code was the biggest limitation given all of the "interesting" behavior that we are uncovering, especially in re-using configuration across clouds; the plugin-and-play nature of the CA cloud-plugins hasn't quite lived up to the promise. |
What happened:
The system nodes only have the option to add the
CriticalAddonsOnly
node taint. The cluster autoscaler ignores theCriticalAddonsOnly
node taint in scheduling computation and results in undefined and undesired behavior.What you expected to happen:
The cluster autoscaler properly respects
CriticalAddonsOnly
taints on the system nodes during its scheduling calculations OR the AKS team allows other taints to be added to system nodes.How to reproduce it (as minimally and precisely as possible):
Run a cluster with system nodes with the
CriticalAddonsOnly
node taint and install the cluster autoscaler.Anything else we need to know?:
See this issue in the cluster autoscaler repository: kubernetes/autoscaler#4097. There is some commentary that AKS's use of taints is incorrect. I think your implementation is within spec, but it is worth this team weighing in on b/c until alignment is reached b/w the AKS team and the cluster autoscaler maintainers, the cluster autoscaler is has undefined and undesired behavior in the common scenario described above.
Environment:
kubectl version
): 1.20The text was updated successfully, but these errors were encountered: