-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add PodTemplates support as NodeInfoProcessor in Cluster-Autoscaler #3964
Add PodTemplates support as NodeInfoProcessor in Cluster-Autoscaler #3964
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: clamoriniere The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What I had in mind during the sig-meeting was to implement a NodeInfoProcessor that would inject those pods directly into template nodes. My reasoning is that NodeInfoProcessor allows all sort of modifications to template nodes or pods that run on them. The new processor introduced in this PR seems tailored to this single use-case, handling custom DS controllers is the only reason I can think of for modifying the list of DS. I'd argue that this defeats the point of having a processor:
Is there some reason why injecting into list of daemonsets would work better than injecting directly into template nodes? Or is there some other possible use-case where this processor could be useful that I'm missing? |
Hi @MaciekPytel Sorry, I didn't fully understood your recommendation to use the
I focused too much on my use case which is to recognize other resources as daemonsets. That is why updating the Daemonsets list seemed to be a good approach to not modify the workflows that use this list, like defining if a But I can definitely work on a solution that implements the |
Thanks. Just for the record - I'm not saying I'm 100% certain that NodeInfoProcessor based solution is the best one, just that I'd like to consider it as an option and, if it's not good enough understand why that is. I think it should cover scale-up (though I may be missing something). |
@MaciekPytel I have a question about the For the scale down. I think I have already cover the needs in #2483. |
Regarding running multiple processors:
|
a277580
to
277c519
Compare
277c519
to
0356b02
Compare
7975c64
to
3d0c914
Compare
Hi @MaciekPytel I have updated the PR based on our previous discussion. I added two new commits:
|
228614f
to
c1b1594
Compare
c1b1594
to
6411d70
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i am not overly familiar with the processors code, but this generally makes sense to me. i did have one question though.
for _, podInfo := range baseNodeInfo.Pods { | ||
pods = append(pods, podInfo.Pod) | ||
} | ||
if err := clusterSnapshot.AddNodeWithPods(node, pods); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems like a side effect of calling this function, i would expect from the name that it just returns the nodeinfo with the pod templates. why do we add the node/pods to the cluster snapshot here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that makes sense following the daemonset pattern. i was mainly asking because i don't know this code that well and was curious, i appreciate any further findings you might share =)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @elmiko,
Sorry for the late reply,
I checked why it is used like this in the GetDaemonSetPodsForNode()
function.
So for what I understand, it is because in this specific case the Node
doesn't exist yet (we are in the edgecase where we scale from 0), and we don't want to add this Node the context.AutoscalingContext.ClusterSnapshot
yet.
The other solution could have been to use "context.AutoscalingContext.ClusterSnapshot.Fork()" but since we are not interested by the others NodeInfo
here, creating a new ClusterSnapshot
with simulator.NewBasicClusterSnapshot()
limit the memory copy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @clamoriniere ! i appreciate the extra context =)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using an empty snapshot makes nodes that are already in the clusters invisible to scheduling, in other words you implicitly assume such pods wouldn't impact the result of scheduling. There are some scheduling constraints (podAffinity, podTopologySpreading) that look at multiple nodes. Admittedly using any of those on DS or DS-like pod seems like a really bad idea, but technically I think it's more correct to go with Fork/Revert route.
Also - Fork/Revert is used across codebase, but I don't think we actually use BasicClusterSnapshot anywhere in CA. So you're exposing yourself to code that is not tested in production.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @MaciekPytel
thanks for the details.
The edge-case that I would like to solve is the "scale from 0", for which the cluster-autoscaler is looking at Daemonset to simulate an existing Node. Thanks is why in this case, knowing the other existing Nodes doesn't seems useful. It is what I understood from code use to generate Pod based on Daemonset
clusterSnapshot := simulator.NewBasicClusterSnapshot() |
But I don't see any reason why I can't change PodTemplate NodeInfoProcessor to use a Fork.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough, I was clearly wrong on not using BasicClusterSnapshot anywhere. I don't think the fact that we're scaling-from-0 changes anything here though. Those scheduling constraints look at all nodes in a given topology (in this case the relevant topology being zone) and the fact that there are no nodes in a given NodeGroup does not imply there are no other nodes in a zone. That being said it is a bit of a theoretical problem as I can't think of a reason to use topologySpreading or podAffinity on DS and the fact that no-one run into problems with existing DS logic seems to confirm that.
New changes are detected. LGTM label has been removed. |
The PodTemplate Processor is used to simulate DaemonSet resources from PodTemplate that have a specific label. The processorr is used during ScaleUp operation.
`nodeInfoWithPodTemplateProcessor` implements the `NodeInfoProcessor` interface. This NodeInfoProcessor can be used to assign Pods to a NodeInfo during scale-up simulation. The processor looks at PodTemplates to generates Pods from these templates. Only PodTemplates with a specific label are selected.
The `nodeinfos.builder` package contains: * the `Register()` function use to register `nodeinfos.NodeInfoProcessor` implementation. * the `Build()` function use to instanciate the requested `nodeinfos.NodeInfoProcessor` implementation from `AutoscalerOptions`.
Signed-off-by: cedric lamoriniere <[email protected]>
672fe3b
to
aee8776
Compare
@clamoriniere: PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
No activity, closing. Feel free to reopen if needed. |
This Pull request is implementing feature request #3873
Solution1 describe in #3873 has been implemented. Base on the feedback and advice received during the last SIG meeting: March 15 2021, the feature has been wrapped inside a
Processor
and disabled by default.Thanks to this feature, it is possible to provide to the cluster-autoscaler,
PodTemplates
that will be considered asDaemonset
.Changes:
e29a8f1: Initial implementation Add a new processor:
PodTemplatesProcessor
which aims to simulate Daemonset instances fromPodTemplates. This implementation extended the Daemonset list used during scale-up operations.
podTemplates
processor with 2 implementations:noOpPodTemplateListProcessor
(default) andactivePodTemplateListProcessor
.cluster-autoscaler
helm chart withcorev1.PodTemplate
required RBAC.6294791: Reimplement PodTemplateProcessor
behind the
NodeInfoProcessor
interface. The newnodeInfoWithPodTemplateProcessor
updates for eachNodeInfo
theNodeInfo.Pods
withPods
generated from thePodTemplate
.Thanks to this code structure it will be possible to enable
NodeInfoProcessor
implementations behind build flags.