-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v1alpha2 planning #24
Comments
In This data is exposed to make the scheduler plugin, the sole consumer of this information, easier. Additionally, this field is (planned to be) used to group together nodes using the same TM configuration. The scheduler can verify nodes are properly grouped by the same TM config checking this value. This verification is not very strong, but is the only available data the scheduler can consume. Is worth to stress that the primary intended use, by far and large, is to run the same logic as TM does on the scheduler side. The information about the TM configuration however doesn't really belong to NRT. NRT should report only resources, not configuration data. We can very much get the same information using a more kubernetes-native mechanism: labels.
Pros and cons: just expose the kubeletconfiguration API new set of labels derived from kubelet configuration |
my take: NRT wants to generalize what TM does, so binding ourselves to TM is a step back. For these reasons it seems better not to depend directly on kubelet configuration, we should define our labels and our enum values for these labels. The TM scope concept is generic and forward-looking enough concept we can translate into something like
(or a less mature proposal for expressing the TM policy constraints can be
(values very much subject to change) |
One thing we missing here are |
right, both are good points. Exposing options becomes more hairy. From what you can see at this point in time, would we need to expose all options or just selected few, from scheduler perspective? |
Right now it's easy cause we only have one |
Thanks. This is going to be a balancing act. We're shaping up the label API and the design direction with help and feedback from NFD maintainers, I'll keep you in this loop tagging you on GH discussions as needed. |
Here's the (very) initial proposal, so we can look at some code and start to get a feeling of how it could look like: #25 |
I'm just wondering are labels the way we want to advertise this information? Just a thought but could the recently introduced NodeFeature CRD in NFD be used for this? |
I think what we need is a way to categorize and group nodes using common attributes. Labels can be a way to do this but I'm (and I think we) are very open to alternatives. I'm not up to speed with the |
BTW, one unrelated improvement in the types would be to enumerate also other |
probably instead of "numa" -> "memory node" or something similar, emphasis on memory partitioning. |
I think this makes sense in general and we should do this but I would like to point out that the maximum supported NUMA zones is 8 as can be seen the topology manager documentation.
Can you elaborate on why you think using TM policies is a step back? We are running an algorithm in the scheduler plugin to approximate the allocation decision make by Topology Manager in kubelet. I am noticing that we are trying to decouple from kubelet configuration but I am probably not fully understanding why that is a problem. I am open to learn and get a better understanding of this. I am biased towards labels with a 1:1 mapping between the configured kubelet policy/policy options and the created labels than abstracted versions of them. It seems more intuitive and saves us from having to translate and maintain additional documentation. Maybe it's just me but I feel that it might cause confusion and would be hard to explain that the labels exposed by NFD are related to Topology Manager configuration but still don't match with the configured values? What if someone independent of Topology aware scheduling usecase wants to use these labels to target a workload to be scheduled on a node that has a specific topology manager policy configured? If we use these new labels, we would be diverging from using well recognized policies that is documented in the officical kubernetes documentation. When it comes to keeping the values synced with kubelet, since there are planning to graduating Topology Manager to GA, I expect the names of existing policies and policyoptions to continue to exist in the near future. If new policies or policy options are added, we would have to take those into account regardless of whether we want to have a 1:1 mapping or an abstracted label representation. Maybe the abstraction of the polices is motivated by the fact that there are ongoing discussions is the community about making resource management pluggable in kubelet for more flexibility? Thinking out loud here but if that is the case, we would have to figure out the fate of topology manager and its policies moving forward. I expect support for resource managers and existing polices be supported in next few release (especially now that there are plans of graduating it to GA). A clear migration path would have to be identified in the community in case responsibility is offloaded to resource plugins and resource managers are deprecated. Having said that, I recognize we are heading in the right direction by removing the topologypolicies from the NRT API to NFD where it is more suitable so don't want to slow down the momentum. We can pick an approach that everyone prefers and pivot later if needed. |
Thanks @swatisehgal , yours is a good point. I'm reconsidering a bit the approach about label representation, also taking into account the previous feedback from @marquiz . I'll take the minimum time to properly formalize my thoughts and I'll update this thread. |
I had a offline chat with others including @kad and we agreed that the API should not define the labels after all. It should be a contract between the producer and the consumer, and it's at very least too early to have labels and values frozen in the API, if we should do this at all. We also agreed that a The combinations of these two factors enables a nice upgrade path: NFD (or RTE or any other agent) can add the TM configuration of the node as top-level attributes which the scheduler plugin (or any other consumer) can read and use. |
expose TM configuration as attributes of a special Zone. This is in addition to TopologyPolicies field, which is being deprecated. See: - k8stopologyawareschedwg/noderesourcetopology-api#24 - k8stopologyawareschedwg/noderesourcetopology-api#25 Signed-off-by: Francesco Romani <[email protected]>
expose TM configuration as attributes of a special Zone. This is in addition to TopologyPolicies field, which is being deprecated. See: - k8stopologyawareschedwg/noderesourcetopology-api#24 - k8stopologyawareschedwg/noderesourcetopology-api#25 Signed-off-by: Francesco Romani <[email protected]>
Also, I guess the ttl time we discussed on the call for the cleaning stale NRT CRs can go in the top level attributes too. |
Yes. We'are going to add other extra fields as well.
Planned for |
closed by #25 |
expose TM configuration as attributes. This is in addition to TopologyPolicies field, which is being deprecated. See: - k8stopologyawareschedwg/noderesourcetopology-api#24 - k8stopologyawareschedwg/noderesourcetopology-api#25 Signed-off-by: Francesco Romani <[email protected]>
Since the NRT API is moving towards top-level attributes (see: k8stopologyawareschedwg/noderesourcetopology-api#24) add the preferred Attribute Name alongside the existing Annotatio Name. Signed-off-by: Francesco Romani <[email protected]>
Since the NRT API is moving towards top-level attributes (see: k8stopologyawareschedwg/noderesourcetopology-api#24) add the preferred Attribute Name alongside the existing annotation Name. Signed-off-by: Francesco Romani <[email protected]>
expose TM configuration as attributes. This is in addition to TopologyPolicies field, which is being deprecated. See: - k8stopologyawareschedwg/noderesourcetopology-api#24 - k8stopologyawareschedwg/noderesourcetopology-api#25 Signed-off-by: Francesco Romani <[email protected]>
expose TM configuration as attributes. This is in addition to TopologyPolicies field, which is being deprecated. See: - k8stopologyawareschedwg/noderesourcetopology-api#24 - k8stopologyawareschedwg/noderesourcetopology-api#25 Signed-off-by: Francesco Romani <[email protected]>
expose TM configuration as attributes. This is in addition to TopologyPolicies field, which is being deprecated. See: - k8stopologyawareschedwg/noderesourcetopology-api#24 - k8stopologyawareschedwg/noderesourcetopology-api#25 Signed-off-by: Francesco Romani <[email protected]>
proposed changes for
v1alpha2
, to be discussedadd constants for values we use repeatedly, like "Node" (and possibly more). Also add constants for max supported NUMA zones (64)these are scheduler/topology manager constraints or constants, the API itself does not have any requirement here so we are deferring any further evaluation to at least v1alpha3Attributes
field, like we already have forZones
labelsthe newly addedAttributes
)find a way to expose TM policy options as labels (extension of point 2 above)Expose TM policy options as top-levelAttributes
objectspin off from #21 to enable more incremental evolution
The text was updated successfully, but these errors were encountered: