CA: refactor ClusterSnapshot methods #7466

towca · 2024-11-05T16:01:43Z

What type of PR is this?

/kind cleanup

What this PR does / why we need it:

This is a part of Dynamic Resource Allocation (DRA) support in Cluster Autoscaler. The ClusterSnapshot interface is cleaned up to facilitate later changes needed for DRA:

There were multiple methods for adding Nodes to the snapshot. This causes redundancy in ClusterSnapshot implementations for no clear reason. Instead of adding DRA handling to all these methods, they're replaced with AddNodeInfo which is DRA-aware already.
RemoveNode is renamed to RemoveNodeInfo for consistency with AddNodeInfo.
AddPod and RemovePod are renamed to SchedulePod and UnschedulePod. These names are more in-line with the method behavior when DRA is considered (a pod is not "removed" from the snapshot altogether, since we have to keep tracking its DRA objects).
An Initialize method is added. All other methods were Node or Pod specific, while for DRA the snapshot will also need to track DRA objects that are not bound to Nodes or Pods. Initialize() will be used to set these "global" DRA objects in later commits.

Which issue(s) this PR fixes:

The CA/DRA integration is tracked in kubernetes/kubernetes#118612, this is just part of the implementation.

Special notes for your reviewer:

This is intended to be a no-op refactor. It was extracted from #7350 after #7447.

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

- [KEP]: https://github.com/kubernetes/enhancements/blob/9de7f62e16fc5c1ea3bd40689487c9edc7fa5057/keps/sig-node/4381-dra-structured-parameters/README.md

/assign @MaciekPytel
/assign @jackfrancis

pohly · 2024-11-06T12:08:19Z

/cc

jackfrancis · 2024-11-06T21:16:06Z

cluster-autoscaler/core/static_autoscaler.go

-				pods = append(pods, podInfo.Pod)
-			}
-			err := a.ClusterSnapshot.AddNodeWithPods(upcomingNode.Node(), pods)
+			err := a.ClusterSnapshot.AddNodeInfo(upcomingNode)


this would be a good opportunity to rename vars as follows:

upcomingNodes --> upcomingNodeInfos

upcomingNode --> upcomingNodeInfo

jackfrancis · 2024-11-06T21:30:58Z

cluster-autoscaler/simulator/clustersnapshot/basic.go

+
+	knownNodes := make(map[string]bool)
+	for _, node := range nodes {
+		if err := snapshot.AddNode(node); err != nil {


The only error condition for adding a node is if your []*apiv1.Node set has a duplicate. I wonder if there's a more efficient way of doing that targeted error handling earlier in the execution flow so we don't have to do so much error handling at this point. It would also have the side-benefit of allowing us to ditch this knownNodes accounting overhead in this function.

IMO this is the perfect place to validate this:

The alternative is for Initialize() to assume that some validation happened earlier and that its input is correct. This doesn't seem safe, as it relies on every Initialize() user properly validating data first.

There are multiple places that call Initialize(), so ideally we'd want to extract the validation logic to remove redundancy anyway. If we're extracting it outside of Initialize(), we essentially have 2 functions that always need to be called in sequence.

Keep in mind that this should be called once per snapshot per loop, so the knownNodes overhead should be trivial compared to the rest of the loop.

O.K. I think my initial skepticism here is keeping the source of truth in two places here:

in the cluster snapshot data store

in the local function (knownNodes)

I don't have any reason to suspect a race condition risk here, but would this be more "correct"?:

for _, pod := range scheduledPods { if err := snapshot.getInternalData().addPod(pod, pod.Spec.NodeName); err != nil { continue } }

The only error return condition for the addPod() method is

if _, found := data.nodeInfoMap[nodeName]; !found { return ErrNodeNotFound }

So rather than keeping a local accounting of "found nodes" (knownNodes) and conditionally calling addPod(), we could simply leverage the lookup in addPod() which is equivalently expensive to if knownNodes[pod.Spec.NodeName], and by so doing permit ourselves to drop the knownNodes local accounting.

DONOTSUBMIT

MaciekPytel · 2024-11-14T16:35:07Z

cluster-autoscaler/simulator/clustersnapshot/clustersnapshot.go

+	Initialize(nodes []*apiv1.Node, scheduledPods []*apiv1.Pod) error
+
+	// SchedulePod schedules the given Pod onto the Node with the given nodeName inside the snapshot.
+	SchedulePod(pod *apiv1.Pod, nodeName string) error


I wonder if in DRA world we need a separate method to schedule an existing pod (this one) and one to inject a completely new, in-memory pod? Basically - do we want separate method for "create pod" and "schedule pod"?

To close the loop on this - we discussed with @towca offline and we both think such method would be needed (potentially with separate version for injecting a completely new pod for cases like provisioningrequest and one for duplicating existing pods e.g. for replica count optimization -> this is TBD).
However, it makes sense to do it in a separate PR, together with the logic that will cover storing DRA objects for unschedulable pods.

Note to other reviewers: I ended up leaving the AddPod method in this PR, just renamed it to ForceAddPod (see the relevant commit message for details why). I'll add the other necessary methods in future PRs.

MaciekPytel

Overall looks quite good, but I left a few comments that I feel are significant.
/approve

I assume I'll be unavailable when the comments are resolved. Please feel free to assume lgtm from me once you resolve the comments.

MaciekPytel · 2024-11-15T16:50:04Z

cluster-autoscaler/core/podlistprocessor/filter_out_schedulable_test.go

@@ -286,15 +282,10 @@ func BenchmarkFilterOutSchedulable(b *testing.B) {
 				assert.NoError(b, err)

 				clusterSnapshot := snapshotFactory()
-				if err := clusterSnapshot.AddNodes(nodes); err != nil {
+				if err := clusterSnapshot.Initialize(nodes, scheduledPods); err != nil {


It feels somewhat inconsistent to use Initialize here and use AddNodeInfo in a loop in TestFilterOutSchedulable above. I assume you do it, because here you have a list of all pods and above it's grouped by node, but that leaks out an assumption that Initialize is just adding NI in a loop. Is that an assumption we want? Or would we tell users that ClusterSnapshot must be initialized to be used?
Maybe this boils down to my other comment about the naming of Initialize and this would be fine if we just renamed the method?

Yeah this is a fair point. I think after the name change to SetClusterState this is okay though:

IMO with the name change it's pretty clear that starting from an empty snapshot and using SetClusterState, or AddNodeInfo with the same set of pods and nodes should result in the same snapshot state.

In prod code, it only makes sense to use SetClusterState to set the state.

The test code can use whichever approach is easier, though any test that takes DRA into account will likely have to use SetClusterState.

MaciekPytel · 2024-11-15T16:58:27Z

cluster-autoscaler/simulator/clustersnapshot/clustersnapshot.go

-	// IsPVCUsedByPods returns if the pvc is used by any pod, key = <namespace>/<pvc_name>
-	IsPVCUsedByPods(key string) bool
+
+	// Initialize clears the snapshot and initializes it with real objects from the cluster - Nodes,


nit: "initializes it with real objects from the cluster" - it doesn't, it initializes snapshot with whatever you passed. Based on the wording I'd expect this method to actually pull stuff out of an informer. Maybe "Initialize replaces current contents of the snapshot with the provided objects (nodes, scheduled pods)"?

Fair enough, done.

MaciekPytel · 2024-11-15T17:04:52Z

cluster-autoscaler/simulator/clustersnapshot/clustersnapshot.go

+
+	// Initialize clears the snapshot and initializes it with real objects from the cluster - Nodes,
+	// scheduled pods.
+	Initialize(nodes []*apiv1.Node, scheduledPods []*apiv1.Pod) error


I don't like the name "Initialize" here. I think it's misleading:

I'd expect "Initialize" to be something that happens one on the start of object's lifetime.

I'd expect "Initialize" to be required before using the object in any other way - this isn't the case here, see my comment in one of the tests for example.

It's very non-specific, literally anything can happen here. It's fine if it encapsulates a bunch of internal stuff I don't care about as a user, but here it performs a very specific function that is both understandable and important for the user.

Naming is hard obviously, but I'd suggest the name along the lines of "ReplaceContentWith()" or "SetContent" or "SetClusterState" or similar. I think any of this would imply that the current contents of snapshot are dropped and replaced with whatever is provided - which is exactly what the comment above is saying.

All good points, changed the name to SetClusterState, I think this leaves us with clear semantics:

SetClusterState resets the forks and sets the state to what's provided. No expectation of happening only once or having to be called before other methods, the name reflects what's happening.

MaciekPytel · 2024-11-15T17:06:04Z

cluster-autoscaler/simulator/clustersnapshot/clustersnapshot.go

@@ -41,8 +41,6 @@ type ClusterSnapshot interface {
 	AddPod(pod *apiv1.Pod, nodeName string) error
 	// RemovePod removes pod from the snapshot.
 	RemovePod(namespace string, podName string, nodeName string) error
-	// IsPVCUsedByPods returns if the pvc is used by any pod, key = <namespace>/<pvc_name>
-	IsPVCUsedByPods(key string) bool


After talking with Kuba - the functionality still exists and scheduler was accessing it via the lister interface anyway. This makes sense to me.

MaciekPytel · 2024-11-15T17:12:18Z

cluster-autoscaler/core/static_autoscaler.go

-				return caerrors.ToAutoscalerError(caerrors.InternalError, err)
-			}
-		}
+	if err := a.ClusterSnapshot.Initialize(nodes, scheduledPods); err != nil {


Is Clear() call needed here? Based on the docstring for Initialize(), I'd expect Clear() to be a part of Initialize().

Does it make sense to wrap a single method in a helper function? Feels like at this point we should just inline it back into RunOnce.

Good catch, removed the helper method altogether

MaciekPytel · 2024-11-15T17:19:21Z

cluster-autoscaler/simulator/clustersnapshot/basic.go

-}
+// Initialize initializes the snapshot.
+func (snapshot *BasicClusterSnapshot) Initialize(nodes []*apiv1.Node, scheduledPods []*apiv1.Pod) error {
+	snapshot.Clear()


Shouldn't we just remove Clear() from interface at this point? It feels completely redundant. Initialize(nil, nil) should do exactly the same thing and, after renaming it, should also be a very obvious and natural way of doing that (not to mention the fact that IIRC we never Clear() a snapshot without immediately re-initializing it, so the Clear() functionality is probably never needed).

Yup, full agreement here - removed Clear().

DONOTSUBMIT

towca · 2024-11-19T14:14:07Z

/assign @BigDarkClown

DONOTSUBMIT

…dNodeInfo We need AddNodeInfo in order to propagate DRA objects through the snapshot, which makes AddNodeWithPods redundant.

AddNodes() is redundant - it was indended for batch adding nodes, with batch-specific optimizations in mind probably. However, it has always been implemented as just iterating over AddNode(), and is only used in test code. Most of the uses in the test code were initializing the cluster state. They are replaced with SetClusterState(), which will later be needed for handling DRA anyway (we'll have to start tracking things that aren't node- or pod-scoped). The other uses are replaced with inline loops over AddNode().

The method is already accessible via StorageInfos(), it's redundant.

AddNodeInfo already provides the same functionality, and has to be used in production code in order to propagate DRA objects correctly. Uses in production are replaced with SetClusterState(), which will later take DRA objects into account. Uses in the test code are replaced with AddNodeInfo().

RemoveNode is renamed to RemoveNodeInfo for consistency with other NodeInfo methods. For DRA, the snapshot will have to potentially allocate ResourceClaims when adding a Pod to a Node, and deallocate them when removing a Pod from a Node. This will happen in new methods added to ClusterSnapshot in later commits - SchedulePod and UnschedulePod. These new methods should be the "default" way of moving pods around the snapshot going forward. However, we'll still need to be able to add and remove pods from the snapshot "forcefully" to handle some corner cases (e.g. expendable pods). AddPod is renamed to ForceAddPod, and RemovePod to ForceRemovePod to highlight that these are no longer the "default" methods of moving pods around the snapshot, and are bypassing something important.

It's now redundant - SetClusterState with empty arguments does the same thing.

DONOTSUBMIT

BigDarkClown · 2024-11-21T11:05:49Z

cluster-autoscaler/simulator/clustersnapshot/basic.go

+		knownNodes[node.Name] = true
+	}
+	for _, pod := range scheduledPods {
+		if knownNodes[pod.Spec.NodeName] {


This is a slight modification from the original usage of AddNode -> AddPod. I think it's confusing that in case a pod with pod.Spec.NodeName not present in the nodes is provided -> we skip it, instead of returning an error. The original use cases in codebase would return error in such cases.

What would you think about removing the knownNodes from this method and letting snapshot.AddPod return ErrNodeNotFound?

Ok, after additional review I see that it is meant to replace eg. static_autoscaler.go usage of AddNode, which ignores pods with unknown nodes. Do you think it's okay that we do it this way? I am worried that some places might depend on the error returned by AddPod.

Yeah your second comment got it right - SetClusterState is intended to match/replace initializeClusterSnapshot from static_autoscaler.go. The direct replacement for AddNode is AddNodeInfo.

There are basically 2 usages in production code - StaticAutoscaler and Actuator. Both first filter the provided pods to only those which have nodeName set. So I'm pretty confident that this change is safe from the perspective of not changing behavior.

It's not ideal that each caller has to process the input pods in the same way, but I'd prefer to tackle this when we get to having unschedulable pods tracked in the snapshot. Then SetClusterState will just take a list of all Pods, and split it between scheduled and unschedulable internally. If a pod doesn't have a nodeName set, it's just unschedulable - no error condition.

That sounds good to me, especially given that we will be able to tackle it for unschedulable pods.

BigDarkClown · 2024-11-21T11:59:37Z

cluster-autoscaler/simulator/clustersnapshot/clustersnapshot.go

@@ -41,8 +41,6 @@ type ClusterSnapshot interface {
 	AddPod(pod *apiv1.Pod, nodeName string) error
 	// RemovePod removes pod from the snapshot.
 	RemovePod(namespace string, podName string, nodeName string) error
-	// IsPVCUsedByPods returns if the pvc is used by any pod, key = <namespace>/<pvc_name>
-	IsPVCUsedByPods(key string) bool


I 100% understand why this method is not needed in the interface, by why have you not removed its implementations? I may be tricked by Github fantastic code review tools, but I still see the implementation of this method present fort the BasicClusterSnapshot.

BigDarkClown

/lgtm
/approve

BigDarkClown · 2024-11-21T15:54:52Z

cluster-autoscaler/simulator/clustersnapshot/clustersnapshot.go

@@ -41,8 +41,6 @@ type ClusterSnapshot interface {
 	AddPod(pod *apiv1.Pod, nodeName string) error
 	// RemovePod removes pod from the snapshot.
 	RemovePod(namespace string, podName string, nodeName string) error
-	// IsPVCUsedByPods returns if the pvc is used by any pod, key = <namespace>/<pvc_name>
-	IsPVCUsedByPods(key string) bool


True, I've missed that StorageInfos() returns self, so this is needed in implementation. LGTM then.

k8s-ci-robot · 2024-11-21T15:55:08Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: BigDarkClown, MaciekPytel, towca

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~cluster-autoscaler/OWNERS~~ [BigDarkClown,MaciekPytel,towca]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

jackfrancis · 2024-11-21T16:30:34Z

/lgtm

towca · 2024-11-21T17:16:59Z

Thanks for the reviews!

/unhold

k8s-ci-robot assigned jackfrancis and MaciekPytel Nov 5, 2024

k8s-ci-robot requested review from andrewsykim and apricote November 5, 2024 16:01

towca force-pushed the jtuznik/dra-snapshot-cleanup branch from c7d18df to e0d1e60 Compare November 5, 2024 16:07

k8s-ci-robot requested a review from pohly November 6, 2024 12:08

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 6, 2024

jackfrancis reviewed Nov 6, 2024

View reviewed changes

towca added a commit to towca/autoscaler that referenced this pull request Nov 14, 2024

TMP squash: kubernetes#7466, kubernetes#7479

36be2ce

DONOTSUBMIT

towca mentioned this pull request Nov 14, 2024

CA: refactor PredicateChecker into ClusterSnapshot #7497

Merged

MaciekPytel reviewed Nov 14, 2024

View reviewed changes

MaciekPytel reviewed Nov 15, 2024

View reviewed changes

towca force-pushed the jtuznik/dra-snapshot-cleanup branch 2 times, most recently from f58acf7 to 00e304b Compare November 18, 2024 20:49

towca added a commit to towca/autoscaler that referenced this pull request Nov 18, 2024

TMP squash: kubernetes#7466

97c2cef

DONOTSUBMIT

towca added a commit to towca/autoscaler that referenced this pull request Nov 19, 2024

TMP squash: kubernetes#7466, kubernetes#7479

2bfe8b5

DONOTSUBMIT

k8s-ci-robot assigned BigDarkClown Nov 19, 2024

towca force-pushed the jtuznik/dra-snapshot-cleanup branch from 00e304b to c4389f6 Compare November 19, 2024 14:16

towca added a commit to towca/autoscaler that referenced this pull request Nov 19, 2024

TMP squash: kubernetes#7466

24ae6d4

DONOTSUBMIT

towca added 6 commits November 19, 2024 15:27

CA: remove AddNodeWithPods from ClusterSnapshot, replace uses with Ad…

269c7a3

…dNodeInfo We need AddNodeInfo in order to propagate DRA objects through the snapshot, which makes AddNodeWithPods redundant.

CA: remove redundant IsPVCUsedByPods from ClusterSnapshot

3860388

The method is already accessible via StorageInfos(), it's redundant.

CA: remove Clear from ClusterSnapshot

473a1a8

It's now redundant - SetClusterState with empty arguments does the same thing.

towca force-pushed the jtuznik/dra-snapshot-cleanup branch from c4389f6 to 473a1a8 Compare November 19, 2024 14:28

towca added a commit to towca/autoscaler that referenced this pull request Nov 19, 2024

TMP squash: kubernetes#7466, kubernetes#7479

acc4c4c

DONOTSUBMIT

towca added a commit to towca/autoscaler that referenced this pull request Nov 20, 2024

TMP squash: kubernetes#7466, kubernetes#7479, kubernetes#7497

3036cfd

DONOTSUBMIT

towca added a commit to towca/autoscaler that referenced this pull request Nov 20, 2024

TMP squash: kubernetes#7466, kubernetes#7479, kubernetes#7497

0a00f6b

DONOTSUBMIT

BigDarkClown reviewed Nov 21, 2024

View reviewed changes

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 21, 2024

k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 21, 2024

k8s-ci-robot merged commit 30e57c9 into kubernetes:master Nov 21, 2024
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CA: refactor ClusterSnapshot methods #7466

CA: refactor ClusterSnapshot methods #7466

towca commented Nov 5, 2024 •

edited

Loading

pohly commented Nov 6, 2024

jackfrancis Nov 6, 2024

towca Nov 13, 2024

jackfrancis Nov 6, 2024

towca Nov 13, 2024

jackfrancis Nov 19, 2024

MaciekPytel Nov 14, 2024

MaciekPytel Nov 15, 2024

towca Nov 18, 2024

MaciekPytel left a comment

MaciekPytel Nov 15, 2024

towca Nov 18, 2024

MaciekPytel Nov 15, 2024

towca Nov 18, 2024

MaciekPytel Nov 15, 2024

towca Nov 18, 2024

MaciekPytel Nov 15, 2024

MaciekPytel Nov 15, 2024

towca Nov 18, 2024

MaciekPytel Nov 15, 2024

towca Nov 18, 2024

towca commented Nov 19, 2024

BigDarkClown Nov 21, 2024

BigDarkClown Nov 21, 2024

towca Nov 21, 2024

BigDarkClown Nov 21, 2024

BigDarkClown Nov 21, 2024

BigDarkClown left a comment

BigDarkClown Nov 21, 2024

k8s-ci-robot commented Nov 21, 2024

jackfrancis commented Nov 21, 2024

towca commented Nov 21, 2024

CA: refactor ClusterSnapshot methods #7466

CA: refactor ClusterSnapshot methods #7466

Conversation

towca commented Nov 5, 2024 • edited Loading

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

pohly commented Nov 6, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MaciekPytel left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

towca commented Nov 19, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BigDarkClown left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

k8s-ci-robot commented Nov 21, 2024

jackfrancis commented Nov 21, 2024

towca commented Nov 21, 2024

towca commented Nov 5, 2024 •

edited

Loading