Extend ClusterSnapshot implementations to include StorageInfos lister #4904

x13n · 2022-05-20T18:21:50Z

Which component are you using?:

cluster autoscaler

Is your feature request designed to solve a problem? If so describe the problem this feature should solve.:

Recent scheduler framework changes will cause Cluster Autoscaler code to stop compiling once we update the dependency on scheduler.

Describe the solution you'd like.:

There are 3 objects implementing SharedFramework interface:

BasicClusterSnapshot (https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/simulator/basic_cluster_snapshot.go)
DeltaClusterSnapshot (https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/simulator/delta_cluster_snapshot.go)
Delegating shared lister: https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/simulator/delegating_shared_lister.go (fix is trivial, just delegate the new extra functions).

Both BasicClusterSnapshot and DeltaClusterSnapshot can maintain a mapping from namespaced PVC names to sets of namespaced pod names. IsPVCUsedByPods can then just check if size of the set is non-0. This will have O(1) complexity on adding/removing pods. In delta snapshot case it also needs to track deleted pods similarly to how node infos are tracked today, to avoid O(n) computation when forking.

Describe any alternative solutions you've considered.:

Doing no-op implementation - would be simpler, but would incorrectly handle ReadWriteOncePod PVCs, leading to scale ups that wouldn't be able to help some pending pods.

Additional context.:

Scheduler PR changing SharedLister interface: kubernetes/kubernetes#109715

The text was updated successfully, but these errors were encountered:

x13n · 2022-05-23T19:24:50Z

/help

k8s-ci-robot · 2022-05-23T19:24:51Z

@x13n:
This request has been marked as needing help from a contributor.

Guidelines

Please ensure that the issue body includes answers to the following questions:

Why are we solving this issue?
To address this issue, are there any code changes? If there are code changes, what needs to be done in the code and what places can the assignee treat as reference points?
Does this issue have zero to low barrier of entry?
How can the assignee reach out to you for help?

For more details on the requirements of such an issue, please see here and ensure that they are met.

If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-help command.

In response to this:

/help

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

ahaysx · 2022-05-25T04:59:20Z

/assign

ahaysx · 2022-05-29T04:29:24Z

@x13n there hasn't been a k/k release > v1.25.0-alpha.0 with kubernetes/kubernetes#109715. Besides implementing IsPVCUsedByPods, is it fine for in this change just to set up the snapshots and delegating lister for the eventual vendor update? The update change will just need to add StorageInfos.

type basicClusterSnapshotStorageInfoLister BasicClusterSnapshot
func (snapshot *basicClusterSnapshotStorageInfoLister) IsPVCUsedByPods(key string) bool

Also, I have some questions about the right strategy in the delta snapshot. I don't really see how a PVC cache would be helpful due to the cache invalidation calls in pod add/remove. If the nodeInfoList were a map, I could update it directly in nodeToModify and only need to invalidate the pod caches in pod add/remove.

Also, maybe we should consider changing all the caches to maps, which would allow for updating in place and removing invalidation. It would make them more effective and maybe reduce the caching complexity in this file.

We can chat on slack at some point this week if that's faster?

x13n · 2022-05-30T13:59:17Z

Thanks for working on this!

@x13n there hasn't been a k/k release > v1.25.0-alpha.0 with kubernetes/kubernetes#109715. Besides implementing IsPVCUsedByPods, is it fine for in this change just to set up the snapshots and delegating lister for the eventual vendor update? The update change will just need to add StorageInfos.

I think everything can be implemented beforehand, including StorageInfos. That way the vendor update should be require no extra changes. The function will just be unused until then.

type basicClusterSnapshotStorageInfoLister BasicClusterSnapshot
func (snapshot *basicClusterSnapshotStorageInfoLister) IsPVCUsedByPods(key string) bool
Also, I have some questions about the right strategy in the delta snapshot. I don't really see how a PVC cache would be helpful due to the cache invalidation calls in pod add/remove. If the nodeInfoList were a map, I could update it directly in nodeToModify and only need to invalidate the pod caches in pod add/remove.

Also, maybe we should consider changing all the caches to maps, which would allow for updating in place and removing invalidation. It would make them more effective and maybe reduce the caching complexity in this file.

Good point about cache invalidation. Since the caches are only used for listing, maintaining them during add/remove makes sense for Revert - they are no longer needed then and can be dropped. Forking, on the other hand, would require building the entire map, which is O(n) instead of the current O(1). This may be problematic in cases when snapshot is forked, modified, then reverted (and never listed). Current approach keeps O(n) complexity at listing only (unless cache is invalid). I think some hybrid approach where we would populate the cache on first listing, but then update when needed (if cache is not empty) would be best from performance perspective.

We can chat on slack at some point this week if that's faster?

Yup, we can, just note I'm in CEST timezone.

x13n · 2022-06-23T10:42:15Z

So we chatted about this with @ahaysx on Slack. He won't be able to follow-up on this, but some conclusions were:

we can turn nodeInfoList into a map to allow updating/deleting entries without invalidating the entire cache.
it's harder to do the same trick with PVC cache, since we need to iterate over all pods on a node to update this. I thought about benchmarking this, but it is probably still going to be faster than invalidating the entire cache.

x13n · 2022-06-23T10:44:05Z

/unassign @ahaysx

x13n · 2022-06-30T18:43:53Z

/assign @jayantjain93

jayantjain93 · 2022-07-04T09:45:35Z

I've started looking into this.

x13n added the kind/feature Categorizes issue or PR as related to a new feature. label May 20, 2022

k8s-ci-robot added the help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. label May 23, 2022

k8s-ci-robot assigned ahaysx May 25, 2022

jbartosik added the area/cluster-autoscaler label May 31, 2022

k8s-ci-robot unassigned ahaysx Jun 23, 2022

k8s-ci-robot assigned jayantjain93 Jun 30, 2022

jayantjain93 mentioned this issue Jul 7, 2022

Update Vendor v1.25.0-alpha.2 , bump Go to 1.18.1 and additional support for StorageInfoListers #5013

Merged

k8s-ci-robot closed this as completed in #5013 Aug 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extend ClusterSnapshot implementations to include StorageInfos lister #4904

Extend ClusterSnapshot implementations to include StorageInfos lister #4904

x13n commented May 20, 2022

x13n commented May 23, 2022

k8s-ci-robot commented May 23, 2022

ahaysx commented May 25, 2022

ahaysx commented May 29, 2022 •

edited

Loading

x13n commented May 30, 2022

x13n commented Jun 23, 2022 •

edited

Loading

x13n commented Jun 23, 2022

x13n commented Jun 30, 2022

jayantjain93 commented Jul 4, 2022

Extend ClusterSnapshot implementations to include StorageInfos lister #4904

Extend ClusterSnapshot implementations to include StorageInfos lister #4904

Comments

x13n commented May 20, 2022

x13n commented May 23, 2022

k8s-ci-robot commented May 23, 2022

Guidelines

ahaysx commented May 25, 2022

ahaysx commented May 29, 2022 • edited Loading

x13n commented May 30, 2022

x13n commented Jun 23, 2022 • edited Loading

x13n commented Jun 23, 2022

x13n commented Jun 30, 2022

jayantjain93 commented Jul 4, 2022

ahaysx commented May 29, 2022 •

edited

Loading

x13n commented Jun 23, 2022 •

edited

Loading