topology-updater: reactive updates #1031

Tal-or · 2023-01-12T13:02:12Z

Enabling reactive update for nfd-topology-updater
by detecting changes in Kubelet state/checkpoint files,
and signaling to the main loop to update the NodeResourceTopology
objects.

This has high value when scaling is an issue.
Having multiple pods deployed in between single update instance
might reflect incorrect resource accounting in the NRT CRs.
Example:
Time Interval = 5s
t0 - New update sent to NRT CRs
t1 - Schedule guaranteed podA
t2 - Schedule guaranteed podB
time elapsed between t0-t2 < 5 seconds,
IOW the update on t0 is the recent update.

In t2 the resource accounting reflected by NRT
is not aligned with the actual accounting because
NRT CRs doesn't reflect the change happened in t1.

With this reactive update feature we expect an update to be trigger
between t1 and t2 so the NRT objects will reflect more accurate
picture.

There still might be a scenario when the updates
aren't fast enough, but this is an additional
future planned optimization.

The notifier has two event types:

Time based - keeping the old behavior, trigger
an update per interval.
FS event - trigger an update when Kubelet state/checkpoint files modified.

k8s-ci-robot · 2023-01-12T13:02:20Z

Hi @Tal-or. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

netlify · 2023-01-12T13:02:35Z

✅ Deploy Preview for kubernetes-sigs-nfd ready!

Name	Link
🔨 Latest commit	`5c6be58`
🔍 Latest deploy log	https://app.netlify.com/sites/kubernetes-sigs-nfd/deploys/6412e7165261190008d34209
😎 Deploy Preview	https://deploy-preview-1031--kubernetes-sigs-nfd.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site settings.

marquiz

Thanks @Tal-or for the PR. I had a quick run-through and some comments.

Also, some of the commit message subject lines are a bit misleading or ambiguous, adjust e.g.:

main: add kubelet-dir-path flag -> topology-updater: add kubelet-dir-path flag
log: log event type -> topology-updater: log event type that triggered update
manifests: add mount for kubelet dir -> deployment/topology-updater: add mount for kubelet dir

pkg/notifier/notifier.go

cmd/nfd-topology-updater/main.go

pkg/notifier/notifier.go

deployment/components/topology-updater/topologyupdater-mounts.yaml

cmd/nfd-topology-updater/main.go

Tal-or · 2023-01-12T13:51:21Z

Thanks for the quick review. This PR need more polish, but it's good enough to start the conversation going

ffromani · 2023-01-12T14:18:22Z

Enabling reactive update for nfd-topology-updater by detecting changes in Kubelet state/checkpoint files, and signaling to the main loop to update the NodeResourceTopology objects.

The notifier has two event types:
1. Time based - keeping the old behavior, trigger
   an update per interval.

2. FS event - trigger an update when Kubelet state/checkpoint files modified.

I know the rationale because I worked on the prototype implementation. I think we should elaborate more in the PR description what's the benefit for NFD-topology-updater users. It's ok to start minimal because this PR is a conversation starter (#1031 (comment)). Let's just make sure we have a more complete rationale when the PR is polished.

Tal-or · 2023-01-12T14:21:13Z

Enabling reactive update for nfd-topology-updater by detecting changes in Kubelet state/checkpoint files, and signaling to the main loop to update the NodeResourceTopology objects.
The notifier has two event types:
1. Time based - keeping the old behavior, trigger
   an update per interval.

2. FS event - trigger an update when Kubelet state/checkpoint files modified.
I know the rationale because I worked on the prototype implementation. I think we should elaborate more in the PR description what's the benefit for NFD-topology-updater users. It's ok to start minimal because this PR is a conversation starter (#1031 (comment)). Let's just make sure we have a more complete rationale when the PR is polished.

Yes I agree. We should emphasis the benefits of the reactive updates and their advantages when scalability is a concern.

PiotrProkop · 2023-01-13T08:58:00Z

pkg/nfd-topology-updater/nfd-topology-updater.go

 	for {
 		select {
-		case <-crTrigger.C:
-			klog.Infof("Scanning")
+		case info := <-w.eventSource:


Wouldn't it be simpler and clearer instead of pushing events from Notifier into new channel. Just to change this select to read from <-timerEvent and <-n.fsEvent directly?

We should create and configure the fs watcher, then filter the events we received from it and passing only the relevant ones to the topology-updater select loop.

There is quite some work to do, hence the reason for moving this logic into a separate package

I was thinking something like this:

+ var crTrigger *time.Ticker + if w.resourcemonitorArgs.SleepInterval > 0 { + crTrigger = time.NewTicker(w.resourcemonitorArgs.SleepInterval) + } + ch, err := createFSWatcherEvent([]string{kubeletDirPath}) + if err != nil { + return fmt.Errorf("failed to obtain node resource information: %w", err) + } for { select { case <-crTrigger.C: - klog.Infof("Scanning") - podResources, err := resScan.Scan() - utils.KlogDump(1, "podResources are", " ", podResources) - if err != nil { - klog.Warningf("Scan failed: %v", err) - continue - } - zones = resAggr.Aggregate(podResources) - utils.KlogDump(1, "After aggregating resources identified zones are", " ", zones) - if !w.args.NoPublish { - if err = w.updateNodeResourceTopology(zones); err != nil { - return err - } - } - + w.runUpdater(resScan, resAggr) if w.args.Oneshot { return nil } + case e := <-n.fsEvent: + klog.V(5).Infof("fsnotify event from file %q: %q received", e.Name, e.Op) + if e.Name == CPUStateFile || + e.Name == MemoryStateFile || + e.Name == DevicesStateFile { + i := Info{Event: FSUpdate} + w.runUpdater(resScan, resAggr) + } case <-w.stop: klog.Infof("shutting down nfd-topology-updater") return nil @@ -164,6 +164,28 @@ func (w *nfdTopologyUpdater) Run() error { } +func (w *nfdTopologyUpdater) runUpdater(resScan resourcemonitor.ResourcesScanner, resAggr resourcemonitor.ResourcesAggregator) { + zones := w.aggregateZones(resScan, resAggr) + utils.KlogDump(1, "After aggregating resources identified zones are", " ", zones) + if !w.args.NoPublish { + if err := w.updateNodeResourceTopology(zones); err != nil { + klog.Warningf("cannot update NodeResourceTopology: %s", err.Error()) + } + } +} + +func (w *nfdTopologyUpdater) aggregateZones(resScan resourcemonitor.ResourcesScanner, resAggr resourcemonitor.ResourcesAggregator) v1alpha1.ZoneList { + klog.Infof("Scanning") + podResources, err := resScan.Scan() + utils.KlogDump(1, "podResources are", " ", podResources) + if err != nil { + klog.Warningf("Scan failed: %v", err) + return nil + } + + return resAggr.Aggregate(podResources) +} +

we could just put common logic to new funcs and remove necessity of having separate struct.

I tend to agree that inline everything might be clearer, but IMVHO decoupling the event source generator from the main loop has greater benefits:
For example you can generate your own source of events if needed, without changing the main loop logic.
This is useful especially for testing, in case you want to mock notification events.
@PiotrProkop WDYT?

This makes sense. I have no strong objections against putting events sources behind this abstraction, just wanted to indicate that this could be simplified. Both approaches works for me 😄

marquiz · 2023-01-17T15:05:23Z

/ok-to-test

Tal-or · 2023-01-23T09:59:54Z

Sorry for the long absence i'll address the comments in the following days

Tal-or · 2023-02-14T07:31:52Z

@marquiz @PiotrProkop I would like to share my thoughts with you about the continuation of this work.
Basically we have #1049 which is a superior supersede for this one.
Once topology-updater will have the pfp (pod fingerprint) the reserve plugin on the scheduler side will track the gaps until the next update, hence we don't need the reactive update.
My question to you is do you see any benefit in having the reactive updates feature anyway?
For example in case that topology-updater would work without scheduler plugin or in any other constellation?

PiotrProkop · 2023-02-14T08:40:39Z

@marquiz @PiotrProkop I would like to share my thoughts with you about the continuation of this work. Basically we have #1049 which is a superior supersede for this one. Once topology-updater will have the pfp (pod fingerprint) the reserve plugin on the scheduler side will track the gaps until the next update, hence we don't need the reactive update. My question to you is do you see any benefit in having the reactive updates feature anyway? For example in case that topology-updater would work without scheduler plugin or in any other constellation?

@Tal-or Not everyone are using reserve plugin(for us it's DiscardReservedNodes plugin) and we are using our custom patches in NFD for reactive updates, so we are very interested in this work. If you don't have bandwidth to take care of this PR I would be happy to take over this work and finish it 😄

Tal-or · 2023-02-14T09:05:04Z

@marquiz @PiotrProkop I would like to share my thoughts with you about the continuation of this work. Basically we have #1049 which is a superior supersede for this one. Once topology-updater will have the pfp (pod fingerprint) the reserve plugin on the scheduler side will track the gaps until the next update, hence we don't need the reactive update. My question to you is do you see any benefit in having the reactive updates feature anyway? For example in case that topology-updater would work without scheduler plugin or in any other constellation?

@Tal-or Not everyone are using reserve plugin(for us it's DiscardReservedNodes plugin) and we are using our custom patches in NFD for reactive updates, so we are very interested in this work. If you don't have bandwidth to take care of this PR I would be happy to take over this work and finish it smile

@PiotrProkop I'm very happy to hear that you're interesting in this work, then i'll continue it in the coming days.

marquiz · 2023-02-15T08:17:01Z

@Tal-or what do you think about the timeline for this? I was thinking to have a bit more predicable/more frequent release cadence and cut v0.13.0 around end of March. Releasing whatever we've managed to get in by that time 😊 It would be great to get this in and let people drop some of the custom patches

Tal-or · 2023-02-15T08:42:40Z

@Tal-or what do you think about the timeline for this? I was thinking to have a bit more predicable/more frequent release cadence and cut v0.13.0 around end of March. Releasing whatever we've managed to get in by that time blush It would be great to get this in and let people drop some of the custom patches

Hey @marquiz I have some time today, let me rebase + addressing open comments and let's see how the next review cycle goes

marquiz · 2023-02-15T08:48:00Z

Hey @marquiz I have some time today, let me rebase + addressing open comments and let's see how the next review cycle goes

Not super urgent, just wanted to give you head's up

ffromani · 2023-02-15T14:57:19Z

/cc

Enabling reactive update for nfd-topology-updater by detecting changes in Kubelet state/checkpoint files, and signaling to the main loop to update the NodeResourceTopology objects. This has high value when scaling is an issue. Having multiple pods deployed in between single update instance might reflect incorrect resource accounting in the NRT CRs. Example: Time Interval = 5s t0 - New update sent to NRT CRs t1 - Schedule guaranteed podA t2 - Schedule guaranteed podB time elapsed between t0-t2 < 5 seconds, IOW the update on t0 is the recent update. In t2 the resource accounting reflected by NRT is not aligned with the actual accounting because NRT CRs doesn't reflect the change happened in t1. With this reactive update feature we expect an update to be trigger between t1 and t2 so the NRT objects will reflect more accurate picture. There still might be a scenario when the updates aren't fast enough, but this is an additional future planned optimization. The notifier has two event types: 1. Time based - keeping the old behavior, trigger an update per interval. 2. FS event - trigger an update when Kubelet state/checkpoint files modified. Signed-off-by: Talor Itzhak <[email protected]>

On different Kubernetes flavors like OpenShift for exmaple, the Kubelet state directory path is different. make it configurable for maximum flexability. Signed-off-by: Talor Itzhak <[email protected]>

When a message received via the channel, the main loop updates the `NodeResourceTopology` objects. The notifier will send a message via the channel if: 1. It reached the sleep timeout. 2. It detected a change in Kubelet state files Signed-off-by: Talor Itzhak <[email protected]>

Specify the event type as part of the log message. In order to reduce the log volume, make it V4 Signed-off-by: Talor Itzhak <[email protected]>

This mount is needed for watching the state files Signed-off-by: Talor Itzhak <[email protected]>

Especially convenient for testing porpuses and completely harmless Signed-off-by: Talor Itzhak <[email protected]>

Signed-off-by: Talor Itzhak <[email protected]>

deployment/helm/node-feature-discovery/templates/topologyupdater.yaml

docs/deployment/helm.md

Adding kubelet state directory mount Signed-off-by: Talor Itzhak <[email protected]>

Signed-off-by: Talor Itzhak <[email protected]>

Access to the kubelet state directory may raise concerns in some setups, added an option to disable it. The feature is enabled by default. Signed-off-by: Talor Itzhak <[email protected]>

marquiz

Thanks @Tal-or for the persistence on this. I think I don't have any more feedback and would be ready to get this in

ping @ffromani @PiotrProkop

ffromani · 2023-03-16T10:59:53Z

thanks for the ping @marquiz I'll review shortly

PiotrProkop · 2023-03-16T11:07:58Z

Thanks @Tal-or for the persistence on this. I think I don't have any more feedback and would be ready to get this in

ping @ffromani @PiotrProkop

LGTM but I'll leave final approval to @ffromani . Good work @Tal-or !

marquiz · 2023-03-16T11:13:48Z

LGTM but I'll leave final approval to @ffromani .

🥳

Good work @Tal-or !

Yes, really good work on this one!

ffromani

/lgtm

very nice work indeed. Kudos to @Tal-or

k8s-ci-robot · 2023-03-17T17:11:52Z

LGTM label has been added.

Git tree hash: 65e359b51ea6ae20c384f683aae3e0046e570f40

k8s-ci-robot · 2023-03-17T17:12:01Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ffromani, marquiz, Tal-or

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [marquiz]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jan 12, 2023

k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jan 12, 2023

k8s-ci-robot requested review from adrianchiris and ArangoGutierrez January 12, 2023 13:02

k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jan 12, 2023

marquiz reviewed Jan 12, 2023

View reviewed changes

PiotrProkop reviewed Jan 13, 2023

View reviewed changes

k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jan 17, 2023

Tal-or force-pushed the reactive_updates branch 2 times, most recently from 996bfe3 to a5e9b8a Compare January 23, 2023 16:46

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 9, 2023

Tal-or force-pushed the reactive_updates branch from a5e9b8a to 94f84b4 Compare February 15, 2023 12:57

k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 15, 2023

Tal-or force-pushed the reactive_updates branch 2 times, most recently from b75b46f to 7af3de6 Compare February 15, 2023 13:55

Tal-or added 7 commits March 12, 2023 12:37

topology-updater: add kubelet-state-dir flag

175e0c8

On different Kubernetes flavors like OpenShift for exmaple, the Kubelet state directory path is different. make it configurable for maximum flexability. Signed-off-by: Talor Itzhak <[email protected]>

topology-updater: log event type that triggered update

1c12876

Specify the event type as part of the log message. In order to reduce the log volume, make it V4 Signed-off-by: Talor Itzhak <[email protected]>

deployment/topology-updater: add mount for kubelet state dir

8afd819

This mount is needed for watching the state files Signed-off-by: Talor Itzhak <[email protected]>

topology-updater: make it possible to disable sleep-interval

8924213

Especially convenient for testing porpuses and completely harmless Signed-off-by: Talor Itzhak <[email protected]>

e2e: reactive updates test

6de13fe

Signed-off-by: Talor Itzhak <[email protected]>

Tal-or force-pushed the reactive_updates branch from 65882ef to e20cc06 Compare March 12, 2023 11:38

k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 12, 2023

Tal-or force-pushed the reactive_updates branch 3 times, most recently from f261c80 to 2d7890b Compare March 12, 2023 13:08

marquiz reviewed Mar 16, 2023

View reviewed changes

deployment/helm/node-feature-discovery/templates/topologyupdater.yaml Show resolved Hide resolved

docs/deployment/helm.md Outdated Show resolved Hide resolved

Tal-or added 3 commits March 16, 2023 11:51

deployment/helm: update helm charts

91daff3

Adding kubelet state directory mount Signed-off-by: Talor Itzhak <[email protected]>

documentaion: document the reactive updates feature

727de56

Signed-off-by: Talor Itzhak <[email protected]>

reactive updates: add an option to disable the feature

5c6be58

Access to the kubelet state directory may raise concerns in some setups, added an option to disable it. The feature is enabled by default. Signed-off-by: Talor Itzhak <[email protected]>

Tal-or force-pushed the reactive_updates branch from 2d7890b to 5c6be58 Compare March 16, 2023 09:53

marquiz approved these changes Mar 16, 2023

View reviewed changes

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 16, 2023

ffromani approved these changes Mar 17, 2023

View reviewed changes

k8s-ci-robot assigned ffromani Mar 17, 2023

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 17, 2023

k8s-ci-robot merged commit 13f92fa into kubernetes-sigs:master Mar 17, 2023

marquiz mentioned this pull request Apr 12, 2023

Release v0.13.0 #1068

Closed

24 tasks

PiotrProkop mentioned this pull request Apr 24, 2023

Release v0.13.1 #1172

Closed

18 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

topology-updater: reactive updates #1031

topology-updater: reactive updates #1031

Tal-or commented Jan 12, 2023 •

edited

Loading

k8s-ci-robot commented Jan 12, 2023

netlify bot commented Jan 12, 2023 •

edited

Loading

marquiz left a comment

Tal-or commented Jan 12, 2023

ffromani commented Jan 12, 2023

Tal-or commented Jan 12, 2023

PiotrProkop Jan 13, 2023

Tal-or Jan 23, 2023 •

edited

Loading

PiotrProkop Jan 23, 2023

Tal-or Feb 15, 2023

PiotrProkop Feb 15, 2023

marquiz commented Jan 17, 2023

Tal-or commented Jan 23, 2023

Tal-or commented Feb 14, 2023

PiotrProkop commented Feb 14, 2023 •

edited

Loading

Tal-or commented Feb 14, 2023

marquiz commented Feb 15, 2023

Tal-or commented Feb 15, 2023

marquiz commented Feb 15, 2023

ffromani commented Feb 15, 2023

marquiz left a comment

ffromani commented Mar 16, 2023

PiotrProkop commented Mar 16, 2023

marquiz commented Mar 16, 2023

ffromani left a comment

k8s-ci-robot commented Mar 17, 2023

k8s-ci-robot commented Mar 17, 2023

topology-updater: reactive updates #1031

topology-updater: reactive updates #1031

Conversation

Tal-or commented Jan 12, 2023 • edited Loading

k8s-ci-robot commented Jan 12, 2023

netlify bot commented Jan 12, 2023 • edited Loading

✅ Deploy Preview for kubernetes-sigs-nfd ready!

marquiz left a comment

Choose a reason for hiding this comment

Tal-or commented Jan 12, 2023

ffromani commented Jan 12, 2023

Tal-or commented Jan 12, 2023

PiotrProkop Jan 13, 2023

Choose a reason for hiding this comment

Tal-or Jan 23, 2023 • edited Loading

Choose a reason for hiding this comment

PiotrProkop Jan 23, 2023

Choose a reason for hiding this comment

Tal-or Feb 15, 2023

Choose a reason for hiding this comment

PiotrProkop Feb 15, 2023

Choose a reason for hiding this comment

marquiz commented Jan 17, 2023

Tal-or commented Jan 23, 2023

Tal-or commented Feb 14, 2023

PiotrProkop commented Feb 14, 2023 • edited Loading

Tal-or commented Feb 14, 2023

marquiz commented Feb 15, 2023

Tal-or commented Feb 15, 2023

marquiz commented Feb 15, 2023

ffromani commented Feb 15, 2023

marquiz left a comment

Choose a reason for hiding this comment

ffromani commented Mar 16, 2023

PiotrProkop commented Mar 16, 2023

marquiz commented Mar 16, 2023

ffromani left a comment

Choose a reason for hiding this comment

k8s-ci-robot commented Mar 17, 2023

k8s-ci-robot commented Mar 17, 2023

Tal-or commented Jan 12, 2023 •

edited

Loading

netlify bot commented Jan 12, 2023 •

edited

Loading

Tal-or Jan 23, 2023 •

edited

Loading

PiotrProkop commented Feb 14, 2023 •

edited

Loading