Don't send endpoint profile updates from Server updates when opaqueness doesn't change #12013

adleong · 2024-01-30T01:09:34Z

When the destination controller receives an update for a Server resource, we recompute opaqueness ports for all pods. This results in a large number of updates to all endpoint profile watches, even if the opaqueness doesn't change. In cases where there are many Server resources, this can result in a large number of updates being sent to the endpoint profile translator and overflowing the endpoint profile translator update queue. This is especially likely to happen during an informer resync, since this will result in an informer callback for every Server in the cluster.

We refactor the workload watcher to not send these updates if the opaqueness has not changed.

This, seemingly simple, change in behavior requires a large code change because:

the current opaqueness state is not stored on workload publishers and must be added so that we can determine if the opaqueness has changed
storing the opaqueness in addition to the other state we're storing (pod, ip, port, etc.) means that we are not storing all of the data represented by the Address struct
workload watcher uses a createAddress func to dynamically create an Address from the state it stores
now that we are storing the Address as state, creating Addresses dynamically is no longer necessary and we can operate on the Address state directly
- this makes the workload watcher more similar to other watchers and follow a common pattern
- it also fixes some minor correctness issues:
  - pods that did not have the ready status condition were being considered when they should not have been
  - updates to ExternalWorkload labels were not being considered

Signed-off-by: Alex Leong <[email protected]>

zaharidichev

This is a good change. I am a fan of the "pruning" approach you have here. Left a question about resyncs. Also, I think it would be good to add a test (if it is not too hard to wire up) that demonstrates the case where updates that we do not need to process, do not end up in the queue.

zaharidichev · 2024-01-30T12:46:47Z

controller/api/destination/watcher/workload_watcher.go

+	if oldServer.ResourceVersion == newServer.ResourceVersion {
+		return
+	}


As far as I understand, you are already minimizing the amount of work dispatched by checking whether there is a change in the "opaqueness" in updateServers. Why go further and ignore resyncs too? If I remember correctly, we discussed resync ignores when it came to the endpoints slices controller, and reached the conclusion that it is fine to skip them if we have another retry mechanism (a retry queue), which we do not have here. Here are some follow up questions:

If we are skipping resyncs here, why not skip them for all update callbacks (e.g. pods)

Do we have any idea how much of the work needlessly ends up in the update queue due to the fact that we were not checking opaqueness state change as opposed to not ignoring resyncs?. In other words, if you let resyncs go through, would the rest of the changes still solve the problem that we were seeing?

zaharidichev · 2024-01-30T12:48:37Z

controller/api/destination/watcher/workload_watcher.go

+	// Fill in ownership.
+	if wp.addr.ExternalWorkload != nil && len(wp.addr.ExternalWorkload.GetOwnerReferences()) == 1 {
+		wp.addr.OwnerKind = wp.addr.ExternalWorkload.GetOwnerReferences()[0].Kind
+		wp.addr.OwnerName = wp.addr.ExternalWorkload.GetOwnerReferences()[0].Name


TIOLI: you can abstract this ownership filling into separate functions to make things more readable. Here and in the other place where you are doing it.

alpeb

These are extremely nice simplifications 🤟

controller/api/destination/watcher/workload_watcher.go

alpeb · 2024-01-30T16:10:37Z

controller/api/destination/endpoint_profile_translator.go

+	_, opaqueProtocol := opaquePorts[address.Port]
 	profile := &pb.DestinationProfile{
 		RetryBudget:    defaultRetryBudget(),
 		Endpoint:       endpoint,
-		OpaqueProtocol: address.OpaqueProtocol,
+		OpaqueProtocol: opaqueProtocol || address.OpaqueProtocol,


Would it be possible to just rely on address.OpaqueProtocol here, and refactor ept.createEndpoint() so it also relies solely on address.OpaqueProtocol?

this would be a good change but after giving it a try, I think it's a larger refactor that would be better to separate from this change. basically, we not consistent with where we're consuming opaque protocol annotations and we read them both in the translator and in the watcher. refactoring this to centralize this so that opaque protocol annotations are consumed in one place would be a good refactor.

Signed-off-by: Alex Leong <[email protected]>

zaharidichev

LGTM

Signed-off-by: Alex Leong <[email protected]>

adleong added 2 commits January 29, 2024 23:17

Refector workload watcher to send fewer redundant updates

c08af9f

Signed-off-by: Alex Leong <[email protected]>

skip resyncs for servers

3f742da

Signed-off-by: Alex Leong <[email protected]>

adleong requested a review from a team as a code owner January 30, 2024 01:09

zaharidichev reviewed Jan 30, 2024

View reviewed changes

alpeb reviewed Jan 30, 2024

View reviewed changes

adleong added 2 commits January 30, 2024 23:23

Only process updates for workloads affected by the server

0428343

Signed-off-by: Alex Leong <[email protected]>

Resolve merge conflicts

6fc80be

Signed-off-by: Alex Leong <[email protected]>

zaharidichev approved these changes Mar 14, 2024

View reviewed changes

Update comment

a4544c5

Signed-off-by: Alex Leong <[email protected]>

alpeb approved these changes Mar 19, 2024

View reviewed changes

adleong merged commit 5915ef5 into main Mar 19, 2024
35 checks passed

adleong deleted the alex/server-madness branch March 19, 2024 17:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't send endpoint profile updates from Server updates when opaqueness doesn't change #12013

Don't send endpoint profile updates from Server updates when opaqueness doesn't change #12013

adleong commented Jan 30, 2024 •

edited

Loading

zaharidichev left a comment

zaharidichev Jan 30, 2024

zaharidichev Jan 30, 2024

alpeb left a comment

alpeb Jan 30, 2024

adleong Mar 18, 2024

zaharidichev left a comment

Don't send endpoint profile updates from Server updates when opaqueness doesn't change #12013

Don't send endpoint profile updates from Server updates when opaqueness doesn't change #12013

Conversation

adleong commented Jan 30, 2024 • edited Loading

zaharidichev left a comment

Choose a reason for hiding this comment

zaharidichev Jan 30, 2024

Choose a reason for hiding this comment

zaharidichev Jan 30, 2024

Choose a reason for hiding this comment

alpeb left a comment

Choose a reason for hiding this comment

alpeb Jan 30, 2024

Choose a reason for hiding this comment

adleong Mar 18, 2024

Choose a reason for hiding this comment

zaharidichev left a comment

Choose a reason for hiding this comment

adleong commented Jan 30, 2024 •

edited

Loading