-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
prioritized client updates #17354
prioritized client updates #17354
Conversation
4e2c253
to
97878f4
Compare
|
59cfde9
to
9d55450
Compare
The allocrunner sends several updates to the server during the early lifecycle of an allocation and its tasks. Clients batch-up allocation updates every 200ms, but experiments like the C2M challenge has shown that even with this batching, servers can be overwhelmed with client updates during high volume deployments. Benchmarking done in #9451 has shown that client updates can easily represent ~70% of all Nomad Raft traffic. Each allocation sends many updates during its lifetime, but only those that change the `ClientStatus` field are critical for progressing a deployment or kicking off a reschedule to recover from failures. Add a priority to the client allocation sync and update the `syncTicker` receiver so that we only send an update if there's a high priority update waiting, or on every 5th tick. This means when there are no high priority updates, the client will send updates at most every 1s instead of 200ms. Benchmarks have shown this can reduce overall Raft traffic by 10%, as well as reduce client-to-server RPC traffic. This changeset also switches from a channel-based collection of updates to a shared buffer, so as to split batching from sending and prevent backpressure onto the allocrunner when the RPC is slow. This doesn't have a major performance benefit in the benchmarks but makes the implementation of the prioritized update simpler. Fixes: #9451
9d55450
to
116ad2a
Compare
@@ -765,6 +765,24 @@ func TestClient_SaveRestoreState(t *testing.T) { | |||
return fmt.Errorf("expected running client status, got %v", | |||
ar.AllocState().ClientStatus) | |||
} | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note for reviewers: this test was actually already racy but it's a very tight race before the changes in this PR. So these fixes for the test remove the race.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! I added a couple of very minor comments and a question, but nothing blocking.
client/client.go
Outdated
// filteredAcknowledgedUpdates returns a list of client alloc updates with the | ||
// already-acknowledged updates removed, and the highest priority of any update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand correctly, the caller must hold at least a read lock on c.allocUpdatesLock
when calling this function? If so, would it be worth adding a note to the function comment about this for future readers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd go a step further and suggest extracting the allocUpdates
map into its own little data structure and let it handle its own locking. There's quite a bit of implementation detail around what it's used for splattered all over these Client functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand correctly, the caller must hold at least a read lock on
c.allocUpdatesLock
when calling this function? If so, would it be worth adding a note to the function comment about this for future readers?
Well it's not a RWMutex
, but yes. Really the only reason this is in its own function at all now is that we need to take the c.allocLock
to read from the allocrunners and having it in its own function lets us scope-down that lock.
I'd go a step further and suggest extracting the
allocUpdates
map into its own little data structure and let it handle its own locking. There's quite a bit of implementation detail around what it's used for splattered all over these Client functions.
So in other words, make updatesToSync
and filterAcknowledgedUpdates
methods on the allocUpdates
data structure? We'd need to pass the *Client
as a parameter so that we can access the allocrunners. But yeah that seems reasonable. Let me take a quick try at that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shoenig @jrasell I've refactored this by pulling out the allocUpdates
to their own struct pendingAllocUpdates
(so as not to conflict with the other struct named allocUpdates
which is the updates received from the server!). I think this makes the whole locking situation a lot cleaner. Let me know what you think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM; just the one locking thing
case !last.DeploymentStatus.Equal(a.DeploymentStatus): | ||
return false | ||
return cstructs.AllocUpdatePriorityTypical |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically deployments are gated by this field, so it could considered critical since it can cause a scheduling decision...
...but nothing about deployments is concerned with sub-second latencies, so I think it's fine to leave this as Typical
.
If you're in this code again maybe add a comment pointing out that while deployment status changes are not urgent, they can affect scheduling but not in a way that sub-second skew is significant.
If you really want to tidy things up the PR description misses this too:
Each allocation sends many updates during its lifetime, but only those that change the
ClientStatus
field are critical for progressing a deployment or kicking off a reschedule to recover from failures.
DeploymentStatus
is critical for progressing a deployment as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically deployments are gated by this field, so it could considered critical since it can cause a scheduling decision...
...but nothing about deployments is concerned with sub-second latencies, so I think it's fine to leave this as Typical
Somehow I missed that, so yeah I would've set it to urgent based on the reasoning I had in the PR. I'll keep (for now at least) and I'll add some commentary here around reasoning for things.
ar.stateLock.RLock() | ||
defer ar.stateLock.RUnlock() | ||
|
||
last := ar.lastAcknowledgedState | ||
if last == nil { | ||
return false | ||
return cstructs.AllocUpdatePriorityTypical |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we don't know what it was before, how can we assume the change is typical? Seems worth a comment especially since all of the other code in this method must check from Highest Priority to Lowest in order to ensure a change to a low priority field doesn't demote an actually high priority update.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right that we don't know for sure. In practice an allocation will never become healthy quickly enough that the first update we send is that update. That being said we probably should account for allocations that quickly fail because there's a bunch of things that can go unrecoverably wrong on the client before we ever hit the task runner, and it'd be nice to be able to send those failure states to the server more quickly.
In #17354 we made client updates prioritized to reduce client-to-server traffic. When the client has no previously-acknowledged update we assume that the update is of typical priority; although we don't know that for sure in practice an allocation will never become healthy quickly enough that the first update we send is the update saying the alloc is healthy. But that doesn't account for allocations that quickly fail in an unrecoverable way because of allocrunner hook failures, and it'd be nice to be able to send those failure states to the server more quickly. This changeset does so and adds some extra comments on reasoning behind priority.
In #17354 we made client updates prioritized to reduce client-to-server traffic. When the client has no previously-acknowledged update we assume that the update is of typical priority; although we don't know that for sure in practice an allocation will never become healthy quickly enough that the first update we send is the update saying the alloc is healthy. But that doesn't account for allocations that quickly fail in an unrecoverable way because of allocrunner hook failures, and it'd be nice to be able to send those failure states to the server more quickly. This changeset does so and adds some extra comments on reasoning behind priority.
The allocrunner sends several updates to the server during the early lifecycle of an allocation and its tasks. Clients batch-up allocation updates every 200ms, but experiments like the C2M challenge have shown that even with this batching, servers can be overwhelmed with client updates during high volume deployments. Benchmarking done in #9451 has shown that client updates can easily represent ~70% of all Nomad Raft traffic.
Each allocation sends many updates during its lifetime, but only those that change the
ClientStatus
field are critical for progressing a deployment or kicking off a reschedule to recover from failures.Add a priority to the client allocation sync and update the
syncTicker
receiver so that we only send an update if there's a high priority update waiting, or on every 5th tick. This means when there are no high priority updates, the client will send updates at most every 1s instead of 200ms. Benchmarks have shown this can reduce overall Raft traffic by 10%, as well as reduce client-to-server RPC traffic.This changeset also switches from a channel-based collection of updates to a shared buffer, so as to split batching from sending and prevent backpressure onto the allocrunner when the RPC is slow. This doesn't have a major performance benefit in the benchmarks but makes the implementation of the prioritized update simpler.
Fixes: #9451