client: de-duplicate alloc updates and gate during restore #17074

tgross · 2023-05-03T20:44:31Z

When client nodes are restarted, all allocations that have been scheduled on the node have their modify index updated, including terminal allocations. There are several contributing factors:

The allocSync method that updates the servers isn't gated on first contact with the servers. This means that if a server updates the desired state while the client is down, the allocSync races with the Node.ClientGetAlloc RPC. This will typically result in the client updating the server with "running" and then immediately thereafter "complete".
The allocSync method unconditionally sends the Node.UpdateAlloc RPC even if it's possible to assert that the server has definitely seen the client state. The allocrunner may queue-up updates even if we gate sending them. So then we end up with a race between the allocrunner updating its internal state to overwrite the previous update and allocSync sending the bogus or duplicate update.

This changeset adds tracking of server-acknowledged state to the allocrunner. This state gets checked in the allocSync before adding the update to the batch, and updated when Node.UpdateAlloc returns successfully. To implement this we need to be able to equality-check the updates against the last acknowledged state. We also need to add the last acknowledged state to the client state DB, otherwise we'd drop unacknowledged updates across restarts.

The client restart test has been expanded to cover a variety of allocation states, including allocs stopped before shutdown, allocs stopped by the server while the client is down, and allocs that have been completely GC'd on the server while the client is down. I've also bench tested scenarios where the task workload is killed while the client is down, resulting in a failed restore.

Fixes #16381

shoenig

Added some suggestions; will make a second pass later on

client/allocrunner/alloc_runner.go

client/client.go

nomad/structs/structs.go

schmichael

Great work fixing a long standing issue!

Due to #15808 I think this will cause a reconnecting Client to remain in initializing until Node.GetClientAllocs returns since it relies on Node.UpdateAlloc being called as a signal that a node is ready again.

I can't imagine how this is an issue. I don't think there's any opportunity for further simplification or cleanup either, but I wanted to call it out in case somebody else can think of a way this effects behavior.

client/client.go

schmichael · 2023-05-08T23:36:20Z

client/client.go

+			}
+			if len(sync) == 0 {
+				// No updates to send
+				updates = make(map[string]*structs.Allocation, len(updates))


in Go 1.21 this can become clear(updates) which... doesn't really matter because a single malloc and single piece of garbage here isn't going to make any noticeable performance difference 😅

schmichael · 2023-05-09T00:11:32Z

client/client.go

+			c.allocLock.RLock()
+			for _, update := range sync {
+				if ar, ok := c.allocs[update.ID]; ok {
+					ar.AcknowledgeState(&arstate.State{


If you wanted to avoid storing a bunch of duplicated client alloc state and implementing a deep equality, I think we could make this a uint64, increment it on every Client.AllocStateUpdated call, and have the Client persist it here.

That relies on ARs/TRs not calling Client.AllocStateUpdated even when nothing has been updated though, so I think your approach is safer.

I really wanted to do exactly that sort of thing by retaining an index on both the current client state and the last-acknowledged state. But that doesn't help across a restore because the AR/TR will make repeat updates as their hooks complete.

If the index were a hash it'd require less storage and potentially be less complex, but we wouldn't be able to return early on the comparison checks (whereas with the check we've got if e.g. the ClientStatus field changes we can immediately bail out and say "yes, need to update"). That being said, maybe we can probably cut down on the complexity of the equality check by not making it an equality check -- a bunch of the ClientState is immutable events, so we can get away with simple length check on those. (edit: as it turns out, not so much)

That relies on ARs/TRs not calling Client.AllocStateUpdated even when nothing has been updated though, so I think your approach is safer.

I tried pushing the LastAcknowledgedStateIsCurrent check into the AllocStateUpdated method in an earlier draft but then the check and the write+ACK are happening in different goroutines and there's a TOCTOU bug.

tgross · 2023-05-09T15:42:13Z

~~Somehow in addressing comments I've managed to break the test. Will need to figure out what I broke here~~ 😊 Fixed

tgross · 2023-05-10T19:30:13Z

I've made one more pass over this and added an additional unit test for the check so that we have a little more coverage of the equality check. Will merge once I've got the ✔️ from CI and this will go out with the 1.6.0 release.

When client nodes are restarted, all allocations that have been scheduled on the node have their modify index updated, including terminal allocations. There are several contributing factors: * The `allocSync` method that updates the servers isn't gated on first contact with the servers. This means that if a server updates the desired state while the client is down, the `allocSync` races with the `Node.ClientGetAlloc` RPC. This will typically result in the client updating the server with "running" and then immediately thereafter "complete". * The `allocSync` method unconditionally sends the `Node.UpdateAlloc` RPC even if it's possible to assert that the server has definitely seen the client state. The allocrunner may queue-up updates even if we gate sending them. So then we end up with a race between the allocrunner updating its internal state to overwrite the previous update and `allocSync` sending the bogus or duplicate update. This changeset adds tracking of server-acknowledged state to the allocrunner. This state gets checked in the `allocSync` before adding the update to the batch, and updated when `Node.UpdateAlloc` returns successfully. To implement this we need to be able to equality-check the updates against the last acknowledged state. We also need to add the last acknowledged state to the client state DB, otherwise we'd drop unacknowledged updates across restarts. The client restart test has been expanded to cover a variety of allocation states, including allocs stopped before shutdown, allocs stopped by the server while the client is down, and allocs that have been completely GC'd on the server while the client is down. I've also bench tested scenarios where the task workload is killed while the client is down, resulting in a failed restore. Fixes #16381

tgross changed the title ~~client: de-duplicare alloc updates and gate during restore~~ client: de-duplicate alloc updates and gate during restore May 3, 2023

tgross force-pushed the gate-allocrunner-on-restart branch from bb1154f to a27cb30 Compare May 3, 2023 20:47

tgross added theme/client theme/client-restart labels May 3, 2023

tgross added this to the 1.6.0 milestone May 3, 2023

vercel bot deployed to Preview – nomad-storybook-and-ui May 3, 2023 20:51 View deployment

tgross force-pushed the gate-allocrunner-on-restart branch from a27cb30 to 6066a18 Compare May 3, 2023 21:09

vercel bot deployed to Preview – nomad-storybook-and-ui May 3, 2023 21:14 View deployment

tgross marked this pull request as ready for review May 4, 2023 13:08

tgross requested review from shoenig and schmichael May 4, 2023 13:08

shoenig reviewed May 8, 2023

View reviewed changes

schmichael approved these changes May 9, 2023

View reviewed changes

vercel bot deployed to Preview – nomad-storybook-and-ui May 9, 2023 14:52 View deployment

vercel bot deployed to Preview – nomad-storybook-and-ui May 9, 2023 15:01 View deployment

tgross force-pushed the gate-allocrunner-on-restart branch from e001fe3 to b81d349 Compare May 9, 2023 15:13

vercel bot deployed to Preview – nomad-storybook-and-ui May 9, 2023 15:18 View deployment

vercel bot deployed to Preview – nomad-storybook-and-ui May 9, 2023 16:22 View deployment

vercel bot deployed to Preview – nomad-storybook-and-ui May 10, 2023 19:08 View deployment

tgross force-pushed the gate-allocrunner-on-restart branch from bcf7f8f to d8d1b4b Compare May 10, 2023 19:26

vercel bot deployed to Preview – nomad-storybook-and-ui May 10, 2023 19:32 View deployment

tgross force-pushed the gate-allocrunner-on-restart branch from d8d1b4b to c953360 Compare May 10, 2023 19:36

vercel bot deployed to Preview – nomad-storybook-and-ui May 10, 2023 19:40 View deployment

tgross force-pushed the gate-allocrunner-on-restart branch from c953360 to ee9c41b Compare May 10, 2023 20:46

vercel bot deployed to Preview – nomad-storybook-and-ui May 10, 2023 20:51 View deployment

tgross merged commit 116f24d into main May 11, 2023

tgross deleted the gate-allocrunner-on-restart branch May 11, 2023 13:05

tgross mentioned this pull request May 11, 2023

Node restart causes modify index of all allocs on that node to be bumped #16381

Closed

marvinchin mentioned this pull request Apr 11, 2024

Restarted node takes a long time to become useful due to re-processesing already terminal allocations #20354

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

client: de-duplicate alloc updates and gate during restore #17074

client: de-duplicate alloc updates and gate during restore #17074

tgross commented May 3, 2023 •

edited

Loading

shoenig left a comment

schmichael left a comment

schmichael May 8, 2023

schmichael May 9, 2023

tgross May 9, 2023 •

edited

Loading

tgross commented May 9, 2023 •

edited

Loading

tgross commented May 10, 2023

client: de-duplicate alloc updates and gate during restore #17074

client: de-duplicate alloc updates and gate during restore #17074

Conversation

tgross commented May 3, 2023 • edited Loading

shoenig left a comment

Choose a reason for hiding this comment

schmichael left a comment

Choose a reason for hiding this comment

schmichael May 8, 2023

Choose a reason for hiding this comment

schmichael May 9, 2023

Choose a reason for hiding this comment

tgross May 9, 2023 • edited Loading

Choose a reason for hiding this comment

tgross commented May 9, 2023 • edited Loading

tgross commented May 10, 2023

tgross commented May 3, 2023 •

edited

Loading

tgross May 9, 2023 •

edited

Loading

tgross commented May 9, 2023 •

edited

Loading