Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

client: de-duplicate alloc updates and gate during restore #17074

Merged
merged 1 commit into from
May 11, 2023

Conversation

tgross
Copy link
Member

@tgross tgross commented May 3, 2023

When client nodes are restarted, all allocations that have been scheduled on the node have their modify index updated, including terminal allocations. There are several contributing factors:

  • The allocSync method that updates the servers isn't gated on first contact with the servers. This means that if a server updates the desired state while the client is down, the allocSync races with the Node.ClientGetAlloc RPC. This will typically result in the client updating the server with "running" and then immediately thereafter "complete".

  • The allocSync method unconditionally sends the Node.UpdateAlloc RPC even if it's possible to assert that the server has definitely seen the client state. The allocrunner may queue-up updates even if we gate sending them. So then we end up with a race between the allocrunner updating its internal state to overwrite the previous update and allocSync sending the bogus or duplicate update.

This changeset adds tracking of server-acknowledged state to the allocrunner. This state gets checked in the allocSync before adding the update to the batch, and updated when Node.UpdateAlloc returns successfully. To implement this we need to be able to equality-check the updates against the last acknowledged state. We also need to add the last acknowledged state to the client state DB, otherwise we'd drop unacknowledged updates across restarts.

The client restart test has been expanded to cover a variety of allocation states, including allocs stopped before shutdown, allocs stopped by the server while the client is down, and allocs that have been completely GC'd on the server while the client is down. I've also bench tested scenarios where the task workload is killed while the client is down, resulting in a failed restore.

Fixes #16381

@tgross tgross changed the title client: de-duplicare alloc updates and gate during restore client: de-duplicate alloc updates and gate during restore May 3, 2023
@tgross tgross force-pushed the gate-allocrunner-on-restart branch from bb1154f to a27cb30 Compare May 3, 2023 20:47
@tgross tgross added this to the 1.6.0 milestone May 3, 2023
@tgross tgross force-pushed the gate-allocrunner-on-restart branch from a27cb30 to 6066a18 Compare May 3, 2023 21:09
@tgross tgross marked this pull request as ready for review May 4, 2023 13:08
@tgross tgross requested review from shoenig and schmichael May 4, 2023 13:08
Copy link
Member

@shoenig shoenig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some suggestions; will make a second pass later on

client/allocrunner/alloc_runner.go Outdated Show resolved Hide resolved
client/client.go Outdated Show resolved Hide resolved
client/client.go Outdated Show resolved Hide resolved
client/client.go Outdated Show resolved Hide resolved
nomad/structs/structs.go Outdated Show resolved Hide resolved
nomad/structs/structs.go Outdated Show resolved Hide resolved
nomad/structs/structs.go Outdated Show resolved Hide resolved
Copy link
Member

@schmichael schmichael left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work fixing a long standing issue!

Due to #15808 I think this will cause a reconnecting Client to remain in initializing until Node.GetClientAllocs returns since it relies on Node.UpdateAlloc being called as a signal that a node is ready again.

I can't imagine how this is an issue. I don't think there's any opportunity for further simplification or cleanup either, but I wanted to call it out in case somebody else can think of a way this effects behavior.

client/client.go Outdated Show resolved Hide resolved
}
if len(sync) == 0 {
// No updates to send
updates = make(map[string]*structs.Allocation, len(updates))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in Go 1.21 this can become clear(updates) which... doesn't really matter because a single malloc and single piece of garbage here isn't going to make any noticeable performance difference 😅

c.allocLock.RLock()
for _, update := range sync {
if ar, ok := c.allocs[update.ID]; ok {
ar.AcknowledgeState(&arstate.State{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you wanted to avoid storing a bunch of duplicated client alloc state and implementing a deep equality, I think we could make this a uint64, increment it on every Client.AllocStateUpdated call, and have the Client persist it here.

That relies on ARs/TRs not calling Client.AllocStateUpdated even when nothing has been updated though, so I think your approach is safer.

Copy link
Member Author

@tgross tgross May 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really wanted to do exactly that sort of thing by retaining an index on both the current client state and the last-acknowledged state. But that doesn't help across a restore because the AR/TR will make repeat updates as their hooks complete.

If the index were a hash it'd require less storage and potentially be less complex, but we wouldn't be able to return early on the comparison checks (whereas with the check we've got if e.g. the ClientStatus field changes we can immediately bail out and say "yes, need to update"). That being said, maybe we can probably cut down on the complexity of the equality check by not making it an equality check -- a bunch of the ClientState is immutable events, so we can get away with simple length check on those. (edit: as it turns out, not so much)

That relies on ARs/TRs not calling Client.AllocStateUpdated even when nothing has been updated though, so I think your approach is safer.

I tried pushing the LastAcknowledgedStateIsCurrent check into the AllocStateUpdated method in an earlier draft but then the check and the write+ACK are happening in different goroutines and there's a TOCTOU bug.

@tgross
Copy link
Member Author

tgross commented May 9, 2023

Somehow in addressing comments I've managed to break the test. Will need to figure out what I broke here 😊 Fixed

@tgross
Copy link
Member Author

tgross commented May 10, 2023

I've made one more pass over this and added an additional unit test for the check so that we have a little more coverage of the equality check. Will merge once I've got the ✔️ from CI and this will go out with the 1.6.0 release.

When client nodes are restarted, all allocations that have been scheduled on the
node have their modify index updated, including terminal allocations. There are
several contributing factors:

* The `allocSync` method that updates the servers isn't gated on first contact
  with the servers. This means that if a server updates the desired state while
  the client is down, the `allocSync` races with the `Node.ClientGetAlloc`
  RPC. This will typically result in the client updating the server with "running"
  and then immediately thereafter "complete".

* The `allocSync` method unconditionally sends the `Node.UpdateAlloc` RPC even
  if it's possible to assert that the server has definitely seen the client
  state. The allocrunner may queue-up updates even if we gate sending them. So
  then we end up with a race between the allocrunner updating its internal state
  to overwrite the previous update and `allocSync` sending the bogus or duplicate
  update.

This changeset adds tracking of server-acknowledged state to the
allocrunner. This state gets checked in the `allocSync` before adding the update
to the batch, and updated when `Node.UpdateAlloc` returns successfully. To
implement this we need to be able to equality-check the updates against the last
acknowledged state. We also need to add the last acknowledged state to the
client state DB, otherwise we'd drop unacknowledged updates across restarts.

The client restart test has been expanded to cover a variety of allocation
states, including allocs stopped before shutdown, allocs stopped by the server
while the client is down, and allocs that have been completely GC'd on the
server while the client is down. I've also bench tested scenarios where the task
workload is killed while the client is down, resulting in a failed restore.

Fixes #16381
@tgross tgross force-pushed the gate-allocrunner-on-restart branch from c953360 to ee9c41b Compare May 10, 2023 20:46
@tgross tgross merged commit 116f24d into main May 11, 2023
@tgross tgross deleted the gate-allocrunner-on-restart branch May 11, 2023 13:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Node restart causes modify index of all allocs on that node to be bumped
3 participants