-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
client: de-duplicate alloc updates and gate during restore #17074
Conversation
bb1154f
to
a27cb30
Compare
a27cb30
to
6066a18
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some suggestions; will make a second pass later on
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work fixing a long standing issue!
Due to #15808 I think this will cause a reconnecting Client to remain in initializing
until Node.GetClientAllocs
returns since it relies on Node.UpdateAlloc
being called as a signal that a node is ready again.
I can't imagine how this is an issue. I don't think there's any opportunity for further simplification or cleanup either, but I wanted to call it out in case somebody else can think of a way this effects behavior.
} | ||
if len(sync) == 0 { | ||
// No updates to send | ||
updates = make(map[string]*structs.Allocation, len(updates)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in Go 1.21 this can become clear(updates)
which... doesn't really matter because a single malloc and single piece of garbage here isn't going to make any noticeable performance difference 😅
c.allocLock.RLock() | ||
for _, update := range sync { | ||
if ar, ok := c.allocs[update.ID]; ok { | ||
ar.AcknowledgeState(&arstate.State{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you wanted to avoid storing a bunch of duplicated client alloc state and implementing a deep equality, I think we could make this a uint64
, increment it on every Client.AllocStateUpdated
call, and have the Client persist it here.
That relies on ARs/TRs not calling Client.AllocStateUpdated
even when nothing has been updated though, so I think your approach is safer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really wanted to do exactly that sort of thing by retaining an index on both the current client state and the last-acknowledged state. But that doesn't help across a restore because the AR/TR will make repeat updates as their hooks complete.
If the index were a hash it'd require less storage and potentially be less complex, but we wouldn't be able to return early on the comparison checks (whereas with the check we've got if e.g. the ClientStatus
field changes we can immediately bail out and say "yes, need to update"). That being said, maybe we can probably cut down on the complexity of the equality check by not making it an equality check -- a bunch of the (edit: as it turns out, not so much)ClientState
is immutable events, so we can get away with simple length check on those.
That relies on ARs/TRs not calling
Client.AllocStateUpdated
even when nothing has been updated though, so I think your approach is safer.
I tried pushing the LastAcknowledgedStateIsCurrent
check into the AllocStateUpdated
method in an earlier draft but then the check and the write+ACK are happening in different goroutines and there's a TOCTOU bug.
e001fe3
to
b81d349
Compare
|
bcf7f8f
to
d8d1b4b
Compare
I've made one more pass over this and added an additional unit test for the check so that we have a little more coverage of the equality check. Will merge once I've got the ✔️ from CI and this will go out with the 1.6.0 release. |
d8d1b4b
to
c953360
Compare
When client nodes are restarted, all allocations that have been scheduled on the node have their modify index updated, including terminal allocations. There are several contributing factors: * The `allocSync` method that updates the servers isn't gated on first contact with the servers. This means that if a server updates the desired state while the client is down, the `allocSync` races with the `Node.ClientGetAlloc` RPC. This will typically result in the client updating the server with "running" and then immediately thereafter "complete". * The `allocSync` method unconditionally sends the `Node.UpdateAlloc` RPC even if it's possible to assert that the server has definitely seen the client state. The allocrunner may queue-up updates even if we gate sending them. So then we end up with a race between the allocrunner updating its internal state to overwrite the previous update and `allocSync` sending the bogus or duplicate update. This changeset adds tracking of server-acknowledged state to the allocrunner. This state gets checked in the `allocSync` before adding the update to the batch, and updated when `Node.UpdateAlloc` returns successfully. To implement this we need to be able to equality-check the updates against the last acknowledged state. We also need to add the last acknowledged state to the client state DB, otherwise we'd drop unacknowledged updates across restarts. The client restart test has been expanded to cover a variety of allocation states, including allocs stopped before shutdown, allocs stopped by the server while the client is down, and allocs that have been completely GC'd on the server while the client is down. I've also bench tested scenarios where the task workload is killed while the client is down, resulting in a failed restore. Fixes #16381
c953360
to
ee9c41b
Compare
When client nodes are restarted, all allocations that have been scheduled on the node have their modify index updated, including terminal allocations. There are several contributing factors:
The
allocSync
method that updates the servers isn't gated on first contact with the servers. This means that if a server updates the desired state while the client is down, theallocSync
races with theNode.ClientGetAlloc
RPC. This will typically result in the client updating the server with "running" and then immediately thereafter "complete".The
allocSync
method unconditionally sends theNode.UpdateAlloc
RPC even if it's possible to assert that the server has definitely seen the client state. The allocrunner may queue-up updates even if we gate sending them. So then we end up with a race between the allocrunner updating its internal state to overwrite the previous update andallocSync
sending the bogus or duplicate update.This changeset adds tracking of server-acknowledged state to the allocrunner. This state gets checked in the
allocSync
before adding the update to the batch, and updated whenNode.UpdateAlloc
returns successfully. To implement this we need to be able to equality-check the updates against the last acknowledged state. We also need to add the last acknowledged state to the client state DB, otherwise we'd drop unacknowledged updates across restarts.The client restart test has been expanded to cover a variety of allocation states, including allocs stopped before shutdown, allocs stopped by the server while the client is down, and allocs that have been completely GC'd on the server while the client is down. I've also bench tested scenarios where the task workload is killed while the client is down, resulting in a failed restore.
Fixes #16381