Backport of core: enforce strict steps for clients reconnect into release/1.4.x #15879
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Backport
This PR is auto-generated from #15808 to be assessed for backporting due to the inclusion of the label backport/1.4.x.
The below text is copied from the body of the original PR.
When a Nomad client that is running an allocation with
max_client_disconnect
set misses a heartbeat the Nomad server will update its status todisconnected
.Upon reconnecting, the client will make three main RPC calls:
Node.UpdateStatus
is used to set the client status toready
.Node.UpdateAlloc
is used to update the client-side information aboutallocations, such as their
ClientStatus
, task states etc.Node.Register
is used to upsert the entire node information,including its status.
These calls are made concurrently and are also running in parallel with the scheduler. Depending on the order they run the scheduler may end up with incomplete data when reconciling allocations.
#15068 already enforced clients to heartbeat before updating their allocation data, but there are still scenarios that can generate wrong results.
For example, a client disconnects and its replacement allocation cannot be placed anywhere else, so there's a pending eval waiting for resources.
When this client comes back the order of events may be:
Node.UpdateStatus
and is nowready
.the client. The client is now assigned two allocations: the original
alloc that is still
unknown
and the replacement that ispending
.Node.UpdateAlloc
and updates the original alloc torunning
.This creates unnecessary placements or, in a different order of events, may leave the job without any allocations running until the whole state is updated and reconciled.
To avoid problems like this clients must update all of its relevant information before they can be considered
ready
and available for scheduling.To achieve this goal the RPC endpoints mentioned above have been modified to enforce strict steps for nodes reconnecting:
Node.Register
does not set the client status anymore.Node.UpdateStatus
sets the reconnecting client to theinitializing
status until it successfully calls
Node.UpdateAlloc
.These changes are done server-side to avoid the need of additional coordination between clients and servers. Clients are kept oblivious of these changes and will keep making these calls as they normally would.
The verification of whether allocations have been updates is done by storing and comparing the Raft index of the last time the client missed a heartbeat and the last time it updated its allocations.
Closes #15483