client: avoid acting on stale data after launch #10907

notnoop · 2021-07-17T21:22:41Z

When the client launches, use a consistent read to fetch its own allocs,
but allow stale read afterwards as long as reads don't revert into older
state.

This change addresses an edge case affecting restarting clients. When a
client restarts, it may fetch a stale data concerning its allocs: allocs
that have completed prior to the client shutdown may still have "run/running"
desired/client status, and have the client attempt to re-run again.

An alternative approach is to track the indices such that the client
set MinQueryIndex on the maximum index the client ever saw, or compare
received allocs against locally restored client state. Garbage
collection complicates this approach (local knowledge is not complete),
and the approach still risks starting "dead" allocations (e.g. the
allocation may have been placed when client just restarted and have
already been rescheduled by the time the client started. This approach
here is effective against all kinds of staleness problems with small
overhead.

Fixes #10901

When the client launches, use a consistent read to fetch its own allocs, but allow stale read afterwards as long as reads don't revert into older state. This change addresses an edge case affecting restarting client. When a client restarts, it may fetch a stale data concerning its allocs: allocs that have completed prior to the client shutdown may still have "run/running" desired/client status, and have the client attempt to re-run again. An alternative approach is to track the indices such that the client set MinQueryIndex on the maximum index the client ever saw, or compare received allocs against locally restored client state. Garbage collection complicates this approach (local knowledge is not complete), and the approach still risks starting "dead" allocations (e.g. the allocation may have been placed when client just restarted and have already been reschuled by the time the client started. This approach here is effective against all kinds of stalness problems with small overhead.

shoenig

LGTM; nice find!

When the client launches, use a consistent read to fetch its own allocs, but allow stale read afterwards as long as reads don't revert into older state. This change addresses an edge case affecting restarting client. When a client restarts, it may fetch a stale data concerning its allocs: allocs that have completed prior to the client shutdown may still have "run/running" desired/client status, and have the client attempt to re-run again. An alternative approach is to track the indices such that the client set MinQueryIndex on the maximum index the client ever saw, or compare received allocs against locally restored client state. Garbage collection complicates this approach (local knowledge is not complete), and the approach still risks starting "dead" allocations (e.g. the allocation may have been placed when client just restarted and have already been reschuled by the time the client started. This approach here is effective against all kinds of stalness problems with small overhead.

github-actions · 2022-10-20T02:45:58Z

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

notnoop added the backport/1.0 label Jul 17, 2021

notnoop added this to the 1.1.3 milestone Jul 17, 2021

notnoop requested review from schmichael, shoenig and isabeldepapel July 17, 2021 21:22

notnoop self-assigned this Jul 17, 2021

shoenig approved these changes Jul 19, 2021

View reviewed changes

changelog

f917f08

vercel bot temporarily deployed to Preview – nomad July 19, 2021 18:52 Inactive

vercel bot deployed to Preview – nomad-storybook-and-ui July 19, 2021 18:52 View deployment

schmichael approved these changes Jul 19, 2021

View reviewed changes

notnoop merged commit 3165ae8 into main Jul 20, 2021

notnoop deleted the b-client-allocs-consistency branch July 20, 2021 19:13

notnoop added the stage/needs-backporting label Jul 28, 2021

notnoop mentioned this pull request Jul 28, 2021

update changelog #10963

Merged

schmichael removed the stage/needs-backporting label Sep 17, 2021

lgfa29 removed the backport/1.0 label Apr 14, 2022

github-actions bot locked as resolved and limited conversation to collaborators Oct 20, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

client: avoid acting on stale data after launch #10907

client: avoid acting on stale data after launch #10907

notnoop commented Jul 17, 2021 •

edited

Loading

shoenig left a comment

github-actions bot commented Oct 20, 2022

client: avoid acting on stale data after launch #10907

client: avoid acting on stale data after launch #10907

Conversation

notnoop commented Jul 17, 2021 • edited Loading

shoenig left a comment

Choose a reason for hiding this comment

github-actions bot commented Oct 20, 2022

notnoop commented Jul 17, 2021 •

edited

Loading