Backport of Track plan rejection history and automatically mark clients as ineligible into release/1.3.x #13729
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Backport
This PR is auto-generated from #13421 to be assessed for backporting due to the inclusion of the label backport/1.3.x.
The below text is copied from the body of the original PR.
Plan rejections occur when the scheduler work and the leader plan
applier disagree on the feasibility of a plan. This may happen for valid
reasons: since Nomad does parallel scheduling, it is expected that
different workers will have a different state when computing placements.
As the final plan reaches the leader plan applier, it may no longer be
valid due to a concurrent scheduling taking up intended resources. In
these situations the plan applier will notify the worker that the plan
was rejected and that they should refresh their state before trying
again.
In some rare and unexpected circumstances it has been observed that
workers will repeatedly submit the same plan, even if they are always
rejected.
While the root cause is still unknown this mitigation has been put in
place. The plan applier will now track the history of plan rejections
per client and include in the plan result a list of node IDs that should
be set as ineligible if the number of rejections in a given time window
crosses a certain threshold. The window size and threshold value can be
adjusted in the server configuration.
Closes #13017
Closes #12920
Note for reviewers: since we can't yet reliably reproduce this bug, the way I tested this was by applying this patch that causes a plan to be rejected if it is evaluated by a server running the env var
CRASH
set it's for a client with a name that starts withcrash
.So, after applying the patch, start a 3 server cluster with one of them having the
CRASH
env var set and make sure this server becomes the leader. Start a client with the name starting withcrash
and run a job.Monitoring the log you should see the plan rejection messages and, after a few minutes, the client will become ineligible. You can then start a client without
crash
in the name to verify that the job scheduling will proceed to the new client.