-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Job Evaluations are not correctly adjusting for dead worker nodes #1663
Comments
In your example there are only 3 nodes up but the status says it is running on 6? |
Would it be possible to maybe share two node configs and a job file that will expose this behavior? |
There are 3 live nodes (think of these as worker-green v2). The 6 running jobs are allocated on 3 worker-green v1 (dead) and 3 worker-blue v1 (dead). Deploy Job SpecNOTE: I dropped
Sample Green Worker Config
Sample Blue Worker Config
|
This problem appears to be limited to It's also worth noting that the nodes remain listed in It's unclear why system jobs remain "running" on a ghost node. |
@steve-jansen Thanks for the additional detail. Will get this fixed before releasing 0.5! |
I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues. |
Nomad version
Clients and Servers
Operating system and Environment details
Our clients have the following set:
We are running nomad workers using immutable infrastructure.
We have 2 sets of workgroups (blue and green) that allow us to upgrade worker boxes without downtime.
We run nomad jobs at both workgroups using constraints.
Our job status may look something like this:
Issue
The issue arises when upgrading our nodes.
This particular job is scheduled as a system job so we expect it to run on all worker nodes that are live.
Instead, what we get is the job is placed based on comparing "total live workers" vs "total allocations".
From the above job status, we can see that all 6 allocations are on down worker nodes.
Since a system job should run on every box, we would expect 3 live allocations.
Reproduction steps
The text was updated successfully, but these errors were encountered: