You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm forced to reschedule all Nomad jobs after node reboot or init because of #16812
However, if job type is "system", I can't reschedule it: Nomad stuck in infinite "Still waiting for allocation to be replaced" process.
==> 2023-11-09T05:51:48-05:00: Restarting 1 allocation
2023-11-09T05:51:48-05:00: Rescheduling allocation "6fe8d748" for group "brokers"
2023-11-09T05:52:48-05:00: Still waiting for allocation "6fe8d748" to be replaced
2023-11-09T05:53:48-05:00: Still waiting for allocation "6fe8d748" to be replaced
2023-11-09T05:54:48-05:00: Still waiting for allocation "6fe8d748" to be replaced
2023-11-09T05:55:48-05:00: Still waiting for allocation "6fe8d748" to be replaced
2023-11-09T05:56:48-05:00: Still waiting for allocation "6fe8d748" to be replaced
2023-11-09T05:57:48-05:00: Still waiting for allocation "6fe8d748" to be replaced
Thanks for the report. The -reschedule flag is indeed probably for non-system jobs as it assumes the Nomad reconciler will recreate the allocation, which doesn't happen for other types of jobs.
For system jobs I think we can just re-register the job, and that should trigger the reconciler to create the replacements.
But batch and sysbatch jobs don't sound should be allowed to be restart, they should run to completion unless stopped, so the command should check for them and exit early.
I have a draft PR up in #19043 to fix this, I just need to write some extra tests.
It's far from ideal but, as a workaround for now, you can call nomad job eval every time an allocation is rescheduled. This will trigger Nomad to create its replacement.
Nomad version
Nomad v1.6.1
BuildDate 2023-07-21T13:49:42Z
Revision 515895c
Operating system and Environment details
Ubuntu 20.04, arm64, nVidia Jetson
Issue
I'm forced to reschedule all Nomad jobs after node reboot or init because of #16812
However, if job type is "system", I can't reschedule it: Nomad stuck in infinite "Still waiting for allocation to be replaced" process.
Reproduction steps
nomad job restart -yes -reschedule mqtt-broker
Job file (if appropriate)
The text was updated successfully, but these errors were encountered: