Nomad CLI should allow multiple jobs in stop command #2390

bengaywins · 2017-03-02T00:27:41Z

Currently one is only able to feed a single job ID into the nomad stop <job> command. It would be great if this could allow for many as opposed to just a single one. That way one could stop thousands of jobs at once if necessary as opposed to needing a for loop or similar single execution per job.

The text was updated successfully, but these errors were encountered:

nugend · 2017-03-03T19:36:15Z

I see the utility of this, but also question what the implementation should look like. Mainly, if one is stopping thousands of jobs, then it implies that there is some scripting involved (to get the job names in the first place). At that point, what practical difference is there between issuing nomad stop <job> thousands of times in a loop?

My thoughts are that some sort of name or tag filtering would be the most appropriate way to implement it, but maybe I am overlooking something?

bengaywins · 2017-03-03T19:47:45Z

I will use my exact scenario that happened. I had nearly 5k jobs that needed to be stopped and it took a little over 2hrs to do in a for loop. If one can stop many jobs in a single go, without the need to submit a single command each time for every job and wait for that command to give a status, this could be dropped down to minutes.

dadgar · 2017-03-03T21:28:32Z

@gehzumteufel Did you use nomad stop -detach <job-id>.

nugend · 2017-03-03T21:46:46Z

@gehzumteufel Ah, I see. So not just a nomad stop, but a stop that works in parallel.

I think @dadgar's idea is workable, but solutions that involve external scripting (since you might not want to move on until everything is down) can be failure prone.

Miserlou · 2018-03-15T21:05:43Z

We are also finding that stopping a large number of jobs in sequence is very time consuming. We would like to be able to nomad stop --all.

Also why do all hashicorp products only use single - rather than -- for named arguments. So annoying.

schmichael · 2018-03-19T18:21:18Z

Also why do all hashicorp products only use single - rather than -- for named arguments. So annoying.
-- @Miserlou

A Go-ism (from Plan 9 before that?) we chose to keep I'm afraid: https://golang.org/pkg/flag/

Miserlou · 2018-08-08T15:00:53Z

In case anybody is here looking for this basic functionality that Nomad should provide but doesn't, and you have likely made a mistake by choosing this stack, you can at least try this:

echo "Killing dispatch jobs... (This may take a while.)"
if [[ $(nomad status) != "No running jobs" ]]; then
    for job in $(nomad status | awk {'print $1'} || grep /)
    do  
        # Skip the header row for jobs.
        if [ $job != "ID" ]; then
            nomad stop -purge -detach $job > /dev/null
        fi  
    done
fi

Fuco1 · 2019-06-08T20:05:01Z

With use of GNU parallel you can speed this up significantly

nomad job status YOURJOB | grep pending | awk '{print $1}' > jobs
cat jobs | parallel -j32 nomad job stop -detach -purge

Adjust the number of cores as you see fit

analytically · 2020-07-15T09:52:14Z

Or just kill all pending jobs. It's pretty bad for operators that this behaviour is not built-in.

BirkhoffLee · 2022-03-31T21:57:09Z

In case anybody is here looking for this basic functionality that Nomad should provide but doesn't, and you have likely made a mistake by choosing this stack, you can at least try this:

Heads up for those who wants to try out Nomad for production: I just spent the last one hour to try to kill 700 jobs that is causing the cluster to freeze. Still ongoing.

danishprakash · 2022-04-05T06:05:26Z

@schmichael Looked at Run() for JobStopCommand and it seems like we can accept multiple jobs and then concurrently stop them? I can work on this if that seems like the right direction.

mikenomitch · 2022-04-06T16:24:01Z

@danishprakash, I just checked with engineering and that sounds good!

If you pick this up and want some guidance, please let us know. And feel free to open a WIP/draft PR too - doesn't have to be perfect before getting feedback. Thank you!

schmichael · 2022-04-06T16:26:42Z

tl;dr - +1 to @danishprakash

@danishprakash Sounds good to me! Concurrent vs sequential is an interesting question, but I think your choice - concurrent - is the right one. Sequential is easy enough to script already, and a concurrent stop operation could someday call a batch/atomic stop API which in @BirkhoffLee's case could be a significant optimization! 1 command, API call, and Raft commit instead of 700 of each.

Concurrent implies we attempt to stop all jobs even if any of them encounter an error. That means in the case of a missing ACL token we'll be spewing 1 error per job listed, but I think that's ok. I think halting on the first error encountered would be far worse as it would be difficult to know what got stopped successfully and what didn't.

So the design is:

Concurrent stops via goroutines in the CLI making independent HTTP requests
Soft-fail on errors (log and allow other operations to continue)
Future Work: batch/atomic stop support in the API/Raft.

bengaywins · 2022-04-06T17:44:03Z

Funny that I reported this years ago, with the same exact issue that @BirkhoffLee had, because I had 5000 to kill. Appreciate that I wasn't the only one.

danishprakash · 2022-04-10T14:33:28Z

@schmichael thanks for the helpful summary. I've started looking into this, just trying to understand the relevant pieces right now before making any changes, I'll open a draft PR soon.

Concurrent vs sequential is an interesting question, but I think your choice - concurrent - is the right one.

I think this stemmed from seeing how kubectl does this. You can pass multiple entities and the client fires off a request to the server and moves on to the next entity. Of course, it might not be a 1-1 implementation here but that felt pretty intuitive. Error handling in that context becomes quite different and equally important as you mentioned.

Concurrent implies we attempt to stop all jobs even if any of them encounter an error.

Wait, does this mean out of the stop cmd context or did I miss something here?

schmichael · 2022-04-13T23:16:15Z

@danishprakash

Concurrent implies we attempt to stop all jobs even if any of them encounter an error.

Wait, does this mean out of the stop cmd context or did I miss something here?

I meant that if a user runs:

nomad job stop job1 jobDoesNotExist job2

...and jobDoesNotExist doesn't exist: job1 and job2 should still get stopped successfully. We should display an error for jobDoesNotExist but still stop the other 2 jobs.

Basically the same as using bash job control:

$ nomad job stop foo &
[1] 3507196
$ nomad job stop doesNotExist & # <-- this will end up exiting with an error
[2] 3507206
$ nomad job stop bar &
[3] 3507210
$ wait
[1]   Done                    nomad job stop foo
[2]-  Exit 1                  nomad job stop doesNotExist
[3]+  Done                    nomad job stop bar

github-actions · 2023-04-16T02:12:42Z

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

dadgar added theme/cli type/enhancement labels Mar 2, 2017

Amier3 added the help-wanted We encourage community PRs for these issues! label Apr 1, 2022

danishprakash mentioned this issue Apr 15, 2022

command/job_stop: accept multiple jobs, stop concurrently #12582

Merged

schmichael closed this as completed in #12582 Dec 16, 2022

github-actions bot locked as resolved and limited conversation to collaborators Apr 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nomad CLI should allow multiple jobs in stop command #2390

Nomad CLI should allow multiple jobs in stop command #2390

bengaywins commented Mar 2, 2017

nugend commented Mar 3, 2017

bengaywins commented Mar 3, 2017 •

edited

Loading

dadgar commented Mar 3, 2017

nugend commented Mar 3, 2017

Miserlou commented Mar 15, 2018 •

edited

Loading

schmichael commented Mar 19, 2018

Miserlou commented Aug 8, 2018 •

edited

Loading

Fuco1 commented Jun 8, 2019

analytically commented Jul 15, 2020

BirkhoffLee commented Mar 31, 2022

danishprakash commented Apr 5, 2022

mikenomitch commented Apr 6, 2022

schmichael commented Apr 6, 2022 •

edited

Loading

bengaywins commented Apr 6, 2022

danishprakash commented Apr 10, 2022

schmichael commented Apr 13, 2022

github-actions bot commented Apr 16, 2023

Nomad CLI should allow multiple jobs in stop command #2390

Nomad CLI should allow multiple jobs in stop command #2390

Comments

bengaywins commented Mar 2, 2017

nugend commented Mar 3, 2017

bengaywins commented Mar 3, 2017 • edited Loading

dadgar commented Mar 3, 2017

nugend commented Mar 3, 2017

Miserlou commented Mar 15, 2018 • edited Loading

schmichael commented Mar 19, 2018

Miserlou commented Aug 8, 2018 • edited Loading

Fuco1 commented Jun 8, 2019

analytically commented Jul 15, 2020

BirkhoffLee commented Mar 31, 2022

danishprakash commented Apr 5, 2022

mikenomitch commented Apr 6, 2022

schmichael commented Apr 6, 2022 • edited Loading

bengaywins commented Apr 6, 2022

danishprakash commented Apr 10, 2022

schmichael commented Apr 13, 2022

github-actions bot commented Apr 16, 2023

bengaywins commented Mar 3, 2017 •

edited

Loading

Miserlou commented Mar 15, 2018 •

edited

Loading

Miserlou commented Aug 8, 2018 •

edited

Loading

schmichael commented Apr 6, 2022 •

edited

Loading