Skip to content

BYUHPC/slurm-auto-array

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

49 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

slurm-auto-array

slurm-auto-array provides users of sbatch an easier way to submit job arrays, especially when their work units are very small and/or very numerous. Rather than manually creating a mapping from SLURM_ARRAY_TASK_ID to the arguments they want to run a command on, they supply said arguments directly over stdin much like one would with parallel or xargs. For example, to run mycmd --infile $FILE on every FILE with a name ending in ".in" in the directory infiles, allowing each work unit 1 GiB of memory and a processor for an hour, one could use:

ls infiles/*.in | slurm-auto-array --time 1:00:00 --ntasks 1 --mem 1G -- mycmd --infile

...or, equivalently:

slurm-auto-array --time 1:00:00 --ntasks 1 --mem 1G -- mycmd --infile :::: <(ls infiles/*.in)

slurm-auto-array aggregates work since job arrays consisting of many jobs are hard on the scheduler. If a user wants to run a command on each of 100,000 files, slurm-auto-array will by default submit at most 1,000 jobs, each in charge of at least 100 work units (which are run with parallel). The parameters that determine the amount of work units that each array task runs can be tuned; see the configuration section of the man page. Despite the aggregation, output files can still be made distinct per work unit.

Although we've found slurm-auto-array to work well for many users on our system, it's still a rough draft that hasn't been tested elsewhere--treat it as early beta software. slurm-array-submit is another option.

Here are the slides from the 2024 RMACC presentation on slurm-auto-array.

Installation

slurm-auto-array requires Slurm, GNU Parallel, and Python (3.6 or higher) at runtime. Pandoc is required to install from this repository (but not to install from a release), and bats and all the runtime dependencies are required to run make check.

Given that you want to install in /my/software/slurm-auto-array:

# Only needed if installing from git directly:
aclocal
autoconf
automake --add-missing
# Needed for any installation method:
./configure --prefix=/my/software/slurm-auto-array
make check
make install

You'll need to run parallel --citation; parallel --record-env in a clean environment before slurm-auto-array or make check will work.

make dist will create a release tarball.

Usage

In its simplest form, slurm-auto-array runs a command on on each of several user-supplied arguments--for example, to run echo 1, echo 2, and echo 3, you could submit with any of:

slurm-auto-array --output echo-%1.txt -- echo ::: 1 2 3
slurm-auto-array --output echo-%1.txt -- echo :::: <(seq 3)
seq 3 | slurm-auto-array --output echo-%1.txt -- echo

This will result in 3 files, echo-1.txt, echo-2.txt, and echo-3.txt, each containing the number in its title.

If no :::, :::+, ::::, or ::::+ arguments are specified, arguments are taken from stdin; if colon arguments are given, stdin is passed to the command to be run. One could thus use the following to get outputs 1 a b c, 2 a b c, and 3 a b c in the files slurm-auto-array-*.out:

echo a b c | slurm-auto-array -- bash -c 'echo "$0 $(cat)"' :::: <(seq 3)

Multiple sets of arguments can be specified with :::, in which case the arguments will be crossed. To run mycommand $letter $number for every combination of letter between A and D and every number between 4 and 10, allocating 2 CPUs and 4 GB of memory for 3 hours for each run, use:

slurm-auto-array -n 2 --mem 4G -t 3:00:00 -- mycommand ::: A B C D ::: 4 5 6 7 8 9 10

Arguments can be paired rather than being crossed by using :::+ rather than :::. To run echo 1 X a, echo 2 Y b, and echo 3 Z c, use:

slurm-auto-array -- echo ::: 1 2 3 :::+ X Y Z :::+ a b c

Use 4 colons rather than 3 to specify a file containing arguments rather than the arguments themselves. To run echo $N alpha a and echo $N beta b for each N from 3 to 8, run:

slurm-auto-array -- echo :::: <(seq 3 8) ::: alpha beta ::::+ latin_letters.txt

Worked example

Suppose you have many input files scattered about the deep directory infiles, each named *blah*.in. For each of these files, you'd like to run the equivalent of:

mycommand --permutation $N \
          --infile "$INFILE" \
          --outfile "${INFILE%.in}-$N.out" \
          &> "${INFILE%.in}-$N.log"

...for each N in 1 through 4, creating a *blah*-$N.out for each input file with mycommand and capturing the output in *blah*-$N.log. For convenience, you create a script, run-mycommand.sh, that takes two arguments: the permutation, and the filename stripped of its suffix. Here it is:

#!/bin/bash

N="$1"
IN="$2.in"
OUT="$2-$N.out"

mycommand --permutation "$N" --infile "$IN" --outfile "$OUT"

To use this with slurm-auto-array, you'll need the input files stripped of their suffix:

find infiles -name '*blah*.in' -exec basename {} .in \;

...and the permutations, which can be obtained with seq 4.

To run each instance of mycommand with 8 CPUs and 4 GB of memory for 2 hours using slurm-auto-array, you can use:

slurm-auto-array -n 8 --mem 4g -t 2:00:00 -o %2-%1.log -- \
                 run-mycommand.sh :::: <(seq 4) \
                                  :::: <(find infiles -name '*blah*.in' -exec basename {} .in \;)