Skip to content

Running jobs on HPC

Mike S Wang edited this page Aug 6, 2024 · 12 revisions

Generic HPC

You should always check the site-specific guidance on running jobs for each HPC.

Demo: NERSC Perlmutter

The following guide is specific to NERSC Perlmutter.

If you have not yet installed Triumvirate, read through this wiki first.

Runtime module dependencies

The following build dependencies also appear to be runtime dependencies:

module load cray-fftw

OpenMP parallelisation

Specify the number of threads in your batch script:

#SBATCH --cpus-per-task=<num-threads>

To be certain the correct number of threads are used, also add the following lines:

#SBATCH --ntasks=1

export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}

--ntasks=1 ensures that only one process using <num-threads> threads is run when you have been allocated an entire node of more than your requested CPU cores. The environmental variable SLURM_CPUS_PER_TASK is automatically set to <num-threads> in the Slurm directive.

We recommend a value of <num-threads> that is no greater than 64 on NERSC Perlmutter for Triumvirate versions >=0.4.0 as of November 2023 (and no greater than 32 for >0.2.2,<0.4.0, no greater than 16 for <=0.2.2).

In addition, the following process affinity settings are found to be performant:

export OMP_PLACES=threads
export OMP_PROC_BIND=spread

Log messages

Triumvirate records log messages for tracking purposes. To make the log messages appear in stdout as soon as possible, add the --unbuffered option to srun for whatever command you execute, e.g.

srun --unbuffered triumvirate[...]

or

srun --unbuffered python[...]

Sample batch script

As an example (which may require modification for your own use), here is a simplified batch script I use to execute the C++ program:

#!/usr/bin/env bash

#SBATCH --job-name=triumvirate
#SBATCH --qos=shared
#SBATCH --constraint=cpu
#SBATCH --time=06:00:00
#SBATCH --mem=128G
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=32


# -- Set-up --------------------------------------------------------------

cd <repo>

module load gsl
module load cray-fftw

exe=build/bin/triumvirate
if [ ! -f ${exe} ]; then
    make clean
    make -j cppappbuild useomp=true
fi

export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
export OMP_PLACES=threads
export OMP_PROC_BIND=spread


# -- Run -----------------------------------------------------------------

srun --unbuffered ${exe} <parameter-file.ini>

You can adapt it to run the installed Python package:

#!/usr/bin/env bash

#SBATCH --job-name=triumvirate
#SBATCH --qos=shared
#SBATCH --constraint=cpu
#SBATCH --time=06:00:00
#SBATCH --mem=128G
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=12


# -- Set-up --------------------------------------------------------------

# Change to the project directory
cd <repo>

# Load the modules needed for runtime dependencies
module load cray-fftw
module load python

# Set multithreading
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
export OMP_PLACES=threads
export OMP_PROC_BIND=spread


# -- Run -----------------------------------------------------------------

srun --unbuffered python <python-script> <parameter-file.yml>

where <python-scripts> is your Python script that uses Triumvirate and parses the parameter file.

Needless to say, <parameter-file.ini> and <parameter-file.yml> are the parameter files read by the C++/Python program. See the official documentation for more details.