Skip to content

Building and running with Trilinos

Nicole Slattengren edited this page Aug 14, 2020 · 13 revisions

Vortex

You can learn more about using the machine by running less /opt/VORTEX_INTRO after logging in.

Building

  1. To grab the current selection of modules/Trilinos (with RDC required):
source /projects/empire/installs/vortex/CUDA-10.1.243_GNU-7.3.1_SPMPI-ROLLING-RELEASE-CUDA-STATIC/trilinos/latest/load_matching_env.sh
  1. This is the build script I use for basic builds
#!/usr/bin/env bash

set +ex

empire=$1

if test $# -eq 0
then
    echo "usage: $0 <empire-dir> [ <trace-enabled=0> ] [ <build-type=Release> ] "
    exit 1
fi


if test $# -gt 1
then
    trace=$2
else
    trace=0
fi

if test $# -gt 2
then
    build_type=$3
else
    build_type=Release
fi

cmake -GNinja -DCMAKE_EXPORT_COMPILE_COMMANDS=true -DEMPIRE_ENABLE_WERROR=OFF -DEMPIRE_ENABLE_PIC=ON -Dvt_trace_enabled=${trace} -DCMAKE_BUILD_TYPE=${build_type} ${empire}
ninja EMPIRE_PIC.exe

Running

To run an interactive job on Vortex with a proper shell run:

bsub -nnodes 16 -Is bash

Scheduling

The scheduler is [IBM LSF](The scheduler is IBM LSF: https://www.ibm.com/support/knowledgecenter/en/SSWRJV_10.1.0/lsf_users_guide/chap_jobs_lsf.html).

To schedule a batch job:

bsub -N -nnodes 16 -W <time_limit> -C 1000000000 -o <stdout_file> -e <stderr_file> <run_script>

The output and error files will only appear after the job has terminated. If you want to know what's happening sooner:

bpeek <job_id>

To see my jobs (both running and pending) summarized, I use:

bjobs -o "user: stat: jobid: job_name:25 submit_time: start_time: run_time: time_left: estimated_start_time:"

To see all jobs, add -u all to the end. If you want to know how wide the running jobs are, it's best to just use:

bjobs -u all

If you schedule multiple jobs and decide not to run them in the order they were submitted, you can move a specific job to the top of your list using:

btop <job_id>

To kill a job, running or pending:

bkill <job_id>

To put a job on hold or release it:

bstop <job_id>
bresume <job_id>

Mutrino

You can learn more about using the machine by running less /opt/MUTRINO_INTRO after logging in.

Building

  1. To grab the current selection of modules/Trilinos (with RDC required):
module swap intel/19.0.4 intel/18.0.5
module unload cray-libsci/19.02.1
source /projects/empire/installs/mutrino/INTEL-18.0.5_MPICH-7.7.6-RELEASE-OPENMP-STATIC/trilinos/latest/load_matching_env.sh
module unload cmake/3.9.0
module load cmake/3.14.6
  1. This is the build script I use for basic builds
#!/usr/bin/env bash

set +ex

empire=$1

if test $# -eq 0
then
    echo "usage: $0 <empire-dir> [ <trace-enabled=0> ] [ <build-type=Release> ] "
    exit 1
fi


if test $# -gt 1
then
    trace=$2
else
    trace=0
fi

if test $# -gt 2
then
    build_type=$3
else
    build_type=Release
fi

srun cmake -DUSE_STANDARD_LINKER=ON -DCMAKE_EXPORT_COMPILE_COMMANDS=true -DEMPIRE_ENABLE_PIC=ON -Dvt_trace_enabled=${trace} -DCMAKE_BUILD_TYPE=${build_type} ${empire}
srun make -j32 EMPIRE_PIC.exe

Note that the srun before make will build on a compute node, which has the benefit of allowing you to schedule execution as soon as the job successfully completes:

sbatch -d afterok:<make_job_id> <run_script>

The job will get stuck in the queue if your make command fails, so change the dependency using:

scontrol update Dependency=afterok:<new_make_job_id> <run_job_id>

or remove the dependency manually when it's finally built:

scontrol update Dependency=   <run_job_id>

Note the space between the equal sign and the next argument.

If you want to build on the head node instead, remove srun from before the make command, but not from the cmake command.

Running

To run an interactive job:

salloc -C haswell -N 64 -t <time_limit> /bin/bash

Stria

Building

source /projects/empire/installs/stria/ARM-20.0_OPENMPI-4.0.2-RELEASE-OPENMP-STATIC/trilinos/latest/load_matching_env.sh