This repository showcases example workflows using the high-performance computing (HPC) cluster at Florida Atlantic University (FAU) to scale up scientific computing tasks. The workflows serve as a starting point for Max Planck Florida Institute for Neuroscience (MPFI) researchers. Please refer to the exhaustive Knowledge Base provided by the HPC team at FAU for many great detailed guides on specific topics. You can also request help from the HPC team by submitting a ticket under Services > Research Computing/HPC > Research Application Assistance. Note that this requires an FAU login (see below).
Specific examples are provided that elaborate on the following topics:
- Running MATLAB
- submit jobs in a loop
- utilize temporary scratch directories
- Running Python
- install custom packages
- use GPU nodes
- workflow for DeepLabCut
- restart unfinished jobs
Please read below for general information on how to access the HPC cluster and submit jobs.
In order to get access to the HPC cluster, you need to have an FAU login. Contact the MPFI IT team to request an account. To log in, you will have to use the Duo Mobile App.
The Open OnDemand web interface provides user-friendly and immediate access the HPC cluster. You can
- upload/download files
- run interactive sessions for e.g.
- Jupyter notebooks
- MATLAB
- virtual Linux desktop
- open a terminal on the login node
If you are using the terminal regularly, I highly recommend setting up SSH access and using Zsh. Note that you can also use WSL when working on a Windows machine.
There are multiple ways to transfer files between your local machine and the HPC cluster On Windows, I recommend mounting the HPC cluster as a network drive.
Some resources from the Knowledge Base to get started:
- Computing Storage
- Submitting Jobs using SLURM
- Available queues
- Available nodes
- Which queue provides GPU's?
The HPC cluster uses the SLURM job scheduler.
Jobs are submitted using the sbatch
command:
sbatch submit.sh
where submit.sh
is a submit script that holds instructions for the job scheduler.
Example submit scripts for different tasks are provided in the subfolders of this repository.
The lines in the submit script starting with #SBATCH
are
instructions for the job scheduler,
such as which queue and how many CPU cores to use and how to name the output and error files.
#SBATCH --partition=shortq7
#SBATCH --nodes=1
#SBATCH --cpus-per-task=24
#SBATCH --output="%x.%j.out"
#SBATCH --error="%x.%j.err"
This is equvivalent to adding the following flags to the sbatch
command:
sbatch --partition=shortq7 --nodes=1 --cpus-per-task=24 --output="%x.%j.out" --error="%x.%j.err" submit.sh
The --output
and --error
flags specify the file names for the text output and errors created during the job, where %x
is the job name and %j
is the unique job ID assigned by the job scheduler.
You can monitor the status of your jobs using the squeue
command:
squeue -u $USER
To cancel a job, use the scancel
command:
scancel JOB_ID
You can show the available queues and their time limits using the sinfo
command:
sinfo
Note that by default you only have access to shortq7
, shortq7-gpu
, mediumq7
, and longq7
.
Another useful tool is the gnodes
script from slurm-utils, which provides a graphical representation of all queues and their utilization.
Copy the file to the HPC and run from the same directory:
gnodes