Skip to content

Commit

Permalink
Adjust profiles for taurus-tud
Browse files Browse the repository at this point in the history
Add a getNode alias for gpu2 partition equipped with k80.
Add comment to V100.tpl for taurus ml partition equipped with V100
in order to provide a solution for issues due to too few hostmemory
on the taurusml nodes.
See ComputationalRadiationPhysics#2861.
Add --cpus-per-task to V100_picongpu.profile.example
  • Loading branch information
steindev committed Jun 27, 2019
1 parent 78bb8ed commit 48daeee
Show file tree
Hide file tree
Showing 4 changed files with 18 additions and 7 deletions.
6 changes: 5 additions & 1 deletion etc/picongpu/taurus-tud/V100.tpl
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@
#SBATCH --ntasks=!TBG_tasks
#SBATCH --mincpus=!TBG_mpiTasksPerNode
#SBATCH --cpus-per-task=!TBG_coresPerGPU
#SBATCH --mem-per-cpu=1511
#SBATCH --mem=0
#SBATCH --gres=gpu:!TBG_gpusPerNode
#SBATCH --mail-type=!TBG_mailSettings
#SBATCH --mail-user=!TBG_mailAddress
Expand All @@ -49,6 +49,10 @@
.TBG_profile=${PIC_PROFILE:-"~/picongpu.profile"}

# 6 gpus per node
# Taurus does not have enough node memory to hold data off all GPUs in node memory during ADIOS output.
# If you experience random crashes or get killed by the batch systems resource watch dog,
# reduce the number of GPUs used per node to three here.
# That is, replace in the following line the two appearances of 6 with 3.
.TBG_gpusPerNode=`if [ $TBG_tasks -gt 6 ] ; then echo 6; else echo $TBG_tasks; fi`

# number of cores to block per GPU - we got 28 cpus per gpu
Expand Down
8 changes: 3 additions & 5 deletions etc/picongpu/taurus-tud/V100_picongpu.profile.example
Original file line number Diff line number Diff line change
Expand Up @@ -15,13 +15,11 @@ export MY_NAME="$(whoami) <$MY_MAIL>"

# Modules #####################################################################
#
module purge
module load modenv/ml
module switch modenv/ml

# load CUDA/9.2.88-GCC-7.3.0-2.30, also loads GCC/7.3.0-2.30, zlib, OpenMPI and others
module load fosscuda/2018b
module load CMake/3.11.4-GCCcore-7.3.0
module load git/2.18.0-GCCcore-6.4.0
module load libpng/1.6.34-GCCcore-7.3.0

printf "@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@\n"
Expand Down Expand Up @@ -59,7 +57,7 @@ export CMAKE_PREFIX_PATH=$Splash_DIR:$CMAKE_PREFIX_PATH

export PICSRC=$HOME/src/picongpu
export PIC_EXAMPLES=$PICSRC/share/picongpu/examples
export PIC_BACKEND="cuda:60"
export PIC_BACKEND="cuda:70"

export PATH=$PATH:$PICSRC
export PATH=$PATH:$PICSRC/bin
Expand All @@ -78,5 +76,5 @@ export CXXFLAGS="-Dlinux"
export TBG_SUBMIT="sbatch"
export TBG_TPLFILE="etc/picongpu/taurus-tud/V100.tpl"

alias getNode='srun -p ml --gres=gpu:6 -n 6 --pty --mem-per-cpu=10000 -t 2:00:00 bash'
alias getNode='srun -p ml --gres=gpu:6 -n 1 --mem=0 --cpus-per-task=28 --pty -t 2:00:00 bash'

9 changes: 8 additions & 1 deletion etc/picongpu/taurus-tud/V100_restart.tpl
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
#!/usr/bin/env bash
# Copyright 2013-2019 Axel Huebl, Richard Pausch, Alexander Debus
# Copyright 2013-2019 Axel Huebl, Richard Pausch, Alexander Debus, Klaus Steiniger
#
# This file is part of PIConGPU.
#
Expand All @@ -20,6 +20,13 @@


# PIConGPU batch script for taurus' SLURM batch system
# This tpl for automated restarts is older than the actual
# V100.tpl.
# It uses a machine file for parallel job execution,
# which is not necessary anymore.
# (See comment below, MPI has been fixed)
# However, it still works and therefore is left unchanged.
# Klaus, June 2019

#SBATCH --partition=!TBG_queue
#SBATCH --time=!TBG_wallTime
Expand Down
2 changes: 2 additions & 0 deletions etc/picongpu/taurus-tud/k80_picongpu.profile.example
Original file line number Diff line number Diff line change
Expand Up @@ -75,3 +75,5 @@ export PYTHONPATH=$PICSRC/lib/python:$PYTHONPATH
# - "gpu2" queue
export TBG_SUBMIT="sbatch"
export TBG_TPLFILE="etc/picongpu/taurus-tud/k80.tpl"

alias getNode='srun -p gpu2-interactive --gres=gpu:4 -n 1 --pty --mem=0 -t 2:00:00 bash'

0 comments on commit 48daeee

Please sign in to comment.