The multi-node testing involves 3 steps:
- launch server (
fabtget
) one node. - save the
fabtget
's address on a file in shared file system. - launch client('fabtput`) on another node using the address file.
To ensure that step 2 finishes before 3, put some delay (e.g., sleep 2) in step 3.
We provide two different options:
On SLURM, the launcher (srun
) is integrated with the scheduler so it
knows how to do round robin placement across invocations.
On PBS Pro, the launcher (mpiexec
) is decoupled so you need to specify
explicitly where to run the server and client using -host
.