Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
NB: based on
feat/hostport
as that is required for host networking.Uses host networking to avoid MPI errors and give better network performance.
Using host networking means the pod's hostname = k8s node name, e.g.
sbtest-worker-a438ab51-c69dx
. This isn't "slurm hostlist expression compatible" so e.g.sinfo
can't contract the nodenames. This PR uses the downward API to inject the pod name (which is hostlist expression compatible e.g.slurmd-0
as using a StateFullSet) into the container's environment vars, and explicitly sets the slurm'd nodename on startup usingslurmd -N <nodename>
.Example of performance changes on arcus using
portal-internal
network (i.e. not RoCE), showing 0-byte, max bandwidth max message size values fromsrun
-launched IMB-MPI1 pingpong using openmpi in image:Default CNI :
Host networking:
(note in both cases the K8s node VMs are on the same hypervisor host).