-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MPI: Integrate with HPE's CXI library for allocating VNIs #24
Comments
Slingshot requires the use of VNIs (think of VLANs). If you use the same VNI for everything, eventually you exhaust endpoints on the switches. Slurm will be using one VNI per job step. For Flux:
|
VNI tagging was brought up again recently in a (not public) TOSS issue: https://lc.llnl.gov/jira/browse/TOSS-5932 This statement from the issue seemed like a good description of the problem:
|
I don't see the context elsewhere, or another issue, so I'll add it here. We need to implement VNI assignment at least local to each node. The switch reconfiguration, which is the part that has performance and interface concerns, we don't need to deal with, but it's also possible to exhaust resources on the NIC if we don't. From what I understand, there are two parts to this.
In principle, as a start at least, I think we could actually just do (2) and it would work, but it wouldn't provide any protection against inappropriate cross-job/cross-user RDMAs. |
In the scheduler meeting yesterday we discussed moving this forward:
HPE referred us to their slurm slingshot plugin (source pointer was posted in an earlier comment). Possibly also interesting at this early stage are the slurm config options for slingshot, documented here. There some slingshot related srun options as well. |
The text was updated successfully, but these errors were encountered: