-
Notifications
You must be signed in to change notification settings - Fork 701
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
{lib,mpi}[GCCcore/13.3.0,NVHPC/24.9] Add NCCL 2.22.3, UCC-CUDA 1.3.0, OpenMPI 5.0.3 w/CUDA 12.6.0 #21546
base: develop
Are you sure you want to change the base?
Conversation
Signed-off-by: Jan André Reuter <[email protected]>
Signed-off-by: Jan André Reuter <[email protected]>
Test report by @Thyre |
e5daa93
to
e3227f8
Compare
Signed-off-by: Jan André Reuter <[email protected]>
e3227f8
to
6d8490d
Compare
Test report by @SebastianAchilles |
@boegelbot please test @ jsc-zen3-a100 |
@SebastianAchilles: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de PR test command '
Test results coming soon (I hope)... - notification for comment with ID 2394983210 processed Message to humans: this is just bookkeeping information for me, |
Test report by @boegelbot |
Unfortunately its hard to say why that particular test I'm also trying to build this on a second system of mine to see if it fails there. This will take some time, as EasyBuild is not set up there. [1] open-mpi/ompi#10152 |
Test report by @Thyre Edit (2024-01-07): I guess the issue might be related to NFS mounts. This system (datenlager) only provides SMB shares, while my main system doesn't mount any network shares by default. I'll check if something changes when mounting some NFS share. |
Test report by @sassy-crick |
With NFS share & mount: Test report by @Thyre Edit: I can certainly imagine that NFS shares might be the reason for the observed failure. If the NFS server doesn't exist anymore but is still mounted, building OpenMPI simply hangs indefinitely in the test step. So this tests seems to be fragile when it comes to NFS shares. |
Add NCCL 2.22.3 & UCC-CUDA 1.3.0 for GCCcore 13.3.0.
Add OpenMPI 5.0.3 for NVHPC 24.9.
NVHPC 24.9 requires some patches to work correctly with OpenMPI 5.0.3.