-
Notifications
You must be signed in to change notification settings - Fork 702
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
{lib,mpi}[GCCcore/13.3.0,NVHPC/24.9] Add NCCL 2.22.3, UCC-CUDA 1.3.0, OpenMPI 5.0.3 w/CUDA 12.6.0 #21546
{lib,mpi}[GCCcore/13.3.0,NVHPC/24.9] Add NCCL 2.22.3, UCC-CUDA 1.3.0, OpenMPI 5.0.3 w/CUDA 12.6.0 #21546
Conversation
Signed-off-by: Jan André Reuter <[email protected]>
Signed-off-by: Jan André Reuter <[email protected]>
Test report by @Thyre |
e3227f8
to
6d8490d
Compare
Test report by @SebastianAchilles |
@boegelbot please test @ jsc-zen3-a100 |
@SebastianAchilles: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de PR test command '
Test results coming soon (I hope)... - notification for comment with ID 2394983210 processed Message to humans: this is just bookkeeping information for me, |
Test report by @boegelbot |
Unfortunately its hard to say why that particular test I'm also trying to build this on a second system of mine to see if it fails there. This will take some time, as EasyBuild is not set up there. [1] open-mpi/ompi#10152 |
Test report by @Thyre Edit (2024-01-07): I guess the issue might be related to NFS mounts. This system (datenlager) only provides SMB shares, while my main system doesn't mount any network shares by default. I'll check if something changes when mounting some NFS share. |
Test report by @sassy-crick |
With NFS share & mount: Test report by @Thyre Edit: I can certainly imagine that NFS shares might be the reason for the observed failure. If the NFS server doesn't exist anymore but is still mounted, building OpenMPI simply hangs indefinitely in the test step. So this tests seems to be fragile when it comes to NFS shares. |
@boegelbot please test @ jsc-zen3-a100 |
@SebastianAchilles: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de PR test command '
Test results coming soon (I hope)... - notification for comment with ID 2436116880 processed Message to humans: this is just bookkeeping information for me, |
Test report by @boegelbot |
@boegelbot please test @ generoso |
@SebastianAchilles: Request for testing this PR well received on login1 PR test command '
Test results coming soon (I hope)... - notification for comment with ID 2436204654 processed Message to humans: this is just bookkeeping information for me, |
Test report by @boegelbot |
|
Yes, disabling the tests that fail on NFS with a patch is probably the best solution. |
Will look into it |
6d8490d
to
c24d884
Compare
c24d884
to
3c748e0
Compare
Updated software
|
@boegelbot please test @ jsc-zen3-a100 |
@Thyre: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de PR test command '
Test results coming soon (I hope)... - notification for comment with ID 2493832493 processed Message to humans: this is just bookkeeping information for me, |
Signed-off-by: Jan André Reuter <[email protected]>
3c748e0
to
16242b7
Compare
Looks like the batch job 5321 on |
@boegelbot please test @ jsc-zen3-a100 |
@SebastianAchilles: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de PR test command '
Test results coming soon (I hope)... - notification for comment with ID 2495439226 processed Message to humans: this is just bookkeeping information for me, |
Test report by @boegelbot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Going in, thanks @Thyre! |
Add NCCL 2.22.3 & UCC-CUDA 1.3.0 for GCCcore 13.3.0.
Add OpenMPI 5.0.3 for NVHPC 24.9.
NVHPC 24.9 requires some patches to work correctly with OpenMPI 5.0.3.