{lib,mpi}[GCCcore/13.3.0,NVHPC/24.9] Add NCCL 2.22.3, UCC-CUDA 1.3.0, OpenMPI 5.0.3 w/CUDA 12.6.0 #21546

Thyre · 2024-10-04T14:51:19Z

Add NCCL 2.22.3 & UCC-CUDA 1.3.0 for GCCcore 13.3.0.
Add OpenMPI 5.0.3 for NVHPC 24.9.

NVHPC 24.9 requires some patches to work correctly with OpenMPI 5.0.3.

Signed-off-by: Jan André Reuter <[email protected]>

Thyre · 2024-10-04T15:04:07Z

Test report by @Thyre
SUCCESS
Build succeeded for 3 out of 3 (3 easyconfigs in total)
Linux - Linux EndeavourOS UNKNOWN, x86_64, AMD Ryzen 7 7800X3D 8-Core Processor, 1 x NVIDIA NVIDIA GeForce RTX 3070, 560.35.03, Python 3.12.7
See https://gist.github.com/Thyre/455444cb24a87d6904c430ee1332c464 for a full test report.

SebastianAchilles · 2024-10-05T08:38:44Z

Test report by @SebastianAchilles
SUCCESS
Build succeeded for 3 out of 3 (3 easyconfigs in total)
skl-rockylinux-810 - Linux Rocky Linux 8.10, x86_64, Intel(R) Core(TM) i7-10700 CPU @ 2.90GHz (skylake), 1 x NVIDIA NVIDIA RTX A4000, 555.42.06, Python 3.6.8
See https://gist.github.com/SebastianAchilles/b85abb42f2523431c1e1acdb99a8c2f0 for a full test report.

SebastianAchilles · 2024-10-05T08:41:17Z

@boegelbot please test @ jsc-zen3-a100
CORE_CNT=16

boegelbot · 2024-10-05T08:50:12Z

@SebastianAchilles: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de

PR test command 'if [[ develop != 'develop' ]]; then EB_BRANCH=develop ./easybuild_develop.sh 2> /dev/null 1>&2; EB_PREFIX=/home/boegelbot/easybuild/develop source init_env_easybuild_develop.sh; fi; EB_PR=21546 EB_ARGS= EB_CONTAINER= EB_REPO=easybuild-easyconfigs EB_BRANCH=develop /opt/software/slurm/bin/sbatch --job-name test_PR_21546 --ntasks="16" --partition=jsczen3g --gres=gpu:1 ~/boegelbot/eb_from_pr_upload_jsc-zen3.sh' executed!

exit code: 0
output:

Submitted batch job 5009

Test results coming soon (I hope)...

- notification for comment with ID 2394983210 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

boegelbot · 2024-10-05T09:05:51Z

Test report by @boegelbot
FAILED
Build succeeded for 2 out of 3 (3 easyconfigs in total)
jsczen3g1.int.jsc-zen3.fz-juelich.de - Linux Rocky Linux 9.4, x86_64, AMD EPYC-Milan Processor (zen3), 1 x NVIDIA NVIDIA A100 80GB PCIe, 555.42.06, Python 3.9.18
See https://gist.github.com/boegelbot/b264a6ea5d8fe659647a51a1668af921 for a full test report.

Thyre · 2024-10-06T12:54:13Z

Test report by @boegelbot FAILED Build succeeded for 2 out of 3 (3 easyconfigs in total) jsczen3g1.int.jsc-zen3.fz-juelich.de - Linux Rocky Linux 9.4, x86_64, AMD EPYC-Milan Processor (zen3), 1 x NVIDIA NVIDIA A100 80GB PCIe, 555.42.06, Python 3.9.18 See https://gist.github.com/boegelbot/b264a6ea5d8fe659647a51a1668af921 for a full test report.

Unfortunately its hard to say why that particular test opal_path_nfs failed. Looking that test up online, one can find several occurrences where this test fails but shouldn't [1][2][3][4].
It might be interesting to have the full log, since that might include the exit code. Trying a second run might also be interesting, just to see if this was a one-time failure or is related to something specific to that system.

I'm also trying to build this on a second system of mine to see if it fails there. This will take some time, as EasyBuild is not set up there.

[1] open-mpi/ompi#10152
[2] open-mpi/ompi#628 (comment)
[3] https://www.mail-archive.com/[email protected]/msg33810.html
[4] https://www.mail-archive.com/[email protected]/msg35301.html

Thyre · 2024-10-06T16:50:06Z

Test report by @Thyre
SUCCESS
Build succeeded for 3 out of 3 (3 easyconfigs in total)
datenlager - Linux Ubuntu 24.04, x86_64, AMD Ryzen 7 3700X 8-Core Processor, Python 3.12.3
See https://gist.github.com/Thyre/0f30df76f9467fd9a84c608721c2614f for a full test report.

Edit (2024-01-07): I guess the issue might be related to NFS mounts. This system (datenlager) only provides SMB shares, while my main system doesn't mount any network shares by default. I'll check if something changes when mounting some NFS share.

sassy-crick · 2024-10-07T08:15:28Z

Test report by @sassy-crick
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in total)
Full report for OpenMPI-5.0.3-NVHPC-24.9-CUDA-12.6.0.eb can be found here
I am building using EASYBUILD_CUDA_COMPUTE_CAPABILITIES=8.9 for our L40s is that helps.

Thyre · 2024-10-07T09:06:36Z

With NFS share & mount:

Test report by @Thyre
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in total)
Linux - Linux EndeavourOS UNKNOWN, x86_64, AMD Ryzen 7 7800X3D 8-Core Processor, 1 x NVIDIA NVIDIA GeForce RTX 3070, 560.35.03, Python 3.12.7
See https://gist.github.com/Thyre/9305d3751fab2f7cee7c0d436a1dbf1f for a full test report.

Edit: I can certainly imagine that NFS shares might be the reason for the observed failure. If the NFS server doesn't exist anymore but is still mounted, building OpenMPI simply hangs indefinitely in the test step. So this tests seems to be fragile when it comes to NFS shares.

SebastianAchilles · 2024-10-24T18:50:44Z

@boegelbot please test @ jsc-zen3-a100
CORE_CNT=16
EB_ARGS="--buildpath=/dev/shm"

boegelbot · 2024-10-24T19:00:16Z

@SebastianAchilles: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de

PR test command 'if [[ develop != 'develop' ]]; then EB_BRANCH=develop ./easybuild_develop.sh 2> /dev/null 1>&2; EB_PREFIX=/home/boegelbot/easybuild/develop source init_env_easybuild_develop.sh; fi; EB_PR=21546 EB_ARGS="--buildpath=/dev/shm" EB_CONTAINER= EB_REPO=easybuild-easyconfigs EB_BRANCH=develop /opt/software/slurm/bin/sbatch --job-name test_PR_21546 --ntasks="16" --partition=jsczen3g --gres=gpu:1 ~/boegelbot/eb_from_pr_upload_jsc-zen3.sh' executed!

exit code: 0
output:

Submitted batch job 5159

Test results coming soon (I hope)...

- notification for comment with ID 2436116880 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

boegelbot · 2024-10-24T19:15:14Z

Test report by @boegelbot
FAILED
Build succeeded for 2 out of 3 (3 easyconfigs in total)
jsczen3g1.int.jsc-zen3.fz-juelich.de - Linux Rocky Linux 9.4, x86_64, AMD EPYC-Milan Processor (zen3), 1 x NVIDIA NVIDIA A100 80GB PCIe, 555.42.06, Python 3.9.18
See https://gist.github.com/boegelbot/6f0763dcddb12904ba4a30db4cb29e96 for a full test report.

SebastianAchilles · 2024-10-24T19:41:16Z

@boegelbot please test @ generoso
CORE_CNT=16

boegelbot · 2024-10-24T19:45:10Z

@SebastianAchilles: Request for testing this PR well received on login1

PR test command 'EB_PR=21546 EB_ARGS= EB_CONTAINER= EB_REPO=easybuild-easyconfigs /opt/software/slurm/bin/sbatch --job-name test_PR_21546 --ntasks="16" ~/boegelbot/eb_from_pr_upload_generoso.sh' executed!

exit code: 0
output:

Submitted batch job 14569

Test results coming soon (I hope)...

- notification for comment with ID 2436204654 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

boegelbot · 2024-10-24T20:05:40Z

Test report by @boegelbot
SUCCESS
Build succeeded for 3 out of 3 (3 easyconfigs in total)
cnx1 - Linux Rocky Linux 8.9, x86_64, Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz (haswell), Python 3.6.8
See https://gist.github.com/boegelbot/1ed350b8f48ecc2c1934322312a2eeeb for a full test report.

Thyre · 2024-11-22T08:28:07Z

jsc-zen3 indeed uses NFS to mount some directories. This might explain the failures, if something goes wrong there during testing. @SebastianAchilles, should I try to disable the test with a patch, so that other users do not run into this issue?

SebastianAchilles · 2024-11-22T12:44:16Z

Yes, disabling the tests that fail on NFS with a patch is probably the best solution.

Thyre · 2024-11-22T12:46:51Z

Will look into it

github-actions · 2024-11-22T13:51:03Z

Updated software `NCCL-2.22.3-GCCcore-13.3.0-CUDA-12.6.0.eb`

Diff against NCCL-2.20.5-GCCcore-13.2.0-CUDA-12.4.0.eb

easybuild/easyconfigs/n/NCCL/NCCL-2.20.5-GCCcore-13.2.0-CUDA-12.4.0.eb

diff --git a/easybuild/easyconfigs/n/NCCL/NCCL-2.20.5-GCCcore-13.2.0-CUDA-12.4.0.eb b/easybuild/easyconfigs/n/NCCL/NCCL-2.22.3-GCCcore-13.3.0-CUDA-12.6.0.eb
index 90634952ad..0534e538fa 100644
--- a/easybuild/easyconfigs/n/NCCL/NCCL-2.20.5-GCCcore-13.2.0-CUDA-12.4.0.eb
+++ b/easybuild/easyconfigs/n/NCCL/NCCL-2.22.3-GCCcore-13.3.0-CUDA-12.6.0.eb
@@ -1,23 +1,23 @@
 name = 'NCCL'
-version = '2.20.5'
+version = '2.22.3'
 versionsuffix = '-CUDA-%(cudaver)s'
 
 homepage = 'https://developer.nvidia.com/nccl'
 description = """The NVIDIA Collective Communications Library (NCCL) implements multi-GPU and multi-node collective
 communication primitives that are performance optimized for NVIDIA GPUs."""
 
-toolchain = {'name': 'GCCcore', 'version': '13.2.0'}
+toolchain = {'name': 'GCCcore', 'version': '13.3.0'}
 
 github_account = 'NVIDIA'
 source_urls = [GITHUB_SOURCE]
 sources = ['v%(version)s-1.tar.gz']
-checksums = ['d11ad65c1df3cbe4447eaddceec71569f5c0497e27b3b8369cf79f18d2b2ad8c']
+checksums = ['45151629a9494460e73375281e8b0fe379141528879301899ece9b776faca024']
 
-builddependencies = [('binutils', '2.40')]
+builddependencies = [('binutils', '2.42')]
 
 dependencies = [
-    ('CUDA', '12.4.0', '', SYSTEM),
-    ('UCX-CUDA', '1.15.0', versionsuffix),
+    ('CUDA', '12.6.0', '', SYSTEM),
+    ('UCX-CUDA', '1.16.0', versionsuffix),
 ]
 
 # default CUDA compute capabilities to use (override via --cuda-compute-capabilities)

Diff against NCCL-2.16.2-GCCcore-12.2.0-CUDA-11.7.0.eb

easybuild/easyconfigs/n/NCCL/NCCL-2.16.2-GCCcore-12.2.0-CUDA-11.7.0.eb

diff --git a/easybuild/easyconfigs/n/NCCL/NCCL-2.16.2-GCCcore-12.2.0-CUDA-11.7.0.eb b/easybuild/easyconfigs/n/NCCL/NCCL-2.22.3-GCCcore-13.3.0-CUDA-12.6.0.eb
index ebbd822138..0534e538fa 100644
--- a/easybuild/easyconfigs/n/NCCL/NCCL-2.16.2-GCCcore-12.2.0-CUDA-11.7.0.eb
+++ b/easybuild/easyconfigs/n/NCCL/NCCL-2.22.3-GCCcore-13.3.0-CUDA-12.6.0.eb
@@ -1,33 +1,26 @@
 name = 'NCCL'
-version = '2.16.2'
+version = '2.22.3'
 versionsuffix = '-CUDA-%(cudaver)s'
 
 homepage = 'https://developer.nvidia.com/nccl'
 description = """The NVIDIA Collective Communications Library (NCCL) implements multi-GPU and multi-node collective
 communication primitives that are performance optimized for NVIDIA GPUs."""
 
-toolchain = {'name': 'GCCcore', 'version': '12.2.0'}
+toolchain = {'name': 'GCCcore', 'version': '13.3.0'}
 
 github_account = 'NVIDIA'
 source_urls = [GITHUB_SOURCE]
 sources = ['v%(version)s-1.tar.gz']
-patches = ['NCCL-2.16.2_fix-cpuid.patch']
-checksums = [
-    {'v2.16.2-1.tar.gz': '7f7c738511a8876403fc574d13d48e7c250d934d755598d82e14bab12236fc64'},
-    {'NCCL-2.16.2_fix-cpuid.patch': '0459ecadcd32b2a7a000a2ce4f675afba908b2c0afabafde585330ff4f83e277'},
-]
+checksums = ['45151629a9494460e73375281e8b0fe379141528879301899ece9b776faca024']
 
-builddependencies = [('binutils', '2.39')]
+builddependencies = [('binutils', '2.42')]
 
 dependencies = [
-    ('CUDA', '11.7.0', '', SYSTEM),
-    ('UCX-CUDA', '1.13.1', versionsuffix),
+    ('CUDA', '12.6.0', '', SYSTEM),
+    ('UCX-CUDA', '1.16.0', versionsuffix),
 ]
 
-prebuildopts = "sed -i 's/NVCUFLAGS  := /NVCUFLAGS  := -allow-unsupported-compiler /' makefiles/common.mk && "
-buildopts = "VERBOSE=1"
-
 # default CUDA compute capabilities to use (override via --cuda-compute-capabilities)
-cuda_compute_capabilities = ['3.5', '5.0', '6.0', '7.0', '7.5', '8.0', '8.6']
+cuda_compute_capabilities = ['5.0', '6.0', '7.0', '7.5', '8.0', '8.6', '9.0']
 
 moduleclass = 'lib'

Diff against NCCL-2.18.3-GCCcore-12.3.0-CUDA-12.1.1.eb

easybuild/easyconfigs/n/NCCL/NCCL-2.18.3-GCCcore-12.3.0-CUDA-12.1.1.eb

diff --git a/easybuild/easyconfigs/n/NCCL/NCCL-2.18.3-GCCcore-12.3.0-CUDA-12.1.1.eb b/easybuild/easyconfigs/n/NCCL/NCCL-2.22.3-GCCcore-13.3.0-CUDA-12.6.0.eb
index 569d09f985..0534e538fa 100644
--- a/easybuild/easyconfigs/n/NCCL/NCCL-2.18.3-GCCcore-12.3.0-CUDA-12.1.1.eb
+++ b/easybuild/easyconfigs/n/NCCL/NCCL-2.22.3-GCCcore-13.3.0-CUDA-12.6.0.eb
@@ -1,32 +1,23 @@
 name = 'NCCL'
-version = '2.18.3'
+version = '2.22.3'
 versionsuffix = '-CUDA-%(cudaver)s'
 
 homepage = 'https://developer.nvidia.com/nccl'
 description = """The NVIDIA Collective Communications Library (NCCL) implements multi-GPU and multi-node collective
 communication primitives that are performance optimized for NVIDIA GPUs."""
 
-toolchain = {'name': 'GCCcore', 'version': '12.3.0'}
+toolchain = {'name': 'GCCcore', 'version': '13.3.0'}
 
 github_account = 'NVIDIA'
 source_urls = [GITHUB_SOURCE]
 sources = ['v%(version)s-1.tar.gz']
-patches = [
-    'NCCL-2.16.2_fix-cpuid.patch',
-    'NCCL-2.18.3_fix-cudaMemcpyAsync.patch',
-]
-checksums = [
-    ('6477d83c9edbb34a0ebce6d751a1b32962bc6415d75d04972b676c6894ceaef9',
-     'b4f5d7d9eea2c12e32e7a06fe138b2cfc75969c6d5c473aa6f819a792db2fc96'),
-    {'NCCL-2.16.2_fix-cpuid.patch': '0459ecadcd32b2a7a000a2ce4f675afba908b2c0afabafde585330ff4f83e277'},
-    {'NCCL-2.18.3_fix-cudaMemcpyAsync.patch': '7dc8d0d1b78e4f8acefbc400860f47432ef67c225b50d73c732999c23483de90'},
-]
+checksums = ['45151629a9494460e73375281e8b0fe379141528879301899ece9b776faca024']
 
-builddependencies = [('binutils', '2.40')]
+builddependencies = [('binutils', '2.42')]
 
 dependencies = [
-    ('CUDA', '12.1.1', '', SYSTEM),
-    ('UCX-CUDA', '1.14.1', versionsuffix),
+    ('CUDA', '12.6.0', '', SYSTEM),
+    ('UCX-CUDA', '1.16.0', versionsuffix),
 ]
 
 # default CUDA compute capabilities to use (override via --cuda-compute-capabilities)

Updated software `OpenMPI-5.0.3-NVHPC-24.9-CUDA-12.6.0.eb`

Diff against OpenMPI-5.0.3-GCC-13.3.0.eb

easybuild/easyconfigs/o/OpenMPI/OpenMPI-5.0.3-GCC-13.3.0.eb

diff --git a/easybuild/easyconfigs/o/OpenMPI/OpenMPI-5.0.3-GCC-13.3.0.eb b/easybuild/easyconfigs/o/OpenMPI/OpenMPI-5.0.3-NVHPC-24.9-CUDA-12.6.0.eb
index 6864e213a9..e6c772bf64 100644
--- a/easybuild/easyconfigs/o/OpenMPI/OpenMPI-5.0.3-GCC-13.3.0.eb
+++ b/easybuild/easyconfigs/o/OpenMPI/OpenMPI-5.0.3-NVHPC-24.9-CUDA-12.6.0.eb
@@ -4,19 +4,29 @@ version = '5.0.3'
 homepage = 'https://www.open-mpi.org/'
 description = """The Open MPI Project is an open source MPI-3 implementation."""
 
-toolchain = {'name': 'GCC', 'version': '13.3.0'}
+toolchain = {'name': 'NVHPC', 'version': '24.9-CUDA-12.6.0'}
 
 source_urls = ['https://www.open-mpi.org/software/ompi/v%(version_major_minor)s/downloads']
 sources = [SOURCELOWER_TAR_BZ2]
-patches = [('OpenMPI-5.0.2_build-with-internal-cuda-header.patch', 1)]
+patches = [
+    'OpenMPI-5.0.3_fix_hle_make_errors.patch',
+    'OpenMPI-5.0.3_disable_opal_path_nfs_test.patch',
+    ('OpenMPI-5.0.2_build-with-internal-cuda-header.patch', 1)
+]
 checksums = [
-    {'openmpi-5.0.3.tar.bz2': '990582f206b3ab32e938aa31bbf07c639368e4405dca196fabe7f0f76eeda90b'},
+    {'openmpi-5.0.3.tar.bz2':
+     '990582f206b3ab32e938aa31bbf07c639368e4405dca196fabe7f0f76eeda90b'},
+    {'OpenMPI-5.0.3_fix_hle_make_errors.patch':
+     '881c907a9f5901d5d6af41cd33dffdcecba4a67a9e5123e602542aea57a80895'},
+    {'OpenMPI-5.0.3_disable_opal_path_nfs_test.patch':
+     '75d4417e35252ea3a19b2792f1b06e9aeb408c253aa4921d77226d57b71dee45'},
     {'OpenMPI-5.0.2_build-with-internal-cuda-header.patch':
      'f52dc470543f35efef10d651dd159c771ae25f8f76a420d20d87abf4dc769ed7'},
 ]
 
 builddependencies = [
     ('pkgconf', '2.2.0'),
+    ('Perl', '5.38.2'),
     ('Autotools', '20231222'),
 ]
 
@@ -25,14 +35,29 @@ dependencies = [
     ('hwloc', '2.10.0'),
     ('libevent', '2.1.12'),
     ('UCX', '1.16.0'),
+    ('UCX-CUDA', '1.16.0', '-CUDA-%(cudaver)s'),
     ('libfabric', '1.21.0'),
     ('PMIx', '5.0.2'),
     ('PRRTE', '3.0.5'),
     ('UCC', '1.3.0'),
+    ('UCC-CUDA', '1.3.0', '-CUDA-%(cudaver)s'),
 ]
 
 # CUDA related patches and custom configure option can be removed if CUDA support isn't wanted.
-preconfigopts = 'gcc -Iopal/mca/cuda/include -shared opal/mca/cuda/lib/cuda.c -o opal/mca/cuda/lib/libcuda.so && '
-configopts = '--with-cuda=%(start_dir)s/opal/mca/cuda --with-show-load-errors=no '
+preconfigopts = 'nvc -Iopal/mca/cuda/include -shared opal/mca/cuda/lib/cuda.c -o opal/mca/cuda/lib/libcuda.so && '
+# Update configure to include changes from the "disable_opal_path_nfs_test" patch
+preconfigopts += './autogen.pl --force && '
+
+configopts = '--with-cuda=%(start_dir)s/opal/mca/cuda '
+# Required to prevent internal compiler error in opal.
+configopts += '--enable-alt-short-float=no '
+# Set PGI compilers manually, as NVHPC compilers are not correctly detected
+configopts += 'CC=pgcc CXX=pgc++ FC=pgfortran '
+
+# site specific options
+# configopts += '--without-psm2 '
+# configopts += '--disable-oshmem '
+# configopts += '--with-gpfs '
+configopts += '--with-slurm '
 
 moduleclass = 'mpi'

Diff against OpenMPI-4.1.6-GCC-13.2.0.eb

easybuild/easyconfigs/o/OpenMPI/OpenMPI-4.1.6-GCC-13.2.0.eb

diff --git a/easybuild/easyconfigs/o/OpenMPI/OpenMPI-4.1.6-GCC-13.2.0.eb b/easybuild/easyconfigs/o/OpenMPI/OpenMPI-5.0.3-NVHPC-24.9-CUDA-12.6.0.eb
index 831148339a..e6c772bf64 100644
--- a/easybuild/easyconfigs/o/OpenMPI/OpenMPI-4.1.6-GCC-13.2.0.eb
+++ b/easybuild/easyconfigs/o/OpenMPI/OpenMPI-5.0.3-NVHPC-24.9-CUDA-12.6.0.eb
@@ -1,64 +1,63 @@
 name = 'OpenMPI'
-version = '4.1.6'
+version = '5.0.3'
 
 homepage = 'https://www.open-mpi.org/'
 description = """The Open MPI Project is an open source MPI-3 implementation."""
 
-toolchain = {'name': 'GCC', 'version': '13.2.0'}
+toolchain = {'name': 'NVHPC', 'version': '24.9-CUDA-12.6.0'}
 
 source_urls = ['https://www.open-mpi.org/software/ompi/v%(version_major_minor)s/downloads']
 sources = [SOURCELOWER_TAR_BZ2]
 patches = [
-    'OpenMPI-4.1.1_build-with-internal-cuda-header.patch',
-    'OpenMPI-4.1.1_opal-datatype-cuda-performance.patch',
-    'OpenMPI-4.1.x_add_atomic_wmb.patch',
+    'OpenMPI-5.0.3_fix_hle_make_errors.patch',
+    'OpenMPI-5.0.3_disable_opal_path_nfs_test.patch',
+    ('OpenMPI-5.0.2_build-with-internal-cuda-header.patch', 1)
 ]
 checksums = [
-    {'openmpi-4.1.6.tar.bz2': 'f740994485516deb63b5311af122c265179f5328a0d857a567b85db00b11e415'},
-    {'OpenMPI-4.1.1_build-with-internal-cuda-header.patch':
-     '63eac52736bdf7644c480362440a7f1f0ae7c7cae47b7565f5635c41793f8c83'},
-    {'OpenMPI-4.1.1_opal-datatype-cuda-performance.patch':
-     'b767c7166cf0b32906132d58de5439c735193c9fd09ec3c5c11db8d5fa68750e'},
-    {'OpenMPI-4.1.x_add_atomic_wmb.patch': '9494bbc546d661ba5189e44b4c84a7f8df30a87cdb9d96ce2e73a7c8fecba172'},
+    {'openmpi-5.0.3.tar.bz2':
+     '990582f206b3ab32e938aa31bbf07c639368e4405dca196fabe7f0f76eeda90b'},
+    {'OpenMPI-5.0.3_fix_hle_make_errors.patch':
+     '881c907a9f5901d5d6af41cd33dffdcecba4a67a9e5123e602542aea57a80895'},
+    {'OpenMPI-5.0.3_disable_opal_path_nfs_test.patch':
+     '75d4417e35252ea3a19b2792f1b06e9aeb408c253aa4921d77226d57b71dee45'},
+    {'OpenMPI-5.0.2_build-with-internal-cuda-header.patch':
+     'f52dc470543f35efef10d651dd159c771ae25f8f76a420d20d87abf4dc769ed7'},
 ]
 
 builddependencies = [
-    ('pkgconf', '2.0.3'),
-    ('Perl', '5.38.0'),
-    ('Autotools', '20220317'),
+    ('pkgconf', '2.2.0'),
+    ('Perl', '5.38.2'),
+    ('Autotools', '20231222'),
 ]
 
 dependencies = [
-    ('zlib', '1.2.13'),
-    ('hwloc', '2.9.2'),
+    ('zlib', '1.3.1'),
+    ('hwloc', '2.10.0'),
     ('libevent', '2.1.12'),
-    ('UCX', '1.15.0'),
-    ('libfabric', '1.19.0'),
-    ('PMIx', '4.2.6'),
-    ('UCC', '1.2.0'),
+    ('UCX', '1.16.0'),
+    ('UCX-CUDA', '1.16.0', '-CUDA-%(cudaver)s'),
+    ('libfabric', '1.21.0'),
+    ('PMIx', '5.0.2'),
+    ('PRRTE', '3.0.5'),
+    ('UCC', '1.3.0'),
+    ('UCC-CUDA', '1.3.0', '-CUDA-%(cudaver)s'),
 ]
 
-# Update configure to include changes from the "internal-cuda" patch
-# by running a subset of autogen.pl sufficient to achieve this
-# without doing the full, long-running regeneration.
-preconfigopts = ' && '.join([
-    'cd config',
-    'autom4te --language=m4sh opal_get_version.m4sh -o opal_get_version.sh',
-    'cd ..',
-    'autoconf',
-    'autoheader',
-    'aclocal',
-    'automake',
-    ''
-])
-
 # CUDA related patches and custom configure option can be removed if CUDA support isn't wanted.
-configopts = '--with-cuda=internal '
-
-# disable MPI1 compatibility for now, see what breaks...
-# configopts += '--enable-mpi1-compatibility '
-
-# to enable SLURM integration (site-specific)
-# configopts += '--with-slurm --with-pmi=/usr/include/slurm --with-pmi-libdir=/usr'
+preconfigopts = 'nvc -Iopal/mca/cuda/include -shared opal/mca/cuda/lib/cuda.c -o opal/mca/cuda/lib/libcuda.so && '
+# Update configure to include changes from the "disable_opal_path_nfs_test" patch
+preconfigopts += './autogen.pl --force && '
+
+configopts = '--with-cuda=%(start_dir)s/opal/mca/cuda '
+# Required to prevent internal compiler error in opal.
+configopts += '--enable-alt-short-float=no '
+# Set PGI compilers manually, as NVHPC compilers are not correctly detected
+configopts += 'CC=pgcc CXX=pgc++ FC=pgfortran '
+
+# site specific options
+# configopts += '--without-psm2 '
+# configopts += '--disable-oshmem '
+# configopts += '--with-gpfs '
+configopts += '--with-slurm '
 
 moduleclass = 'mpi'

Diff against OpenMPI-4.1.5-intel-compilers-2023.1.0.eb

easybuild/easyconfigs/o/OpenMPI/OpenMPI-4.1.5-intel-compilers-2023.1.0.eb

diff --git a/easybuild/easyconfigs/o/OpenMPI/OpenMPI-4.1.5-intel-compilers-2023.1.0.eb b/easybuild/easyconfigs/o/OpenMPI/OpenMPI-5.0.3-NVHPC-24.9-CUDA-12.6.0.eb
index 59780f6df6..e6c772bf64 100644
--- a/easybuild/easyconfigs/o/OpenMPI/OpenMPI-4.1.5-intel-compilers-2023.1.0.eb
+++ b/easybuild/easyconfigs/o/OpenMPI/OpenMPI-5.0.3-NVHPC-24.9-CUDA-12.6.0.eb
@@ -1,66 +1,63 @@
 name = 'OpenMPI'
-version = '4.1.5'
+version = '5.0.3'
 
 homepage = 'https://www.open-mpi.org/'
 description = """The Open MPI Project is an open source MPI-3 implementation."""
 
-toolchain = {'name': 'intel-compilers', 'version': '2023.1.0'}
+toolchain = {'name': 'NVHPC', 'version': '24.9-CUDA-12.6.0'}
 
 source_urls = ['https://www.open-mpi.org/software/ompi/v%(version_major_minor)s/downloads']
 sources = [SOURCELOWER_TAR_BZ2]
 patches = [
-    'OpenMPI-4.1.1_build-with-internal-cuda-header.patch',
-    'OpenMPI-4.1.1_opal-datatype-cuda-performance.patch',
-    'OpenMPI-4.1.5_fix-pmix3x.patch',
-    'OpenMPI-4.1.x_add_atomic_wmb.patch',
+    'OpenMPI-5.0.3_fix_hle_make_errors.patch',
+    'OpenMPI-5.0.3_disable_opal_path_nfs_test.patch',
+    ('OpenMPI-5.0.2_build-with-internal-cuda-header.patch', 1)
 ]
 checksums = [
-    {'openmpi-4.1.5.tar.bz2': 'a640986bc257389dd379886fdae6264c8cfa56bc98b71ce3ae3dfbd8ce61dbe3'},
-    {'OpenMPI-4.1.1_build-with-internal-cuda-header.patch':
-     '63eac52736bdf7644c480362440a7f1f0ae7c7cae47b7565f5635c41793f8c83'},
-    {'OpenMPI-4.1.1_opal-datatype-cuda-performance.patch':
-     'b767c7166cf0b32906132d58de5439c735193c9fd09ec3c5c11db8d5fa68750e'},
-    {'OpenMPI-4.1.5_fix-pmix3x.patch': '46edac3dbf32f2a611d45e8a3c8edd3ae2f430eec16a1373b510315272115c40'},
-    {'OpenMPI-4.1.x_add_atomic_wmb.patch': '9494bbc546d661ba5189e44b4c84a7f8df30a87cdb9d96ce2e73a7c8fecba172'},
+    {'openmpi-5.0.3.tar.bz2':
+     '990582f206b3ab32e938aa31bbf07c639368e4405dca196fabe7f0f76eeda90b'},
+    {'OpenMPI-5.0.3_fix_hle_make_errors.patch':
+     '881c907a9f5901d5d6af41cd33dffdcecba4a67a9e5123e602542aea57a80895'},
+    {'OpenMPI-5.0.3_disable_opal_path_nfs_test.patch':
+     '75d4417e35252ea3a19b2792f1b06e9aeb408c253aa4921d77226d57b71dee45'},
+    {'OpenMPI-5.0.2_build-with-internal-cuda-header.patch':
+     'f52dc470543f35efef10d651dd159c771ae25f8f76a420d20d87abf4dc769ed7'},
 ]
 
 builddependencies = [
-    ('pkgconf', '1.9.5'),
-    ('Perl', '5.36.1'),
-    ('Autotools', '20220317'),
+    ('pkgconf', '2.2.0'),
+    ('Perl', '5.38.2'),
+    ('Autotools', '20231222'),
 ]
 
 dependencies = [
-    ('zlib', '1.2.13'),
-    ('hwloc', '2.9.1'),
+    ('zlib', '1.3.1'),
+    ('hwloc', '2.10.0'),
     ('libevent', '2.1.12'),
-    ('UCX', '1.14.1'),
-    ('libfabric', '1.18.0'),
-    ('PMIx', '4.2.4'),
-    ('UCC', '1.2.0'),
+    ('UCX', '1.16.0'),
+    ('UCX-CUDA', '1.16.0', '-CUDA-%(cudaver)s'),
+    ('libfabric', '1.21.0'),
+    ('PMIx', '5.0.2'),
+    ('PRRTE', '3.0.5'),
+    ('UCC', '1.3.0'),
+    ('UCC-CUDA', '1.3.0', '-CUDA-%(cudaver)s'),
 ]
 
-# Update configure to include changes from the "internal-cuda" patch
-# by running a subset of autogen.pl sufficient to achieve this
-# without doing the full, long-running regeneration.
-preconfigopts = ' && '.join([
-    'cd config',
-    'autom4te --language=m4sh opal_get_version.m4sh -o opal_get_version.sh',
-    'cd ..',
-    'autoconf',
-    'autoheader',
-    'aclocal',
-    'automake',
-    ''
-])
-
 # CUDA related patches and custom configure option can be removed if CUDA support isn't wanted.
-configopts = '--with-cuda=internal '
-
-# disable MPI1 compatibility for now, see what breaks...
-# configopts += '--enable-mpi1-compatibility '
-
-# to enable SLURM integration (site-specific)
-# configopts += '--with-slurm --with-pmi=/usr/include/slurm --with-pmi-libdir=/usr'
+preconfigopts = 'nvc -Iopal/mca/cuda/include -shared opal/mca/cuda/lib/cuda.c -o opal/mca/cuda/lib/libcuda.so && '
+# Update configure to include changes from the "disable_opal_path_nfs_test" patch
+preconfigopts += './autogen.pl --force && '
+
+configopts = '--with-cuda=%(start_dir)s/opal/mca/cuda '
+# Required to prevent internal compiler error in opal.
+configopts += '--enable-alt-short-float=no '
+# Set PGI compilers manually, as NVHPC compilers are not correctly detected
+configopts += 'CC=pgcc CXX=pgc++ FC=pgfortran '
+
+# site specific options
+# configopts += '--without-psm2 '
+# configopts += '--disable-oshmem '
+# configopts += '--with-gpfs '
+configopts += '--with-slurm '
 
 moduleclass = 'mpi'

Updated software `UCC-CUDA-1.3.0-GCCcore-13.3.0-CUDA-12.6.0.eb`

Diff against UCC-CUDA-1.2.0-GCCcore-12.3.0-CUDA-12.1.1.eb

easybuild/easyconfigs/u/UCC-CUDA/UCC-CUDA-1.2.0-GCCcore-12.3.0-CUDA-12.1.1.eb

diff --git a/easybuild/easyconfigs/u/UCC-CUDA/UCC-CUDA-1.2.0-GCCcore-12.3.0-CUDA-12.1.1.eb b/easybuild/easyconfigs/u/UCC-CUDA/UCC-CUDA-1.3.0-GCCcore-13.3.0-CUDA-12.6.0.eb
index 8594d50984..a0b4865a72 100644
--- a/easybuild/easyconfigs/u/UCC-CUDA/UCC-CUDA-1.2.0-GCCcore-12.3.0-CUDA-12.1.1.eb
+++ b/easybuild/easyconfigs/u/UCC-CUDA/UCC-CUDA-1.3.0-GCCcore-13.3.0-CUDA-12.6.0.eb
@@ -1,7 +1,7 @@
 easyblock = 'ConfigureMake'
 
 name = 'UCC-CUDA'
-version = '1.2.0'
+version = '1.3.0'
 versionsuffix = '-CUDA-%(cudaver)s'
 
 homepage = 'https://www.openucx.org/'
@@ -12,7 +12,7 @@ feature-rich for current and emerging programming models and runtimes.
 This module adds the UCC CUDA support.
 """
 
-toolchain = {'name': 'GCCcore', 'version': '12.3.0'}
+toolchain = {'name': 'GCCcore', 'version': '13.3.0'}
 toolchainopts = {'pic': True}
 
 source_urls = ['https://github.com/openucx/ucc/archive/refs/tags']
@@ -21,21 +21,21 @@ patches = [
     '%(name)s-%(version)s_link_against_existing_UCC_libs.patch',
 ]
 checksums = [
-    {'v1.2.0.tar.gz': 'c1552797600835c0cf401b82dc89c4d27d5717f4fb805d41daca8e19f65e509d'},
-    {'UCC-CUDA-1.2.0_link_against_existing_UCC_libs.patch':
-     '84157be5eae96d2501df076bcf0598b104adf80abeca028a144c4fb098638207'},
+    {'v1.3.0.tar.gz': 'b56379abe5f1c125bfa83be305d78d81a64aa271b7b5fff0ac17b86725ff3acf'},
+    {'UCC-CUDA-1.3.0_link_against_existing_UCC_libs.patch':
+     '758228357ce2a6ae50fb26a0b43e9176feaf379e266365f38205ce679267fc0d'},
 ]
 
 builddependencies = [
-    ('binutils', '2.40'),
-    ('Autotools', '20220317'),
+    ('binutils', '2.42'),
+    ('Autotools', '20231222'),
 ]
 
 dependencies = [
     ('UCC', version),
-    ('CUDA',  '12.1.1', '', SYSTEM),
-    ('UCX-CUDA', '1.14.1', '-CUDA-%(cudaver)s'),
-    ('NCCL', '2.18.3', '-CUDA-%(cudaver)s'),
+    ('CUDA',  '12.6.0', '', SYSTEM),
+    ('UCX-CUDA', '1.16.0', '-CUDA-%(cudaver)s'),
+    ('NCCL', '2.22.3', '-CUDA-%(cudaver)s'),
 ]
 
 preconfigopts = "./autogen.sh && "

Diff against UCC-CUDA-1.1.0-GCCcore-12.2.0-CUDA-12.0.0.eb

easybuild/easyconfigs/u/UCC-CUDA/UCC-CUDA-1.1.0-GCCcore-12.2.0-CUDA-12.0.0.eb

diff --git a/easybuild/easyconfigs/u/UCC-CUDA/UCC-CUDA-1.1.0-GCCcore-12.2.0-CUDA-12.0.0.eb b/easybuild/easyconfigs/u/UCC-CUDA/UCC-CUDA-1.3.0-GCCcore-13.3.0-CUDA-12.6.0.eb
index bfe211063d..a0b4865a72 100644
--- a/easybuild/easyconfigs/u/UCC-CUDA/UCC-CUDA-1.1.0-GCCcore-12.2.0-CUDA-12.0.0.eb
+++ b/easybuild/easyconfigs/u/UCC-CUDA/UCC-CUDA-1.3.0-GCCcore-13.3.0-CUDA-12.6.0.eb
@@ -1,7 +1,7 @@
 easyblock = 'ConfigureMake'
 
 name = 'UCC-CUDA'
-version = '1.1.0'
+version = '1.3.0'
 versionsuffix = '-CUDA-%(cudaver)s'
 
 homepage = 'https://www.openucx.org/'
@@ -12,32 +12,30 @@ feature-rich for current and emerging programming models and runtimes.
 This module adds the UCC CUDA support.
 """
 
-toolchain = {'name': 'GCCcore', 'version': '12.2.0'}
+toolchain = {'name': 'GCCcore', 'version': '13.3.0'}
 toolchainopts = {'pic': True}
 
 source_urls = ['https://github.com/openucx/ucc/archive/refs/tags']
 sources = ['v%(version)s.tar.gz']
 patches = [
-    '%(name)s-1.0.0_link_against_existing_UCC_libs.patch',
-    '%(name)s-%(version)s_cuda_12_mem_ops.patch',
+    '%(name)s-%(version)s_link_against_existing_UCC_libs.patch',
 ]
 checksums = [
-    {'v1.1.0.tar.gz': '74c8ba75037b5bd88cb703e8c8ae55639af3fecfd4428912a433c010c97b4df7'},
-    {'UCC-CUDA-1.0.0_link_against_existing_UCC_libs.patch':
-     '9fa11cf6779174f4e9048df5812096e4261e1769d465cc7f34a6354398876856'},
-    {'UCC-CUDA-1.1.0_cuda_12_mem_ops.patch': 'fc3ea1487d29dc626db2363ef5a79e7f0906f6a7507a363fa6167a812b143eb6'},
+    {'v1.3.0.tar.gz': 'b56379abe5f1c125bfa83be305d78d81a64aa271b7b5fff0ac17b86725ff3acf'},
+    {'UCC-CUDA-1.3.0_link_against_existing_UCC_libs.patch':
+     '758228357ce2a6ae50fb26a0b43e9176feaf379e266365f38205ce679267fc0d'},
 ]
 
 builddependencies = [
-    ('binutils', '2.39'),
-    ('Autotools', '20220317'),
+    ('binutils', '2.42'),
+    ('Autotools', '20231222'),
 ]
 
 dependencies = [
-    ('UCC', '1.1.0'),
-    ('CUDA',  '12.0.0', '', SYSTEM),
-    ('UCX-CUDA', '1.13.1', '-CUDA-%(cudaver)s'),
-    ('NCCL', '2.16.2', '-CUDA-%(cudaver)s'),
+    ('UCC', version),
+    ('CUDA',  '12.6.0', '', SYSTEM),
+    ('UCX-CUDA', '1.16.0', '-CUDA-%(cudaver)s'),
+    ('NCCL', '2.22.3', '-CUDA-%(cudaver)s'),
 ]
 
 preconfigopts = "./autogen.sh && "

Diff against UCC-CUDA-1.0.0-GCCcore-11.3.0-CUDA-11.7.0.eb

easybuild/easyconfigs/u/UCC-CUDA/UCC-CUDA-1.0.0-GCCcore-11.3.0-CUDA-11.7.0.eb

diff --git a/easybuild/easyconfigs/u/UCC-CUDA/UCC-CUDA-1.0.0-GCCcore-11.3.0-CUDA-11.7.0.eb b/easybuild/easyconfigs/u/UCC-CUDA/UCC-CUDA-1.3.0-GCCcore-13.3.0-CUDA-12.6.0.eb
index e213c78d3b..a0b4865a72 100644
--- a/easybuild/easyconfigs/u/UCC-CUDA/UCC-CUDA-1.0.0-GCCcore-11.3.0-CUDA-11.7.0.eb
+++ b/easybuild/easyconfigs/u/UCC-CUDA/UCC-CUDA-1.3.0-GCCcore-13.3.0-CUDA-12.6.0.eb
@@ -1,7 +1,7 @@
 easyblock = 'ConfigureMake'
 
 name = 'UCC-CUDA'
-version = '1.0.0'
+version = '1.3.0'
 versionsuffix = '-CUDA-%(cudaver)s'
 
 homepage = 'https://www.openucx.org/'
@@ -12,30 +12,30 @@ feature-rich for current and emerging programming models and runtimes.
 This module adds the UCC CUDA support.
 """
 
-toolchain = {'name': 'GCCcore', 'version': '11.3.0'}
+toolchain = {'name': 'GCCcore', 'version': '13.3.0'}
 toolchainopts = {'pic': True}
 
 source_urls = ['https://github.com/openucx/ucc/archive/refs/tags']
 sources = ['v%(version)s.tar.gz']
 patches = [
-    '%(name)s-1.0.0_link_against_existing_UCC_libs.patch',
+    '%(name)s-%(version)s_link_against_existing_UCC_libs.patch',
 ]
 checksums = [
-    'd3b4aa7004bf339d35952a1699a6e408064ba578bdc93861f5f07527ad0a5e8c',  # v1.0.0.tar.gz
-    # UCC-CUDA-1.0.0_link_against_existing_UCC_libs.patch
-    '9fa11cf6779174f4e9048df5812096e4261e1769d465cc7f34a6354398876856',
+    {'v1.3.0.tar.gz': 'b56379abe5f1c125bfa83be305d78d81a64aa271b7b5fff0ac17b86725ff3acf'},
+    {'UCC-CUDA-1.3.0_link_against_existing_UCC_libs.patch':
+     '758228357ce2a6ae50fb26a0b43e9176feaf379e266365f38205ce679267fc0d'},
 ]
 
 builddependencies = [
-    ('binutils', '2.38'),
-    ('Autotools', '20220317'),
+    ('binutils', '2.42'),
+    ('Autotools', '20231222'),
 ]
 
 dependencies = [
-    ('UCC', '1.0.0'),
-    ('CUDA',  '11.7.0', '', SYSTEM),
-    ('UCX-CUDA', '1.12.1', '-CUDA-%(cudaver)s'),
-    ('NCCL', '2.12.12', '-CUDA-%(cudaver)s'),
+    ('UCC', version),
+    ('CUDA',  '12.6.0', '', SYSTEM),
+    ('UCX-CUDA', '1.16.0', '-CUDA-%(cudaver)s'),
+    ('NCCL', '2.22.3', '-CUDA-%(cudaver)s'),
 ]
 
 preconfigopts = "./autogen.sh && "
@@ -43,10 +43,6 @@ preconfigopts = "./autogen.sh && "
 buildopts = '-C src/components/mc/cuda V=1 && make -C src/components/tl/nccl V=1'
 installopts = '-C src/components/mc/cuda && make -C src/components/tl/nccl install'
 
-# UCC_COMPONENT_PATH completely overrides $EBROOTUCC/lib/ucc so install symbolic links
-# to existing non CUDA related components
-postinstallcmds = ['for i in $EBROOTUCC/lib/ucc/*; do ln -s $i %(installdir)s/lib/ucc; done']
-
 sanity_check_paths = {
     'files': ['lib/ucc/libucc_mc_cuda.%s' % SHLIB_EXT, 'lib/ucc/libucc_tl_nccl.%s' % SHLIB_EXT],
     'dirs': ['lib']
@@ -54,6 +50,6 @@ sanity_check_paths = {
 
 sanity_check_commands = ["ucc_info -c"]
 
-modextravars = {'UCC_COMPONENT_PATH': '%(installdir)s/lib/ucc'}
+modextrapaths = {'EB_UCC_EXTRA_COMPONENT_PATH': 'lib/ucc'}
 
 moduleclass = 'lib'

Thyre · 2024-11-22T13:59:03Z

@boegelbot please test @ jsc-zen3-a100
CORE_CNT=16

boegelbot · 2024-11-22T14:00:09Z

@Thyre: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de

PR test command 'if [[ develop != 'develop' ]]; then EB_BRANCH=develop ./easybuild_develop.sh 2> /dev/null 1>&2; EB_PREFIX=/home/boegelbot/easybuild/develop source init_env_easybuild_develop.sh; fi; EB_PR=21546 EB_ARGS= EB_CONTAINER= EB_REPO=easybuild-easyconfigs EB_BRANCH=develop /opt/software/slurm/bin/sbatch --job-name test_PR_21546 --ntasks="16" --partition=jsczen3g --gres=gpu:1 ~/boegelbot/eb_from_pr_upload_jsc-zen3.sh' executed!

exit code: 0
output:

Submitted batch job 5321

Test results coming soon (I hope)...

- notification for comment with ID 2493832493 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

Signed-off-by: Jan André Reuter <[email protected]>

SebastianAchilles · 2024-11-23T10:57:22Z

Looks like the batch job 5321 on jsc-zen3 finished successfully, but it had problems uploading the test report.

SebastianAchilles · 2024-11-23T10:57:29Z

@boegelbot please test @ jsc-zen3-a100
CORE_CNT=16

boegelbot · 2024-11-23T11:00:09Z

@SebastianAchilles: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de

PR test command 'if [[ develop != 'develop' ]]; then EB_BRANCH=develop ./easybuild_develop.sh 2> /dev/null 1>&2; EB_PREFIX=/home/boegelbot/easybuild/develop source init_env_easybuild_develop.sh; fi; EB_PR=21546 EB_ARGS= EB_CONTAINER= EB_REPO=easybuild-easyconfigs EB_BRANCH=develop /opt/software/slurm/bin/sbatch --job-name test_PR_21546 --ntasks="16" --partition=jsczen3g --gres=gpu:1 ~/boegelbot/eb_from_pr_upload_jsc-zen3.sh' executed!

exit code: 0
output:

Submitted batch job 5328

Test results coming soon (I hope)...

- notification for comment with ID 2495439226 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

boegelbot · 2024-11-23T11:18:41Z

Test report by @boegelbot
SUCCESS
Build succeeded for 3 out of 3 (3 easyconfigs in total)
jsczen3g1.int.jsc-zen3.fz-juelich.de - Linux Rocky Linux 9.4, x86_64, AMD EPYC-Milan Processor (zen3), 1 x NVIDIA NVIDIA A100 80GB PCIe, 555.42.06, Python 3.9.18
See https://gist.github.com/boegelbot/151b246d610b3e39782e93aa2f8d5e5a for a full test report.

SebastianAchilles

lgtm

SebastianAchilles · 2024-11-24T11:38:55Z

Going in, thanks @Thyre!

Thyre added 2 commits October 4, 2024 16:30

{lib}[GCCcore/13.3.0] Add NCCL 2.22.3

0ad9dd9

Signed-off-by: Jan André Reuter <[email protected]>

{lib}[GCCcore/13.3.0] Add UCC-CUDA 1.3.0

01c1b04

Signed-off-by: Jan André Reuter <[email protected]>

Thyre force-pushed the openmpi-5.0.3-nvhpc-24.9 branch 2 times, most recently from e3227f8 to 6d8490d Compare October 4, 2024 18:39

SebastianAchilles added update 2024a issues & PRs related to 2024a common toolchains labels Oct 5, 2024

SebastianAchilles added this to the release after 4.9.4 milestone Oct 5, 2024

This was referenced Oct 14, 2024

{toolchain}[system/system] nvofbf v2024.9, nvompi v2024.9 #21639

Open

{lib}[GCCcore/13.3.0] NCCL v2.22.3 w/ CUDA 12.6.0 #21636

Closed

sassy-crick pushed a commit to sassy-crick/easybuild-easyconfigs that referenced this pull request Oct 14, 2024

OpenMPI removed in favour of PR easybuilders#21546

fc0bf6e

sassy-crick mentioned this pull request Oct 14, 2024

{phys}[nvofbf/2024.9] VASP v6.4.3 w/ hdf5 #21641

Open

4 tasks

Thyre force-pushed the openmpi-5.0.3-nvhpc-24.9 branch from 6d8490d to c24d884 Compare November 22, 2024 13:48

Thyre force-pushed the openmpi-5.0.3-nvhpc-24.9 branch from c24d884 to 3c748e0 Compare November 22, 2024 13:49

{mpi}[NVHPC/24.9] Add OpenMPI 5.0.3

16242b7

Signed-off-by: Jan André Reuter <[email protected]>

Thyre force-pushed the openmpi-5.0.3-nvhpc-24.9 branch from 3c748e0 to 16242b7 Compare November 22, 2024 15:19

SebastianAchilles approved these changes Nov 24, 2024

View reviewed changes

SebastianAchilles merged commit e643818 into easybuilders:develop Nov 24, 2024
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

{lib,mpi}[GCCcore/13.3.0,NVHPC/24.9] Add NCCL 2.22.3, UCC-CUDA 1.3.0, OpenMPI 5.0.3 w/CUDA 12.6.0 #21546

{lib,mpi}[GCCcore/13.3.0,NVHPC/24.9] Add NCCL 2.22.3, UCC-CUDA 1.3.0, OpenMPI 5.0.3 w/CUDA 12.6.0 #21546

Thyre commented Oct 4, 2024

Thyre commented Oct 4, 2024

SebastianAchilles commented Oct 5, 2024

SebastianAchilles commented Oct 5, 2024

boegelbot commented Oct 5, 2024

boegelbot commented Oct 5, 2024

Thyre commented Oct 6, 2024

Thyre commented Oct 6, 2024 •

edited

Loading

sassy-crick commented Oct 7, 2024

Thyre commented Oct 7, 2024 •

edited

Loading

SebastianAchilles commented Oct 24, 2024

boegelbot commented Oct 24, 2024

boegelbot commented Oct 24, 2024

SebastianAchilles commented Oct 24, 2024

boegelbot commented Oct 24, 2024

boegelbot commented Oct 24, 2024

Thyre commented Nov 22, 2024

SebastianAchilles commented Nov 22, 2024

Thyre commented Nov 22, 2024

github-actions bot commented Nov 22, 2024 •

edited

Loading

Thyre commented Nov 22, 2024

boegelbot commented Nov 22, 2024

SebastianAchilles commented Nov 23, 2024

SebastianAchilles commented Nov 23, 2024

boegelbot commented Nov 23, 2024

boegelbot commented Nov 23, 2024

SebastianAchilles left a comment

SebastianAchilles commented Nov 24, 2024

{lib,mpi}[GCCcore/13.3.0,NVHPC/24.9] Add NCCL 2.22.3, UCC-CUDA 1.3.0, OpenMPI 5.0.3 w/CUDA 12.6.0 #21546

{lib,mpi}[GCCcore/13.3.0,NVHPC/24.9] Add NCCL 2.22.3, UCC-CUDA 1.3.0, OpenMPI 5.0.3 w/CUDA 12.6.0 #21546

Conversation

Thyre commented Oct 4, 2024

Thyre commented Oct 4, 2024

SebastianAchilles commented Oct 5, 2024

SebastianAchilles commented Oct 5, 2024

boegelbot commented Oct 5, 2024

boegelbot commented Oct 5, 2024

Thyre commented Oct 6, 2024

Thyre commented Oct 6, 2024 • edited Loading

sassy-crick commented Oct 7, 2024

Thyre commented Oct 7, 2024 • edited Loading

SebastianAchilles commented Oct 24, 2024

boegelbot commented Oct 24, 2024

boegelbot commented Oct 24, 2024

SebastianAchilles commented Oct 24, 2024

boegelbot commented Oct 24, 2024

boegelbot commented Oct 24, 2024

Thyre commented Nov 22, 2024

SebastianAchilles commented Nov 22, 2024

Thyre commented Nov 22, 2024

github-actions bot commented Nov 22, 2024 • edited Loading

Updated software NCCL-2.22.3-GCCcore-13.3.0-CUDA-12.6.0.eb

Updated software OpenMPI-5.0.3-NVHPC-24.9-CUDA-12.6.0.eb

Updated software UCC-CUDA-1.3.0-GCCcore-13.3.0-CUDA-12.6.0.eb

Thyre commented Nov 22, 2024

boegelbot commented Nov 22, 2024

SebastianAchilles commented Nov 23, 2024

SebastianAchilles commented Nov 23, 2024

boegelbot commented Nov 23, 2024

boegelbot commented Nov 23, 2024

SebastianAchilles left a comment

Choose a reason for hiding this comment

SebastianAchilles commented Nov 24, 2024

Thyre commented Oct 6, 2024 •

edited

Loading

Thyre commented Oct 7, 2024 •

edited

Loading

github-actions bot commented Nov 22, 2024 •

edited

Loading

Updated software `NCCL-2.22.3-GCCcore-13.3.0-CUDA-12.6.0.eb`

Updated software `OpenMPI-5.0.3-NVHPC-24.9-CUDA-12.6.0.eb`

Updated software `UCC-CUDA-1.3.0-GCCcore-13.3.0-CUDA-12.6.0.eb`