Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

{lib,mpi}[GCCcore/13.3.0,NVHPC/24.9] Add NCCL 2.22.3, UCC-CUDA 1.3.0, OpenMPI 5.0.3 w/CUDA 12.6.0 #21546

Merged

Conversation

Thyre
Copy link
Contributor

@Thyre Thyre commented Oct 4, 2024

Add NCCL 2.22.3 & UCC-CUDA 1.3.0 for GCCcore 13.3.0.
Add OpenMPI 5.0.3 for NVHPC 24.9.

NVHPC 24.9 requires some patches to work correctly with OpenMPI 5.0.3.

@Thyre
Copy link
Contributor Author

Thyre commented Oct 4, 2024

Test report by @Thyre
SUCCESS
Build succeeded for 3 out of 3 (3 easyconfigs in total)
Linux - Linux EndeavourOS UNKNOWN, x86_64, AMD Ryzen 7 7800X3D 8-Core Processor, 1 x NVIDIA NVIDIA GeForce RTX 3070, 560.35.03, Python 3.12.7
See https://gist.github.com/Thyre/455444cb24a87d6904c430ee1332c464 for a full test report.

@Thyre Thyre force-pushed the openmpi-5.0.3-nvhpc-24.9 branch 2 times, most recently from e3227f8 to 6d8490d Compare October 4, 2024 18:39
@SebastianAchilles SebastianAchilles added update 2024a issues & PRs related to 2024a common toolchains labels Oct 5, 2024
@SebastianAchilles SebastianAchilles added this to the release after 4.9.4 milestone Oct 5, 2024
@SebastianAchilles
Copy link
Member

Test report by @SebastianAchilles
SUCCESS
Build succeeded for 3 out of 3 (3 easyconfigs in total)
skl-rockylinux-810 - Linux Rocky Linux 8.10, x86_64, Intel(R) Core(TM) i7-10700 CPU @ 2.90GHz (skylake), 1 x NVIDIA NVIDIA RTX A4000, 555.42.06, Python 3.6.8
See https://gist.github.com/SebastianAchilles/b85abb42f2523431c1e1acdb99a8c2f0 for a full test report.

@SebastianAchilles
Copy link
Member

@boegelbot please test @ jsc-zen3-a100
CORE_CNT=16

@boegelbot
Copy link
Collaborator

@SebastianAchilles: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de

PR test command 'if [[ develop != 'develop' ]]; then EB_BRANCH=develop ./easybuild_develop.sh 2> /dev/null 1>&2; EB_PREFIX=/home/boegelbot/easybuild/develop source init_env_easybuild_develop.sh; fi; EB_PR=21546 EB_ARGS= EB_CONTAINER= EB_REPO=easybuild-easyconfigs EB_BRANCH=develop /opt/software/slurm/bin/sbatch --job-name test_PR_21546 --ntasks="16" --partition=jsczen3g --gres=gpu:1 ~/boegelbot/eb_from_pr_upload_jsc-zen3.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 5009

Test results coming soon (I hope)...

- notification for comment with ID 2394983210 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegelbot
Copy link
Collaborator

Test report by @boegelbot
FAILED
Build succeeded for 2 out of 3 (3 easyconfigs in total)
jsczen3g1.int.jsc-zen3.fz-juelich.de - Linux Rocky Linux 9.4, x86_64, AMD EPYC-Milan Processor (zen3), 1 x NVIDIA NVIDIA A100 80GB PCIe, 555.42.06, Python 3.9.18
See https://gist.github.com/boegelbot/b264a6ea5d8fe659647a51a1668af921 for a full test report.

@Thyre
Copy link
Contributor Author

Thyre commented Oct 6, 2024

Test report by @boegelbot FAILED Build succeeded for 2 out of 3 (3 easyconfigs in total) jsczen3g1.int.jsc-zen3.fz-juelich.de - Linux Rocky Linux 9.4, x86_64, AMD EPYC-Milan Processor (zen3), 1 x NVIDIA NVIDIA A100 80GB PCIe, 555.42.06, Python 3.9.18 See https://gist.github.com/boegelbot/b264a6ea5d8fe659647a51a1668af921 for a full test report.

Unfortunately its hard to say why that particular test opal_path_nfs failed. Looking that test up online, one can find several occurrences where this test fails but shouldn't [1][2][3][4].
It might be interesting to have the full log, since that might include the exit code. Trying a second run might also be interesting, just to see if this was a one-time failure or is related to something specific to that system.

I'm also trying to build this on a second system of mine to see if it fails there. This will take some time, as EasyBuild is not set up there.

[1] open-mpi/ompi#10152
[2] open-mpi/ompi#628 (comment)
[3] https://www.mail-archive.com/[email protected]/msg33810.html
[4] https://www.mail-archive.com/[email protected]/msg35301.html

@Thyre
Copy link
Contributor Author

Thyre commented Oct 6, 2024

Test report by @Thyre
SUCCESS
Build succeeded for 3 out of 3 (3 easyconfigs in total)
datenlager - Linux Ubuntu 24.04, x86_64, AMD Ryzen 7 3700X 8-Core Processor, Python 3.12.3
See https://gist.github.com/Thyre/0f30df76f9467fd9a84c608721c2614f for a full test report.


Edit (2024-01-07): I guess the issue might be related to NFS mounts. This system (datenlager) only provides SMB shares, while my main system doesn't mount any network shares by default. I'll check if something changes when mounting some NFS share.

@sassy-crick
Copy link
Collaborator

Test report by @sassy-crick
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in total)
Full report for OpenMPI-5.0.3-NVHPC-24.9-CUDA-12.6.0.eb can be found here
I am building using EASYBUILD_CUDA_COMPUTE_CAPABILITIES=8.9 for our L40s is that helps.

@Thyre
Copy link
Contributor Author

Thyre commented Oct 7, 2024

With NFS share & mount:

Test report by @Thyre
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in total)
Linux - Linux EndeavourOS UNKNOWN, x86_64, AMD Ryzen 7 7800X3D 8-Core Processor, 1 x NVIDIA NVIDIA GeForce RTX 3070, 560.35.03, Python 3.12.7
See https://gist.github.com/Thyre/9305d3751fab2f7cee7c0d436a1dbf1f for a full test report.


Edit: I can certainly imagine that NFS shares might be the reason for the observed failure. If the NFS server doesn't exist anymore but is still mounted, building OpenMPI simply hangs indefinitely in the test step. So this tests seems to be fragile when it comes to NFS shares.

@SebastianAchilles
Copy link
Member

@boegelbot please test @ jsc-zen3-a100
CORE_CNT=16
EB_ARGS="--buildpath=/dev/shm"

@boegelbot
Copy link
Collaborator

@SebastianAchilles: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de

PR test command 'if [[ develop != 'develop' ]]; then EB_BRANCH=develop ./easybuild_develop.sh 2> /dev/null 1>&2; EB_PREFIX=/home/boegelbot/easybuild/develop source init_env_easybuild_develop.sh; fi; EB_PR=21546 EB_ARGS="--buildpath=/dev/shm" EB_CONTAINER= EB_REPO=easybuild-easyconfigs EB_BRANCH=develop /opt/software/slurm/bin/sbatch --job-name test_PR_21546 --ntasks="16" --partition=jsczen3g --gres=gpu:1 ~/boegelbot/eb_from_pr_upload_jsc-zen3.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 5159

Test results coming soon (I hope)...

- notification for comment with ID 2436116880 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegelbot
Copy link
Collaborator

Test report by @boegelbot
FAILED
Build succeeded for 2 out of 3 (3 easyconfigs in total)
jsczen3g1.int.jsc-zen3.fz-juelich.de - Linux Rocky Linux 9.4, x86_64, AMD EPYC-Milan Processor (zen3), 1 x NVIDIA NVIDIA A100 80GB PCIe, 555.42.06, Python 3.9.18
See https://gist.github.com/boegelbot/6f0763dcddb12904ba4a30db4cb29e96 for a full test report.

@SebastianAchilles
Copy link
Member

@boegelbot please test @ generoso
CORE_CNT=16

@boegelbot
Copy link
Collaborator

@SebastianAchilles: Request for testing this PR well received on login1

PR test command 'EB_PR=21546 EB_ARGS= EB_CONTAINER= EB_REPO=easybuild-easyconfigs /opt/software/slurm/bin/sbatch --job-name test_PR_21546 --ntasks="16" ~/boegelbot/eb_from_pr_upload_generoso.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 14569

Test results coming soon (I hope)...

- notification for comment with ID 2436204654 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegelbot
Copy link
Collaborator

Test report by @boegelbot
SUCCESS
Build succeeded for 3 out of 3 (3 easyconfigs in total)
cnx1 - Linux Rocky Linux 8.9, x86_64, Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz (haswell), Python 3.6.8
See https://gist.github.com/boegelbot/1ed350b8f48ecc2c1934322312a2eeeb for a full test report.

@Thyre
Copy link
Contributor Author

Thyre commented Nov 22, 2024

jsc-zen3 indeed uses NFS to mount some directories. This might explain the failures, if something goes wrong there during testing. @SebastianAchilles, should I try to disable the test with a patch, so that other users do not run into this issue?

@SebastianAchilles
Copy link
Member

Yes, disabling the tests that fail on NFS with a patch is probably the best solution.

@Thyre
Copy link
Contributor Author

Thyre commented Nov 22, 2024

Will look into it

Copy link

github-actions bot commented Nov 22, 2024

Updated software NCCL-2.22.3-GCCcore-13.3.0-CUDA-12.6.0.eb

Diff against NCCL-2.20.5-GCCcore-13.2.0-CUDA-12.4.0.eb

easybuild/easyconfigs/n/NCCL/NCCL-2.20.5-GCCcore-13.2.0-CUDA-12.4.0.eb

diff --git a/easybuild/easyconfigs/n/NCCL/NCCL-2.20.5-GCCcore-13.2.0-CUDA-12.4.0.eb b/easybuild/easyconfigs/n/NCCL/NCCL-2.22.3-GCCcore-13.3.0-CUDA-12.6.0.eb
index 90634952ad..0534e538fa 100644
--- a/easybuild/easyconfigs/n/NCCL/NCCL-2.20.5-GCCcore-13.2.0-CUDA-12.4.0.eb
+++ b/easybuild/easyconfigs/n/NCCL/NCCL-2.22.3-GCCcore-13.3.0-CUDA-12.6.0.eb
@@ -1,23 +1,23 @@
 name = 'NCCL'
-version = '2.20.5'
+version = '2.22.3'
 versionsuffix = '-CUDA-%(cudaver)s'
 
 homepage = 'https://developer.nvidia.com/nccl'
 description = """The NVIDIA Collective Communications Library (NCCL) implements multi-GPU and multi-node collective
 communication primitives that are performance optimized for NVIDIA GPUs."""
 
-toolchain = {'name': 'GCCcore', 'version': '13.2.0'}
+toolchain = {'name': 'GCCcore', 'version': '13.3.0'}
 
 github_account = 'NVIDIA'
 source_urls = [GITHUB_SOURCE]
 sources = ['v%(version)s-1.tar.gz']
-checksums = ['d11ad65c1df3cbe4447eaddceec71569f5c0497e27b3b8369cf79f18d2b2ad8c']
+checksums = ['45151629a9494460e73375281e8b0fe379141528879301899ece9b776faca024']
 
-builddependencies = [('binutils', '2.40')]
+builddependencies = [('binutils', '2.42')]
 
 dependencies = [
-    ('CUDA', '12.4.0', '', SYSTEM),
-    ('UCX-CUDA', '1.15.0', versionsuffix),
+    ('CUDA', '12.6.0', '', SYSTEM),
+    ('UCX-CUDA', '1.16.0', versionsuffix),
 ]
 
 # default CUDA compute capabilities to use (override via --cuda-compute-capabilities)
Diff against NCCL-2.16.2-GCCcore-12.2.0-CUDA-11.7.0.eb

easybuild/easyconfigs/n/NCCL/NCCL-2.16.2-GCCcore-12.2.0-CUDA-11.7.0.eb

diff --git a/easybuild/easyconfigs/n/NCCL/NCCL-2.16.2-GCCcore-12.2.0-CUDA-11.7.0.eb b/easybuild/easyconfigs/n/NCCL/NCCL-2.22.3-GCCcore-13.3.0-CUDA-12.6.0.eb
index ebbd822138..0534e538fa 100644
--- a/easybuild/easyconfigs/n/NCCL/NCCL-2.16.2-GCCcore-12.2.0-CUDA-11.7.0.eb
+++ b/easybuild/easyconfigs/n/NCCL/NCCL-2.22.3-GCCcore-13.3.0-CUDA-12.6.0.eb
@@ -1,33 +1,26 @@
 name = 'NCCL'
-version = '2.16.2'
+version = '2.22.3'
 versionsuffix = '-CUDA-%(cudaver)s'
 
 homepage = 'https://developer.nvidia.com/nccl'
 description = """The NVIDIA Collective Communications Library (NCCL) implements multi-GPU and multi-node collective
 communication primitives that are performance optimized for NVIDIA GPUs."""
 
-toolchain = {'name': 'GCCcore', 'version': '12.2.0'}
+toolchain = {'name': 'GCCcore', 'version': '13.3.0'}
 
 github_account = 'NVIDIA'
 source_urls = [GITHUB_SOURCE]
 sources = ['v%(version)s-1.tar.gz']
-patches = ['NCCL-2.16.2_fix-cpuid.patch']
-checksums = [
-    {'v2.16.2-1.tar.gz': '7f7c738511a8876403fc574d13d48e7c250d934d755598d82e14bab12236fc64'},
-    {'NCCL-2.16.2_fix-cpuid.patch': '0459ecadcd32b2a7a000a2ce4f675afba908b2c0afabafde585330ff4f83e277'},
-]
+checksums = ['45151629a9494460e73375281e8b0fe379141528879301899ece9b776faca024']
 
-builddependencies = [('binutils', '2.39')]
+builddependencies = [('binutils', '2.42')]
 
 dependencies = [
-    ('CUDA', '11.7.0', '', SYSTEM),
-    ('UCX-CUDA', '1.13.1', versionsuffix),
+    ('CUDA', '12.6.0', '', SYSTEM),
+    ('UCX-CUDA', '1.16.0', versionsuffix),
 ]
 
-prebuildopts = "sed -i 's/NVCUFLAGS  := /NVCUFLAGS  := -allow-unsupported-compiler /' makefiles/common.mk && "
-buildopts = "VERBOSE=1"
-
 # default CUDA compute capabilities to use (override via --cuda-compute-capabilities)
-cuda_compute_capabilities = ['3.5', '5.0', '6.0', '7.0', '7.5', '8.0', '8.6']
+cuda_compute_capabilities = ['5.0', '6.0', '7.0', '7.5', '8.0', '8.6', '9.0']
 
 moduleclass = 'lib'
Diff against NCCL-2.18.3-GCCcore-12.3.0-CUDA-12.1.1.eb

easybuild/easyconfigs/n/NCCL/NCCL-2.18.3-GCCcore-12.3.0-CUDA-12.1.1.eb

diff --git a/easybuild/easyconfigs/n/NCCL/NCCL-2.18.3-GCCcore-12.3.0-CUDA-12.1.1.eb b/easybuild/easyconfigs/n/NCCL/NCCL-2.22.3-GCCcore-13.3.0-CUDA-12.6.0.eb
index 569d09f985..0534e538fa 100644
--- a/easybuild/easyconfigs/n/NCCL/NCCL-2.18.3-GCCcore-12.3.0-CUDA-12.1.1.eb
+++ b/easybuild/easyconfigs/n/NCCL/NCCL-2.22.3-GCCcore-13.3.0-CUDA-12.6.0.eb
@@ -1,32 +1,23 @@
 name = 'NCCL'
-version = '2.18.3'
+version = '2.22.3'
 versionsuffix = '-CUDA-%(cudaver)s'
 
 homepage = 'https://developer.nvidia.com/nccl'
 description = """The NVIDIA Collective Communications Library (NCCL) implements multi-GPU and multi-node collective
 communication primitives that are performance optimized for NVIDIA GPUs."""
 
-toolchain = {'name': 'GCCcore', 'version': '12.3.0'}
+toolchain = {'name': 'GCCcore', 'version': '13.3.0'}
 
 github_account = 'NVIDIA'
 source_urls = [GITHUB_SOURCE]
 sources = ['v%(version)s-1.tar.gz']
-patches = [
-    'NCCL-2.16.2_fix-cpuid.patch',
-    'NCCL-2.18.3_fix-cudaMemcpyAsync.patch',
-]
-checksums = [
-    ('6477d83c9edbb34a0ebce6d751a1b32962bc6415d75d04972b676c6894ceaef9',
-     'b4f5d7d9eea2c12e32e7a06fe138b2cfc75969c6d5c473aa6f819a792db2fc96'),
-    {'NCCL-2.16.2_fix-cpuid.patch': '0459ecadcd32b2a7a000a2ce4f675afba908b2c0afabafde585330ff4f83e277'},
-    {'NCCL-2.18.3_fix-cudaMemcpyAsync.patch': '7dc8d0d1b78e4f8acefbc400860f47432ef67c225b50d73c732999c23483de90'},
-]
+checksums = ['45151629a9494460e73375281e8b0fe379141528879301899ece9b776faca024']
 
-builddependencies = [('binutils', '2.40')]
+builddependencies = [('binutils', '2.42')]
 
 dependencies = [
-    ('CUDA', '12.1.1', '', SYSTEM),
-    ('UCX-CUDA', '1.14.1', versionsuffix),
+    ('CUDA', '12.6.0', '', SYSTEM),
+    ('UCX-CUDA', '1.16.0', versionsuffix),
 ]
 
 # default CUDA compute capabilities to use (override via --cuda-compute-capabilities)

Updated software OpenMPI-5.0.3-NVHPC-24.9-CUDA-12.6.0.eb

Diff against OpenMPI-5.0.3-GCC-13.3.0.eb

easybuild/easyconfigs/o/OpenMPI/OpenMPI-5.0.3-GCC-13.3.0.eb

diff --git a/easybuild/easyconfigs/o/OpenMPI/OpenMPI-5.0.3-GCC-13.3.0.eb b/easybuild/easyconfigs/o/OpenMPI/OpenMPI-5.0.3-NVHPC-24.9-CUDA-12.6.0.eb
index 6864e213a9..e6c772bf64 100644
--- a/easybuild/easyconfigs/o/OpenMPI/OpenMPI-5.0.3-GCC-13.3.0.eb
+++ b/easybuild/easyconfigs/o/OpenMPI/OpenMPI-5.0.3-NVHPC-24.9-CUDA-12.6.0.eb
@@ -4,19 +4,29 @@ version = '5.0.3'
 homepage = 'https://www.open-mpi.org/'
 description = """The Open MPI Project is an open source MPI-3 implementation."""
 
-toolchain = {'name': 'GCC', 'version': '13.3.0'}
+toolchain = {'name': 'NVHPC', 'version': '24.9-CUDA-12.6.0'}
 
 source_urls = ['https://www.open-mpi.org/software/ompi/v%(version_major_minor)s/downloads']
 sources = [SOURCELOWER_TAR_BZ2]
-patches = [('OpenMPI-5.0.2_build-with-internal-cuda-header.patch', 1)]
+patches = [
+    'OpenMPI-5.0.3_fix_hle_make_errors.patch',
+    'OpenMPI-5.0.3_disable_opal_path_nfs_test.patch',
+    ('OpenMPI-5.0.2_build-with-internal-cuda-header.patch', 1)
+]
 checksums = [
-    {'openmpi-5.0.3.tar.bz2': '990582f206b3ab32e938aa31bbf07c639368e4405dca196fabe7f0f76eeda90b'},
+    {'openmpi-5.0.3.tar.bz2':
+     '990582f206b3ab32e938aa31bbf07c639368e4405dca196fabe7f0f76eeda90b'},
+    {'OpenMPI-5.0.3_fix_hle_make_errors.patch':
+     '881c907a9f5901d5d6af41cd33dffdcecba4a67a9e5123e602542aea57a80895'},
+    {'OpenMPI-5.0.3_disable_opal_path_nfs_test.patch':
+     '75d4417e35252ea3a19b2792f1b06e9aeb408c253aa4921d77226d57b71dee45'},
     {'OpenMPI-5.0.2_build-with-internal-cuda-header.patch':
      'f52dc470543f35efef10d651dd159c771ae25f8f76a420d20d87abf4dc769ed7'},
 ]
 
 builddependencies = [
     ('pkgconf', '2.2.0'),
+    ('Perl', '5.38.2'),
     ('Autotools', '20231222'),
 ]
 
@@ -25,14 +35,29 @@ dependencies = [
     ('hwloc', '2.10.0'),
     ('libevent', '2.1.12'),
     ('UCX', '1.16.0'),
+    ('UCX-CUDA', '1.16.0', '-CUDA-%(cudaver)s'),
     ('libfabric', '1.21.0'),
     ('PMIx', '5.0.2'),
     ('PRRTE', '3.0.5'),
     ('UCC', '1.3.0'),
+    ('UCC-CUDA', '1.3.0', '-CUDA-%(cudaver)s'),
 ]
 
 # CUDA related patches and custom configure option can be removed if CUDA support isn't wanted.
-preconfigopts = 'gcc -Iopal/mca/cuda/include -shared opal/mca/cuda/lib/cuda.c -o opal/mca/cuda/lib/libcuda.so && '
-configopts = '--with-cuda=%(start_dir)s/opal/mca/cuda --with-show-load-errors=no '
+preconfigopts = 'nvc -Iopal/mca/cuda/include -shared opal/mca/cuda/lib/cuda.c -o opal/mca/cuda/lib/libcuda.so && '
+# Update configure to include changes from the "disable_opal_path_nfs_test" patch
+preconfigopts += './autogen.pl --force && '
+
+configopts = '--with-cuda=%(start_dir)s/opal/mca/cuda '
+# Required to prevent internal compiler error in opal.
+configopts += '--enable-alt-short-float=no '
+# Set PGI compilers manually, as NVHPC compilers are not correctly detected
+configopts += 'CC=pgcc CXX=pgc++ FC=pgfortran '
+
+# site specific options
+# configopts += '--without-psm2 '
+# configopts += '--disable-oshmem '
+# configopts += '--with-gpfs '
+configopts += '--with-slurm '
 
 moduleclass = 'mpi'
Diff against OpenMPI-4.1.6-GCC-13.2.0.eb

easybuild/easyconfigs/o/OpenMPI/OpenMPI-4.1.6-GCC-13.2.0.eb

diff --git a/easybuild/easyconfigs/o/OpenMPI/OpenMPI-4.1.6-GCC-13.2.0.eb b/easybuild/easyconfigs/o/OpenMPI/OpenMPI-5.0.3-NVHPC-24.9-CUDA-12.6.0.eb
index 831148339a..e6c772bf64 100644
--- a/easybuild/easyconfigs/o/OpenMPI/OpenMPI-4.1.6-GCC-13.2.0.eb
+++ b/easybuild/easyconfigs/o/OpenMPI/OpenMPI-5.0.3-NVHPC-24.9-CUDA-12.6.0.eb
@@ -1,64 +1,63 @@
 name = 'OpenMPI'
-version = '4.1.6'
+version = '5.0.3'
 
 homepage = 'https://www.open-mpi.org/'
 description = """The Open MPI Project is an open source MPI-3 implementation."""
 
-toolchain = {'name': 'GCC', 'version': '13.2.0'}
+toolchain = {'name': 'NVHPC', 'version': '24.9-CUDA-12.6.0'}
 
 source_urls = ['https://www.open-mpi.org/software/ompi/v%(version_major_minor)s/downloads']
 sources = [SOURCELOWER_TAR_BZ2]
 patches = [
-    'OpenMPI-4.1.1_build-with-internal-cuda-header.patch',
-    'OpenMPI-4.1.1_opal-datatype-cuda-performance.patch',
-    'OpenMPI-4.1.x_add_atomic_wmb.patch',
+    'OpenMPI-5.0.3_fix_hle_make_errors.patch',
+    'OpenMPI-5.0.3_disable_opal_path_nfs_test.patch',
+    ('OpenMPI-5.0.2_build-with-internal-cuda-header.patch', 1)
 ]
 checksums = [
-    {'openmpi-4.1.6.tar.bz2': 'f740994485516deb63b5311af122c265179f5328a0d857a567b85db00b11e415'},
-    {'OpenMPI-4.1.1_build-with-internal-cuda-header.patch':
-     '63eac52736bdf7644c480362440a7f1f0ae7c7cae47b7565f5635c41793f8c83'},
-    {'OpenMPI-4.1.1_opal-datatype-cuda-performance.patch':
-     'b767c7166cf0b32906132d58de5439c735193c9fd09ec3c5c11db8d5fa68750e'},
-    {'OpenMPI-4.1.x_add_atomic_wmb.patch': '9494bbc546d661ba5189e44b4c84a7f8df30a87cdb9d96ce2e73a7c8fecba172'},
+    {'openmpi-5.0.3.tar.bz2':
+     '990582f206b3ab32e938aa31bbf07c639368e4405dca196fabe7f0f76eeda90b'},
+    {'OpenMPI-5.0.3_fix_hle_make_errors.patch':
+     '881c907a9f5901d5d6af41cd33dffdcecba4a67a9e5123e602542aea57a80895'},
+    {'OpenMPI-5.0.3_disable_opal_path_nfs_test.patch':
+     '75d4417e35252ea3a19b2792f1b06e9aeb408c253aa4921d77226d57b71dee45'},
+    {'OpenMPI-5.0.2_build-with-internal-cuda-header.patch':
+     'f52dc470543f35efef10d651dd159c771ae25f8f76a420d20d87abf4dc769ed7'},
 ]
 
 builddependencies = [
-    ('pkgconf', '2.0.3'),
-    ('Perl', '5.38.0'),
-    ('Autotools', '20220317'),
+    ('pkgconf', '2.2.0'),
+    ('Perl', '5.38.2'),
+    ('Autotools', '20231222'),
 ]
 
 dependencies = [
-    ('zlib', '1.2.13'),
-    ('hwloc', '2.9.2'),
+    ('zlib', '1.3.1'),
+    ('hwloc', '2.10.0'),
     ('libevent', '2.1.12'),
-    ('UCX', '1.15.0'),
-    ('libfabric', '1.19.0'),
-    ('PMIx', '4.2.6'),
-    ('UCC', '1.2.0'),
+    ('UCX', '1.16.0'),
+    ('UCX-CUDA', '1.16.0', '-CUDA-%(cudaver)s'),
+    ('libfabric', '1.21.0'),
+    ('PMIx', '5.0.2'),
+    ('PRRTE', '3.0.5'),
+    ('UCC', '1.3.0'),
+    ('UCC-CUDA', '1.3.0', '-CUDA-%(cudaver)s'),
 ]
 
-# Update configure to include changes from the "internal-cuda" patch
-# by running a subset of autogen.pl sufficient to achieve this
-# without doing the full, long-running regeneration.
-preconfigopts = ' && '.join([
-    'cd config',
-    'autom4te --language=m4sh opal_get_version.m4sh -o opal_get_version.sh',
-    'cd ..',
-    'autoconf',
-    'autoheader',
-    'aclocal',
-    'automake',
-    ''
-])
-
 # CUDA related patches and custom configure option can be removed if CUDA support isn't wanted.
-configopts = '--with-cuda=internal '
-
-# disable MPI1 compatibility for now, see what breaks...
-# configopts += '--enable-mpi1-compatibility '
-
-# to enable SLURM integration (site-specific)
-# configopts += '--with-slurm --with-pmi=/usr/include/slurm --with-pmi-libdir=/usr'
+preconfigopts = 'nvc -Iopal/mca/cuda/include -shared opal/mca/cuda/lib/cuda.c -o opal/mca/cuda/lib/libcuda.so && '
+# Update configure to include changes from the "disable_opal_path_nfs_test" patch
+preconfigopts += './autogen.pl --force && '
+
+configopts = '--with-cuda=%(start_dir)s/opal/mca/cuda '
+# Required to prevent internal compiler error in opal.
+configopts += '--enable-alt-short-float=no '
+# Set PGI compilers manually, as NVHPC compilers are not correctly detected
+configopts += 'CC=pgcc CXX=pgc++ FC=pgfortran '
+
+# site specific options
+# configopts += '--without-psm2 '
+# configopts += '--disable-oshmem '
+# configopts += '--with-gpfs '
+configopts += '--with-slurm '
 
 moduleclass = 'mpi'
Diff against OpenMPI-4.1.5-intel-compilers-2023.1.0.eb

easybuild/easyconfigs/o/OpenMPI/OpenMPI-4.1.5-intel-compilers-2023.1.0.eb

diff --git a/easybuild/easyconfigs/o/OpenMPI/OpenMPI-4.1.5-intel-compilers-2023.1.0.eb b/easybuild/easyconfigs/o/OpenMPI/OpenMPI-5.0.3-NVHPC-24.9-CUDA-12.6.0.eb
index 59780f6df6..e6c772bf64 100644
--- a/easybuild/easyconfigs/o/OpenMPI/OpenMPI-4.1.5-intel-compilers-2023.1.0.eb
+++ b/easybuild/easyconfigs/o/OpenMPI/OpenMPI-5.0.3-NVHPC-24.9-CUDA-12.6.0.eb
@@ -1,66 +1,63 @@
 name = 'OpenMPI'
-version = '4.1.5'
+version = '5.0.3'
 
 homepage = 'https://www.open-mpi.org/'
 description = """The Open MPI Project is an open source MPI-3 implementation."""
 
-toolchain = {'name': 'intel-compilers', 'version': '2023.1.0'}
+toolchain = {'name': 'NVHPC', 'version': '24.9-CUDA-12.6.0'}
 
 source_urls = ['https://www.open-mpi.org/software/ompi/v%(version_major_minor)s/downloads']
 sources = [SOURCELOWER_TAR_BZ2]
 patches = [
-    'OpenMPI-4.1.1_build-with-internal-cuda-header.patch',
-    'OpenMPI-4.1.1_opal-datatype-cuda-performance.patch',
-    'OpenMPI-4.1.5_fix-pmix3x.patch',
-    'OpenMPI-4.1.x_add_atomic_wmb.patch',
+    'OpenMPI-5.0.3_fix_hle_make_errors.patch',
+    'OpenMPI-5.0.3_disable_opal_path_nfs_test.patch',
+    ('OpenMPI-5.0.2_build-with-internal-cuda-header.patch', 1)
 ]
 checksums = [
-    {'openmpi-4.1.5.tar.bz2': 'a640986bc257389dd379886fdae6264c8cfa56bc98b71ce3ae3dfbd8ce61dbe3'},
-    {'OpenMPI-4.1.1_build-with-internal-cuda-header.patch':
-     '63eac52736bdf7644c480362440a7f1f0ae7c7cae47b7565f5635c41793f8c83'},
-    {'OpenMPI-4.1.1_opal-datatype-cuda-performance.patch':
-     'b767c7166cf0b32906132d58de5439c735193c9fd09ec3c5c11db8d5fa68750e'},
-    {'OpenMPI-4.1.5_fix-pmix3x.patch': '46edac3dbf32f2a611d45e8a3c8edd3ae2f430eec16a1373b510315272115c40'},
-    {'OpenMPI-4.1.x_add_atomic_wmb.patch': '9494bbc546d661ba5189e44b4c84a7f8df30a87cdb9d96ce2e73a7c8fecba172'},
+    {'openmpi-5.0.3.tar.bz2':
+     '990582f206b3ab32e938aa31bbf07c639368e4405dca196fabe7f0f76eeda90b'},
+    {'OpenMPI-5.0.3_fix_hle_make_errors.patch':
+     '881c907a9f5901d5d6af41cd33dffdcecba4a67a9e5123e602542aea57a80895'},
+    {'OpenMPI-5.0.3_disable_opal_path_nfs_test.patch':
+     '75d4417e35252ea3a19b2792f1b06e9aeb408c253aa4921d77226d57b71dee45'},
+    {'OpenMPI-5.0.2_build-with-internal-cuda-header.patch':
+     'f52dc470543f35efef10d651dd159c771ae25f8f76a420d20d87abf4dc769ed7'},
 ]
 
 builddependencies = [
-    ('pkgconf', '1.9.5'),
-    ('Perl', '5.36.1'),
-    ('Autotools', '20220317'),
+    ('pkgconf', '2.2.0'),
+    ('Perl', '5.38.2'),
+    ('Autotools', '20231222'),
 ]
 
 dependencies = [
-    ('zlib', '1.2.13'),
-    ('hwloc', '2.9.1'),
+    ('zlib', '1.3.1'),
+    ('hwloc', '2.10.0'),
     ('libevent', '2.1.12'),
-    ('UCX', '1.14.1'),
-    ('libfabric', '1.18.0'),
-    ('PMIx', '4.2.4'),
-    ('UCC', '1.2.0'),
+    ('UCX', '1.16.0'),
+    ('UCX-CUDA', '1.16.0', '-CUDA-%(cudaver)s'),
+    ('libfabric', '1.21.0'),
+    ('PMIx', '5.0.2'),
+    ('PRRTE', '3.0.5'),
+    ('UCC', '1.3.0'),
+    ('UCC-CUDA', '1.3.0', '-CUDA-%(cudaver)s'),
 ]
 
-# Update configure to include changes from the "internal-cuda" patch
-# by running a subset of autogen.pl sufficient to achieve this
-# without doing the full, long-running regeneration.
-preconfigopts = ' && '.join([
-    'cd config',
-    'autom4te --language=m4sh opal_get_version.m4sh -o opal_get_version.sh',
-    'cd ..',
-    'autoconf',
-    'autoheader',
-    'aclocal',
-    'automake',
-    ''
-])
-
 # CUDA related patches and custom configure option can be removed if CUDA support isn't wanted.
-configopts = '--with-cuda=internal '
-
-# disable MPI1 compatibility for now, see what breaks...
-# configopts += '--enable-mpi1-compatibility '
-
-# to enable SLURM integration (site-specific)
-# configopts += '--with-slurm --with-pmi=/usr/include/slurm --with-pmi-libdir=/usr'
+preconfigopts = 'nvc -Iopal/mca/cuda/include -shared opal/mca/cuda/lib/cuda.c -o opal/mca/cuda/lib/libcuda.so && '
+# Update configure to include changes from the "disable_opal_path_nfs_test" patch
+preconfigopts += './autogen.pl --force && '
+
+configopts = '--with-cuda=%(start_dir)s/opal/mca/cuda '
+# Required to prevent internal compiler error in opal.
+configopts += '--enable-alt-short-float=no '
+# Set PGI compilers manually, as NVHPC compilers are not correctly detected
+configopts += 'CC=pgcc CXX=pgc++ FC=pgfortran '
+
+# site specific options
+# configopts += '--without-psm2 '
+# configopts += '--disable-oshmem '
+# configopts += '--with-gpfs '
+configopts += '--with-slurm '
 
 moduleclass = 'mpi'

Updated software UCC-CUDA-1.3.0-GCCcore-13.3.0-CUDA-12.6.0.eb

Diff against UCC-CUDA-1.2.0-GCCcore-12.3.0-CUDA-12.1.1.eb

easybuild/easyconfigs/u/UCC-CUDA/UCC-CUDA-1.2.0-GCCcore-12.3.0-CUDA-12.1.1.eb

diff --git a/easybuild/easyconfigs/u/UCC-CUDA/UCC-CUDA-1.2.0-GCCcore-12.3.0-CUDA-12.1.1.eb b/easybuild/easyconfigs/u/UCC-CUDA/UCC-CUDA-1.3.0-GCCcore-13.3.0-CUDA-12.6.0.eb
index 8594d50984..a0b4865a72 100644
--- a/easybuild/easyconfigs/u/UCC-CUDA/UCC-CUDA-1.2.0-GCCcore-12.3.0-CUDA-12.1.1.eb
+++ b/easybuild/easyconfigs/u/UCC-CUDA/UCC-CUDA-1.3.0-GCCcore-13.3.0-CUDA-12.6.0.eb
@@ -1,7 +1,7 @@
 easyblock = 'ConfigureMake'
 
 name = 'UCC-CUDA'
-version = '1.2.0'
+version = '1.3.0'
 versionsuffix = '-CUDA-%(cudaver)s'
 
 homepage = 'https://www.openucx.org/'
@@ -12,7 +12,7 @@ feature-rich for current and emerging programming models and runtimes.
 This module adds the UCC CUDA support.
 """
 
-toolchain = {'name': 'GCCcore', 'version': '12.3.0'}
+toolchain = {'name': 'GCCcore', 'version': '13.3.0'}
 toolchainopts = {'pic': True}
 
 source_urls = ['https://github.com/openucx/ucc/archive/refs/tags']
@@ -21,21 +21,21 @@ patches = [
     '%(name)s-%(version)s_link_against_existing_UCC_libs.patch',
 ]
 checksums = [
-    {'v1.2.0.tar.gz': 'c1552797600835c0cf401b82dc89c4d27d5717f4fb805d41daca8e19f65e509d'},
-    {'UCC-CUDA-1.2.0_link_against_existing_UCC_libs.patch':
-     '84157be5eae96d2501df076bcf0598b104adf80abeca028a144c4fb098638207'},
+    {'v1.3.0.tar.gz': 'b56379abe5f1c125bfa83be305d78d81a64aa271b7b5fff0ac17b86725ff3acf'},
+    {'UCC-CUDA-1.3.0_link_against_existing_UCC_libs.patch':
+     '758228357ce2a6ae50fb26a0b43e9176feaf379e266365f38205ce679267fc0d'},
 ]
 
 builddependencies = [
-    ('binutils', '2.40'),
-    ('Autotools', '20220317'),
+    ('binutils', '2.42'),
+    ('Autotools', '20231222'),
 ]
 
 dependencies = [
     ('UCC', version),
-    ('CUDA',  '12.1.1', '', SYSTEM),
-    ('UCX-CUDA', '1.14.1', '-CUDA-%(cudaver)s'),
-    ('NCCL', '2.18.3', '-CUDA-%(cudaver)s'),
+    ('CUDA',  '12.6.0', '', SYSTEM),
+    ('UCX-CUDA', '1.16.0', '-CUDA-%(cudaver)s'),
+    ('NCCL', '2.22.3', '-CUDA-%(cudaver)s'),
 ]
 
 preconfigopts = "./autogen.sh && "
Diff against UCC-CUDA-1.1.0-GCCcore-12.2.0-CUDA-12.0.0.eb

easybuild/easyconfigs/u/UCC-CUDA/UCC-CUDA-1.1.0-GCCcore-12.2.0-CUDA-12.0.0.eb

diff --git a/easybuild/easyconfigs/u/UCC-CUDA/UCC-CUDA-1.1.0-GCCcore-12.2.0-CUDA-12.0.0.eb b/easybuild/easyconfigs/u/UCC-CUDA/UCC-CUDA-1.3.0-GCCcore-13.3.0-CUDA-12.6.0.eb
index bfe211063d..a0b4865a72 100644
--- a/easybuild/easyconfigs/u/UCC-CUDA/UCC-CUDA-1.1.0-GCCcore-12.2.0-CUDA-12.0.0.eb
+++ b/easybuild/easyconfigs/u/UCC-CUDA/UCC-CUDA-1.3.0-GCCcore-13.3.0-CUDA-12.6.0.eb
@@ -1,7 +1,7 @@
 easyblock = 'ConfigureMake'
 
 name = 'UCC-CUDA'
-version = '1.1.0'
+version = '1.3.0'
 versionsuffix = '-CUDA-%(cudaver)s'
 
 homepage = 'https://www.openucx.org/'
@@ -12,32 +12,30 @@ feature-rich for current and emerging programming models and runtimes.
 This module adds the UCC CUDA support.
 """
 
-toolchain = {'name': 'GCCcore', 'version': '12.2.0'}
+toolchain = {'name': 'GCCcore', 'version': '13.3.0'}
 toolchainopts = {'pic': True}
 
 source_urls = ['https://github.com/openucx/ucc/archive/refs/tags']
 sources = ['v%(version)s.tar.gz']
 patches = [
-    '%(name)s-1.0.0_link_against_existing_UCC_libs.patch',
-    '%(name)s-%(version)s_cuda_12_mem_ops.patch',
+    '%(name)s-%(version)s_link_against_existing_UCC_libs.patch',
 ]
 checksums = [
-    {'v1.1.0.tar.gz': '74c8ba75037b5bd88cb703e8c8ae55639af3fecfd4428912a433c010c97b4df7'},
-    {'UCC-CUDA-1.0.0_link_against_existing_UCC_libs.patch':
-     '9fa11cf6779174f4e9048df5812096e4261e1769d465cc7f34a6354398876856'},
-    {'UCC-CUDA-1.1.0_cuda_12_mem_ops.patch': 'fc3ea1487d29dc626db2363ef5a79e7f0906f6a7507a363fa6167a812b143eb6'},
+    {'v1.3.0.tar.gz': 'b56379abe5f1c125bfa83be305d78d81a64aa271b7b5fff0ac17b86725ff3acf'},
+    {'UCC-CUDA-1.3.0_link_against_existing_UCC_libs.patch':
+     '758228357ce2a6ae50fb26a0b43e9176feaf379e266365f38205ce679267fc0d'},
 ]
 
 builddependencies = [
-    ('binutils', '2.39'),
-    ('Autotools', '20220317'),
+    ('binutils', '2.42'),
+    ('Autotools', '20231222'),
 ]
 
 dependencies = [
-    ('UCC', '1.1.0'),
-    ('CUDA',  '12.0.0', '', SYSTEM),
-    ('UCX-CUDA', '1.13.1', '-CUDA-%(cudaver)s'),
-    ('NCCL', '2.16.2', '-CUDA-%(cudaver)s'),
+    ('UCC', version),
+    ('CUDA',  '12.6.0', '', SYSTEM),
+    ('UCX-CUDA', '1.16.0', '-CUDA-%(cudaver)s'),
+    ('NCCL', '2.22.3', '-CUDA-%(cudaver)s'),
 ]
 
 preconfigopts = "./autogen.sh && "
Diff against UCC-CUDA-1.0.0-GCCcore-11.3.0-CUDA-11.7.0.eb

easybuild/easyconfigs/u/UCC-CUDA/UCC-CUDA-1.0.0-GCCcore-11.3.0-CUDA-11.7.0.eb

diff --git a/easybuild/easyconfigs/u/UCC-CUDA/UCC-CUDA-1.0.0-GCCcore-11.3.0-CUDA-11.7.0.eb b/easybuild/easyconfigs/u/UCC-CUDA/UCC-CUDA-1.3.0-GCCcore-13.3.0-CUDA-12.6.0.eb
index e213c78d3b..a0b4865a72 100644
--- a/easybuild/easyconfigs/u/UCC-CUDA/UCC-CUDA-1.0.0-GCCcore-11.3.0-CUDA-11.7.0.eb
+++ b/easybuild/easyconfigs/u/UCC-CUDA/UCC-CUDA-1.3.0-GCCcore-13.3.0-CUDA-12.6.0.eb
@@ -1,7 +1,7 @@
 easyblock = 'ConfigureMake'
 
 name = 'UCC-CUDA'
-version = '1.0.0'
+version = '1.3.0'
 versionsuffix = '-CUDA-%(cudaver)s'
 
 homepage = 'https://www.openucx.org/'
@@ -12,30 +12,30 @@ feature-rich for current and emerging programming models and runtimes.
 This module adds the UCC CUDA support.
 """
 
-toolchain = {'name': 'GCCcore', 'version': '11.3.0'}
+toolchain = {'name': 'GCCcore', 'version': '13.3.0'}
 toolchainopts = {'pic': True}
 
 source_urls = ['https://github.com/openucx/ucc/archive/refs/tags']
 sources = ['v%(version)s.tar.gz']
 patches = [
-    '%(name)s-1.0.0_link_against_existing_UCC_libs.patch',
+    '%(name)s-%(version)s_link_against_existing_UCC_libs.patch',
 ]
 checksums = [
-    'd3b4aa7004bf339d35952a1699a6e408064ba578bdc93861f5f07527ad0a5e8c',  # v1.0.0.tar.gz
-    # UCC-CUDA-1.0.0_link_against_existing_UCC_libs.patch
-    '9fa11cf6779174f4e9048df5812096e4261e1769d465cc7f34a6354398876856',
+    {'v1.3.0.tar.gz': 'b56379abe5f1c125bfa83be305d78d81a64aa271b7b5fff0ac17b86725ff3acf'},
+    {'UCC-CUDA-1.3.0_link_against_existing_UCC_libs.patch':
+     '758228357ce2a6ae50fb26a0b43e9176feaf379e266365f38205ce679267fc0d'},
 ]
 
 builddependencies = [
-    ('binutils', '2.38'),
-    ('Autotools', '20220317'),
+    ('binutils', '2.42'),
+    ('Autotools', '20231222'),
 ]
 
 dependencies = [
-    ('UCC', '1.0.0'),
-    ('CUDA',  '11.7.0', '', SYSTEM),
-    ('UCX-CUDA', '1.12.1', '-CUDA-%(cudaver)s'),
-    ('NCCL', '2.12.12', '-CUDA-%(cudaver)s'),
+    ('UCC', version),
+    ('CUDA',  '12.6.0', '', SYSTEM),
+    ('UCX-CUDA', '1.16.0', '-CUDA-%(cudaver)s'),
+    ('NCCL', '2.22.3', '-CUDA-%(cudaver)s'),
 ]
 
 preconfigopts = "./autogen.sh && "
@@ -43,10 +43,6 @@ preconfigopts = "./autogen.sh && "
 buildopts = '-C src/components/mc/cuda V=1 && make -C src/components/tl/nccl V=1'
 installopts = '-C src/components/mc/cuda && make -C src/components/tl/nccl install'
 
-# UCC_COMPONENT_PATH completely overrides $EBROOTUCC/lib/ucc so install symbolic links
-# to existing non CUDA related components
-postinstallcmds = ['for i in $EBROOTUCC/lib/ucc/*; do ln -s $i %(installdir)s/lib/ucc; done']
-
 sanity_check_paths = {
     'files': ['lib/ucc/libucc_mc_cuda.%s' % SHLIB_EXT, 'lib/ucc/libucc_tl_nccl.%s' % SHLIB_EXT],
     'dirs': ['lib']
@@ -54,6 +50,6 @@ sanity_check_paths = {
 
 sanity_check_commands = ["ucc_info -c"]
 
-modextravars = {'UCC_COMPONENT_PATH': '%(installdir)s/lib/ucc'}
+modextrapaths = {'EB_UCC_EXTRA_COMPONENT_PATH': 'lib/ucc'}
 
 moduleclass = 'lib'

@Thyre
Copy link
Contributor Author

Thyre commented Nov 22, 2024

@boegelbot please test @ jsc-zen3-a100
CORE_CNT=16

@boegelbot
Copy link
Collaborator

@Thyre: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de

PR test command 'if [[ develop != 'develop' ]]; then EB_BRANCH=develop ./easybuild_develop.sh 2> /dev/null 1>&2; EB_PREFIX=/home/boegelbot/easybuild/develop source init_env_easybuild_develop.sh; fi; EB_PR=21546 EB_ARGS= EB_CONTAINER= EB_REPO=easybuild-easyconfigs EB_BRANCH=develop /opt/software/slurm/bin/sbatch --job-name test_PR_21546 --ntasks="16" --partition=jsczen3g --gres=gpu:1 ~/boegelbot/eb_from_pr_upload_jsc-zen3.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 5321

Test results coming soon (I hope)...

- notification for comment with ID 2493832493 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

Signed-off-by: Jan André Reuter <[email protected]>
@SebastianAchilles
Copy link
Member

Looks like the batch job 5321 on jsc-zen3 finished successfully, but it had problems uploading the test report.

@SebastianAchilles
Copy link
Member

@boegelbot please test @ jsc-zen3-a100
CORE_CNT=16

@boegelbot
Copy link
Collaborator

@SebastianAchilles: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de

PR test command 'if [[ develop != 'develop' ]]; then EB_BRANCH=develop ./easybuild_develop.sh 2> /dev/null 1>&2; EB_PREFIX=/home/boegelbot/easybuild/develop source init_env_easybuild_develop.sh; fi; EB_PR=21546 EB_ARGS= EB_CONTAINER= EB_REPO=easybuild-easyconfigs EB_BRANCH=develop /opt/software/slurm/bin/sbatch --job-name test_PR_21546 --ntasks="16" --partition=jsczen3g --gres=gpu:1 ~/boegelbot/eb_from_pr_upload_jsc-zen3.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 5328

Test results coming soon (I hope)...

- notification for comment with ID 2495439226 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegelbot
Copy link
Collaborator

Test report by @boegelbot
SUCCESS
Build succeeded for 3 out of 3 (3 easyconfigs in total)
jsczen3g1.int.jsc-zen3.fz-juelich.de - Linux Rocky Linux 9.4, x86_64, AMD EPYC-Milan Processor (zen3), 1 x NVIDIA NVIDIA A100 80GB PCIe, 555.42.06, Python 3.9.18
See https://gist.github.com/boegelbot/151b246d610b3e39782e93aa2f8d5e5a for a full test report.

Copy link
Member

@SebastianAchilles SebastianAchilles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@SebastianAchilles
Copy link
Member

Going in, thanks @Thyre!

@SebastianAchilles SebastianAchilles merged commit e643818 into easybuilders:develop Nov 24, 2024
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2024a issues & PRs related to 2024a common toolchains update
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants