Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

{toolchain} nvompic v2021a #13107

Conversation

SebastianAchilles
Copy link
Member

@SebastianAchilles SebastianAchilles commented Jun 10, 2021

(created using eb --new-pr)

Depends on

@SebastianAchilles SebastianAchilles changed the title {toolchain} nompi v2021a {toolchain} nvompic v2021a Jun 11, 2021
@SebastianAchilles SebastianAchilles added this to the 4.x milestone Jun 11, 2021
@SebastianAchilles SebastianAchilles marked this pull request as ready for review June 18, 2021 14:16
@SebastianAchilles
Copy link
Member Author

@boegelbot please test @ generoso

@boegelbot
Copy link
Collaborator

@SebastianAchilles: Request for testing this PR well received on generoso

PR test command 'EB_PR=13107 EB_ARGS= /apps/slurm/default/bin/sbatch --job-name test_PR_13107 --ntasks=4 ~/boegelbot/eb_from_pr_upload_generoso.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 17561

Test results coming soon (I hope)...

- notification for comment with ID 864102480 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@SebastianAchilles
Copy link
Member Author

Test report by @SebastianAchilles
SUCCESS
Build succeeded for 2 out of 2 (2 easyconfigs in total)
rocky8-eb - Linux rocky linux 8.4, x86_64, Intel(R) Core(TM) i7-6900K CPU @ 3.20GHz (broadwell), Python 3.6.8
See https://gist.github.com/0c883508f1f2cd69a3e26171c5ad7e5a for a full test report.

@boegelbot
Copy link
Collaborator

Test report by @boegelbot
SUCCESS
Build succeeded for 2 out of 2 (2 easyconfigs in total)
generoso-x-2 - Linux centos linux 8.2.2004, x86_64, Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz (haswell), Python 3.6.8
See https://gist.github.com/821a2dc8a5040f0eb5ec0ffd7f5c359f for a full test report.

@SebastianAchilles
Copy link
Member Author

Test report by @SebastianAchilles
SUCCESS
Build succeeded for 4 out of 4 (2 easyconfigs in total)
centos8-eb - Linux centos linux 8.3.2011, x86_64, Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz (skylake), Python 3.6.8
See https://gist.github.com/c21c48a9300d461c88bddaa92edb15ea for a full test report.

@easybuilders easybuilders deleted a comment from boegelbot Jun 28, 2021
@easybuilders easybuilders deleted a comment from boegelbot Jun 28, 2021
@akesandgren
Copy link
Contributor

@boegelbot Please test @ generoso

@boegelbot
Copy link
Collaborator

@akesandgren: Request for testing this PR well received on generoso

PR test command 'EB_PR=13107 EB_ARGS= /apps/slurm/default/bin/sbatch --job-name test_PR_13107 --ntasks=4 ~/boegelbot/eb_from_pr_upload_generoso.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 17633

Test results coming soon (I hope)...

- notification for comment with ID 869590094 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@akesandgren
Copy link
Contributor

This should eventually use UCX-CUDA when that stuff is completely in place (which should be soon)

@boegelbot
Copy link
Collaborator

Test report by @boegelbot
SUCCESS
Build succeeded for 2 out of 2 (2 easyconfigs in total)
generoso-x-2 - Linux centos linux 8.2.2004, x86_64, Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz (haswell), Python 3.6.8
See https://gist.github.com/345570077cd9633950e5efd903f214a7 for a full test report.

('zlib', '1.2.11'),
('hwloc', '2.4.1'),
('libevent', '2.1.12'),
('UCX', '1.10.0'),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's replace this with the now available UCX-CUDA of the same version.

@akesandgren
Copy link
Contributor

And add a HPL and OSU-Micro-Benchmarks built on top of this so we can test it.

@akesandgren
Copy link
Contributor

This is now tripping up in tools/module_naming_scheme/hierarchical_mns.py:det_modpath_extensions line 231

@akesandgren
Copy link
Contributor

@boegel I think we might need some guidance here...
When it is trying to build CUDA-11.3.1-NVHPC-21.5.eb the hmns code sees CUDA and in det_modpath_extensions we have
non_system_cuda = true, entering that if stmt, "if ec['name'] in extend_comps:" is true since CUDA is in at least one of the COMP_NAME_VERSION_TEMPLATES keys, same for "if ec['name'] in comp_names:", etc but eventually comp_name_ver will still be None since there is no 'CUDA,NVHPC' key. But if we're trying to move away from the compiler-CUDA modulepath extension, #12484 option 1, we need to do this differently. Either in this PR or in framework.

What is the right way forward here?

@Micket
Copy link
Contributor

Micket commented Sep 7, 2021

Considering the recent foss/fosscuda merge changes (including renaming CUDAcore -> CUDA), we need to do similar things here;

  1. We use the system level CUDA package as a dependency. This allows us to use the UCX-CUDA plugin directly.
  2. Presumably, NVHPC shouldn't provide it's own CUDA on top then? Because it does right now by default.
  3. There won't be a second CUDA package, it's just NVHPC as the compiler level, and it already implies CUDA. There is no non-cuda variant like there used to be with GCC vs GCC-CUDA. I don't think we need to do anything special with modulepaths here. It's just NVHPC + OpenMPI to consider.

Copy link
Member

@jfgrimm jfgrimm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As decided in issue #16330, we have deprecated the use of True to signify a system-toolchain dependency (#16384), in favour of the more intuitive SYSTEM template constant. Due to the change in the test suite, please run eb --sync-pr-with-develop 13107 and update the PR to use SYSTEM instead.

@SebastianAchilles
Copy link
Member Author

Closing since we added a nvofbf/2022.07 toolchain in #16724

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants