Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UCX 1.15 upgrade #9824

Merged
merged 2 commits into from
Nov 22, 2023
Merged

UCX 1.15 upgrade #9824

merged 2 commits into from
Nov 22, 2023

Conversation

abellina
Copy link
Collaborator

@abellina abellina commented Nov 21, 2023

Closes #8137

This updates JUCX to 1.15, updating the example and CI dockerfiles.

The Dockerfile.multi is using aarch64 but the rest are using x86_64.

I will test with cuda12 locally.

Note: I am not able to get rocky8 to work with the packages we have in openucx with cuda12. There is a cuda11 + centos8 package, but not a cuda12 + centos8. I also don't see consistency with the architectures supported, so we would need to work with the UCX team to get those releases populated (perhaps UCX 1.16?)

For now, if a user wants one of these combinations in RockyOS, they would have to build from source. Ubuntu 20 and 22 have all combinations: cuda11 and cuda12, aarch64 and x86_64.

I had to make a table of the releases for 1.15 over at https://github.com/openucx/ucx/releases, here it is:

OS CUDA11+ARM CUDA12+ARM CUDA11+x86 CUDA12+x86
centos7     YES YES
centos8 YES   YES  
ubuntu16     YES  
ubuntu18 YES   YES YES
ubuntu20 YES YES YES YES
ubuntu22 YES YES YES YES

Update: I am able to install the centos7/cuda12 binary on rocky8, so that would be one approach while we get the support matrix figured out.

Signed-off-by: Alessandro Bellina <[email protected]>
@abellina
Copy link
Collaborator Author

build

@abellina
Copy link
Collaborator Author

build

@abellina abellina marked this pull request as ready for review November 22, 2023 15:07
@abellina abellina merged commit cd3f85f into NVIDIA:branch-23.12 Nov 22, 2023
37 checks passed
@abellina abellina deleted the ucx_1.15 branch November 22, 2023 17:17
@sameerz sameerz added the performance A performance related task/issue label Nov 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance A performance related task/issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] Upgrade to UCX 1.15
4 participants