Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improving detection of CUDA enabled MPI in EasyBuild #14517

Closed
Micket opened this issue Dec 8, 2021 · 3 comments
Closed

Improving detection of CUDA enabled MPI in EasyBuild #14517

Micket opened this issue Dec 8, 2021 · 3 comments

Comments

@Micket
Copy link
Contributor

Micket commented Dec 8, 2021

Summarizing a long discussion with Arkadiy Daydov on slack:

Starting with foss/2021a, we dropped fosscuda and use UCX-CUDA to enable CUDA support for existing UCX+OpenMPI installations.

This worked well for OSUMicroBenchmark, but other applications (PyTorch, LAMMPS) are trying to be clever and detect whether CUDA support is enabled.
Unfortuantely, there doesn't seem to be a definitive way to do this, and unfortunately, so for both these applications, it fails.
The root cause seems to stem from the fact that they both check if the OPAL backend (which as far as I understand, only means the BTL stuff in OpenMPI, i.e. the old "smcuda" thing) has CUDA support, and, if not. report false.
Since we aren't building smcuda, these correctly report false (but that's not what you really want).

OpenMPI provides a header file with

#define MPIX_CUDA_AWARE_SUPPORT 0
OMPI_DECLSPEC int MPIX_Query_cuda_support(void);

and, unfortunately, not much we can do about the define here, but the function also only checks OPAL, ignoring UCX.

This function predates UCX, and it is discussed what to do with it here:
open-mpi/ompi#7963
and they actually write that

When UCX is used: whether UCX has CUDA support

This issue was closed after the PR;
open-mpi/ompi#7970
but this function is sitll returning 0 if there isn't the old smcuda support, and this comment worries me

, it will be 0. because OMPI not compiled with CUDA. you might get limited support UCX/CUDA for pt2pt.

I'm not sure what these limits would be, but in my mind, we kind of always use UCX now, and the UCX PML excludes the possibility to use any BTL, so.. OPAL is dead now and ucx-cuda is the only thing that matters.
Patching LAMMPS (kokkos) to just forcibly enabling the "gpu-aware" code, seems to work fine, so so far every application I'm aware of (which isn't much) only needs or expects ucx-cuda.
So, either MPIX_Query_cuda_support is wrong, or it's not fine grained enough to give the applications the information it needs?

@ocaisa
Copy link
Member

ocaisa commented Dec 9, 2021

Just to throw another spanner in the works, what if you are not using UCX? There was a recent thread on the mailing list (link will only work properly once you have acknowledged that you are not a spammer) that said you wouldn't want to use UCX with Omnipath interconnect, what is the implication here then, no CUDA support possible (maybe @bartoldeman has some input here...)?

@Micket
Copy link
Contributor Author

Micket commented Jan 20, 2022

@ocaisa Sorry I missed that someone replied to this discussion;
Well, in these cases, and, perhaps necessary for us UCX-users as well as it might be the case that the UCX PML can't be used for all things (ugh... i really was hoping that UCX stuff would be cleaning these things up and make life simpler), I think we need to build the smcuda BTL plugin.

Fortunately, I think there is still hope we can do it in the same design like UCX-CUDA at runtime, using OMPI_MCA_mca_component_path to point to an external ./lib/openmpi/mca_btl_smcuda.so (based on @bartoldeman testing in #12484 )

@Micket
Copy link
Contributor Author

Micket commented Mar 31, 2024

We have full support for cuda everywhere as far as i know. nothing else to fix here.

@Micket Micket closed this as completed Mar 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants