Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA support for default Open-MPI implementation #1162

Closed
tbrandvik opened this issue Jun 27, 2019 · 9 comments
Closed

CUDA support for default Open-MPI implementation #1162

tbrandvik opened this issue Jun 27, 2019 · 9 comments

Comments

@tbrandvik
Copy link

Environment:

  • AWS ParallelCluster / CfnCluster version aws-parallelcluster-2.4.0
  • OS: centos7
  • Scheduler: slurm
  • Master instance type: c5n.large
  • Compute instance type: p3.2xlarge

Bug description and how to reproduce:
The default Open-MPI (/opt/amazon/efa) is not compiled with CUDA support.

Steps to reproduce:

  1. Run ompi_info --parsable --all | grep mpi_built_with_cuda_support:value on a compute node with a GPU

The output is as follows, indicating that CUDA support is disabled

mca:mpi:base:param:mpi_built_with_cuda_support:value:false

Is there a particular reason why CUDA support is not enabled or would it be possible to compile the default Open-MPI version to support CUDA in future releases?

@sean-smith
Copy link
Contributor

Thanks for pointing this out, we install openmpi from the efa installers, which in turn build the rpm's. We can add cuda support in a future release, until then you'll need to compile from source:

See https://www.open-mpi.org/faq/?category=buildcuda

@tbrandvik
Copy link
Author

Perfect - thanks for the quick response!

@tbrandvik
Copy link
Author

I just wanted to check if CUDA support is enabled in the recent 2.4.1 release?

@enrico-usai
Copy link
Contributor

Hi @tbrandvik
you can see the details of each release in the Release Notes.

Unfortunately the CUDA support has not been added yet.

@tbrandvik
Copy link
Author

Hi @enrico-usai,

I wanted to check if there are any plans to add this in upcoming releases?

Thanks,
Tobias

@oplatek
Copy link

oplatek commented Mar 8, 2021

Hi, is there any update?

@Srijan1214
Copy link

Hi,
I am kind of new with AWS and was hoping if I could get some help with compiling openmpi with Cuda support from source. As far as I know, I can only build into the head node using AWS batch. How can I build this for all of the compute instances?

@wzamazon
Copy link

As far as I know, I can only build into the head node using AWS batch. How can I build this for all of the compute instances?

The compute instance and the head node have shared disk (such as /home). So if you compile open mpi on head node under your home directory, the compute node should have access to that.

@enrico-usai
Copy link
Contributor

Resolving this since as part of 3.9.0 release we included EFA installer 1.30.0 that contains both openmpi40-aws-4.1.6-2 and openmpi50-aws-5.0.0-11.

Open MPI 4 from EFA installer does not support CUDA-awareness. Open MPI 5 does.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants