-
Notifications
You must be signed in to change notification settings - Fork 285
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add automatic CUDA support for ELPA #2673
base: develop
Are you sure you want to change the base?
Conversation
@@ -52,6 +53,7 @@ def extra_options(): | |||
"""Custom easyconfig parameters for ELPA.""" | |||
extra_vars = { | |||
'auto_detect_cpu_features': [True, "Auto-detect available CPU features, and configure accordingly", CUSTOM], | |||
'cuda': [None, "Enable CUDA build if CUDA is among the dependencies", CUSTOM], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not set this to True
by default? That would simplify the logic below a bit to:
if cuda_is_dep and self.cfg['cuda']:
...
elif not self.cfg['cuda']:
...
else:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -159,6 +161,24 @@ def run_all_steps(self, *args, **kwargs): | |||
self.cfg.update('configopts', '--with-mpi=no') | |||
self.cfg.update('configopts', 'LIBS="$LIBLAPACK"') | |||
|
|||
# Add CUDA features | |||
cuda_is_dep = 'CUDA' in [i['name'] for i in self.cfg.dependencies()] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All this, including the "cuda" extra_var, can be simplified to
if get_software_root('CUDA'):
...
That way you can drop the elif statements and the cuda extra_var.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's actually not fully true...
If the CUDA
module was loaded manually, then $EBROOTCUDA
is also defined (and that's all that get_software_root
does: it grabs the corresponding $EBROOT*
environment variable).
So this is actually a better approach...
We should have proper support for this in framework though, so you can just do self.cfg.is_dep('CUDA')
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say that if you do that (load CUDA manually before building) then you're doing it wrong and almost all our easyconfigs are at risk.
Using just if get_software_root is what we do everywhere else so why not here too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For JSC, CUDA is a dep of MPI by default (and that MPI works regardless of whether or not there actually is a GPU through the use of UCX variables). So get_software_root('CUDA')
is not enough, you need to know if it is actually a dep to decide whether to trigger a GPU build.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ocaisa if CUDA is a dep of MPI regardless of whether or not there actually is a GPU, don’t you want to trigger a GPU build also regardless of whether or not there actually is a GPU?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@smoors No, because then that GPU-enabled build will (typically) not work on a system without a GPU (whereas OpenMPI will work just fine on that system as long as you set the appropriate btl). So at JSC, the GPU supported build is usually explicit
if cuda_is_dep and (self.cfg['cuda'] is None or self.cfg['cuda']): | ||
self.cfg.update('configopts', '--enable-nvidia-gpu') | ||
cuda_cc_space_sep = self.cfg.template_values['cuda_cc_space_sep'].replace('.', '').split() | ||
# Just one is supported, so pick the highest one (but prioritize sm_80) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Picking the highest one is likely a bad idea since then the code won't run on lower end GPU's in case there sites default settings lists all their models.
In this case it's best to raise an error if there are more than one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or pick first listed (or minimum rather than max) and print a clear warning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The LAMMPS easyblock selects the largest: https://github.com/easybuilders/easybuild-easyblocks/blob/develop/easybuild/easyblocks/l/lammps.py#L307
Having a method, with a clear rational on what is picked, would be good for cases where the software supports building for one, and not all of the, CUDA CC.
No description provided.