Optional cudatoolkit dependency #14

jaimergp · 2019-10-04T08:50:37Z

Hello! First, thanks for all the effort made towards having GPU-enabled builds in the conda-forge ecosystem. We are very excited about being able to provide our packages here now!

Currently, we are building packages for several CUDA versions, using labels for each one. We expect the users to select a label that matches their CUDA installation this way.

Moving to conda-forge will mean that users won't need to worry about having a CUDA installation to begin with or selecting the appropriate version because cudatoolkit is listed as a dependency. This is nice and a great step forward in usability.

However, there might be some users that would like to stick to the old behavior: "just give me the package and I will handle CUDA". This might be the case in, for example, HPC sysadmins that would like to manage a single system-wide CUDA installation because that's what works best for them. Some people might not want to download several hundreds of MBs if they already have CUDA in their systems, too.

My question is... how can we provide a GPU-enabled package (this is, we would still need nvcc and cudatoolkit at build time) that does not list cudatoolkit as a runtime dependency. Is there any way to override the run_exports.strong configuration?

Pinging @jchodera and @peastman so they can follow this as well.

The text was updated successfully, but these errors were encountered:

jakirkham · 2019-10-04T09:31:18Z

So one option would be to add cudatoolkit to the ignore_run_exports in the build section of your recipe. This would remove the dependency from the run requirements. There probably needs to be some selector or Jinja logic to restrict this to one build.

Though there are some interesting questions that come out of this. First what CUDA version should that package use? Second how does one ensure it picks up the CUDA libraries once installed? Third how do you warn/error when a CUDA version mismatch between the package and the system occurs? Fourth as a community, we have decided to require cudatoolkit to handle these and other problems; so, where should packages that don’t follow this standard be published?

It’s also worth replacing the word cudatoolkit with compiler runtime libraries (like those of gcc) to compare this to how we handle these problems in other use cases.

Hope that helps 🙂

peastman · 2019-10-04T18:03:22Z

Let me describe how our current build system works, and how we handle these issues, and the potential problems that might arise with the approach currently used in conda-forge.

Right now we create builds for many different versions of CUDA: 7.5, 8.0, 9.0, 9.1, 9.2, 10.0, and 10.1. Users are expected to have installed the toolkit and driver already, and made sure they're in the library path for runtime linking. On HPC clusters the administrator will have taken care of this, so users just execute module load cuda-9.2 or something similar and everything gets set up automatically. On people's own machines they'll usually add the toolkit location to LD_LIBRARY_PATH (Linux) or PATH (Windows).

Users can select which build they want using a label. For example, conda install -c omnia/label/cuda92 openmm gives them the CUDA 9.2 build. If they don't include a label, we pick one that will be the default. Choosing that default involves tradeoffs. We want it to be recent (newer toolkits support more GPUs and usually have better performance) but not too new, or a lot of people will have older drivers that don't support it. Cluster administrators are sometimes very conservative about upgrading any system software, since they don't want to risk breaking things. Our current release defaults to CUDA 10.1, but people can easily override that if they need something older.

Now let's consider the conda-forge approach. For this we just pick one CUDA version, build against it, and have the toolkit installed automatically. For end user computers that's very convenient, since they don't have to worry about downloading a toolkit and setting up paths. For cluster users it's less clearly a benefit. It just saves them putting one extra module load command in their script. Then it forces them to download an extra 800 MB, and it might give them a toolkit that isn't compatible with the driver (if we chose too new a version) or GPU (if we chose too old a version) on the cluster.

One other complication. We have multiple computational backends, including CUDA, OpenCL, and CPU. All versions are normally included in all packages. It figures out at runtime which ones are actually available based on the installed software and hardware. So we certainly don't want to make people download a large CUDA toolkit if they have an AMD GPU. But I'm also nervous about creating a package that doesn't include the CUDA libraries, since it gives people an extra way to make a mistake and get a package that has inferior support for their hardware.

So here are the goal's we want to achieve.

Have multiple builds for different CUDA versions, and allow users to choose which one to install.
Save them an unnecessary large download.

The first one is essential. The second is a "nice to have".

jakirkham · 2019-10-04T21:07:33Z

@peastman, I think some things got lost here. The way conda-forge proposes to do the builds does support multiple CUDA versions. So 1 is already handled. 2 is not.

peastman · 2019-10-07T17:29:30Z

How does the user specify which version they want?

jakirkham · 2019-10-07T17:34:53Z

How do you mean? In a recipe performing the build? Or when installing the packages?

jakirkham · 2019-10-07T17:39:58Z

If you mean at the recipe level, @jaimergp's PR ( conda-forge/openmm-feedstock#1 ) should give you multiple builds against different CUDA versions. Alternatively if you mean installing the packages, the user should run conda install cudatoolkit=<some CUDA version>.

peastman · 2019-10-07T18:12:26Z

So if the user manually installs a particular toolkit version, then installs OpenMM, it will get the build for that CUDA version? It won't just download a new toolkit and then install the default OpenMM build?

jakirkham · 2019-10-07T18:19:39Z

Right. As the cudatoolkit package (with specified version) will be part of the environment spec. So any later install commands will use this environment spec to constrain Conda's solve.

jakirkham · 2019-10-10T16:55:06Z

Did you have any more questions about this @peastman?

jaimergp mentioned this issue Oct 4, 2019

Add recipe for OpenMM conda-forge/staged-recipes#9102

Merged

6 tasks

kkraus14 closed this as completed Oct 15, 2019

weiji14 mentioned this issue Oct 18, 2022

Document ML-image tag/GPU type/CUDA compatibility table pangeo-data/pangeo-docker-images#390

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optional cudatoolkit dependency #14

Optional cudatoolkit dependency #14

jaimergp commented Oct 4, 2019

jakirkham commented Oct 4, 2019

peastman commented Oct 4, 2019

jakirkham commented Oct 4, 2019

peastman commented Oct 7, 2019

jakirkham commented Oct 7, 2019

jakirkham commented Oct 7, 2019

peastman commented Oct 7, 2019

jakirkham commented Oct 7, 2019

jakirkham commented Oct 10, 2019

Optional cudatoolkit dependency #14

Optional cudatoolkit dependency #14

Comments

jaimergp commented Oct 4, 2019

jakirkham commented Oct 4, 2019

peastman commented Oct 4, 2019

jakirkham commented Oct 4, 2019

peastman commented Oct 7, 2019

jakirkham commented Oct 7, 2019

jakirkham commented Oct 7, 2019

peastman commented Oct 7, 2019

jakirkham commented Oct 7, 2019

jakirkham commented Oct 10, 2019