Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU Support? #63

Closed
alexbw opened this issue Mar 27, 2016 · 78 comments · Fixed by #2038
Closed

GPU Support? #63

alexbw opened this issue Mar 27, 2016 · 78 comments · Fixed by #2038

Comments

@alexbw
Copy link

alexbw commented Mar 27, 2016

I've got a couple packages I'm preparing for upload that rely on GPUs. I'm not up to speed on what open-source CI solutions offer, but would building against a VM w/ GPUs be supported? If it would require pitching in or donating to the project, I'm pretty sure I can figure some way to help.

@pelson
Copy link
Member

pelson commented Mar 27, 2016

I'm completely out of my depth on this one. @msarahan any knowledge on the subject?

@jakirkham
Copy link
Member

I'm also interested in this. The trick is most CIs do not provide GPUs. However, if you are willing to work with OpenCL, which works with CPUs and GPUs, then we can work together on getting that support on CIs.

@alexbw
Copy link
Author

alexbw commented Mar 27, 2016

Unfortunately, CUDA really is the de-facto standard for machine learning. I
think NVidia is generally interested in helping out the open-source
community, so it might be worth starting a conversation with them about
helping out. I'll test out conda-forge with non-GPU packages, and if things
seem to work smoothly, then can start talking with them.

Question -- is conda and conda-build updated regularly on this system? My
packages are in Lua, and support for them was only added recently.

On Sun, Mar 27, 2016 at 6:56 PM jakirkham [email protected] wrote:

I'm also interested in this. The trick is most CIs do not provide GPUs.
However, if you are willing to work with OpenCL, which works with CPUs and
GPUs, then we can work together on getting that support on CIs.


You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
#63 (comment)

@jakirkham
Copy link
Member

I thought you might say that. Unfortunately, without the CI support, we are kind of in a bind on this one. If NVIDIA is willing to work on CI support with GPUs, that would be great.

To be completely honest with you, I don't think we should be holding our breath on this. The real problem is that most CI services are leasing time on other infrastructure. Primarily Google and Amazon's infrastructure. Unless someone has the infrastructure with GPUs that they are willing to lease to some CI service for this purpose, we are kind of stuck. I think we can all imagine what they would prefer to do with this infrastructure, right? However, if you figure out something on this, please let us know and we can work on something.

I'm guessing you are using Torch then? At a bare minimum, let's work on getting Torch's dependencies in here. At least, that will make your job a little easier, right? For that matter any run-of-the-mill Lua packages that you have would be good to try to get in, as well. It should help you and others looking for more Lua support in conda. How does that sound?

@jakirkham
Copy link
Member

This repo seems to use NVIDA components for their CI.

@jakirkham
Copy link
Member

Did you see the link above, @alexbw?

This might not be totally impossible after all, but I think we should do some research on how this works. What platforms are you trying to support? Just Linux? Mac also? I'm totally out of my depth on Windows. So we may need someone else to show us the ropes there.

@alexbw
Copy link
Author

alexbw commented Mar 30, 2016

Saw the link. Looking more into this, but on the timescale of a few weeks.

On Sun, Mar 27, 2016 at 6:56 PM jakirkham [email protected] wrote:

I'm also interested in this. The trick is most CIs do not provide GPUs.
However, if you are willing to work with OpenCL, which works with CPUs and
GPUs, then we can work together on getting that support on CIs.


You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
#63 (comment)

@jakirkham
Copy link
Member

So, I looked a little bit more closely at this and it looks like one could add GPU libs to CentOS 6 (what we are currently using). There is Mac and Windows support too, but IMHO this is secondary to getting Linux up and working. However, I am not seeing any support for CentOS 5 (a platform we were debating switching too), which is something to keep in mind.

@msarahan
Copy link
Member

msarahan commented Apr 5, 2016

Good to know. We are collecting datapoints on whether continuing with CentOS 5 is a good idea. If anyone knows of definitive reasons to stay with CentOS5, it is currently preventing:

  • Qt5
  • complete LLVM
  • now, GPU libs

@jakirkham
Copy link
Member

Glad you saw this, @msarahan. Was debating cross referencing, but didn't want to have a mess of links. Are there still many customers using a CentOS 5 equivalent Linux? Could we maybe build the compiler on CentOS 5 and somehow add it to CentOS 6?

@msarahan
Copy link
Member

msarahan commented Apr 5, 2016

Building the compiler on the older architecture doesn't help. What matters is the GLibc version present on the build system when packages are built.

We don't have hard data on customers, outside of HTTP headers for downloads of packages. We're digging through that to see how many people have kernels older than the one corresponding to CentOS 6.

@jakirkham
Copy link
Member

Right. I was just hoping there was a way we could somehow have both. I guess in the worst case some things could be CentOS 6 as needed. Will that have any consequences if we mix the two? Is that already done to some extent (noting that Qt5 was mentioned).

Yeah, it seems like it would be good to give a survey. Might need an incentive to make sure it actually gets filled out.

@jakirkham
Copy link
Member

Also, interesting footnote (though I would appreciate if other people check and make sure I am reading this right as IANAL), it appears that at least some of the CUDA libraries can be shipped. This means we could create a CUDA package that simply makes sure CUDA is installed in the CI and moves them into the conda build prefix for packaging. The resulting package could then be added as a dependency of anything that requires them (e.g. Torch, Caffe, etc.). This would avoid us having to add these hacks in multiple places and risk losing them when we re-render. Furthermore, we would be able to guarantee that the libraries we used to build would be installed on the user's system.

@jakirkham
Copy link
Member

We should verify whether one version of CUDA for one Linux distro and version can be used on other Linux distros and versions easily or if we need to have multiple flavors. This survey will have to be extended to other OSes at some point, but starting with Linux makes the most sense to me.

@jakirkham
Copy link
Member

So, I am trying something out in this PR ( conda-forge/caffe-feedstock#1 ). In it, I am installing CUDA libraries into the docker container and attempting to have Caffe build against them. It is possible some tests will fail if we don't have access to an NVIDA GPU so we will have to play with that. Also, we don't have cuDNN as that appears to require some registration process that I have not looked into yet and may be a pain to download in batch mode.

In the long run, I expect the CUDA libraries will be wrapped in their own package for installation and packages needing them will simply install these libraries. We may need to features to differentiate different GPU variants (CUDA/OpenCL). However, that CUDA package will probably need to hack the CI script in a similar way.

Definitely am interested in feedback. So, please feel free to share.

@jakirkham
Copy link
Member

Another thought might be that we don't ship the CUDA libraries. Instead we have a package that merely checks to set that they are installed via a pre or post-link step. If it fails to find them, the install fails. This would avoid figuring out where the libraries can or cannot be distributed safely. Hopefully, as we are linking against the CUDA API. All that will matter is that we have an acceptable version of the CUDA libraries to run regardless of what Linux distribution it was initially built on.

@jakirkham
Copy link
Member

Appears Circle CI does provide GPU support or at least that is what my testing suggests.

@jakirkham
Copy link
Member

Also, as FYI, in case you didn't already know @msarahan, CentOS 5 maintenance support ends March 2017. In other words, less than a year. That sounds like a pretty big negative to me. Given how many recipes have been added from conda-recipes and how many remain to be added at this point, trying to add a CentOS 5 switch before that point sounds challenging. Not to mention, we may find ourselves needing to migrate back to CentOS 6 by that point. Maybe it is just me, but I'm starting to feel a lot of friction in switching to CentOS 5. Is it reasonable that we consider just accepting CentOS 6 as part of this transition?

@kyleabeauchamp
Copy link

FWIW, we have GPU support on Omnia. Might be worth reading over.

https://github.com/omnia-md/conda-recipes

https://github.com/omnia-md/omnia-build-box

@jakirkham
Copy link
Member

Thanks for these links @kyleabeauchamp. I'll certainly try to brush up on this.

Do you have any thoughts on this PR ( conda-forge/caffe-feedstock#1 )? Also, how do you guys handle the GPU lib dependency? Is that packaged somehow, used from the system (possibly with some sort of check), some other way?

@kyleabeauchamp
Copy link

So AFAIK our main use of GPUs was building a simulation engine OpenMM (openmm.org). OpenMM is a C++ library and can dynamically detect the presence of CUDA support (via shared libraries) at runtime. This means that we did not package and ship anything related to CUDA. We basically just needed CUDA on the build box to build the conda package, then let OpenMM handle things later dynamically.

@kyleabeauchamp
Copy link

Looks like our dockerfile is somewhat similar to your CUDA handling:

https://github.com/omnia-md/omnia-build-box/blob/master/Dockerfile#L25

@jakirkham
Copy link
Member

Ah, ok, thanks for clarifying.

The trick with Caffe, in particular, is that it can use CPU, CUDA, or OpenCL. CPU support is always present; however, a BLAS is required, which includes a user's CPU choice (OpenBLAS, ATLAS, MKL, or possibly some hack to add other options) and a GPU choice (if any) cuBLAS or ViennaCL. Thus, having this dynamically determined ends up not really being as nice as it could be. To allow actual selection will require feature support and possibly multiple rebuilds of Caffe.

One simple route might be to just always use ViennaCL, which can abstract the difference between the OpenCL and CUDA options. Also, it can always fallback to the CPU if no GPU support is present. Though I expect this layer of abstraction comes at some penalty, the question is how severe is that penalty. Would a solution like this work with OpenMM? I don't know if its GPU support proceeds primarily through a GPU BLAS or some other mechanisms. For instance, is it using FFTs?

If you have deep learning interests, this may be relevant. In the case of Caffe, it can optionally support cuDNN. Researchers will want this support out of the box. Not only is this tricky because it may not be provided for due to hardware or software reasons, it is tricky because downloading cuDNN requires a registration step with unclear licensing restrictions. One way we might resolve this is to request cuDNN be loaded on an appropriate Docker container. NVIDA does do this with Ubuntu 14. However, I don't see a similar container for CentOS 6 and am unclear on whether it would be a supported platform. Ultimately, it will require us to communicate with NVIDA at some point to see what we need to do here to stay above board while providing users state of the art support.

Fortunately, NVIDA is very clear as to what parts of the CUDA libraries can be distributed down to file level of what can and cannot be distributed. So, the concerns with cuDNN do not affect this.

@jakirkham
Copy link
Member

Another thought for more versatile support would be to use the clMath libraries.

@kyleabeauchamp
Copy link

OpenMM dynamically chooses the best platform at runtime, with options including CPU (SSE), CUDA, OpenCL, and CPU (no SSE / reference). It does use FFTs. The idea with OpenMM is to build and ship binaries that support all possible platforms, then select at runtime.

@hmaarrfk
Copy link
Contributor

hmaarrfk commented Mar 22, 2019

@scopatz did the legal council state what clause made the cuda toolkit non-redistributable?

Are we allowed to link to cuda stuff so long as we pull in the dependency from defaults?

Ref: https://docs.nvidia.com/cuda/eula/index.html#attachment-a
xref: https://github.com/conda-forge/pytorch-cpu-feedstock

@scopatz
Copy link
Member

scopatz commented Mar 22, 2019

Basically, counsel has said that the EULA - even attachment A - only refers to applications (it does), and conda-forge cannot be considered an application under any reasonable definition.

Also, it looks like they have added the following to attachment A, which we don't meet:

The NVIDIA CUDA Driver Libraries are only distributable in applications that meet this criteria:

1. The application was developed starting from a NVIDIA CUDA container obtained from Docker Hub or the NVIDIA GPU Cloud, and
2. The resulting application is packaged as a Docker container and distributed to users on Docker Hub or the NVIDIA GPU Cloud only.

@hmaarrfk
Copy link
Contributor

I guess we can't distribute the Cuda driver. Many people are OK with installing that themselves :/.

Second question would be:
Is linking to the libraries on defaults OK by conda-forge standards?

@scopatz
Copy link
Member

scopatz commented Mar 22, 2019

Linking to the deafults libraries is fine for us

@hadim
Copy link
Member

hadim commented Mar 22, 2019

Then we can say goodbye to reproducible workflow involving CUDA. This is really a non sense we can't ship it in conda-forge.

I understand the legal aspect of it. What I don't understand is why they took that decision...(unless it's a misunderstanding and we can actually ship it conda?)

@scopatz
Copy link
Member

scopatz commented Mar 22, 2019

Then we can say goodbye to reproducible workflow involving CUDA. This is really a non sense we can't ship it in conda-forge.

I understand the legal aspect of it. What I don't understand is why they took that decision.

I agree on all counts.

(unless it's a misunderstanding and we can actually ship it conda?)

From speaking with NVIDIA, I am reasonably sure that this is not a misunderstanding on their part. The folks who work on ML and cuda and PyData stuff at NVIDIA are different than those in their legal department. I don't know the justification for the license as it stands, but we have to live with it until:

  1. they change the EULA
  2. they give us a written exception, which I have asked for but they haven't responded yes or no

@hadim
Copy link
Member

hadim commented Mar 22, 2019

Thank you @scopatz for working on all those things.

In the meantime for those who want to have reproducible CUDA installation (only Linux unfortunately), see the following gist: https://gist.github.com/hadim/a4fe638587ef0d7baea01e005523e425

@ericpre
Copy link
Member

ericpre commented Apr 28, 2020

Now that building package with CUDA is working fine on linux (in case of https://github.com/conda-forge/prismatic_split-feedstock, it went smoothly), how would it be possible to add windows support? Is it worth trying to add a cudatoolkit-dev package for windows in order to build GPU packages on windows?

@isuruf
Copy link
Member

isuruf commented Apr 28, 2020

PRs welcome. We could use cudatoolkit-dev in Linux too to avoid using /usr/include because NVIDIA images have been polluting the includes into that folder.

@ericpre
Copy link
Member

ericpre commented Apr 28, 2020

Ok, I will give a try.

@jakirkham
Copy link
Member

@ericpre, for an example of how to build GPU packages would take a look at nvidia-apex as a simple example. Sorry there are no docs atm.

@isuruf
Copy link
Member

isuruf commented Apr 28, 2020

@jakirkham, does nvidia-apex-feedstock build windows cuda packages?

@znmeb
Copy link

znmeb commented Oct 18, 2020

I've got an NVIDIA Jetson AGX Xavier (8-core Tegra aarch64 CPU plus a 512-core Volta GPU) that I can test on. I just discovered Miniforge a couple of days ago - it appears to be working out of the box on the Tegra CPU.

I also have a laptop with a GTX 1050 Ti GPU / x86_64 CPU but there's plenty of other ways to get at the GPU there.

@tyler274
Copy link

Why isn't OpenCL included in OpenCV already. CI support and license issues aren't an issue so why is it blocked on Cuda?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging a pull request may close this issue.