The Open-CE build scripts and feedstocks support CUDA from NVIDIA. In order for the build scripts to work, CUDA must be installed on the system on which the builds are taking place. This can be accomplished in two ways (see below).
In all cases, the CUDA_HOME
environment variable must be set to the base
directory where CUDA has been installed.
If you are using Open-CE on a bare metal system. Install the appropriate version of CUDA on the server that matches the version of Open-CE you are using. There is a reference table below.
Open-CE version | CUDA version |
---|---|
1.0 (marmotini) | 10.2 |
It's important to install CUDA correctly. This procedure is dependent on the system type and CUDA version. It's advised to follow NVIDIA's official installation documentation Note that NVLINK2-enabled POWER9 systems require a few extra steps
It is recommended to run the Open-CE build scripts within a container. NVIDIA provides official CUDA base containers on dockerhub, which are ideal for ensuring a properly installed CUDA environment.
Open-CE provides a working set of images that will work for running CUDA enabled builds. These images use the official CUDA images referenced above as a base image. Dockerfiles for these images are located in the images directory To build these images manually, use the command below.
docker build open-ce/images/build-cuda-x86_64
The Open-CE build image presets the CUDA_HOME
environment variable to the appropriate location.
The open-ce build env
command supports the --docker_build
command line argument.
This argument will automatically build the Open-CE CUDA-enabled build image and when combined
with the --build_types=cuda
command line argument, it will build CUDA support into all of the
recipes included in the Open-CE environment file that are enabled to do so.
Both the open-ce build env
and open-ce build feedstock
scripts
support the --build_types=cuda
command line argument. This is required when CUDA support is desired in the build.
This argument sets a parameter in the build environment which can then be referenced in the
collected feedstocks and Open-CE environment files using a tag. For example,
a PyTorch Open-CE envionment file could use the tag as:
{% if build_type == 'cuda' %}
imported_envs:
- tensorrt-env.yaml
{% endif %}
packages:
- feedstock : pytorch
- feedstock : numactl
{% if build_type == 'cuda' %}
- feedstock : nccl
- feedstock : magma
- feedstock : cudnn
{% endif %}
- feedstock : torchtext
- feedstock : onnx
- feedstock : av
- feedstock : torchvision
The tag can also be used per line shown in this example for the xgboost meta.yaml
- cudatoolkit {{ cudatoolkit }} #[build_type == 'cuda']
- nccl {{ nccl }} #[build_type == 'cuda']
A system-level CUDA installation is not required at runtime. Once packages are built, CUDA is pulled at the appropriate level from Anaconda's Defaults channel. Systems with NVIDIA GPUs installed do still need the NVIDIA device driver installed. Drivers can be found on NVIDIA's Driver Downloads website.
The NVIDIA CUDA Deep Neural Network library (cuDNN) is a GPU-accelerated library of primitives for deep neural networks. Both PyTorch and TensorFlow make use of cuDNN and it is require for those frameworks CUDA support. Open-CE includes a build recipe for a cuDNN package and will automatically fetch the appropriate version for which the package being built.
NVIDIA TensorRT is an SDK for high-performance deep learning inference. It includes a deep learning inference
optimizer and runtime that delivers low latency and high-throughput for deep learning inference applications.
PyTorch, TensorFlow, and TensorFlow Serving include optional TensorRT support. Open-CE is unable to automatically
fetch TensorRT at build time. The appropriate version of TensorRT must be downloaded from NVIDIA's TensorRT website
and saved in a directory called local_files
adjacent to the open-ce
repository. The version of TensorRT
must match the version of Open-CE.
Open-CE version | TensorRT version | file type |
---|---|---|
1.0 (marmotini) | 7.0.0.1 | tar.gz |
Note that the Open-CE recipe for TensorRT also includes some open source samples and parsers. These are fetched automatically to match the version of TensorRT included in the version of Open-CE being used.
NCCL (pronounced "Nickel") is a stand-alone library of standard collective communication routines for NVIDIA GPUs, implementing all-reduce, all-gather, reduce, broadcast, and reduce-scatter. Pytorch and TensorFlow utilize NCCL when CUDA support is enabled. NCCL is now open source and fetched for build automatically.