Skip to content

Commit

Permalink
cudaPackages: add docs
Browse files Browse the repository at this point in the history
  • Loading branch information
Connor Baker committed Dec 7, 2023
1 parent 8e800ce commit bfaefd0
Show file tree
Hide file tree
Showing 3 changed files with 97 additions and 9 deletions.
47 changes: 38 additions & 9 deletions doc/languages-frameworks/cuda.section.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,16 +68,45 @@ All new projects should use the CUDA redistributables available in [`cudaPackage
### Updating CUDA redistributables {#updating-cuda-redistributables}

1. Go to NVIDIA's index of CUDA redistributables: <https://developer.download.nvidia.com/compute/cuda/redist/>
2. Copy the `redistrib_*.json` corresponding to the release to `pkgs/development/compilers/cudatoolkit/redist/manifests`.
3. Generate the `redistrib_features_*.json` file by running:
2. Make a note of the new version of CUDA available.
3. Run

```bash
nix run github:ConnorBaker/cuda-redist-find-features -- <path to manifest>
```
```bash
nix run github:connorbaker/cuda-redist-find-features -- \
download-manifests \
--log-level DEBUG \
--version <newest CUDA version> \
https://developer.download.nvidia.com/compute/cuda/redist \
./pkgs/development/cuda-modules/cuda/manifests
```

This will download a copy of the manifest for the new version of CUDA.
4. Run

```bash
nix run github:connorbaker/cuda-redist-find-features -- \
process-manifests \
--log-level DEBUG \
--version <newest CUDA version> \
https://developer.download.nvidia.com/compute/cuda/redist \
./pkgs/development/cuda-modules/cuda/manifests
```

This will generate a `redistrib_features_<newest CUDA version>.json` file in the same directory as the manifest.
5. Update the `cudaVersionMap` attribute set in `pkgs/development/cuda-modules/cuda/extension.nix`.

### Updating cuTensor {#updating-cutensor}

1. Repeat the steps present in [Updating CUDA redistributables](#updating-cuda-redistributables) with the following changes:
- Use the index of cuTensor redistributables: <https://developer.download.nvidia.com/compute/cutensor/redist>
- Use the newest version of cuTensor available instead of the newest version of CUDA.
- Use `pkgs/development/cuda-modules/cutensor/manifests` instead of `pkgs/development/cuda-modules/cuda/manifests`.
- Skip the step of updating `cudaVersionMap` in `pkgs/development/cuda-modules/cuda/extension.nix`.

That command will generate the `redistrib_features_*.json` file in the same directory as the manifest.
### Updating supported compilers and GPUs {#updating-supported-compilers-and-gpus}

4. Include the path to the new manifest in `pkgs/development/compilers/cudatoolkit/redist/extension.nix`.
1. Update `nvcc-compatibilities.nix` in `pkgs/development/cuda-modules/` to include the newest release of NVCC, as well as any newly supported host compilers.
2. Update `gpus.nix` in `pkgs/development/cuda-modules/` to include any new GPUs supported by the new release of CUDA.

### Updating the CUDA Toolkit runfile installer {#updating-the-cuda-toolkit}

Expand All @@ -99,15 +128,15 @@ All new projects should use the CUDA redistributables available in [`cudaPackage
nix store prefetch-file --hash-type sha256 <link>
```

4. Update `pkgs/development/compilers/cudatoolkit/versions.toml` to include the release.
4. Update `pkgs/development/cuda-modules/cudatoolkit/releases.nix` to include the release.

### Updating the CUDA package set {#updating-the-cuda-package-set}

1. Include a new `cudaPackages_<major>_<minor>` package set in `pkgs/top-level/all-packages.nix`.

- NOTE: Changing the default CUDA package set should occur in a separate PR, allowing time for additional testing.

2. Successfully build the closure of the new package set, updating `pkgs/development/compilers/cudatoolkit/redist/overrides.nix` as needed. Below are some common failures:
2. Successfully build the closure of the new package set, updating `pkgs/development/cuda-modules/cuda/overrides.nix` as needed. Below are some common failures:

| Unable to ... | During ... | Reason | Solution | Note |
| --- | --- | --- | --- | --- |
Expand Down
32 changes: 32 additions & 0 deletions pkgs/development/cuda-modules/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
# cuda-modules

> [!NOTE]
> This document is meant to help CUDA maintainers understand the structure of the CUDA packages in Nixpkgs. It is not meant to be a user-facing document.
> For a user-facing document, see [the CUDA section of the manual](../../../doc/languages-frameworks/cuda.section.md).
The files in this directory are added (in some way) to the `cudaPackages` package set by [cuda-packages.nix](../../top-level/cuda-packages.nix).

## Top-level files

Top-level nix files are included in the initial creation of the `cudaPackages` scope. These are typically required for the creation of the finalized `cudaPackages` scope:

- `backend-stdenv.nix`: Standard environment for CUDA packages.
- `flags.nix`: Flags set, or consumed by, NVCC in order to build packages.
- `gpus.nix`: A list of supported NVIDIA GPUs.
- `nvcc-compatibilities.nix`: NVCC releases and the version range of GCC/Clang they support.

## Top-level directories

- `cuda`: CUDA redistributables! Provides extension to `cudaPackages` scope.
- `cudatoolkit`: monolothic CUDA Toolkit run-file installer. Provides extension to `cudaPackages` scope.
- `cudnn`: NVIDIA cuDNN library.
- `cutensor`: NVIDIA cuTENSOR library.
- `generic-builders`:
- Contains a builder `manifest.nix` which operates on the `Manifest` type defined in `modules/generic/manifests`. Most packages are built using this builder.
- Contains a builder `multiplex.nix` which leverages the Manifest builder. In short, the Multiplex builder adds multiple versions of a single package to single instance of the CUDA Packages package set. It is used primarily for packages like `cudnn` and `cutensor`.
- `modules`: Nixpkgs modules to check the shape and content of CUDA redistributable and feature manifests. These modules additionally use shims provided by some CUDA packages to allow them to re-use the `genericManifestBuilder`, even if they don't have manifest files of their own. `cudnn` and `tensorrt` are examples of packages which provide such shims. These modules are further described in the [Modules](./modules/README.md) documentation.
- `nccl`: NVIDIA NCCL library.
- `nccl-tests`: NVIDIA NCCL tests.
- `saxpy`: Example CMake project that uses CUDA.
- `setup-hooks`: Nixpkgs setup hooks for CUDA.
- `tensorrt`: NVIDIA TensorRT library.
27 changes: 27 additions & 0 deletions pkgs/development/cuda-modules/modules/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# Modules

Modules as they are used in `modules` exist primarily to check the shape and content of CUDA redistributable and feature manifests. They are ultimately meant to reduce the repetitive nature of repackaging CUDA redistributables.

Building most redistributables follows a pattern of a manifest indicating which packages are available at a location, their versions, and their hashes. To avoid creating builders for each and every derivation, modules serve as a way for us to use a single `genericManifestBuilder` to build all redistributables.

## `generic`

The modules in `generic` are reusable components meant to check the shape and content of NVIDIA's CUDA redistributable manifests, our feature manifests (which are derived from NVIDIA's manifests), or hand-crafted Nix expressions describing available packages. They are used by the `genericManifestBuilder` to build CUDA redistributables.

Generally, each package which relies on manifests or Nix release expressions will create an alias to the relevant generic module. For example, the [module for CUDNN](./cudnn/default.nix) aliases the generic module for release expressions, while the [module for CUDA redistributables](./cuda/default.nix) aliases the generic module for manifests.

Alternatively, additional fields or values may need to be configured to account for the particulars of a package. For example, while the release expressions for [CUDNN](./cudnn/releases.nix) and [TensorRT](./tensorrt/releases.nix) are very close, they differ slightly in the fields they have. The [module for CUDNN](./modules/cudnn/default.nix) is able to use the generic module for release expressions, while the [module for TensorRT](./modules/tensorrt/default.nix) must add additional fields to the generic module.

### `manifests`

The modules in `generic/manifests` define the structure of NVIDIA's CUDA redistributable manifests and our feature manifests.

NVIDIA's redistributable manifests are retrieved from their web server, while the feature manifests are produced by [`cuda-redist-find-features`](https://github.com/connorbaker/cuda-redist-find-features).

### `releases`

The modules in `generic/releases` define the structure of our hand-crafted Nix expressions containing information necessary to download and repackage CUDA redistributables. These expressions are created when NVIDIA-provided manifests are unavailable or otherwise unusable. For example, though CUDNN has manifests, a bug in NVIDIA's CI/CD causes manifests for different versions of CUDA to use the same name, which leads to the manifests overwriting each other.

### `types`

The modules in `generic/types` define reusable types used in both `generic/manifests` and `generic/releases`.

0 comments on commit bfaefd0

Please sign in to comment.