Skip to content

Commit

Permalink
Document NFD for GPU Labeling
Browse files Browse the repository at this point in the history
Signed-off-by: Carlos Eduardo Arango Gutierrez <[email protected]>
  • Loading branch information
ArangoGutierrez committed Jan 27, 2024
1 parent 54ab2e8 commit 4032470
Showing 1 changed file with 35 additions and 7 deletions.
42 changes: 35 additions & 7 deletions content/en/docs/tasks/manage-gpus/scheduling-gpus.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@ spec:
gpu-vendor.example/example-gpu: 1 # requesting 1 GPU
```
## Clusters containing different types of GPUs
## Manage clusters with different types of GPUs
If different nodes in your cluster have different types of GPUs, then you
can use [Node Labels and Node Selectors](/docs/tasks/configure-pod-container/assign-pods-nodes/)
Expand All @@ -81,12 +81,40 @@ kubectl label nodes node2 accelerator=other-gpu-k915
That label key `accelerator` is just an example; you can use
a different label key if you prefer.

## Automatic node labelling {#node-labeller}
### Automatically labeling nodes with Node Feature Discovery {#node-feature-discovery}

If you're using AMD GPU devices, you can deploy
As an administrator, you can automatically discover and label all your GPU enabled nodes
by deploying the K8S-Sig project [Node Feature Discovery](https://github.com/kubernetes-sigs/node-feature-discovery) (NFD).
NFD detects the hardware features that are available on each node in a Kubernetes cluster and advertises those features.
Typically, NFD adds node labels to advertise the features, but NFD can also add extended resources, annotations, and node taints.
NFD is compatible with any recent version of Kubernetes (v1.21+).

Administrators can leverage NFD to also taint nodes with specific features, so that only pods that request those features can be scheduled on those nodes.
After a cluster is labeled with the GPU feature, you can schedule pods on GPU nodes by adding the following to your pod spec:

```yaml
apiVersion: v1
kind: Pod
metadata:
name: example-vector-add
spec:
restartPolicy: Never
containers:
- name: example-vector-add
image: "registry.example/example-vector-add:v42"
resources:
limits:
gpu-vendor.example/example-gpu: 1 # requesting 1 GPU
nodeSelector:
gpu-vendor.example/example-gpu: "true"
```
NFD exposes an API which allows vendors to leverage the automatic labeling functionality.
NVIDIA has implemented this API in the [GPU feature discovery](https://github.com/NVIDIA/gpu-feature-discovery/blob/main/README.md).
### Using custom labellers
For AMD GPUs, you can use the
[Node Labeller](https://github.com/RadeonOpenCompute/k8s-device-plugin/tree/master/cmd/k8s-node-labeller).
Node Labeller is a {{< glossary_tooltip text="controller" term_id="controller" >}} that automatically
labels your nodes with GPU device properties.

Similar functionality for NVIDIA is provided by
[GPU feature discovery](https://github.com/NVIDIA/gpu-feature-discovery/blob/main/README.md).
labels nodes in a Kubernetes cluster with AMD GPU device properties.

0 comments on commit 4032470

Please sign in to comment.