Skip to content

Commit

Permalink
Document NFD for GPU Labeling
Browse files Browse the repository at this point in the history
Signed-off-by: Carlos Eduardo Arango Gutierrez <[email protected]>
  • Loading branch information
ArangoGutierrez committed Jan 27, 2024
1 parent 54ab2e8 commit b9cabb2
Showing 1 changed file with 28 additions and 17 deletions.
45 changes: 28 additions & 17 deletions content/en/docs/tasks/manage-gpus/scheduling-gpus.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,29 +64,40 @@ spec:
gpu-vendor.example/example-gpu: 1 # requesting 1 GPU
```
## Clusters containing different types of GPUs
## Automatically labeling nodes with Node Feature Discovery {#node-feature-discovery}
If different nodes in your cluster have different types of GPUs, then you
can use [Node Labels and Node Selectors](/docs/tasks/configure-pod-container/assign-pods-nodes/)
to schedule pods to appropriate nodes.
As and administrator, you can automatically discover and label all your GPU enabled nodes
by deploying the K8S-Sig project Node Feature Discovery [NFD](https://github.com/kubernetes-sigs/node-feature-discovery).
NFD enables node feature discovery for Kubernetes.
It detects hardware features available on each node in a Kubernetes cluster, and advertises those features using node labels and optionally node extended resources, annotations and node taints.
Node Feature Discovery is compatible with any recent version of Kubernetes (v1.21+).
For example:
Administrators can leverage NFD to also taint nodes with specific features, so that only pods that request those features can be scheduled on those nodes.
Once a cluster is labeled with the GPU feature, you can schedule pods on GPU nodes by adding the following to your pod spec:
```shell
# Label your nodes with the accelerator type they have.
kubectl label nodes node1 accelerator=example-gpu-x100
kubectl label nodes node2 accelerator=other-gpu-k915
```yaml
apiVersion: v1
kind: Pod
metadata:
name: example-vector-add
spec:
restartPolicy: Never
containers:
- name: example-vector-add
image: "registry.example/example-vector-add:v42"
resources:
limits:
gpu-vendor.example/example-gpu: 1 # requesting 1 GPU
nodeSelector:
gpu-vendor.example/example-gpu: "true"
```
That label key `accelerator` is just an example; you can use
a different label key if you prefer.
NFD exposes an API to allow vendors to leverage the automatic labeling functionality.
NVIDIA has implemented this API in the [GPU feature discovery](https://github.com/NVIDIA/gpu-feature-discovery/blob/main/README.md).
## Automatic node labelling {#node-labeller}
### Using custom labellers
If you're using AMD GPU devices, you can deploy
For AMD GPUs, you can use the
[Node Labeller](https://github.com/RadeonOpenCompute/k8s-device-plugin/tree/master/cmd/k8s-node-labeller).
Node Labeller is a {{< glossary_tooltip text="controller" term_id="controller" >}} that automatically
labels your nodes with GPU device properties.

Similar functionality for NVIDIA is provided by
[GPU feature discovery](https://github.com/NVIDIA/gpu-feature-discovery/blob/main/README.md).
labels nodes in a Kubernetes cluster with AMD GPU device properties.

0 comments on commit b9cabb2

Please sign in to comment.