Skip to content

Commit

Permalink
Document NFD for GPU Labeling
Browse files Browse the repository at this point in the history
Signed-off-by: Carlos Eduardo Arango Gutierrez <[email protected]>
  • Loading branch information
ArangoGutierrez committed Jan 30, 2024
1 parent 54ab2e8 commit c8d9eed
Showing 1 changed file with 42 additions and 7 deletions.
49 changes: 42 additions & 7 deletions content/en/docs/tasks/manage-gpus/scheduling-gpus.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@ spec:
gpu-vendor.example/example-gpu: 1 # requesting 1 GPU
```
## Clusters containing different types of GPUs
## Manage clusters with different types of GPUs
If different nodes in your cluster have different types of GPUs, then you
can use [Node Labels and Node Selectors](/docs/tasks/configure-pod-container/assign-pods-nodes/)
Expand All @@ -83,10 +83,45 @@ a different label key if you prefer.

## Automatic node labelling {#node-labeller}

If you're using AMD GPU devices, you can deploy
[Node Labeller](https://github.com/RadeonOpenCompute/k8s-device-plugin/tree/master/cmd/k8s-node-labeller).
Node Labeller is a {{< glossary_tooltip text="controller" term_id="controller" >}} that automatically
labels your nodes with GPU device properties.
As an administrator, you can automatically discover and label all your GPU enabled nodes
by deploying Kubernetes [Node Feature Discovery](https://github.com/kubernetes-sigs/node-feature-discovery) (NFD).
NFD detects the hardware features that are available on each node in a Kubernetes cluster.
Typically, NFD is configured to advertise those features as node labels, but NFD can also add extended resources, annotations, and node taints.
NFD is compatible with all [supported versions](/releases/version-skew-policy/#supported-versions) of Kubernetes.

Similar functionality for NVIDIA is provided by
[GPU feature discovery](https://github.com/NVIDIA/gpu-feature-discovery/blob/main/README.md).
Administrators can leverage NFD to also taint nodes with specific features, so that only pods that request those features can be scheduled on those nodes.
By default NFD create the [feature labels](https://kubernetes-sigs.github.io/node-feature-discovery/master/usage/features.html) for the detected features.
Additionally, NFD exposes an API that allows vendors to write plugins to advertise features of their hardware.
After a cluster is labeled with NFD and a vendor plugin is deployed, pods can be scheduled on nodes with GPUs by adding the following to your pod spec:

{{< highlight yaml "linenos=false,hl_lines=6-20" >}}
apiVersion: v1
kind: Pod
metadata:
name: example-vector-add
spec:
# You can use Kubernetes node affinity to schedule this Pod onto a node
# that provides the kind of GPU that its container needs in order to work
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: "gpu.gpu-vendor.example/installed-memory"
operator: Gt # (greater than)
values: ["40535"]
- key: "feature.node.kubernetes.io/pci-10.present" # NFD Feature label
values: ["true"] # (optional) only schedule on nodes with PCI device 10
restartPolicy: Never
containers:
- name: example-vector-add
image: "registry.example/example-vector-add:v42"
resources:
limits:
gpu-vendor.example/example-gpu: 1 # requesting 1 GPU
{{< /highlight >}}

#### GPU vendor implementations

- [Intel](https://intel.github.io/intel-device-plugins-for-kubernetes/cmd/gpu_plugin/README.html).
- [NVIDIA](https://github.com/NVIDIA/gpu-feature-discovery/#readme).

0 comments on commit c8d9eed

Please sign in to comment.