-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KEP4009: Add CDI Devices to device plugin API #4011
Merged
k8s-ci-robot
merged 2 commits into
kubernetes:master
from
elezar:KEP-4009/Add-CDI-devices-to-device-plugin-API
Jun 16, 2023
+386
−0
Merged
Changes from all commits
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
kep-number: 4009 | ||
alpha: | ||
approver: "@johnbelamaric" |
340 changes: 340 additions & 0 deletions
340
keps/sig-node/4009-add-cdi-devices-to-device-plugin-api/README.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,340 @@ | ||
# KEP-4009: Add CDI devices to device plugin API | ||
|
||
<!-- toc --> | ||
- [Release Signoff Checklist](#release-signoff-checklist) | ||
- [Summary](#summary) | ||
- [Motivation](#motivation) | ||
- [Goals](#goals) | ||
- [Design Details](#design-details) | ||
- [Test Plan](#test-plan) | ||
- [Prerequisite testing updates](#prerequisite-testing-updates) | ||
- [Unit tests](#unit-tests) | ||
- [Integration tests](#integration-tests) | ||
- [e2e tests](#e2e-tests) | ||
- [Graduation Criteria](#graduation-criteria) | ||
- [Alpha](#alpha) | ||
- [Alpha to Beta Graduation](#alpha-to-beta-graduation) | ||
- [Beta to G.A Graduation](#beta-to-ga-graduation) | ||
- [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy) | ||
- [Version Skew Strategy](#version-skew-strategy) | ||
- [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire) | ||
- [Feature Enablement and Rollback](#feature-enablement-and-rollback) | ||
- [Rollout, Upgrade and Rollback Planning](#rollout-upgrade-and-rollback-planning) | ||
- [Monitoring Requirements](#monitoring-requirements) | ||
- [Dependencies](#dependencies) | ||
- [Scalability](#scalability) | ||
- [Troubleshooting](#troubleshooting) | ||
- [Implementation History](#implementation-history) | ||
- [Drawbacks](#drawbacks) | ||
- [Alternatives](#alternatives) | ||
<!-- /toc --> | ||
|
||
## Release Signoff Checklist | ||
|
||
Items marked with (R) are required *prior to targeting to a milestone / release*. | ||
|
||
- [ ] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR) | ||
- [ ] (R) KEP approvers have approved the KEP status as `implementable` | ||
- [ ] (R) Design details are appropriately documented | ||
- [ ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors) | ||
- [ ] e2e Tests for all Beta API Operations (endpoints) | ||
- [ ] (R) Ensure GA e2e tests meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md) | ||
- [ ] (R) Minimum Two Week Window for GA e2e tests to prove flake free | ||
- [ ] (R) Graduation criteria is in place | ||
- [ ] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md) | ||
- [ ] (R) Production readiness review completed | ||
- [ ] (R) Production readiness review approved | ||
- [ ] "Implementation History" section is up-to-date for milestone | ||
- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io] | ||
- [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes | ||
|
||
[kubernetes.io]: https://kubernetes.io/ | ||
[kubernetes/enhancements]: https://git.k8s.io/enhancements | ||
[kubernetes/kubernetes]: https://git.k8s.io/kubernetes | ||
[kubernetes/website]: https://git.k8s.io/website | ||
|
||
## Summary | ||
|
||
This KEP proposes extending the Device Plugin API, adding a field to specify | ||
Container Device Interface (CDI) device IDs in the `AllocateResponse`. This | ||
supplements the existing fields such as annotations and allows device plugin | ||
implementations to uniquely specify devices using their fully-qualified CDI | ||
devices names. | ||
|
||
The recent addition of CDI device IDs to the CRI structures in [#3731](https://github.com/kubernetes/enhancements/pull/3731) allow these IDs to be forwarded to the CRI runtimes in a secure manner. Although | ||
these changes were motivated by [KEP-3063](https://github.com/kubernetes/enhancements/issues/3063), adding support for these fields to the | ||
existing device plugin API allows this mechanism to also be used for devices | ||
supported by these plugins. | ||
|
||
## Motivation | ||
|
||
The Container Device Inteface (CDI) provides a standard mechanism for device | ||
vendors to describe what is required to provide access to a specific resource | ||
such as a GPU. These resources can be uniquely identified using a | ||
fully-qualified CDI device name. | ||
|
||
The changes proposed in [#3731]((https://github.com/kubernetes/enhancements/pull/3731)) extend the CRI to provide a well-defined mechanism for forwarding such | ||
requests to CRI runtimes such as Containerd and Cri-o. These have already | ||
been extended to accept CDI device requests, and to use the associated CDI | ||
specifications to ensure that the required | ||
modifications are made to the OCI runtime specification for a container being | ||
launched. | ||
|
||
The addition of an explicit field for specifying CDI device names to the Device | ||
Plugin API allows this CRI field to be used to indicate which devices should be | ||
injected. This removes the need to use workarounds such as container annotations | ||
to pass this information to the runtimes and allows Device Plugin authors to | ||
adopt CDI to inject devices without requiring that users move to a Dynamic | ||
Resource Allocation (DRA) based implementation. | ||
|
||
### Goals | ||
|
||
* Allow Device Plugin authors to forward device requests to CRI runtimes as a CRI field. | ||
* Allow Device Plugin authors to use CDI to define the modifications required for containerised environments. | ||
|
||
## Design Details | ||
|
||
This adds a repeated `CDIDevice` field to the exiting `ContainerAllocateResponse` returned as part of the | ||
`AllocateResponse` in the Device Plugin API. This matches the modifications made to the Dynamic Resource Allocation API in [#3731](https://github.com/kubernetes/enhancements/pull/3731). | ||
|
||
The values contained in this field are then used to populate the corresponding field in the CRI | ||
which is passed to the container runtimes. In addition, annotations with a `cdi.k8s.io` prefix will be | ||
added to the CRI to allow for consumption in container runtimes that do not yet support the | ||
CRI field directly, but do support device requests through annotations. | ||
|
||
```protobuf | ||
// CDIDevice specifies a CDI device information. | ||
message CDIDevice { | ||
// Fully qualified CDI device name | ||
// for example: vendor.com/gpu=gpudevice1 | ||
// see more details in the CDI specification: | ||
// https://github.com/container-orchestrated-devices/container-device-interface/blob/main/SPEC.md | ||
string name = 1; | ||
} | ||
|
||
message ContainerAllocateResponse { | ||
// List of environment variable to be set in the container to access one of more devices. | ||
map<string, string> envs = 1; | ||
// Mounts for the container. | ||
repeated Mount mounts = 2; | ||
// Devices for the container. | ||
repeated DeviceSpec devices = 3; | ||
// Container annotations to pass to the container runtime | ||
map<string, string> annotations = 4; | ||
// CDI devices for the container. | ||
repeated CDIDevice cdi_devices = 5; | ||
} | ||
``` | ||
|
||
### Test Plan | ||
|
||
[x] I/we understand the owners of the involved components may require updates to | ||
existing tests to make this code solid enough prior to committing the changes necessary | ||
to implement this enhancement. | ||
|
||
##### Prerequisite testing updates | ||
|
||
##### Unit tests | ||
|
||
- `devicemanager`: `2023-06-15` - `85.1%` | ||
|
||
##### Integration tests | ||
|
||
There are currently no integration tests for device plugins. | ||
We do not plan to add any for this feature. | ||
|
||
However, these cases will be added in the existing integration tests: | ||
- Feature gate enable/disable tests | ||
|
||
##### e2e tests | ||
|
||
These cases will be added in the existing `e2e_node` tests: | ||
- Device Plugin works with CDI devices | ||
|
||
### Graduation Criteria | ||
|
||
#### Alpha | ||
- [X] Add the CDIDevices field to the device plugin API | ||
- [X] Implement the logic to pass the CDIDevices into the CRI | ||
- [X] Add proper `e2e_node` tests | ||
|
||
#### Alpha to Beta Graduation | ||
- [X] No major bugs reported in the previous cycle | ||
|
||
#### Beta to G.A Graduation | ||
- [X] Gather feedback from at least 2 device plugin vendors that CDI support works for them | ||
|
||
### Upgrade / Downgrade Strategy | ||
|
||
We expect no impact on upgrades. | ||
On downgrades, we expect no impact to Kubernetes and minimal impact to device | ||
plugin developers. | ||
|
||
We are not bumping the device plugin API version, but simply adding a field to | ||
its protobuf. On upgrades this means that older device plugins will simply | ||
continue to work as they always have, since they will need to opt-in to using | ||
this new field. | ||
|
||
For downgrades, if a plugin has not opted to use the new field, there will be | ||
no impact since a downgraded kubelet won't support it anyway. If a device | ||
plugin has opted-in to use the new field, a downgraded kubelet will simply | ||
silently ignore it. This would have no impact to Kubernetes itself, but the | ||
plugin developer would need to be aware of this if they are confused as to why | ||
their new CDI support is suddenly not working anymore. | ||
|
||
### Version Skew Strategy | ||
|
||
The kubelet will always be backwards compatible, so going forward existing | ||
plugins are not expected to break. | ||
|
||
## Production Readiness Review Questionnaire | ||
|
||
### Feature Enablement and Rollback | ||
|
||
###### How can this feature be enabled / disabled in a live cluster? | ||
|
||
- [x] Feature gate (also fill in values in `kep.yaml`) | ||
- Feature gate names: | ||
- `DevicePluginCDIDevices` | ||
- Components depending on the feature gate: kubelet | ||
- [x] Pass CDI devices to the kubelet over the new field in the device plugin API | ||
- Will enabling / disabling the feature require downtime of the control | ||
plane? | ||
No. | ||
- Will enabling / disabling the feature require downtime or reprovisioning | ||
of a node? | ||
No. | ||
|
||
|
||
###### Does enabling the feature change any default behavior? | ||
|
||
No. Device Plugins need to be updated to make use of the new field. | ||
|
||
###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)? | ||
|
||
- Yes, disabling the `DevicePluginCDIDevices` feature gate shuts down the feature completely. | ||
- Yes, by not sending CDI devices over the device plugin API (and falling back to the old way of passing device info). | ||
|
||
###### What happens if we reenable the feature if it was previously rolled back? | ||
|
||
Nothing bad will happen, new containers will simply be able to be started with | ||
CDI devices again. | ||
|
||
###### Are there any tests for feature enablement/disablement? | ||
|
||
There will be e2e tests demonstrating that CDI devices are attached as expected | ||
when the feature is enabled, and silently ignored if the feature is disabled. | ||
|
||
### Rollout, Upgrade and Rollback Planning | ||
|
||
###### How can a rollout or rollback fail? Can it impact already running workloads? | ||
|
||
The failure of the kubelet would mean that fields from new device allocations | ||
will not be processed. | ||
|
||
However, CDI device themselves are only interpereted at container start. | ||
Existing containers that were started with support for CDI devices will not be | ||
impacted if the feature gate is enabled or disabled during the lifetime of a | ||
running container. Only new containers will be impacted by the presence or | ||
absence of the feature gate. | ||
|
||
###### What specific metrics should inform a rollback? | ||
|
||
N/A | ||
|
||
###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested? | ||
|
||
N/A | ||
|
||
###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.? | ||
|
||
No | ||
|
||
### Monitoring Requirements | ||
|
||
###### How can an operator determine if the feature is in use by workloads? | ||
|
||
This depends on Device Plugin vendor implementations making use of the required | ||
field and cannot be directly determined. | ||
|
||
###### How can someone using this feature know that it is working for their instance? | ||
|
||
End-users are not aware that this feature exists. Device plugin developers can | ||
ensure that this feature is working by passing CDI devices to workloads | ||
requesting them, and ensuring that the workloads come up successfully with | ||
access to the devices they asked for. | ||
|
||
###### What are the reasonable SLOs (Service Level Objectives) for the enhancement? | ||
|
||
N/A | ||
|
||
###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service? | ||
|
||
N/A | ||
|
||
###### Are there any missing metrics that would be useful to have to improve observability of this feature? | ||
|
||
N/A | ||
|
||
### Dependencies | ||
|
||
###### Does this feature depend on any specific services running in the cluster? | ||
|
||
- The container runtime (e.g. containerd, crio-o, etc.) must support CDI. | ||
- A Device Plugin must be implemented to use the field. | ||
|
||
### Scalability | ||
|
||
###### Will enabling / using this feature result in any new API calls? | ||
|
||
No | ||
|
||
###### Will enabling / using this feature result in introducing new API types? | ||
|
||
No | ||
|
||
###### Will enabling / using this feature result in any new calls to the cloud provider? | ||
|
||
No | ||
|
||
###### Will enabling / using this feature result in increasing size or count of the existing API objects? | ||
|
||
No | ||
|
||
###### Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs? | ||
|
||
No | ||
|
||
###### Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components? | ||
|
||
No. The additional field will replace existing usages where used. | ||
|
||
###### Can enabling / using this feature result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.)? | ||
|
||
No | ||
|
||
### Troubleshooting | ||
|
||
N/A | ||
|
||
###### What are other known failure modes? | ||
|
||
TBD | ||
|
||
###### What steps should be taken if SLOs are not being met to determine the problem? | ||
|
||
N/A | ||
|
||
## Implementation History | ||
|
||
- 2023-05-15: KEP created | ||
|
||
## Drawbacks | ||
|
||
There is no reason this KEP should not be implemented. CDI is the new standard | ||
for device support in containerized environments, and this enhancement now | ||
makes this possible through a simple addition to the device plugin API. | ||
|
||
## Alternatives | ||
|
||
None |
43 changes: 43 additions & 0 deletions
43
keps/sig-node/4009-add-cdi-devices-to-device-plugin-api/kep.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,43 @@ | ||
title: Add CDI devices to device plugin API | ||
kep-number: 4009 | ||
authors: | ||
- "@elezar" | ||
owning-sig: sig-node | ||
participating-sigs: [] | ||
status: implementable | ||
creation-date: 2023-05-16 | ||
reviewers: | ||
- swatisehgal | ||
approvers: | ||
- mrunalp | ||
|
||
see-also: | ||
- "/keps/sig-node/3063-dynamic-resource-allocation" | ||
- "/keps/sig-node/3573-device-plugin" | ||
replaces: [] | ||
|
||
# The target maturity stage in the current dev cycle for this KEP. | ||
stage: alpha | ||
|
||
# The most recent milestone for which work toward delivery of this KEP has been | ||
# done. This can be the current (upcoming) milestone, if it is being actively | ||
# worked on. | ||
latest-milestone: "v1.28" | ||
|
||
# The milestone at which this feature was, or is targeted to be, at each stage. | ||
milestone: | ||
alpha: "v1.28" | ||
beta: "v1.29" | ||
stable: "v1.30" | ||
|
||
# The following PRR answers are required at alpha release | ||
# List the feature gate name and the components for which it must be enabled | ||
feature-gates: | ||
- name: "DevicePluginCDIDevices" | ||
components: | ||
- kubelet | ||
|
||
disable-supported: true | ||
|
||
# The following PRR answers are required at beta release | ||
metrics: [] |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd still suggest not to emphasise usage of CRI field. This is a protocol details between the Kubelet and a CRI runtime. This KEP is about Device Plugin API change.
In fact, you may end up passing CDI devices from the Kubelet to a CRI runtime using annotations to support old runtime versions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although I don't disagree that the kubelet COULD use annotations to forward device IDs to runtimes, the intent of this KEP is to use the new field in the CRI to allow Device Plugins to request devices. This keeps these devices IDs opague and does not require any interaction by the Kubelet. At present the Kubelet (at least in my cursory inspection of the code) doesn't perform any other such processing for other fields present.
Forwarding the field directly also means that the Kubelet does not need to be aware of which runtime version -- or even specific configuration -- is being used. I mention configurations here, because for containerd, for example, CDI needs to be enabled and the CDI annotations need to be allowed before these function. The proposal as it stands shifts the responsibility on device plugin vendors to make a decision as to which mechansim they will use to support device injection and when to transition from one mechanism to another as well as educating users on the required configuration. This would extend the existing node-level configurations or requirements that already need to be communicated to users of a plugin.
If we do at some point decide that the kubelet should process this field and and perform operations on it -- such as converting these to annotations -- that should be a new KEP where that can be discussed with specific use cases in mind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be better if Kubelet would take this responsibility? At least it would be done in one place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you give a use case that is not covered by the current model to extending the Kubelet with additional functionality?