From b746f3720d52b072abfb401910558610a28a1018 Mon Sep 17 00:00:00 2001 From: Markus Lehtonen Date: Fri, 2 Feb 2024 11:08:06 +0200 Subject: [PATCH] KEP-4112: update - update user story 2 - re-order sections about CRI API changes (describe RunPodSandbox before the container specific requests) - update the desciption of PodSandboxConfig - add a note to CreateContainer request that container creation may fail if resources do not match with PodSandbox request. - fill in test plan - fill in graduation criteria - fill in upgrade / downgrade strategy - fill in version skew strategy --- .../4112-passdown-resources-to-cri/README.md | 173 ++++++++++++------ 1 file changed, 113 insertions(+), 60 deletions(-) diff --git a/keps/sig-node/4112-passdown-resources-to-cri/README.md b/keps/sig-node/4112-passdown-resources-to-cri/README.md index 290b20fe06d..c5007daa265 100644 --- a/keps/sig-node/4112-passdown-resources-to-cri/README.md +++ b/keps/sig-node/4112-passdown-resources-to-cri/README.md @@ -75,9 +75,9 @@ SIG Architecture for cross-cutting KEPs). - [Risks and Mitigations](#risks-and-mitigations) - [Design Details](#design-details) - [CRI API](#cri-api) - - [ContainerConfig](#containerconfig) - - [UpdateContainerResourcesRequest](#updatecontainerresourcesrequest) - [PodSandboxConfig](#podsandboxconfig) + - [CreateContainer](#createcontainer) + - [UpdateContainerResourcesRequest](#updatecontainerresourcesrequest) - [kubelet](#kubelet) - [Test Plan](#test-plan) - [Prerequisite testing updates](#prerequisite-testing-updates) @@ -85,6 +85,9 @@ SIG Architecture for cross-cutting KEPs). - [Integration tests](#integration-tests) - [e2e tests](#e2e-tests) - [Graduation Criteria](#graduation-criteria) + - [Alpha](#alpha) + - [Beta](#beta) + - [GA](#ga) - [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy) - [Version Skew Strategy](#version-skew-strategy) - [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire) @@ -263,10 +266,10 @@ guaranteed to get the resources they require. #### Story 2 -As a platform-optimized CRI runtime developer, I want to know detailed -container resource requests to be able to make optimal, platform specific, -resource allocations. Some of the resources may be handled outside cgroups and -be container runtime specific details. +As a developer of non-runc / non-Linux CRI runtime, I want to know detailed +container resource requests to be able to make correct resource allocation for +the applications. I cannot rely on cgroup parameters on this but need to know +what the user requested to fairly allocate resources between applications. #### Story 3 @@ -326,7 +329,60 @@ interoperability between containers inside the Pod. ### CRI API -#### ContainerConfig +#### PodSandboxConfig + +The PodSandboxConfig message (part of the RunPodSandbox request) will be +extended to contain information about resources of all its containers known at +the pod creation time. The container runtime may use this information to make +preparations for all upcoming containers of the pod. E.g. setup all needed +resources for a VM-based pod or prepare for optimal allocation of resources of +all the containers of the Pod. However, the container runtime may continue to +operate as they did (before this enhancement). That is, it can safely ignore +the per-container resource information and allocate resources for each +container separately, one at a time, with the `CreateContainer`. + +```diff + message PodSandboxConfig { + + ... + + // Optional configurations specific to Linux hosts. + LinuxPodSandboxConfig linux = 8; + // Optional configurations specific to Windows hosts. + WindowsPodSandboxConfig windows = 9; ++ ++ // Kubernetes resource spec of the containers in the pod. ++ PodResourceConfig pod_resources = 10; + } + ++// PodResourceConfig contains information of all resources requirements of ++// the containers of a pod. ++message PodResourceConfig { ++ repeated ContainerResourceConfig init_containers = 1; ++ repeated ContainerResourceConfig containers = 2; ++} + ++// ContainerResourceConfig contains information of all resource requirements of ++// one container. ++message ContainerResourceConfig { ++ // Name of the container ++ string name= 1; ++ ++ // Kubernetes resource spec of the container ++ KubernetesResources kubernetes_resources = 2; ++ ++ // Mounts for the container. ++ repeated Mount mounts = 3; ++ ++ // Devices for the container. ++ repeated Device devices = 4; ++ ++ // CDI devices for the container. ++ repeated CDIDevice CDI_devices = 5; ++} +``` + +#### CreateContainer The ContainerConfig message (used in CreateContainer request) is extended to contain unmodified resource requests from the PodSpec. @@ -357,6 +413,12 @@ contain unmodified resource requests from the PodSpec. +} ``` +The resources (mounts, devices, CDI devices, Kubernetes resources) in the +CreateContainer request should be identical to what was (pre-)informed in the +RunPodSandbox request. If they are different, the CRI runtime may fail the +container creation, for example because changes cannot be applied after a +VM-based Pod has been created. + #### UpdateContainerResourcesRequest The UpdateContainerResourcesRequest message is extended to pass down unmodified @@ -380,55 +442,6 @@ resource requests from the PodSpec. } ``` -#### PodSandboxConfig - -The PodSandboxConfig message (part of the RunPodSandbox request) will be -extended to contain information about resources of all its containers known at -the pod creation time. The container resources here are non-binding and only -informational, e.g. for the runtime to prepare for optimal allocation of -resources of all the containers of the Pod. - -```diff - message PodSandboxConfig { - - ... - - // Optional configurations specific to Linux hosts. - LinuxPodSandboxConfig linux = 8; - // Optional configurations specific to Windows hosts. - WindowsPodSandboxConfig windows = 9; -+ -+ // Kubernetes resource spec of the containers in the pod. -+ PodResourceConfig pod_resources = 10; - } - -+// PodResourceConfig contains information of all resources requirements of -+// the containers of a pod. -+message PodResourceConfig { -+ repeated ContainerResourceConfig init_containers = 1; -+ repeated ContainerResourceConfig containers = 2; -+} - -+// ContainerResourceConfig contains information of all resource requirements of -+// one container. -+message ContainerResourceConfig { -+ // Name of the container -+ string name= 1; -+ -+ // Kubernetes resource spec of the container -+ KubernetesResources kubernetes_resources = 2; -+ -+ // Mounts for the container. -+ repeated Mount mounts = 3; -+ -+ // Devices for the container. -+ repeated Device devices = 4; -+ -+ // CDI devices for the container. -+ repeated CDIDevice CDI_devices = 5; -+} -``` - ### kubelet Kubelet code is refactored/modified so that all container resources are known @@ -516,7 +529,7 @@ when drafting this test plan. [testing-guidelines]: https://git.k8s.io/community/contributors/devel/sig-testing/testing.md --> -[ ] I/we understand the owners of the involved components may require updates to +[x] I/we understand the owners of the involved components may require updates to existing tests to make this code solid enough prior to committing the changes necessary to implement this enhancement. @@ -527,6 +540,8 @@ Based on reviewers feedback describe what additional tests need to be added prio implementing this enhancement to ensure the enhancements have also solid foundations. --> +No prerequisite testing updates have been identified. + ##### Unit tests -- ``: `` - `` +- `k8s.io/kubernetes/pkg/kubelet/kuberuntime`: `2024-02-02` - `68.3%` + +The +[fake_runtime](https://github.com/kubernetes/cri-api/blob/master/pkg/apis/testing/fake_runtime_service.go) +will be used in unit tests to verify that the Kubelet correctly passes down the +resource information to the CRI runtime. ##### Integration tests @@ -560,7 +580,7 @@ For Beta and GA, add links to added tests together with links to k8s-triage for https://storage.googleapis.com/k8s-triage/index.html --> -- : +For alpha, no new integration tests are planned. ##### e2e tests @@ -574,7 +594,7 @@ https://storage.googleapis.com/k8s-triage/index.html We expect no non-infra related flakes in the last month as a GA graduation criteria. --> -- : +For alpha, no new e2e tests are planned. ### Graduation Criteria @@ -640,6 +660,25 @@ in back-to-back releases. - Deprecate the flag --> +#### Alpha + +- Feature implemented behind a feature flag +- Initial unit tests completed and enabled + +#### Beta + +- Gather feedback from developers and surveys +- Feature gate enabled by default +- containerd and CRI-O runtimes have released versions that have adopted the + new CRI API changes +- The [NRI API](https://github.com/containerd/nri) has adopted the feature + +#### GA + +- No bugs reported in the previous cycle +- N examples of real-world usage +- N installs + ### Upgrade / Downgrade Strategy +The feature gate (in kubelet) controls the feature enablement. Existing runtime +implementations will continue to work as previously, even if the feature is +enabled. + ### Version Skew Strategy +The feature is node-local (kubelet-only) so there is no dependencies or effects +to other Kubernetes components. + +The behavior is unchanged if either kubelet or the CRI runtime running on a +node does not support the feature. If kubelet has the feature enabled but the +CRI runtime does not support it, the CRI runtime will ignore the new fields in +the CRI API and function as previously. Similarly, if the CRI runtime supports +the feature but the kubelet does not, the runtime will resort to the previous +behavior. + ## Production Readiness Review Questionnaire