From b3703c614a65a8e042d6d438af815807b15c4c0b Mon Sep 17 00:00:00 2001 From: Markus Lehtonen Date: Mon, 5 Feb 2024 16:59:40 +0200 Subject: [PATCH] KEP-4112: update - support sidecar containers: instead of separate lists for init and regular containers, have one list and include the type of container (init, sidecar, regular) - add notes about mounts and devices when describing changes to CreateContainer and UpdateContainerResources requests - update description of kubelet: more accurate description of what information is included in each CRI request - fix typos - kep.yaml: update milestone --- .../4112-passdown-resources-to-cri/README.md | 50 +++++++++++++------ .../4112-passdown-resources-to-cri/kep.yaml | 4 +- 2 files changed, 37 insertions(+), 17 deletions(-) diff --git a/keps/sig-node/4112-passdown-resources-to-cri/README.md b/keps/sig-node/4112-passdown-resources-to-cri/README.md index c5007daa265..2d3977b7f13 100644 --- a/keps/sig-node/4112-passdown-resources-to-cri/README.md +++ b/keps/sig-node/4112-passdown-resources-to-cri/README.md @@ -186,7 +186,7 @@ hot-plugged to the PCIe root-port or switch-port. If the number of pre-allocated pluggable ports is too low, the attachment will fail (container devices > pre-allocated hot-pluggable ports). -In the case of Confidential Containers (uses Kata unter the hood with additional +In the case of Confidential Containers (uses Kata under the hood with additional software components for attestation) the CRI needs to consider the cold-plug aka direct attachment use-case. At sandbox creation time the hypervisor needs to know the exact number of pass-through devices and its properties @@ -335,11 +335,12 @@ The PodSandboxConfig message (part of the RunPodSandbox request) will be extended to contain information about resources of all its containers known at the pod creation time. The container runtime may use this information to make preparations for all upcoming containers of the pod. E.g. setup all needed -resources for a VM-based pod or prepare for optimal allocation of resources of +resources for a VM-based pod or prepare for optimal allocation of resources of all the containers of the Pod. However, the container runtime may continue to -operate as they did (before this enhancement). That is, it can safely ignore -the per-container resource information and allocate resources for each -container separately, one at a time, with the `CreateContainer`. +operate as they did (before this enhancement). That is, it can ignore +the resource information presented here and allocate resources for each +container separately at container creation time with the `CreateContainer` +request. ```diff message PodSandboxConfig { @@ -358,8 +359,7 @@ container separately, one at a time, with the `CreateContainer`. +// PodResourceConfig contains information of all resources requirements of +// the containers of a pod. +message PodResourceConfig { -+ repeated ContainerResourceConfig init_containers = 1; -+ repeated ContainerResourceConfig containers = 2; ++ repeated ContainerResourceConfig containers = 1; +} +// ContainerResourceConfig contains information of all resource requirements of @@ -368,17 +368,26 @@ container separately, one at a time, with the `CreateContainer`. + // Name of the container + string name= 1; + ++ // Type of the container ++ ContainerType type= 2; ++ + // Kubernetes resource spec of the container -+ KubernetesResources kubernetes_resources = 2; ++ KubernetesResources kubernetes_resources = 3; + + // Mounts for the container. -+ repeated Mount mounts = 3; ++ repeated Mount mounts = 4; + + // Devices for the container. -+ repeated Device devices = 4; ++ repeated Device devices = 5; + + // CDI devices for the container. -+ repeated CDIDevice CDI_devices = 5; ++ repeated CDIDevice CDI_devices = 6; ++} + ++enum ContainerType { ++ INIT_CONTAINER = 0; ++ SIDECAR_CONTAINER = 1; ++ REGULAR_CONTAINER = 2; +} ``` @@ -413,6 +422,9 @@ contain unmodified resource requests from the PodSpec. +} ``` +Note that mounts, devices, CDI devices are part of the ContainerConfig message +but are left out of the diff snippet above. + The resources (mounts, devices, CDI devices, Kubernetes resources) in the CreateContainer request should be identical to what was (pre-)informed in the RunPodSandbox request. If they are different, the CRI runtime may fail the @@ -442,16 +454,24 @@ resource requests from the PodSpec. } ``` +Note that mounts, devices, CDI devices are not part of the +UpdateContainerResourcesRequest message and this proposal does not suggest +adding them. + ### kubelet Kubelet code is refactored/modified so that all container resources are known before sandbox creation. This mainly consists of preparing all mounts (of all containers) early. -Kubelet will be be extended to pass down all mounts, devices, CDI devices and -the unmodified resource requests and limits to the container runtime in all -related CRI requests, i.e. RunPodSandbox, CreateContainer and -UpdateContainerResources. +Kubelet will be extended to pass down all resources of containers in all +related CRI requests (as described in the [CRI API](#cri-api) section). That +is: + +- adding mounts, devices, CDI devices and the unmodified resource requests and + limits of all containers into RunPodSandbox request +- adding unmodified resource requests and limits into CreateContainer and + UpdateContainerResources requests For example, take a PodSpec: diff --git a/keps/sig-node/4112-passdown-resources-to-cri/kep.yaml b/keps/sig-node/4112-passdown-resources-to-cri/kep.yaml index 628676bc15f..273e90be64c 100644 --- a/keps/sig-node/4112-passdown-resources-to-cri/kep.yaml +++ b/keps/sig-node/4112-passdown-resources-to-cri/kep.yaml @@ -21,11 +21,11 @@ stage: alpha # The most recent milestone for which work toward delivery of this KEP has been # done. This can be the current (upcoming) milestone, if it is being actively # worked on. -latest-milestone: "v1.30" +latest-milestone: "v1.31" # The milestone at which this feature was, or is targeted to be, at each stage. milestone: - alpha: "v1.30" + alpha: "v1.31" # The following PRR answers are required at alpha release # List the feature gate name and the components for which it must be enabled