Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In-Place Vertical Pod Scaling KEP to implementable, and mini-KEP for CRI extensions #1342

Merged
merged 16 commits into from
Jan 28, 2020
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
Show all changes
16 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 20 additions & 2 deletions keps/sig-autoscaling/20181106-in-place-update-of-pod-resources.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,8 +23,8 @@ approvers:
- "@mwielgus"
editor: TBD
creation-date: 2018-11-06
last-updated: 2018-11-06
status: provisional
last-updated: 2019-10-25
status: implementable
see-also:
replaces:
superseded-by:
Expand All @@ -48,6 +48,7 @@ superseded-by:
- [Scheduler and API Server Interaction](#scheduler-and-api-server-interaction)
- [Flow Control](#flow-control)
- [Container resource limit update ordering](#container-resource-limit-update-ordering)
- [Container resource limit update failure handling](#container-resource-limit-update-failure-handling)
- [Notes](#notes)
- [Affected Components](#affected-components)
- [Future Enhancements](#future-enhancements)
Expand Down Expand Up @@ -167,6 +168,12 @@ Kubelet calls UpdateContainerResources CRI API which currently takes
but not for Windows. This parameter changes to *runtimeapi.ContainerResources*,
that is runtime agnostic, and will contain platform-specific information.

Additionally, GetContainerResources CRI API is introduced that allows Kubelet
vinaykul marked this conversation as resolved.
Show resolved Hide resolved
to query currently configured CPU and memory limits for a container.

These CRI changes are a separate effort that does not affect the design
proposed in this KEP.

### Kubelet and API Server Interaction

When a new Pod is created, Scheduler is responsible for selecting a suitable
Expand Down Expand Up @@ -283,6 +290,16 @@ updates resource limit for the Pod and its Containers in the following manner:
In all the above cases, Kubelet applies Container resource limit decreases
before applying limit increases.

#### Container resource limit update failure handling

If multiple Containers in a Pod are being updated, and UpdateContainerResources
CRI API fails for any of the containers, Kubelet will backoff and retry at a
vinaykul marked this conversation as resolved.
Show resolved Hide resolved
later time. Kubelet does not attempt to update limits for containers that are
lined up for update after the failing container. This ensures that sum of the
container limits does not exceed Pod-level cgroup limit at any point. Once all
the container limits have been successfully updated, Kubelet updates the Pod's
Status.ContainerStatuses[i].Resources to match the desired limit values.

#### Notes

* If CPU Manager policy for a Node is set to 'static', then only integral
Expand Down Expand Up @@ -373,3 +390,4 @@ TODO
- 2019-01-18 - implementation proposal extended
- 2019-03-07 - changes to flow control, updates per review feedback
- 2019-08-29 - updated design proposal
- 2019-10-25 - update key open items and move KEP to implementable
241 changes: 241 additions & 0 deletions keps/sig-node/20191025-kubelet-container-resources-cri-api-changes.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,241 @@
---
title: Container Resources CRI API Changes for Pod Vertical Scaling
authors:
- "@vinaykul"
- "@quinton-hoole"
owning-sig: sig-node
participating-sigs:
reviewers:
- TBD
approvers:
- TBD
editor: TBD
creation-date: 2019-10-25
last-updated: 2019-10-25
status: provisional
see-also:
- "/keps/sig-autoscaling/20181106-in-place-update-of-pod-resources.md"
replaces:
superseded-by:
---

# Container Resources CRI API Changes for Pod Vertical Scaling

## Table of Contents

<!-- toc -->
- [Summary](#summary)
- [Motivation](#motivation)
- [Goals](#goals)
- [Non-Goals](#non-goals)
- [Proposal](#proposal)
- [Design Details](#design-details)
- [Expected Behavior of CRI Runtime](#expected-behavior-of-cri-runtime)
- [Test Plan](#test-plan)
- [Graduation Criteria](#graduation-criteria)
- [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy)
- [Version Skew Strategy](#version-skew-strategy)
- [Risks and Mitigations](#risks-and-mitigations)
- [Implementation History](#implementation-history)
<!-- /toc -->

## Summary

This proposal aims to improve the Container Runtime Interface (CRI) APIs for
managing a Container's CPU and memory resource configurations on the runtime.
It seeks to extend UpdateContainerResources CRI API such that it works for
Windows, and other future runtimes besides Linux. It also seeks to extend
ContainerStatus CRI API to allow Kubelet to discover the current resources
configured on a Container.

## Motivation

In-Place Pod Vertical Scaling feature relies on Container Runtime Interface
(CRI) to update the CPU and/or memory limits for Container(s) in a Pod.

The current CRI API set has a few drawbacks that need to be addressed:
1. UpdateContainerResources CRI API takes a parameter that describes Container
resources to update for Linux Containers, and this may not work for Windows
Containers or other potential non-Linux runtimes in the future.
1. There is no CRI mechanism that lets Kubelet query and discover the CPU and
memory limits configured on a Container from the Container runtime.
1. The expected behavior from a runtime that handles UpdateContainerResources
CRI API is not very well defined or documented.

### Goals

This proposal has two primary goals:
- Modify UpdateContainerResources to allow it to work for Windows Containers,
as well as Containers managed by other runtimes besides Linux,
- Provide CRI API mechanism to query the Container runtime for CPU and memory
resource configurations that are currently applied to a Container.

An additional goal of this proposal is to better define and document the
expected behavior of a Container runtime when handling resource updates.

### Non-Goals

Definition of expected behavior of a Container runtime when it handles CRI APIs
related to a Container's resources is intended to be a high level guide. It is
a non-goal of this proposal to define a detailed or specific way to implement
these functions. Implementation specifics are left to the runtime, within the
bounds of expected behavior.

## Proposal

One key change is to make UpdateContainerResources API work for Windows, and
any other future runtimes, besides Linux by making the resources parameter
passed in the API specific to the target runtime.

Another change in this proposal is to extend ContainerStatus CRI API such that
Kubelet can query and discover the CPU and memory resources that are presently
applied to a Container.

To accomplish aforementioned goals:

* A new protobuf message object named *ContainerResources* that encapsulates
LinuxContainerResources and WindowsContainerResources is introduced as below.
- This message can easily be extended for future runtimes by simply adding a
new runtime-specific resources struct to the ContainerResources message.
```
vinaykul marked this conversation as resolved.
Show resolved Hide resolved
// ContainerResources holds resource configuration for a container.
message ContainerResources {
oneof r {
vinaykul marked this conversation as resolved.
Show resolved Hide resolved
// Resource configuration specific to Linux container.
LinuxContainerResources linux = 1;
// Resource configuration specific to Windows container.
WindowsContainerResources windows = 2;
}
}
```

* UpdateContainerResourcesRequest message is extended to carry
ContainerResources field as below.
- For Linux runtimes, Kubelet fills UpdateContainerResourcesRequest.Linux in
additon to UpdateContainerResourcesRequest.Resources.Linux fields.
- This keeps backward compatibility by letting runtimes that rely on the
current LinuxContainerResources continue to work, while enabling newer
runtime versions to use UpdateContainerResourcesRequest.Resources.Linux,
- It enables deprecation of UpdateContainerResourcesRequest.Linux field.
```
message UpdateContainerResourcesRequest {
// ID of the container to update.
string container_id = 1;
// Resource configuration specific to Linux container.
LinuxContainerResources linux = 2;
// Resource configuration for the container.
ContainerResources resources = 3;
}
```

* ContainerStatus message is extended to return ContainerResources as below.
- This enables Kubelet to query the runtime and discover resources currently
applied to a Container using ContainerStatus CRI API.
```
@@ -914,6 +912,8 @@ message ContainerStatus {
repeated Mount mounts = 14;
// Log path of container.
string log_path = 15;
+ // Resource configuration of the container.
+ ContainerResources resources = 16;
}
```

* ContainerManager CRI API service interface is modified as below.
- UpdateContainerResources takes ContainerResources parameter instead of
LinuxContainerResources.
```
--- a/staging/src/k8s.io/cri-api/pkg/apis/services.go
+++ b/staging/src/k8s.io/cri-api/pkg/apis/services.go
@@ -43,8 +43,10 @@ type ContainerManager interface {
ListContainers(filter *runtimeapi.ContainerFilter) ([]*runtimeapi.Container, error)
// ContainerStatus returns the status of the container.
ContainerStatus(containerID string) (*runtimeapi.ContainerStatus, error)
- // UpdateContainerResources updates the cgroup resources for the container.
- UpdateContainerResources(containerID string, resources *runtimeapi.LinuxContainerResources) error
+ // UpdateContainerResources updates resource configuration for the container.
+ UpdateContainerResources(containerID string, resources *runtimeapi.ContainerResources) error
// ExecSync executes a command in the container, and returns the stdout output.
// If command exits with a non-zero exit code, an error is returned.
ExecSync(containerID string, cmd []string, timeout time.Duration) (stdout []byte, stderr []byte, err error)
```

* Kubelet code is modified to leverage these changes.

## Design Details

Below diagram is an overview of Kubelet using UpdateContainerResources and
ContainerStatus CRI APIs to set new container resource limits, and update the
Pod Status in response to user changing the desired resources in Pod Spec.

```
+-----------+ +-----------+ +-----------+
| | | | | |
| apiserver | | kubelet | | runtime |
| | | | | |
+-----+-----+ +-----+-----+ +-----+-----+
| | |
| watch (pod update) | |
|------------------------------>| |
| [Containers.Resources] | |
| | |
| (admit) |
| | |
| | UpdateContainerResources() |
| |----------------------------->|
| | (set limits)
| |<- - - - - - - - - - - - - - -|
| | |
| | ContainerStatus() |
| |----------------------------->|
| | |
| | [ContainerResources] |
| |<- - - - - - - - - - - - - - -|
| | |
| update (pod status) | |
|<------------------------------| |
| [ContainerStatuses.Resources] | |
| | |

```

* Kubelet invokes UpdateContainerResources() CRI API in ContainerManager
interface to configure new CPU and memory limits for a Container by
specifying those values in ContainerResources parameter to the API. Kubelet
sets ContainerResources parameter specific to the target runtime platform
when calling this CRI API.

* Kubelet calls ContainerStatus() CRI API in ContainerManager interface to get
the CPU and memory limits applied to a Container. It uses the values returned
in ContainerStatus.Resources to update ContainerStatuses[i].Resources.Limits
for that Container in the Pod's Status.

### Expected Behavior of CRI Runtime

TBD

### Test Plan

Unit tests: TBD
E2E tests: TBD

### Graduation Criteria

TBD

### Upgrade / Downgrade Strategy

Is this applicable? - TBD

### Version Skew Strategy

Is this applicable? - TBD

### Risks and Mitigations

TBD

## Implementation History

- 2019-10-25 - Initial KEP draft created