Skip to content

Commit

Permalink
blog: Enhancing Kubernetes API Server Efficiency with API Streaming
Browse files Browse the repository at this point in the history
  • Loading branch information
p0lyn0mial committed Nov 21, 2024
1 parent 404cf3e commit 5f68e84
Show file tree
Hide file tree
Showing 2 changed files with 88 additions and 0 deletions.
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
---
layout: blog
title: 'Enhancing Kubernetes API Server Efficiency with API Streaming'
date: 2024-11-21
slug: kube-apiserver-api-streaming
author: >
Stefan Schimanski (Upbound),
Wojciech Tyczynski (Google),
Lukasz Szaszkiewicz (Red Hat)
---

Managing Kubernetes clusters efficiently is critical, especially as their size is growing.
A significant challenge with large clusters is the memory overhead caused by LIST requests.

In the current implementation, kube-apiserver processes LIST requests by assembling the entire response in-memory before transmitting any data to the client.
But what if the response body is substantial, say hundreds of megabytes? Additionally, imagine a scenario where multiple LIST requests flood in simultaneously, perhaps after a brief network outage.
Priority and Fairness has proven to be of limited use for protection of the apiserver as it does not know in advance sizes and number of returned objects.
This situation poses a genuine risk, potentially overwhelming and crashing any kube-apiserver within seconds due to OOM. To better visualize the issue let's consider the below graph.

![kube-apiserver memory usage](./kube-apiserver-memory_usage.png "[kube-apiserver memory usage")

The graph shows the memory usage of a kube-apiserver during a synthetic test (see the synthetic test section for more details).
The results clearly show that increasing the number of informers significantly boosts the server's memory consumption.
Notably, at approximately 16:40, the server crashed when serving only 16 informers.

## Why does kube-apiserver allocate so much memory for list requests?

Our investigation revealed that this substantial memory allocation occurs because the server must:
* fetch data from the database,
* deserialize the data from its stored format,
* and finally construct the final response by converting and serializing the data into a client requested format

before sending the first byte to the client. This sequence results in significant temporary memory consumption.
The actual usage depends on many factors like the page size, applied filters (e.g. label selectors), query parameters, and sizes of individual objects.

Unfortunately, neither [API Priority and Fairness](https://kubernetes.io/docs/concepts/cluster-administration/flow-control/) nor Golang's garbage collection or Golang memory limits can prevent the system from exhausting memory under these conditions.
The memory is allocated suddenly and rapidly, and just a few requests can quickly deplete the available memory, leading to resource exhaustion.

Depending on how the apiserver is run on the node, it might either be killed through OOM by the kernel when exceeding the configured memory limits during these uncontrolled spikes, or if limits are not configured it might have even worse impact on the control plane node.
And worst, after dying of the first apiserver, the same requests will likely hit another control plane node in an HA setup with probably the same impact.
Potentially a situation that is hard to diagnose and hard to recover from.

## Streaming LIST requests

Today, we're excited to announce a major improvement.
With the graduation of the WatchList feature to beta in Kubernetes 1.32, client-go users can opt-in (after explicitly enabling WatchListClient feature gate)
to streaming lists by switching from LIST to (a special kind of) WATCH requests.

WATCH requests are served from the watch-cache, an in-memory cache designed to improve scalability of read operations.
By streaming each item individually instead of returning the entire list, this method maintains constant memory overhead.
The server is bound by the maximum allowed size of an object in etcd plus a few additional allocations.
This approach drastically reduces the temporary memory usage compared to traditional LIST requests, ensuring a more efficient and stable system,
especially in clusters with a large number of objects of a given type or large average object sizes where despite paging memory consumption used to be high.

Building on the insight gained from the synthetic test (see the synthetic test section for more details), we developed an automated performance test to systematically evaluate the impact of the WatchList feature.
This test replicates the same scenario, generating a large number of secrets with a large payload and scaling the number of informers to simulate heavy LIST request patterns.
The automated test is executed periodically to monitor memory usage of the server with the feature enabled and disabled.

The results showed significant improvements with the WatchList feature enabled.
With the feature turned on, the kube-apiserver’s memory consumption stabilized at approximately **2 GB**.
In contrast, with the feature disabled, memory usage increased to approximately **20GB**, a **10x** increase!
These results confirm the effectiveness of the new streaming API, which reduces the temporary memory footprint.

## Enabling API Streaming for your component

Upgrade to Kubernetes 1.32. Make sure your cluster uses etcd in version 3.4.31+ or 3.5.13+.
Enable WatchListClient for client-go. For details on enabling the feature gate in client-go, please visit https://kubernetes.io/blog/2024/08/12/feature-gates-in-client-go.

## What's Next?
In Kubernetes 1.32, the feature is enabled in kube-controller-manager by default despite its beta state.
This will be expanded to other core components like kube-scheduler or kubelet eventually, when it is promoted to GA the latest.
Other 3rd-party components are encouraged to opt-in to the feature during the beta phase, especially when they are at risk of accessing a large number of resources or kinds with potentially large object sizes.

For the time being, [API Priority and Fairness](https://kubernetes.io/docs/concepts/cluster-administration/flow-control) assigns a reasonable small cost to LIST requests.
This is necessary to allow enough parallelism for the average case where LISTs are cheap enough.
But it does not match the spiky exceptional situation of many and large objects.
When WatchList is used by the majority of the ecosystem, the LIST cost estimation can be changed to larger values without risking degraded performance in the average case,
and with that increasing the protection against this kind of requests that can still hit the apiserver in the future.


## The synthetic test

In order to reproduce the issue, we conducted a manual test to understand the impact of LIST request on kube-apiserver memory usage.
In the test, we created 400 secrets, each containing 1 MB of data, and used informers to retrieve all secrets.

The results were alarming, only 16 informers were needed to cause the test server to run out of memory and crash, demonstrating how quickly memory consumption can grow under such conditions.

Special [@deads2k](https://github.com/deads2k) for his help in shaping this feature.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 5f68e84

Please sign in to comment.