Blog post: Enhancing Kubernetes API Server Efficiency with API Streaming #48513

p0lyn0mial · 2024-10-23T11:27:43Z

A blog post about the API Streaming feature tracked by kubernetes/enhancements#3157

netlify · 2024-10-23T11:36:14Z

✅ Pull request preview available for checking

Built without sensitive environment variables

Name	Link
🔨 Latest commit	`d483496`
🔍 Latest deploy log	https://app.netlify.com/sites/kubernetes-io-main-staging/deploys/674d81a7de42290008d02e63
😎 Deploy Preview	https://deploy-preview-48513--kubernetes-io-main-staging.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

p0lyn0mial · 2024-11-21T13:17:28Z

/cc @sttts @wojtek-t

mbianchidev · 2024-11-21T14:42:52Z

Hey hey,
it's Matteo here, Comms Lead for 1.32

Is this piece ready for review?
Reminder that the review deadline is on the 25th of November.

Let us know if you need something from us, otherwise we will just mark it as ready for review and have sig-docs-blogs review it for content and the assigned SIG to chime in for a tech review.

Thank you!

p0lyn0mial · 2024-11-21T17:19:54Z

Hey, hey @mbianchidev, yes, the PR is ready for review.

p0lyn0mial · 2024-11-21T17:21:10Z

content/en/blog/_posts/2024-11-21-API-Streaming/2024-11-21-API-Streaming.md

+In the current implementation, kube-apiserver processes LIST requests by assembling the entire response in-memory before transmitting any data to the client. 
+But what if the response body is substantial, say hundreds of megabytes? Additionally, imagine a scenario where multiple LIST requests flood in simultaneously, perhaps after a brief network outage. 
+While [API Priority and Fairness](https://kubernetes.io/docs/concepts/cluster-administration/flow-control) has proven to reasonably protect kube-apiserver from CPU overload, its impact is visibly smaller for memory protection. 
+This can be explained by the different nature of resource consumption by a single API request - the amount of cpu used by a single request is capped by a constant, whereas the memory is proportional to the number of processed objects, which is unbounded. 


I think we might need to change this sentence a bit.

p0lyn0mial · 2024-11-21T17:21:26Z

/hold for #48513 (comment)

p0lyn0mial · 2024-11-22T07:58:00Z

/hold cancel

sttts · 2024-11-22T09:39:31Z

/lgtm

k8s-ci-robot · 2024-11-25T08:47:23Z

LGTM label has been added.

Git tree hash: fa520c6d21b777eed2d9a7dc34afb423ee3d8c66

mbianchidev · 2024-11-25T20:02:55Z

/lgtm

I'll seek SIG-Docs approval.
The current date assigned still works, awesome work folks!

sftim

Thanks. I recommend if we can making a bunch of changes to align with the style guide.

Any feedback that ends with a ? is of course a question, and not a demand.

content/en/blog/_posts/2024-11-21-API-Streaming/2024-11-21-API-Streaming.md

sftim · 2024-11-25T21:17:31Z

content/en/blog/_posts/2024-11-21-API-Streaming/2024-11-21-API-Streaming.md

+This is necessary to allow enough parallelism for the average case where LISTs are cheap enough. 
+But it does not match the spiky exceptional situation of many and large objects. 
+When WatchList is used by the majority of the ecosystem, the LIST cost estimation can be changed to larger values without risking degraded performance in the average case,
+and with that increasing the protection against this kind of requests that can still hit the apiserver in the future.


Do you agree with:

Suggested change

This is necessary to allow enough parallelism for the average case where LISTs are cheap enough.

But it does not match the spiky exceptional situation of many and large objects.

When WatchList is used by the majority of the ecosystem, the LIST cost estimation can be changed to larger values without risking degraded performance in the average case,

and with that increasing the protection against this kind of requests that can still hit the apiserver in the future.

API Priority and Fairness does not consider the size of collections when setting priorities or rate limits;

these controls happen before the size of the response is known.

The current small cost assigned for **list** requests allows enough parallelism for the average case

where **list** requests are cheap enough, nut it does not match the spiky exceptional situation of many

and / or large objects.

Once the majority of the Kubernetes ecosystem has switched to watch lists, the **list** cost estimation can be changed to larger values without risking degraded performance in the average case,

and with that increasing the protection against this kind of requests that can still hit the API server in the future.

?

API Priority and Fairness does not consider the size of collections when setting priorities or rate limits;

No - this isn't true. APF does consider size of collection, because we take number of object (per namespace or globally) of a given type when estimating cost of the request.

The current small cost assigned for list requests allows enough parallelism for the average case

That's also not true - the cost pretty well reflects the CPU cost. It doesn't reflect cost of RAM.

OK. @p0lyn0mial please fix the style issues, and if you want to explain the CPU vs. RAM detail, you're welcome to.

Revised suggestion

Suggested change

This is necessary to allow enough parallelism for the average case where LISTs are cheap enough.

But it does not match the spiky exceptional situation of many and large objects.

When WatchList is used by the majority of the ecosystem, the LIST cost estimation can be changed to larger values without risking degraded performance in the average case,

and with that increasing the protection against this kind of requests that can still hit the apiserver in the future.

API Priority and Fairness does not consider the size of objects in collections when setting priorities

or rate limits; these controls happen before the size of the response is known.

Once the majority of the Kubernetes ecosystem has switched to watch lists, the **list** cost estimation can be changed a larger multiplier without risking degraded performance in the average case,

and with that increasing the protection against this kind of requests that can still hit the API server in the future.

?

content/en/blog/_posts/2024-11-21-API-Streaming/2024-11-21-API-Streaming.md

sftim · 2024-11-25T21:18:12Z

content/en/blog/_posts/2024-11-21-API-Streaming/2024-11-21-API-Streaming.md

+In order to reproduce the issue, we conducted a manual test to understand the impact of LIST request on kube-apiserver memory usage. 
+In the test, we created 400 secrets, each containing 1 MB of data, and used informers to retrieve all secrets.


Suggested change

In order to reproduce the issue, we conducted a manual test to understand the impact of LIST request on kube-apiserver memory usage.

In the test, we created 400 secrets, each containing 1 MB of data, and used informers to retrieve all secrets.

In order to reproduce the issue, we conducted a manual test to understand the impact of **list** requests on kube-apiserver memory usage.

In the test, we created 400 Secrets, each containing 1 MB of data, and used informers to retrieve all Secrets.

content/en/blog/_posts/2024-11-21-API-Streaming/2024-11-21-API-Streaming.md

p0lyn0mial · 2024-11-28T07:25:08Z

@sftim fixed the style issues, ptal.

sftim · 2024-12-01T14:53:32Z

content/en/blog/_posts/2024-11-21-api-streaming/2024-11-21-api-streaming.md

+---
+layout: blog
+title: 'Enhancing Kubernetes API Server Efficiency with API Streaming'
+date: 2024-11-21


Suggested change

date: 2024-11-21

date: 2024-12-11

Remember to change the filename as well (the main article should be content/en/blog/_posts/2024-12-11-api-streaming/index.md; note that this change has TWO fixes as the filename we usually use for this case is index.md)

sftim · 2024-12-01T13:46:31Z

Oh, hold on. Is this a post-release blog article? If so then we'll need to assign a date after the release. That's pencilled in for the 17th, but actually you should do something slightly different.

See #48513 (comment) for the fuller detail but broadly:

you set the publication date to the K8s v1.32 release date
you set draft: true in the front matter

sftim · 2024-12-01T13:48:37Z

@mbianchidev the date in #48513 (comment) wasn't what I'd expect (I'd expected to see the v1.32 release date there), but otherwise I agree with that.

Maybe we could use a label for PRs that are part of the release blog. Something for a retro maybe.

@p0lyn0mial I apologise for the noise. This may well be good to merge; let me do a final check.

sftim · 2024-12-01T14:43:04Z

/hold cancel

mbianchidev · 2024-12-01T14:43:18Z

@sftim Agreed on having a label like release-feature-blog - and a release-blog one for the mid-cycle and final release tbh

IMHO there's no need for the author to set the Kubernetes release as date, having draft: true in there is enough.

I communicated the assigned publication date just to let the author know when to expect the blog to be published - maybe they wanna share it with their network or someone interested.

Operationally all the feature blogs are tracked on the board with an assigned date and that already makes it easy for Comms to open that separate PR to set all the current dates at once 👀

sftim · 2024-12-01T14:53:32Z

content/en/blog/_posts/2024-11-21-api-streaming/2024-11-21-api-streaming.md

+---
+layout: blog
+title: 'Enhancing Kubernetes API Server Efficiency with API Streaming'
+date: 2024-11-21


Suggested change

date: 2024-11-21

date: 2024-12-11

Remember to change the filename as well (the main article should be content/en/blog/_posts/2024-12-11-api-streaming/index.md; note that this change has TWO fixes as the filename we usually use for this case is index.md)

sftim · 2024-12-01T14:53:34Z

content/en/blog/_posts/2024-11-21-api-streaming/2024-11-21-api-streaming.md

+This situation poses a genuine risk, potentially overwhelming and crashing any kube-apiserver within seconds due to out-of-memory (OOM) conditions. To better visualize the issue, let's consider the below graph.
+
+
+![kube-apiserver memory usage](kube-apiserver-memory_usage.png "[kube-apiserver memory usage")


This doesn't render. Try:

Suggested change

![kube-apiserver memory usage](kube-apiserver-memory_usage.png "[kube-apiserver memory usage")

{{< figure src="kube-apiserver-memory_usage.png" alt="Monitoring graph showing kube-apiserver memory usage" >}}

sftim · 2024-12-01T14:55:58Z

content/en/blog/_posts/2024-11-21-api-streaming/2024-11-21-api-streaming.md

+## Enabling API Streaming for your component
+
+Upgrade to Kubernetes 1.32. Make sure your cluster uses etcd in version 3.4.31+ or 3.5.13+. 
+Enable `WatchListClient` for client-go. For details on enabling the feature gate in client-go, read [Introducing Feature Gates to Client-Go: Enhancing Flexibility and Control](/blog/2024/08/12/feature-gates-in-client-go).


Suggested change

Enable `WatchListClient` for client-go. For details on enabling the feature gate in client-go, read [Introducing Feature Gates to Client-Go: Enhancing Flexibility and Control](/blog/2024/08/12/feature-gates-in-client-go).

Change your client software to use watch lists. If your client code is written in Golang, you'll want to

enable the `WatchListClient` for client-go. For details on enabling that feature gate.

read [Introducing Feature Gates to Client-Go: Enhancing Flexibility and Control](/blog/2024/08/12/feature-gates-in-client-go).

We should avoid implying that Kubernetes clients are always written in Go.

sftim · 2024-12-01T14:56:13Z

content/en/blog/_posts/2024-11-21-api-streaming/2024-11-21-api-streaming.md

+Upgrade to Kubernetes 1.32. Make sure your cluster uses etcd in version 3.4.31+ or 3.5.13+. 
+Enable `WatchListClient` for client-go. For details on enabling the feature gate in client-go, read [Introducing Feature Gates to Client-Go: Enhancing Flexibility and Control](/blog/2024/08/12/feature-gates-in-client-go).
+
+## What's Next?


(nit)

Suggested change

## What's Next?

## What's next?

sftim · 2024-12-01T14:57:24Z

content/en/blog/_posts/2024-11-21-api-streaming/2024-11-21-api-streaming.md

+
+![kube-apiserver memory usage](kube-apiserver-memory_usage.png "[kube-apiserver memory usage")
+
+The graph shows the memory usage of a kube-apiserver during a synthetic test (see the synthetic test section for more details). 


This is the web; we like hyperlinks.

Suggested change

The graph shows the memory usage of a kube-apiserver during a synthetic test (see the synthetic test section for more details).

The graph shows the memory usage of a kube-apiserver during a synthetic test

(see the [synthetic test](#the-synthetic-test) section for more details).

p0lyn0mial

Okay, I think I've captured the latest set of comments, PTAL.

p0lyn0mial · 2024-12-02T09:48:01Z

content/en/blog/_posts/2024-12-11-api-streaming/index.md

+This situation poses a genuine risk, potentially overwhelming and crashing any kube-apiserver within seconds due to out-of-memory (OOM) conditions. To better visualize the issue, let's consider the below graph.
+
+
+{{< figure src="kube-apiserver-memory_usage.png" alt="Monitoring graph showing kube-apiserver memory usage" >}}


ok, make sure the image is rendered after the change

sttts · 2024-12-02T14:01:25Z

content/en/blog/_posts/2024-12-11-api-streaming/index.md

+Our investigation revealed that this substantial memory allocation occurs because the server before sending the first byte to the client must:
+* fetch data from the database,
+* deserialize the data from its stored format,
+* and finally construct the final response by converting and serializing the data into a client requested format 


Suggested change

* and finally construct the final response by converting and serializing the data into a client requested format

* and finally construct the final response by converting and serializing the data into a client requested format.

sttts · 2024-12-02T14:01:28Z

content/en/blog/_posts/2024-12-11-api-streaming/index.md

+
+## Why does kube-apiserver allocate so much memory for list requests?
+
+Our investigation revealed that this substantial memory allocation occurs because the server before sending the first byte to the client must:


Suggested change

Our investigation revealed that this substantial memory allocation occurs because the server before sending the first byte to the client must:

Our investigation revealed that this substantial memory allocation occurs because the server – before sending the first byte to the client – must:

? Am no native speaker. Delegating to @sftim.

sure, this makes sense to add (could come in a follow-up PR)

sttts · 2024-12-02T14:07:31Z

content/en/blog/_posts/2024-12-11-api-streaming/index.md

+For details on enabling that feature, read [Introducing Feature Gates to Client-Go: Enhancing Flexibility and Control](/blog/2024/08/12/feature-gates-in-client-go).
+
+## What's next?
+In Kubernetes 1.32, the feature is enabled in kube-controller-manager by default despite its beta state. 


Suggested change

In Kubernetes 1.32, the feature is enabled in kube-controller-manager by default despite its beta state.

In Kubernetes 1.32, the feature is enabled in kube-controller-manager by default despite its beta state.

sftim · 2024-12-02T18:12:51Z

Looks good to merge as a draft. We can take fixup PRs ahead of publication.

/lgtm
/approve

k8s-ci-robot · 2024-12-02T18:12:59Z

LGTM label has been added.

Git tree hash: 39eb8e33bc7f825efb390b5a73d28ad291a1c655

k8s-ci-robot · 2024-12-02T18:13:07Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: sftim

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~content/en/blog/OWNERS~~ [sftim]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. area/blog Issues or PRs related to the Kubernetes Blog subproject labels Oct 23, 2024

k8s-ci-robot requested review from mrbobbytables and nate-double-u October 23, 2024 11:27

k8s-ci-robot added language/en Issues or PRs related to English language size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Oct 23, 2024

p0lyn0mial mentioned this pull request Oct 23, 2024

Allow informers for getting a stream of data instead of chunking kubernetes/enhancements#3157

Open

25 tasks

p0lyn0mial force-pushed the blog-watchlist-1-32 branch from 5350e56 to 5f68e84 Compare November 21, 2024 12:31

k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Nov 21, 2024

p0lyn0mial changed the title ~~[WIP] Blog post: API Streaming~~ [WIP] Blog post: Enhancing Kubernetes API Server Efficiency with API Streaming Nov 21, 2024

p0lyn0mial force-pushed the blog-watchlist-1-32 branch from 5f68e84 to a103da3 Compare November 21, 2024 13:16

p0lyn0mial changed the title ~~[WIP] Blog post: Enhancing Kubernetes API Server Efficiency with API Streaming~~ Blog post: Enhancing Kubernetes API Server Efficiency with API Streaming Nov 21, 2024

p0lyn0mial marked this pull request as ready for review November 21, 2024 13:17

k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 21, 2024

k8s-ci-robot requested a review from sftim November 21, 2024 13:17

k8s-ci-robot requested review from sttts and wojtek-t November 21, 2024 13:17

p0lyn0mial commented Nov 21, 2024

View reviewed changes

k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 21, 2024

p0lyn0mial force-pushed the blog-watchlist-1-32 branch from a103da3 to 3b9d849 Compare November 22, 2024 07:56

k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 22, 2024

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 25, 2024

k8s-ci-robot assigned mbianchidev Nov 25, 2024

sftim reviewed Nov 25, 2024

View reviewed changes

p0lyn0mial force-pushed the blog-watchlist-1-32 branch from 84bd4f2 to 98217e2 Compare November 27, 2024 09:08

k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 27, 2024

k8s-ci-robot requested a review from mbianchidev November 27, 2024 09:08

p0lyn0mial force-pushed the blog-watchlist-1-32 branch from 98217e2 to a944ee0 Compare November 27, 2024 09:27

sftim reviewed Dec 1, 2024

View reviewed changes

k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 1, 2024

sftim reviewed Dec 1, 2024

View reviewed changes

blog: Enhancing Kubernetes API Server Efficiency with API Streaming

d483496

p0lyn0mial force-pushed the blog-watchlist-1-32 branch from a944ee0 to d483496 Compare December 2, 2024 09:45

p0lyn0mial commented Dec 2, 2024

View reviewed changes

sttts reviewed Dec 2, 2024

View reviewed changes

k8s-ci-robot assigned sftim Dec 2, 2024

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 2, 2024

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 2, 2024

k8s-ci-robot merged commit 4d29764 into kubernetes:main Dec 2, 2024
6 checks passed

-This is necessary to allow enough parallelism for the average case where LISTs are cheap enough.
-But it does not match the spiky exceptional situation of many and large objects.
-When WatchList is used by the majority of the ecosystem, the LIST cost estimation can be changed to larger values without risking degraded performance in the average case,
-and with that increasing the protection against this kind of requests that can still hit the apiserver in the future.
+API Priority and Fairness does not consider the size of collections when setting priorities or rate limits;
+these controls happen before the size of the response is known.
+The current small cost assigned for **list** requests allows enough parallelism for the average case
+where **list** requests are cheap enough, nut it does not match the spiky exceptional situation of many
+and / or large objects.
+Once the majority of the Kubernetes ecosystem has switched to watch lists, the **list** cost estimation can be changed to larger values without risking degraded performance in the average case,
+and with that increasing the protection against this kind of requests that can still hit the API server in the future.

		In order to reproduce the issue, we conducted a manual test to understand the impact of LIST request on kube-apiserver memory usage.
		In the test, we created 400 secrets, each containing 1 MB of data, and used informers to retrieve all secrets.

		This situation poses a genuine risk, potentially overwhelming and crashing any kube-apiserver within seconds due to out-of-memory (OOM) conditions. To better visualize the issue, let's consider the below graph.


		![kube-apiserver memory usage](kube-apiserver-memory_usage.png "[kube-apiserver memory usage")

	![kube-apiserver memory usage](kube-apiserver-memory_usage.png "[kube-apiserver memory usage")
	{{< figure src="kube-apiserver-memory_usage.png" alt="Monitoring graph showing kube-apiserver memory usage" >}}


		![kube-apiserver memory usage](kube-apiserver-memory_usage.png "[kube-apiserver memory usage")

		The graph shows the memory usage of a kube-apiserver during a synthetic test (see the synthetic test section for more details).

	The graph shows the memory usage of a kube-apiserver during a synthetic test (see the synthetic test section for more details).
	The graph shows the memory usage of a kube-apiserver during a synthetic test
	(see the [synthetic test](#the-synthetic-test) section for more details).

	* and finally construct the final response by converting and serializing the data into a client requested format
	* and finally construct the final response by converting and serializing the data into a client requested format.


		## Why does kube-apiserver allocate so much memory for list requests?

		Our investigation revealed that this substantial memory allocation occurs because the server before sending the first byte to the client must:

	In Kubernetes 1.32, the feature is enabled in kube-controller-manager by default despite its beta state.

	In Kubernetes 1.32, the feature is enabled in kube-controller-manager by default despite its beta state.

Blog post: Enhancing Kubernetes API Server Efficiency with API Streaming #48513

Blog post: Enhancing Kubernetes API Server Efficiency with API Streaming #48513

Conversation

p0lyn0mial commented Oct 23, 2024 • edited Loading

netlify bot commented Oct 23, 2024 • edited Loading

✅ Pull request preview available for checking

p0lyn0mial commented Nov 21, 2024

mbianchidev commented Nov 21, 2024

p0lyn0mial commented Nov 21, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

p0lyn0mial commented Nov 21, 2024

p0lyn0mial commented Nov 22, 2024

sttts commented Nov 22, 2024

k8s-ci-robot commented Nov 25, 2024

mbianchidev commented Nov 25, 2024

sftim left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sftim Nov 26, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

p0lyn0mial commented Nov 28, 2024

This comment was marked as outdated.

Choose a reason for hiding this comment

sftim commented Dec 1, 2024

sftim commented Dec 1, 2024

sftim commented Dec 1, 2024

mbianchidev commented Dec 1, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

p0lyn0mial left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sftim commented Dec 2, 2024

k8s-ci-robot commented Dec 2, 2024

k8s-ci-robot commented Dec 2, 2024

p0lyn0mial commented Oct 23, 2024 •

edited

Loading

netlify bot commented Oct 23, 2024 •

edited

Loading

sftim Nov 26, 2024 •

edited

Loading