-
Notifications
You must be signed in to change notification settings - Fork 14.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Blog post: Enhancing Kubernetes API Server Efficiency with API Streaming #48513
Conversation
✅ Pull request preview available for checkingBuilt without sensitive environment variables
To edit notification comments on pull requests, go to your Netlify site configuration. |
5350e56
to
5f68e84
Compare
5f68e84
to
a103da3
Compare
Hey hey, Is this piece ready for review? Let us know if you need something from us, otherwise we will just mark it as ready for review and have sig-docs-blogs review it for content and the assigned SIG to chime in for a tech review. Thank you! |
Hey, hey @mbianchidev, yes, the PR is ready for review. |
In the current implementation, kube-apiserver processes LIST requests by assembling the entire response in-memory before transmitting any data to the client. | ||
But what if the response body is substantial, say hundreds of megabytes? Additionally, imagine a scenario where multiple LIST requests flood in simultaneously, perhaps after a brief network outage. | ||
While [API Priority and Fairness](https://kubernetes.io/docs/concepts/cluster-administration/flow-control) has proven to reasonably protect kube-apiserver from CPU overload, its impact is visibly smaller for memory protection. | ||
This can be explained by the different nature of resource consumption by a single API request - the amount of cpu used by a single request is capped by a constant, whereas the memory is proportional to the number of processed objects, which is unbounded. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we might need to change this sentence a bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
/hold for #48513 (comment) |
a103da3
to
3b9d849
Compare
/hold cancel |
/lgtm |
LGTM label has been added. Git tree hash: fa520c6d21b777eed2d9a7dc34afb423ee3d8c66
|
/lgtm I'll seek SIG-Docs approval. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I recommend if we can making a bunch of changes to align with the style guide.
Any feedback that ends with a ? is of course a question, and not a demand.
content/en/blog/_posts/2024-11-21-API-Streaming/2024-11-21-API-Streaming.md
Outdated
Show resolved
Hide resolved
content/en/blog/_posts/2024-11-21-API-Streaming/2024-11-21-API-Streaming.md
Outdated
Show resolved
Hide resolved
content/en/blog/_posts/2024-11-21-API-Streaming/2024-11-21-API-Streaming.md
Outdated
Show resolved
Hide resolved
content/en/blog/_posts/2024-11-21-API-Streaming/2024-11-21-API-Streaming.md
Outdated
Show resolved
Hide resolved
content/en/blog/_posts/2024-11-21-API-Streaming/2024-11-21-API-Streaming.md
Outdated
Show resolved
Hide resolved
content/en/blog/_posts/2024-11-21-API-Streaming/2024-11-21-API-Streaming.md
Outdated
Show resolved
Hide resolved
This is necessary to allow enough parallelism for the average case where LISTs are cheap enough. | ||
But it does not match the spiky exceptional situation of many and large objects. | ||
When WatchList is used by the majority of the ecosystem, the LIST cost estimation can be changed to larger values without risking degraded performance in the average case, | ||
and with that increasing the protection against this kind of requests that can still hit the apiserver in the future. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you agree with:
This is necessary to allow enough parallelism for the average case where LISTs are cheap enough. | |
But it does not match the spiky exceptional situation of many and large objects. | |
When WatchList is used by the majority of the ecosystem, the LIST cost estimation can be changed to larger values without risking degraded performance in the average case, | |
and with that increasing the protection against this kind of requests that can still hit the apiserver in the future. | |
API Priority and Fairness does not consider the size of collections when setting priorities or rate limits; | |
these controls happen before the size of the response is known. | |
The current small cost assigned for **list** requests allows enough parallelism for the average case | |
where **list** requests are cheap enough, nut it does not match the spiky exceptional situation of many | |
and / or large objects. | |
Once the majority of the Kubernetes ecosystem has switched to watch lists, the **list** cost estimation can be changed to larger values without risking degraded performance in the average case, | |
and with that increasing the protection against this kind of requests that can still hit the API server in the future. |
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
API Priority and Fairness does not consider the size of collections when setting priorities or rate limits;
No - this isn't true. APF does consider size of collection, because we take number of object (per namespace or globally) of a given type when estimating cost of the request.
The current small cost assigned for list requests allows enough parallelism for the average case
That's also not true - the cost pretty well reflects the CPU cost. It doesn't reflect cost of RAM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. @p0lyn0mial please fix the style issues, and if you want to explain the CPU vs. RAM detail, you're welcome to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Revised suggestion
This is necessary to allow enough parallelism for the average case where LISTs are cheap enough. | |
But it does not match the spiky exceptional situation of many and large objects. | |
When WatchList is used by the majority of the ecosystem, the LIST cost estimation can be changed to larger values without risking degraded performance in the average case, | |
and with that increasing the protection against this kind of requests that can still hit the apiserver in the future. | |
API Priority and Fairness does not consider the size of objects in collections when setting priorities | |
or rate limits; these controls happen before the size of the response is known. | |
Once the majority of the Kubernetes ecosystem has switched to watch lists, the **list** cost estimation can be changed a larger multiplier without risking degraded performance in the average case, | |
and with that increasing the protection against this kind of requests that can still hit the API server in the future. |
?
content/en/blog/_posts/2024-11-21-API-Streaming/2024-11-21-API-Streaming.md
Outdated
Show resolved
Hide resolved
In order to reproduce the issue, we conducted a manual test to understand the impact of LIST request on kube-apiserver memory usage. | ||
In the test, we created 400 secrets, each containing 1 MB of data, and used informers to retrieve all secrets. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In order to reproduce the issue, we conducted a manual test to understand the impact of LIST request on kube-apiserver memory usage. | |
In the test, we created 400 secrets, each containing 1 MB of data, and used informers to retrieve all secrets. | |
In order to reproduce the issue, we conducted a manual test to understand the impact of **list** requests on kube-apiserver memory usage. | |
In the test, we created 400 Secrets, each containing 1 MB of data, and used informers to retrieve all Secrets. |
content/en/blog/_posts/2024-11-21-API-Streaming/2024-11-21-API-Streaming.md
Outdated
Show resolved
Hide resolved
84bd4f2
to
98217e2
Compare
98217e2
to
a944ee0
Compare
@sftim fixed the style issues, ptal. |
--- | ||
layout: blog | ||
title: 'Enhancing Kubernetes API Server Efficiency with API Streaming' | ||
date: 2024-11-21 |
This comment was marked as outdated.
This comment was marked as outdated.
Sorry, something went wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
date: 2024-11-21 | |
date: 2024-12-11 |
Remember to change the filename as well (the main article should be content/en/blog/_posts/2024-12-11-api-streaming/index.md
; note that this change has TWO fixes as the filename we usually use for this case is index.md
)
Oh, hold on. Is this a post-release blog article? If so then we'll need to assign a date after the release. That's pencilled in for the 17th, but actually you should do something slightly different. See #48513 (comment) for the fuller detail but broadly:
|
@mbianchidev the date in #48513 (comment) wasn't what I'd expect (I'd expected to see the v1.32 release date there), but otherwise I agree with that. Maybe we could use a label for PRs that are part of the release blog. Something for a retro maybe. @p0lyn0mial I apologise for the noise. This may well be good to merge; let me do a final check. |
/hold cancel |
@sftim Agreed on having a label like release-feature-blog - and a release-blog one for the mid-cycle and final release tbh IMHO there's no need for the author to set the Kubernetes release as date, having I communicated the assigned publication date just to let the author know when to expect the blog to be published - maybe they wanna share it with their network or someone interested. Operationally all the feature blogs are tracked on the board with an assigned date and that already makes it easy for Comms to open that separate PR to set all the current dates at once 👀 |
--- | ||
layout: blog | ||
title: 'Enhancing Kubernetes API Server Efficiency with API Streaming' | ||
date: 2024-11-21 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
date: 2024-11-21 | |
date: 2024-12-11 |
Remember to change the filename as well (the main article should be content/en/blog/_posts/2024-12-11-api-streaming/index.md
; note that this change has TWO fixes as the filename we usually use for this case is index.md
)
This situation poses a genuine risk, potentially overwhelming and crashing any kube-apiserver within seconds due to out-of-memory (OOM) conditions. To better visualize the issue, let's consider the below graph. | ||
|
||
|
||
![kube-apiserver memory usage](kube-apiserver-memory_usage.png "[kube-apiserver memory usage") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't render. Try:
![kube-apiserver memory usage](kube-apiserver-memory_usage.png "[kube-apiserver memory usage") | |
{{< figure src="kube-apiserver-memory_usage.png" alt="Monitoring graph showing kube-apiserver memory usage" >}} |
## Enabling API Streaming for your component | ||
|
||
Upgrade to Kubernetes 1.32. Make sure your cluster uses etcd in version 3.4.31+ or 3.5.13+. | ||
Enable `WatchListClient` for client-go. For details on enabling the feature gate in client-go, read [Introducing Feature Gates to Client-Go: Enhancing Flexibility and Control](/blog/2024/08/12/feature-gates-in-client-go). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Enable `WatchListClient` for client-go. For details on enabling the feature gate in client-go, read [Introducing Feature Gates to Client-Go: Enhancing Flexibility and Control](/blog/2024/08/12/feature-gates-in-client-go). | |
Change your client software to use watch lists. If your client code is written in Golang, you'll want to | |
enable the `WatchListClient` for client-go. For details on enabling that feature gate. | |
read [Introducing Feature Gates to Client-Go: Enhancing Flexibility and Control](/blog/2024/08/12/feature-gates-in-client-go). |
We should avoid implying that Kubernetes clients are always written in Go.
Upgrade to Kubernetes 1.32. Make sure your cluster uses etcd in version 3.4.31+ or 3.5.13+. | ||
Enable `WatchListClient` for client-go. For details on enabling the feature gate in client-go, read [Introducing Feature Gates to Client-Go: Enhancing Flexibility and Control](/blog/2024/08/12/feature-gates-in-client-go). | ||
|
||
## What's Next? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(nit)
## What's Next? | |
## What's next? | |
|
||
![kube-apiserver memory usage](kube-apiserver-memory_usage.png "[kube-apiserver memory usage") | ||
|
||
The graph shows the memory usage of a kube-apiserver during a synthetic test (see the synthetic test section for more details). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the web; we like hyperlinks.
The graph shows the memory usage of a kube-apiserver during a synthetic test (see the synthetic test section for more details). | |
The graph shows the memory usage of a kube-apiserver during a synthetic test | |
(see the [synthetic test](#the-synthetic-test) section for more details). |
a944ee0
to
d483496
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I think I've captured the latest set of comments, PTAL.
This situation poses a genuine risk, potentially overwhelming and crashing any kube-apiserver within seconds due to out-of-memory (OOM) conditions. To better visualize the issue, let's consider the below graph. | ||
|
||
|
||
{{< figure src="kube-apiserver-memory_usage.png" alt="Monitoring graph showing kube-apiserver memory usage" >}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, make sure the image is rendered after the change
Our investigation revealed that this substantial memory allocation occurs because the server before sending the first byte to the client must: | ||
* fetch data from the database, | ||
* deserialize the data from its stored format, | ||
* and finally construct the final response by converting and serializing the data into a client requested format |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* and finally construct the final response by converting and serializing the data into a client requested format | |
* and finally construct the final response by converting and serializing the data into a client requested format. |
|
||
## Why does kube-apiserver allocate so much memory for list requests? | ||
|
||
Our investigation revealed that this substantial memory allocation occurs because the server before sending the first byte to the client must: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our investigation revealed that this substantial memory allocation occurs because the server before sending the first byte to the client must: | |
Our investigation revealed that this substantial memory allocation occurs because the server – before sending the first byte to the client – must: |
? Am no native speaker. Delegating to @sftim.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, this makes sense to add (could come in a follow-up PR)
For details on enabling that feature, read [Introducing Feature Gates to Client-Go: Enhancing Flexibility and Control](/blog/2024/08/12/feature-gates-in-client-go). | ||
|
||
## What's next? | ||
In Kubernetes 1.32, the feature is enabled in kube-controller-manager by default despite its beta state. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Kubernetes 1.32, the feature is enabled in kube-controller-manager by default despite its beta state. | |
In Kubernetes 1.32, the feature is enabled in kube-controller-manager by default despite its beta state. |
Looks good to merge as a draft. We can take fixup PRs ahead of publication. /lgtm |
LGTM label has been added. Git tree hash: 39eb8e33bc7f825efb390b5a73d28ad291a1c655
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: sftim The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
A blog post about the API Streaming feature tracked by kubernetes/enhancements#3157