Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Blog post: Enhancing Kubernetes API Server Efficiency with API Streaming #48513

Merged
merged 1 commit into from
Dec 2, 2024

Conversation

p0lyn0mial
Copy link
Contributor

@p0lyn0mial p0lyn0mial commented Oct 23, 2024

A blog post about the API Streaming feature tracked by kubernetes/enhancements#3157

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. area/blog Issues or PRs related to the Kubernetes Blog subproject labels Oct 23, 2024
@k8s-ci-robot k8s-ci-robot added language/en Issues or PRs related to English language size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Oct 23, 2024
Copy link

netlify bot commented Oct 23, 2024

Pull request preview available for checking

Built without sensitive environment variables

Name Link
🔨 Latest commit d483496
🔍 Latest deploy log https://app.netlify.com/sites/kubernetes-io-main-staging/deploys/674d81a7de42290008d02e63
😎 Deploy Preview https://deploy-preview-48513--kubernetes-io-main-staging.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Nov 21, 2024
@p0lyn0mial p0lyn0mial changed the title [WIP] Blog post: API Streaming [WIP] Blog post: Enhancing Kubernetes API Server Efficiency with API Streaming Nov 21, 2024
@p0lyn0mial p0lyn0mial changed the title [WIP] Blog post: Enhancing Kubernetes API Server Efficiency with API Streaming Blog post: Enhancing Kubernetes API Server Efficiency with API Streaming Nov 21, 2024
@p0lyn0mial p0lyn0mial marked this pull request as ready for review November 21, 2024 13:17
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 21, 2024
@k8s-ci-robot k8s-ci-robot requested a review from sftim November 21, 2024 13:17
@p0lyn0mial
Copy link
Contributor Author

/cc @sttts @wojtek-t

@mbianchidev
Copy link
Member

Hey hey,
it's Matteo here, Comms Lead for 1.32

Is this piece ready for review?
Reminder that the review deadline is on the 25th of November.

Let us know if you need something from us, otherwise we will just mark it as ready for review and have sig-docs-blogs review it for content and the assigned SIG to chime in for a tech review.

Thank you!

@p0lyn0mial
Copy link
Contributor Author

Hey, hey @mbianchidev, yes, the PR is ready for review.

In the current implementation, kube-apiserver processes LIST requests by assembling the entire response in-memory before transmitting any data to the client.
But what if the response body is substantial, say hundreds of megabytes? Additionally, imagine a scenario where multiple LIST requests flood in simultaneously, perhaps after a brief network outage.
While [API Priority and Fairness](https://kubernetes.io/docs/concepts/cluster-administration/flow-control) has proven to reasonably protect kube-apiserver from CPU overload, its impact is visibly smaller for memory protection.
This can be explained by the different nature of resource consumption by a single API request - the amount of cpu used by a single request is capped by a constant, whereas the memory is proportional to the number of processed objects, which is unbounded.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we might need to change this sentence a bit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@p0lyn0mial
Copy link
Contributor Author

/hold for #48513 (comment)

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 21, 2024
@p0lyn0mial
Copy link
Contributor Author

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 22, 2024
@sttts
Copy link
Contributor

sttts commented Nov 22, 2024

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 25, 2024
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: fa520c6d21b777eed2d9a7dc34afb423ee3d8c66

@mbianchidev
Copy link
Member

/lgtm

I'll seek SIG-Docs approval.
The current date assigned still works, awesome work folks!

Copy link
Contributor

@sftim sftim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I recommend if we can making a bunch of changes to align with the style guide.

Any feedback that ends with a ? is of course a question, and not a demand.

Comment on lines 78 to 81
This is necessary to allow enough parallelism for the average case where LISTs are cheap enough.
But it does not match the spiky exceptional situation of many and large objects.
When WatchList is used by the majority of the ecosystem, the LIST cost estimation can be changed to larger values without risking degraded performance in the average case,
and with that increasing the protection against this kind of requests that can still hit the apiserver in the future.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you agree with:

Suggested change
This is necessary to allow enough parallelism for the average case where LISTs are cheap enough.
But it does not match the spiky exceptional situation of many and large objects.
When WatchList is used by the majority of the ecosystem, the LIST cost estimation can be changed to larger values without risking degraded performance in the average case,
and with that increasing the protection against this kind of requests that can still hit the apiserver in the future.
API Priority and Fairness does not consider the size of collections when setting priorities or rate limits;
these controls happen before the size of the response is known.
The current small cost assigned for **list** requests allows enough parallelism for the average case
where **list** requests are cheap enough, nut it does not match the spiky exceptional situation of many
and / or large objects.
Once the majority of the Kubernetes ecosystem has switched to watch lists, the **list** cost estimation can be changed to larger values without risking degraded performance in the average case,
and with that increasing the protection against this kind of requests that can still hit the API server in the future.

?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

API Priority and Fairness does not consider the size of collections when setting priorities or rate limits;

No - this isn't true. APF does consider size of collection, because we take number of object (per namespace or globally) of a given type when estimating cost of the request.

The current small cost assigned for list requests allows enough parallelism for the average case

That's also not true - the cost pretty well reflects the CPU cost. It doesn't reflect cost of RAM.

Copy link
Contributor

@sftim sftim Nov 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. @p0lyn0mial please fix the style issues, and if you want to explain the CPU vs. RAM detail, you're welcome to.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revised suggestion

Suggested change
This is necessary to allow enough parallelism for the average case where LISTs are cheap enough.
But it does not match the spiky exceptional situation of many and large objects.
When WatchList is used by the majority of the ecosystem, the LIST cost estimation can be changed to larger values without risking degraded performance in the average case,
and with that increasing the protection against this kind of requests that can still hit the apiserver in the future.
API Priority and Fairness does not consider the size of objects in collections when setting priorities
or rate limits; these controls happen before the size of the response is known.
Once the majority of the Kubernetes ecosystem has switched to watch lists, the **list** cost estimation can be changed a larger multiplier without risking degraded performance in the average case,
and with that increasing the protection against this kind of requests that can still hit the API server in the future.

?

Comment on lines 86 to 87
In order to reproduce the issue, we conducted a manual test to understand the impact of LIST request on kube-apiserver memory usage.
In the test, we created 400 secrets, each containing 1 MB of data, and used informers to retrieve all secrets.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
In order to reproduce the issue, we conducted a manual test to understand the impact of LIST request on kube-apiserver memory usage.
In the test, we created 400 secrets, each containing 1 MB of data, and used informers to retrieve all secrets.
In order to reproduce the issue, we conducted a manual test to understand the impact of **list** requests on kube-apiserver memory usage.
In the test, we created 400 Secrets, each containing 1 MB of data, and used informers to retrieve all Secrets.

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 27, 2024
@p0lyn0mial
Copy link
Contributor Author

@sftim fixed the style issues, ptal.

---
layout: blog
title: 'Enhancing Kubernetes API Server Efficiency with API Streaming'
date: 2024-11-21

This comment was marked as outdated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
date: 2024-11-21
date: 2024-12-11

Remember to change the filename as well (the main article should be content/en/blog/_posts/2024-12-11-api-streaming/index.md; note that this change has TWO fixes as the filename we usually use for this case is index.md)

@sftim
Copy link
Contributor

sftim commented Dec 1, 2024

Oh, hold on. Is this a post-release blog article? If so then we'll need to assign a date after the release. That's pencilled in for the 17th, but actually you should do something slightly different.

See #48513 (comment) for the fuller detail but broadly:

  • you set the publication date to the K8s v1.32 release date
  • you set draft: true in the front matter

@sftim
Copy link
Contributor

sftim commented Dec 1, 2024

@mbianchidev the date in #48513 (comment) wasn't what I'd expect (I'd expected to see the v1.32 release date there), but otherwise I agree with that.

Maybe we could use a label for PRs that are part of the release blog. Something for a retro maybe.

@p0lyn0mial I apologise for the noise. This may well be good to merge; let me do a final check.

@sftim
Copy link
Contributor

sftim commented Dec 1, 2024

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 1, 2024
@mbianchidev
Copy link
Member

@sftim Agreed on having a label like release-feature-blog - and a release-blog one for the mid-cycle and final release tbh

IMHO there's no need for the author to set the Kubernetes release as date, having draft: true in there is enough.

I communicated the assigned publication date just to let the author know when to expect the blog to be published - maybe they wanna share it with their network or someone interested.

Operationally all the feature blogs are tracked on the board with an assigned date and that already makes it easy for Comms to open that separate PR to set all the current dates at once 👀

---
layout: blog
title: 'Enhancing Kubernetes API Server Efficiency with API Streaming'
date: 2024-11-21
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
date: 2024-11-21
date: 2024-12-11

Remember to change the filename as well (the main article should be content/en/blog/_posts/2024-12-11-api-streaming/index.md; note that this change has TWO fixes as the filename we usually use for this case is index.md)

This situation poses a genuine risk, potentially overwhelming and crashing any kube-apiserver within seconds due to out-of-memory (OOM) conditions. To better visualize the issue, let's consider the below graph.


![kube-apiserver memory usage](kube-apiserver-memory_usage.png "[kube-apiserver memory usage")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't render. Try:

Suggested change
![kube-apiserver memory usage](kube-apiserver-memory_usage.png "[kube-apiserver memory usage")
{{< figure src="kube-apiserver-memory_usage.png" alt="Monitoring graph showing kube-apiserver memory usage" >}}

## Enabling API Streaming for your component

Upgrade to Kubernetes 1.32. Make sure your cluster uses etcd in version 3.4.31+ or 3.5.13+.
Enable `WatchListClient` for client-go. For details on enabling the feature gate in client-go, read [Introducing Feature Gates to Client-Go: Enhancing Flexibility and Control](/blog/2024/08/12/feature-gates-in-client-go).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Enable `WatchListClient` for client-go. For details on enabling the feature gate in client-go, read [Introducing Feature Gates to Client-Go: Enhancing Flexibility and Control](/blog/2024/08/12/feature-gates-in-client-go).
Change your client software to use watch lists. If your client code is written in Golang, you'll want to
enable the `WatchListClient` for client-go. For details on enabling that feature gate.
read [Introducing Feature Gates to Client-Go: Enhancing Flexibility and Control](/blog/2024/08/12/feature-gates-in-client-go).

We should avoid implying that Kubernetes clients are always written in Go.

Upgrade to Kubernetes 1.32. Make sure your cluster uses etcd in version 3.4.31+ or 3.5.13+.
Enable `WatchListClient` for client-go. For details on enabling the feature gate in client-go, read [Introducing Feature Gates to Client-Go: Enhancing Flexibility and Control](/blog/2024/08/12/feature-gates-in-client-go).

## What's Next?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit)

Suggested change
## What's Next?
## What's next?


![kube-apiserver memory usage](kube-apiserver-memory_usage.png "[kube-apiserver memory usage")

The graph shows the memory usage of a kube-apiserver during a synthetic test (see the synthetic test section for more details).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the web; we like hyperlinks.

Suggested change
The graph shows the memory usage of a kube-apiserver during a synthetic test (see the synthetic test section for more details).
The graph shows the memory usage of a kube-apiserver during a synthetic test
(see the [synthetic test](#the-synthetic-test) section for more details).

Copy link
Contributor Author

@p0lyn0mial p0lyn0mial left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I think I've captured the latest set of comments, PTAL.

This situation poses a genuine risk, potentially overwhelming and crashing any kube-apiserver within seconds due to out-of-memory (OOM) conditions. To better visualize the issue, let's consider the below graph.


{{< figure src="kube-apiserver-memory_usage.png" alt="Monitoring graph showing kube-apiserver memory usage" >}}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, make sure the image is rendered after the change

Our investigation revealed that this substantial memory allocation occurs because the server before sending the first byte to the client must:
* fetch data from the database,
* deserialize the data from its stored format,
* and finally construct the final response by converting and serializing the data into a client requested format
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* and finally construct the final response by converting and serializing the data into a client requested format
* and finally construct the final response by converting and serializing the data into a client requested format.


## Why does kube-apiserver allocate so much memory for list requests?

Our investigation revealed that this substantial memory allocation occurs because the server before sending the first byte to the client must:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Our investigation revealed that this substantial memory allocation occurs because the server before sending the first byte to the client must:
Our investigation revealed that this substantial memory allocation occurs because the server before sending the first byte to the client must:

? Am no native speaker. Delegating to @sftim.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, this makes sense to add (could come in a follow-up PR)

For details on enabling that feature, read [Introducing Feature Gates to Client-Go: Enhancing Flexibility and Control](/blog/2024/08/12/feature-gates-in-client-go).

## What's next?
In Kubernetes 1.32, the feature is enabled in kube-controller-manager by default despite its beta state.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
In Kubernetes 1.32, the feature is enabled in kube-controller-manager by default despite its beta state.
In Kubernetes 1.32, the feature is enabled in kube-controller-manager by default despite its beta state.

@sftim
Copy link
Contributor

sftim commented Dec 2, 2024

Looks good to merge as a draft. We can take fixup PRs ahead of publication.

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 2, 2024
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 39eb8e33bc7f825efb390b5a73d28ad291a1c655

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: sftim

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 2, 2024
@k8s-ci-robot k8s-ci-robot merged commit 4d29764 into kubernetes:main Dec 2, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/blog Issues or PRs related to the Kubernetes Blog subproject cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. language/en Issues or PRs related to English language lgtm "Looks good to me", indicates that a PR is ready to be merged. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
Status: Published
Development

Successfully merging this pull request may close these issues.

6 participants