Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strimzi makes repeatable LIST requests causing instsability of Kubernetes Control Plane #7793

Closed
marseel opened this issue Dec 13, 2022 · 1 comment
Labels

Comments

@marseel
Copy link

marseel commented Dec 13, 2022

Describe the bug
strimzi-cluster-operator/0.31.1 does not use Kubernetes API Server cache for listing resources. All list calls go directly to Etcd, which puts significant load on Etcd causing Kubernetes Control Plane instability.

Example logs from Kubernetes API Server:

"HTTP" verb="LIST" URI="/api/v1/namespaces/<PII>/secrets?labelSelector=..." latency="1.492406837s" userAgent="strimzi-cluster-operator/0.31.1" <PII>  apf_pl="workload-low" apf_fs="service-accounts" resp=200

Similarly, Strimzi is also listing configmaps/pods/persistentvolumeclaims/services/...

Short term mitigation:
For each LIST/GET request set resourceVersion=0 to use Kubernetes API Server cache. This will allow requests to be served from Kubernetes API Server cache without interaction with Etcd.

Long term solution:
Migrate to use List and Watch pattern.
Relevant documentation: https://kubernetes.io/docs/reference/using-api/api-concepts/#efficient-detection-of-changes
https://cloud.google.com/kubernetes-engine/docs/concepts/planning-scalability#use_list_and_watch_pattern_instead_of_periodic_listing

Expected behavior
By default, strimzi should use Kubernetes API Server cache and ideally List and Watch pattern instead of repeatable LIST calls.

Environment (please complete the following information):

  • Strimzi version: 0.31.1
  • Installation method: N/A
  • Kubernetes cluster: 1.23
  • Infrastructure: GKE

Additional context
Related issue that I've opened in Fabric8: fabric8io/kubernetes-client#4670

@marseel marseel added the bug label Dec 13, 2022
@marseel marseel changed the title Strimzi makes repeatable causing instsability of Kubernetes Control Plane Strimzi makes repeatable LIST requests causing instsability of Kubernetes Control Plane Dec 13, 2022
@marseel
Copy link
Author

marseel commented Dec 13, 2022

More precisely, for example LIST requests for pods are in the format of:

/api/v1/namespaces/namespace-name/pods?labelSelector=strimzi.io/cluster=cluster-name,strimzi.io/name=some-name,strimzi.io/kind=Kafka

These type of requests make full LIST call to Etcd and then Kubernetes API Server makes filtering. I've observed 10 QPS, which is quite significant for such expensive calls (not including LISTs for other resources like secrets etc) especially when there are tens of thousands of pods in cluster.

@strimzi strimzi locked and limited conversation to collaborators Dec 13, 2022
@scholzj scholzj converted this issue into discussion #7794 Dec 13, 2022

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
Projects
None yet
Development

No branches or pull requests

1 participant