This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
Strimzi makes repeatable LIST requests causing instsability of Kubernetes Control Plane #7793
Labels
You can continue the conversation there. Go to discussion →
Describe the bug
strimzi-cluster-operator/0.31.1 does not use Kubernetes API Server cache for listing resources. All list calls go directly to Etcd, which puts significant load on Etcd causing Kubernetes Control Plane instability.
Example logs from Kubernetes API Server:
Similarly, Strimzi is also listing configmaps/pods/persistentvolumeclaims/services/...
Short term mitigation:
For each LIST/GET request set resourceVersion=0 to use Kubernetes API Server cache. This will allow requests to be served from Kubernetes API Server cache without interaction with Etcd.
Long term solution:
Migrate to use List and Watch pattern.
Relevant documentation: https://kubernetes.io/docs/reference/using-api/api-concepts/#efficient-detection-of-changes
https://cloud.google.com/kubernetes-engine/docs/concepts/planning-scalability#use_list_and_watch_pattern_instead_of_periodic_listing
Expected behavior
By default, strimzi should use Kubernetes API Server cache and ideally List and Watch pattern instead of repeatable LIST calls.
Environment (please complete the following information):
Additional context
Related issue that I've opened in Fabric8: fabric8io/kubernetes-client#4670
The text was updated successfully, but these errors were encountered: