You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Prometheus is a widely adopted and still growing technology for cloud-native users to process system metrics, the growth in adoption of Prometheus makes it an essential cloud-native technology in order to meet users where they are.
The Prometheus Server integration has been promoted to GA
Diagnosis
There have been some reports of Kubernetes system components data scrapped by Prometheus causing problems due to a large number of documents being created when ingested in Elastic Search. This resource utilisation grew exponentially with the cluster until eventually customer phasing OOM.
The same issue OOM for Prometheus instances has been reported when using PromQL to aggregate Kubernetes data coming via Prometheus server coming into ES.
Problem definition
Customers using the Prometheus integration alongside Elastic Observability to monitor Kubernetes infrastructure need to understand any limitations they might encounter at scale for all three modes of ingesting Prometheus Server Kubernetes infrastructure data into Elastic Search.
Prometheus Server <> Elastic Search Ingestion modes
Prometheus Exporters (Collectors)
Prometheus Server Remote-Write
Prometheus Queries (PromQL)
Action
-We need to test the Scaling capabilities of Prometheus Integration in a real given size Kubernetes cluster and establish a framework for ingestion performance which includes a clear recommendation of limits.
Deliverables
For each of the three modes for getting Kubernetes system metrics via the Prometheus server integration to Elastic Search
We need to establish a benchmark such as given certain variables are fixed (determined by the test creator), is there an upper limit of pods where ingesting available Kubernetes system metrics creates a problem?
If the above answer cannot be provided in Kubernetes resources eg. pods, nodes, or containers, then the total number of metrics ingested or any other unit that can help establish a common benchmark users can relate to is suitable.
Tasks
Load Test Prometheus Exporters (Collectors)
Define Kubernetes, Prometheus and Prometheus Integration fixed configuration settings and scaling mechanism for this test (eg. we are scaling horizontally number of pods, nodes are constant)
Scale clusters to different sizes as defined above
Load Test Prometheus Server Remote-Write
Define Kubernetes, Prometheus and Prometheus Integration fixed configuration settings and scaling mechanism for this test (eg. we are scaling horizontally number of pods, nodes are constant)
Scale clusters to different sizes as defined above
Load Test Prometheus Queries (PromQL)
Define Specific PromQL queries that are going to be evaluated
Define Kubernetes, Prometheus and Prometheus Integration fixed configuration settings and scaling mechanism for this test (eg. we are scaling horizontally number of pods, nodes are constant)
Context
Prometheus is a widely adopted and still growing technology for cloud-native users to process system metrics, the growth in adoption of Prometheus makes it an essential cloud-native technology in order to meet users where they are.
The Prometheus Server integration has been promoted to GA
Diagnosis
There have been some reports of Kubernetes system components data scrapped by Prometheus causing problems due to a large number of documents being created when ingested in Elastic Search. This resource utilisation grew exponentially with the cluster until eventually customer phasing OOM.
The same issue OOM for Prometheus instances has been reported when using PromQL to aggregate Kubernetes data coming via Prometheus server coming into ES.
Problem definition
Customers using the Prometheus integration alongside Elastic Observability to monitor Kubernetes infrastructure need to understand any limitations they might encounter at scale for all three modes of ingesting Prometheus Server Kubernetes infrastructure data into Elastic Search.
Prometheus Server <> Elastic Search Ingestion modes
Action
-We need to test the Scaling capabilities of Prometheus Integration in a real given size Kubernetes cluster and establish a framework for ingestion performance which includes a clear recommendation of limits.
Deliverables
For each of the three modes for getting Kubernetes system metrics via the Prometheus server integration to Elastic Search
We need to establish a benchmark such as given certain variables are fixed (determined by the test creator), is there an upper limit of pods where ingesting available Kubernetes system metrics creates a problem?
If the above answer cannot be provided in Kubernetes resources eg. pods, nodes, or containers, then the total number of metrics ingested or any other unit that can help establish a common benchmark users can relate to is suitable.
Tasks
Load Test Prometheus Exporters (Collectors)
Load Test Prometheus Server Remote-Write
Load Test Prometheus Queries (PromQL)
The text was updated successfully, but these errors were encountered: