Establish Prometheus Server Integration 'at scale' ingestion limits #3857

gizas · 2022-07-27T12:44:17Z

Context

Prometheus is a widely adopted and still growing technology for cloud-native users to process system metrics, the growth in adoption of Prometheus makes it an essential cloud-native technology in order to meet users where they are.

The Prometheus Server integration has been promoted to GA

Diagnosis

There have been some reports of Kubernetes system components data scrapped by Prometheus causing problems due to a large number of documents being created when ingested in Elastic Search. This resource utilisation grew exponentially with the cluster until eventually customer phasing OOM.

The same issue OOM for Prometheus instances has been reported when using PromQL to aggregate Kubernetes data coming via Prometheus server coming into ES.

Problem definition

Customers using the Prometheus integration alongside Elastic Observability to monitor Kubernetes infrastructure need to understand any limitations they might encounter at scale for all three modes of ingesting Prometheus Server Kubernetes infrastructure data into Elastic Search.

Prometheus Server <> Elastic Search Ingestion modes

Prometheus Exporters (Collectors)
Prometheus Server Remote-Write
Prometheus Queries (PromQL)

Action

-We need to test the Scaling capabilities of Prometheus Integration in a real given size Kubernetes cluster and establish a framework for ingestion performance which includes a clear recommendation of limits.

Deliverables

For each of the three modes for getting Kubernetes system metrics via the Prometheus server integration to Elastic Search

We need to establish a benchmark such as given certain variables are fixed (determined by the test creator), is there an upper limit of pods where ingesting available Kubernetes system metrics creates a problem?

If the above answer cannot be provided in Kubernetes resources eg. pods, nodes, or containers, then the total number of metrics ingested or any other unit that can help establish a common benchmark users can relate to is suitable.

Tasks

Load Test Prometheus Exporters (Collectors)

Define Kubernetes, Prometheus and Prometheus Integration fixed configuration settings and scaling mechanism for this test (eg. we are scaling horizontally number of pods, nodes are constant)
Create a new K8s cluster (see TF here)
Scale clusters to different sizes as defined above

Load Test Prometheus Server Remote-Write

Define Kubernetes, Prometheus and Prometheus Integration fixed configuration settings and scaling mechanism for this test (eg. we are scaling horizontally number of pods, nodes are constant)
Create a new K8s cluster (see TF here)
Scale clusters to different sizes as defined above

Load Test Prometheus Queries (PromQL)

Define Specific PromQL queries that are going to be evaluated
Define Kubernetes, Prometheus and Prometheus Integration fixed configuration settings and scaling mechanism for this test (eg. we are scaling horizontally number of pods, nodes are constant)
Create a new K8s cluster (see TF here)
Scale clusters to different sizes as defined above

ChrsMark · 2022-11-17T08:37:28Z

@gizas for remote_write benchmarking I had played around with https://github.com/ChrsMark/remote-write-bench/tree/master in the past. Sharing it just in case it is useful here too.

gizas · 2022-11-28T16:18:17Z

https://github.com/elastic/observability-dev/issues/2465 closing as duplicate

gizas added the release-pending label Jul 27, 2022

gizas mentioned this issue Jul 27, 2022

[Epic] Make Prometheus integration GA #3687

Closed

tetianakravchenko assigned tetianakravchenko and unassigned tetianakravchenko Aug 23, 2022

mlunadia changed the title ~~Testing Prometheus Integration under Load~~ Establish Prometheus Integration at scale limits Nov 9, 2022

mlunadia changed the title ~~Establish Prometheus Integration at scale limits~~ Establish Prometheus Integration 'at scale' ingestion limits Nov 9, 2022

mlunadia changed the title ~~Establish Prometheus Integration 'at scale' ingestion limits~~ Establish Prometheus Server Integration 'at scale' ingestion limits Nov 9, 2022

gizas closed this as completed Nov 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Establish Prometheus Server Integration 'at scale' ingestion limits #3857

Establish Prometheus Server Integration 'at scale' ingestion limits #3857

gizas commented Jul 27, 2022 •

edited by mlunadia

Loading

ChrsMark commented Nov 17, 2022

gizas commented Nov 28, 2022

Establish Prometheus Server Integration 'at scale' ingestion limits #3857

Establish Prometheus Server Integration 'at scale' ingestion limits #3857

Comments

gizas commented Jul 27, 2022 • edited by mlunadia Loading

Context

Diagnosis

Problem definition

Action

Deliverables

Tasks

ChrsMark commented Nov 17, 2022

gizas commented Nov 28, 2022

gizas commented Jul 27, 2022 •

edited by mlunadia

Loading