Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Establish Prometheus Server Integration 'at scale' ingestion limits #3857

Closed
gizas opened this issue Jul 27, 2022 · 2 comments
Closed

Establish Prometheus Server Integration 'at scale' ingestion limits #3857

gizas opened this issue Jul 27, 2022 · 2 comments

Comments

@gizas
Copy link
Contributor

gizas commented Jul 27, 2022

Context

Prometheus is a widely adopted and still growing technology for cloud-native users to process system metrics, the growth in adoption of Prometheus makes it an essential cloud-native technology in order to meet users where they are.

The Prometheus Server integration has been promoted to GA
image

Diagnosis

There have been some reports of Kubernetes system components data scrapped by Prometheus causing problems due to a large number of documents being created when ingested in Elastic Search. This resource utilisation grew exponentially with the cluster until eventually customer phasing OOM.

The same issue OOM for Prometheus instances has been reported when using PromQL to aggregate Kubernetes data coming via Prometheus server coming into ES.

Problem definition

Customers using the Prometheus integration alongside Elastic Observability to monitor Kubernetes infrastructure need to understand any limitations they might encounter at scale for all three modes of ingesting Prometheus Server Kubernetes infrastructure data into Elastic Search.

Prometheus Server <> Elastic Search Ingestion modes

  • Prometheus Exporters (Collectors)
  • Prometheus Server Remote-Write
  • Prometheus Queries (PromQL)

Action

-We need to test the Scaling capabilities of Prometheus Integration in a real given size Kubernetes cluster and establish a framework for ingestion performance which includes a clear recommendation of limits.

Deliverables

For each of the three modes for getting Kubernetes system metrics via the Prometheus server integration to Elastic Search

We need to establish a benchmark such as given certain variables are fixed (determined by the test creator), is there an upper limit of pods where ingesting available Kubernetes system metrics creates a problem?

If the above answer cannot be provided in Kubernetes resources eg. pods, nodes, or containers, then the total number of metrics ingested or any other unit that can help establish a common benchmark users can relate to is suitable.

Tasks

Load Test Prometheus Exporters (Collectors)

  • Define Kubernetes, Prometheus and Prometheus Integration fixed configuration settings and scaling mechanism for this test (eg. we are scaling horizontally number of pods, nodes are constant)
  • Create a new K8s cluster (see TF here)
  • Scale clusters to different sizes as defined above

Load Test Prometheus Server Remote-Write

  • Define Kubernetes, Prometheus and Prometheus Integration fixed configuration settings and scaling mechanism for this test (eg. we are scaling horizontally number of pods, nodes are constant)
  • Create a new K8s cluster (see TF here)
  • Scale clusters to different sizes as defined above

Load Test Prometheus Queries (PromQL)

  • Define Specific PromQL queries that are going to be evaluated
  • Define Kubernetes, Prometheus and Prometheus Integration fixed configuration settings and scaling mechanism for this test (eg. we are scaling horizontally number of pods, nodes are constant)
  • Create a new K8s cluster (see TF here)
  • Scale clusters to different sizes as defined above
@mlunadia mlunadia changed the title Testing Prometheus Integration under Load Establish Prometheus Integration at scale limits Nov 9, 2022
@mlunadia mlunadia changed the title Establish Prometheus Integration at scale limits Establish Prometheus Integration 'at scale' ingestion limits Nov 9, 2022
@mlunadia mlunadia changed the title Establish Prometheus Integration 'at scale' ingestion limits Establish Prometheus Server Integration 'at scale' ingestion limits Nov 9, 2022
@ChrsMark
Copy link
Member

@gizas for remote_write benchmarking I had played around with https://github.com/ChrsMark/remote-write-bench/tree/master in the past. Sharing it just in case it is useful here too.

@gizas
Copy link
Contributor Author

gizas commented Nov 28, 2022

@gizas gizas closed this as completed Nov 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants