Support targets sharding #2590

d-ulyanov · 2019-05-08T10:55:10Z

Hi, colleagues!

What is missing?

At the moment there is no targets sharding support in the Prometheus Operator.
It would be great to add it.

Why do we need it?

Currently we have a thousands of targets in each Prometheus and it seems that we're coming to its performance limit on one node.

Possible solutions that I see are:

Prometheus per namespace
Use sharding

Both of solutions have their own advantages.

At the moment solution with sharding seems a bit better, that's why:

All Prometheus targets are our microservices and all of them following to the same observability standards, so we should to keep the same aggregation/alerting rules on all Prometheus instances. It means that its better to keep this Prometheuses in one logical group.
We have a lot of namespaces (currently ~100), some of namespaces are really much bigger than others, its normal and we don't want to customize resource limits for each namespace.
Prometheus Operator provide failover logic when one of shards is down: it could reconfigure Prometheus instances and rebalance targets by setting up modulus = Prom instances count.

My propose is to add sharding attribute to ServiceMonitor like shard_by: <label>. This label (or labels list will be used as source label for sharding with action: hashmod). Modulus could be configured automatically based on Prom instances count.

What do you think?

The text was updated successfully, but these errors were encountered:

brancz · 2019-05-08T18:27:35Z

If you need a solution quickly, you can already use additional relabeling rules on your ServiceMonitor via the hashmod action, and create multiple ServiceMonitors per "shard". Your use case makes a lot of sense, I'd like to think it through a little bit further, and arrive at a solution, that would allow us to eventually autoscale sharding based on the metric ingestion (I'm thinking a general purpose way, where a Prometheus object would become a shard and maybe a ShardedPrometheus object that orchestrates these, and can be autoscaled via the HPA). What I'm saying is, maybe the sharding decision should be configured in the Prometheus object ultimately instead of the ServiceMonitor (where it's already possible albeit a little manual today).

paulfantom · 2021-06-07T16:27:14Z

This seems to be implemented with #3241

d-ulyanov added the kind/feature label May 8, 2019

brancz mentioned this issue May 25, 2020

prometheus: Implement sharding mechanism #3241

Merged

paulfantom closed this as completed Jun 7, 2021

SHEELE41 mentioned this issue Aug 1, 2022

How can I autoscale Prometheus shards using HPA? #4946

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support targets sharding #2590

Support targets sharding #2590

d-ulyanov commented May 8, 2019

brancz commented May 8, 2019 •

edited

Loading

paulfantom commented Jun 7, 2021

Support targets sharding #2590

Support targets sharding #2590

Comments

d-ulyanov commented May 8, 2019

brancz commented May 8, 2019 • edited Loading

paulfantom commented Jun 7, 2021

brancz commented May 8, 2019 •

edited

Loading