Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support targets sharding #2590

Closed
d-ulyanov opened this issue May 8, 2019 · 2 comments
Closed

Support targets sharding #2590

d-ulyanov opened this issue May 8, 2019 · 2 comments

Comments

@d-ulyanov
Copy link

Hi, colleagues!

What is missing?

At the moment there is no targets sharding support in the Prometheus Operator.
It would be great to add it.

Why do we need it?

Currently we have a thousands of targets in each Prometheus and it seems that we're coming to its performance limit on one node.

Possible solutions that I see are:

  1. Prometheus per namespace
  2. Use sharding

Both of solutions have their own advantages.

At the moment solution with sharding seems a bit better, that's why:

  • All Prometheus targets are our microservices and all of them following to the same observability standards, so we should to keep the same aggregation/alerting rules on all Prometheus instances. It means that its better to keep this Prometheuses in one logical group.
  • We have a lot of namespaces (currently ~100), some of namespaces are really much bigger than others, its normal and we don't want to customize resource limits for each namespace.
  • Prometheus Operator provide failover logic when one of shards is down: it could reconfigure Prometheus instances and rebalance targets by setting up modulus = Prom instances count.

My propose is to add sharding attribute to ServiceMonitor like shard_by: <label>. This label (or labels list will be used as source label for sharding with action: hashmod). Modulus could be configured automatically based on Prom instances count.

What do you think?

@brancz
Copy link
Contributor

brancz commented May 8, 2019

If you need a solution quickly, you can already use additional relabeling rules on your ServiceMonitor via the hashmod action, and create multiple ServiceMonitors per "shard". Your use case makes a lot of sense, I'd like to think it through a little bit further, and arrive at a solution, that would allow us to eventually autoscale sharding based on the metric ingestion (I'm thinking a general purpose way, where a Prometheus object would become a shard and maybe a ShardedPrometheus object that orchestrates these, and can be autoscaled via the HPA). What I'm saying is, maybe the sharding decision should be configured in the Prometheus object ultimately instead of the ServiceMonitor (where it's already possible albeit a little manual today).

@paulfantom
Copy link
Member

This seems to be implemented with #3241

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants