Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setup prometheus-rules in Mimir #2521

Closed
Tracked by #1817
TheoBrigitte opened this issue Jun 1, 2023 · 11 comments
Closed
Tracked by #1817

Setup prometheus-rules in Mimir #2521

TheoBrigitte opened this issue Jun 1, 2023 · 11 comments
Assignees

Comments

@TheoBrigitte
Copy link
Member

TheoBrigitte commented Jun 1, 2023

Mimir currently runs on gauss and do ingest metrics from other Prometheus servers (#2266 #2265).

In order to replace the current Prometheus setup we need to move the Prometheus Rules evaluation from Prometheus Server to Mimir.

@whites11
Copy link

whites11 commented Jun 1, 2023

prometheus-operator does not support mimir, we need to check grafana-operator

@QuentinBisson
Copy link

The grafana agents supports that out of the box https://gigantic.slack.com/archives/C05AUBA75Q9/p1685709315899309

@whites11 whites11 removed their assignment Jun 5, 2023
@QuentinBisson QuentinBisson self-assigned this Jun 7, 2023
@TheoBrigitte
Copy link
Member Author

We know grafana-agent supports this and we now have an app for it.
What's the next step here ?

@QuentinBisson
Copy link

Review this giantswarm/grafana-agent-app#3, release and deploy it on gauss

@TheoBrigitte
Copy link
Member Author

Also related giantswarm/prometheus-meta-operator#1309

@QuentinBisson
Copy link

QuentinBisson commented Jun 21, 2023

Prometheus Rules are setup in mimir thanks to the grafana agent in flow mode (currently deployed by hand) using the following command:

helm install grafana-agent grafana/grafana-agent -n monitoring -f configmap.yaml

with configmap.yaml being:

agent:
  configMap:
    content: |
      mimir.rules.kubernetes "local" {
        address = "http://mimir-ruler.mimir.svc:8080/"
      }
controller: deployment # defaults to daemonset which is not useful for our use case

That way, prometheus rules CR are transformed by the grafana agent to prometheus rules sent to the mimir ruler.

In addition, we deployed the mimir alertmanager and a datasource to make sure we can see the alerts.

image

What's left:

Questions:

  • Does inhibition rules work in Mimir? Looks supported
  • How to load alertmanager config per tenant? We need to use mimirtool

@QuentinBisson
Copy link

QuentinBisson commented Jun 21, 2023

Rules are missing labels that are added by external labels when using remote write.

All slo alerts should be grouped by them:

(sum(rate(operatorkit_controller_errors_total{container="aws-operator"}[5m])) by (app, cluster_id, installation, cluster_type, region, customer, provider, pipeline))
        /
        (sum(rate(operatorkit_controller_operation_total{container="aws-operator"}[5m])) by (app, cluster_id, installation, cluster_type, region, customer, provider, pipeline))

instead of

(sum(rate(operatorkit_controller_errors_total{container="aws-operator"}[5m])) by (app))
        /
        (sum(rate(operatorkit_controller_operation_total{container="aws-operator"}[5m])) by (app))

@QuentinBisson
Copy link

@QuentinBisson
Copy link

@giantswarm/team-atlas could you take a look at this issue? :)

@QuentinBisson
Copy link

Sloth alerts are fixed, the others will be a case 2 case basis:

Image

@QuentinBisson
Copy link

Done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants