Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integration tests keep failing because prometheus fails to scrape grafana #91

Open
sed-i opened this issue Jan 16, 2024 · 1 comment
Open

Comments

@sed-i
Copy link
Contributor

sed-i commented Jan 16, 2024

Description

Integrations tests keep failing on

# All jobs should be up
health = {target["labels"]["job"]: target["health"] for target in as_dict["activeTargets"]}
assert set(health.values()) == {"up"}

AssertionError: assert {'up', 'down'} == {'up'}
  Extra items in the left set:
  'down'
  Full diff:
  - {'up'}
  + {'up', 'down'}

because prometheus fails to scrape grafana

      {
        "discoveredLabels": {
          "__address__": "grafana-0.grafana-endpoints.test-bundle-8n1r.svc.cluster.local:3000",
          "__metrics_path__": "/metrics",
          "__scheme__": "https",
          "__scrape_interval__": "1m",
          "__scrape_timeout__": "10s",
          "job": "juju_test-bundle-8n1r_e252fd59_grafana_prometheus_scrape",
          "juju_application": "grafana",
          "juju_charm": "grafana-k8s",
          "juju_model": "test-bundle-8n1r",
          "juju_model_uuid": "e252fd59-3737-4887-80af-f8a9c426125a"
        },
        "labels": {
          "instance": "test-bundle-8n1r_e252fd59-3737-4887-80af-f8a9c426125a_grafana",
          "job": "juju_test-bundle-8n1r_e252fd59_grafana_prometheus_scrape",
          "juju_application": "grafana",
          "juju_charm": "grafana-k8s",
          "juju_model": "test-bundle-8n1r",
          "juju_model_uuid": "e252fd59-3737-4887-80af-f8a9c426125a"
        },
        "scrapePool": "juju_test-bundle-8n1r_e252fd59_grafana_prometheus_scrape",
        "scrapeUrl": "https://grafana-0.grafana-endpoints.test-bundle-8n1r.svc.cluster.local:3000/metrics",
        "globalUrl": "https://grafana-0.grafana-endpoints.test-bundle-8n1r.svc.cluster.local:3000/metrics",
        "lastError": "Get \"https://10.43.8.206/test-bundle-8n1r-grafana/metrics\": tls: failed to verify certificate: x509: certificate signed by unknown authority",
        "lastScrape": "2024-01-16T18:37:16.119998201Z",
        "lastScrapeDuration": 0.005085648,
        "health": "down",
        "scrapeInterval": "1m",
        "scrapeTimeout": "10s"
      },

Potential issue

There's a TLS error tls: failed to verify certificate: x509: certificate signed by unknown authority.
All scrape targets in the test are behind TLS, but only grafana fails:

$ curl -sk https://10.1.166.115:9090/api/v1/targets | jq | grep https
          "__scheme__": "https",
        "scrapeUrl": "https://alertmanager-0.alertmanager-endpoints.test-bundle-8n1r.svc.cluster.local:9093/metrics",
        "globalUrl": "https://alertmanager-0.alertmanager-endpoints.test-bundle-8n1r.svc.cluster.local:9093/metrics",
          "__scheme__": "https",
        "scrapeUrl": "https://grafana-0.grafana-endpoints.test-bundle-8n1r.svc.cluster.local:3000/metrics",
        "globalUrl": "https://grafana-0.grafana-endpoints.test-bundle-8n1r.svc.cluster.local:3000/metrics",
        "lastError": "Get \"https://10.43.8.206/test-bundle-8n1r-grafana/metrics\": tls: failed to verify certificate: x509: certificate signed by unknown authority",
          "__scheme__": "https",
        "scrapeUrl": "https://loki-0.loki-endpoints.test-bundle-8n1r.svc.cluster.local:3100/metrics",
        "globalUrl": "https://loki-0.loki-endpoints.test-bundle-8n1r.svc.cluster.local:3100/metrics",
          "__scheme__": "https",
        "scrapeUrl": "https://prometheus-0.prometheus-endpoints.test-bundle-8n1r.svc.cluster.local:9090/metrics",
        "globalUrl": "https://prometheus-0.prometheus-endpoints.test-bundle-8n1r.svc.cluster.local:9090/metrics",
$ curl -sk https://10.1.166.115:9090/api/v1/targets | jq | grep health
        "health": "up",
        "health": "up",
        "health": "up",
        "health": "down",
        "health": "up",
        "health": "up",
        "health": "up",

Perhaps this is related to the grafana 9 vs grafana 10 ingress+redirect issue. Could retry after grafana 9.5.3 rock is published by oci-factory, and grafana metadata update to point there.

@lucabello
Copy link
Contributor

We're not sure if this still happens, we should examine the last runs to verify.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants