Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve out-of-the-box experience with Grafana dashboards in development/mimir-microservices-mode stack #4898

Open
charleskorn opened this issue May 3, 2023 · 6 comments

Comments

@charleskorn
Copy link
Contributor

Is your feature request related to a problem? Please describe.

When working on Mimir, the development/mimir-microservices-mode Docker Compose stack is useful for testing and debugging Mimir. This includes a Grafana instance that uses the dashboards from operations/mimir-mixin-compiled.

However, there are some issues with these dashboards and the data behind them:

  • many targets are scraped three times (by Prometheus, the Grafana agent, and the OTel agent), which means data displayed in dashboards either has 3x series or, in the case of aggregated data, can be 3x the true value
  • the ... Resources dashboards (eg. Writes Resources) use metrics not available outside Kubernetes, such as container_cpu_usage_seconds_total and container_memory_working_set_bytes
  • scraped metrics are missing the container label, which breaks many of the dashboard panels that expect this label to be present

Describe the solution you'd like

All dashboards Just Work™ (with the exception of those that only make sense in the context of a Kubernetes installation, such as autoscaling-related dashboards)

Describe alternatives you've considered

Using an instance of Mimir deployed to a Kubernetes environment: this works for some scenarios, but for others this can be a slow feedback loop relative to a local environment.

@charleskorn
Copy link
Contributor Author

Two of the issues described above (the triple scraping and missing container label) will be fixed by #4900.

@charleskorn
Copy link
Contributor Author

Another feature request: would be good if the recording rules were set up in Mimir's ruler, rather than relying on Prometheus, as this means turning off Prometheus (eg. to test the Grafana Agent) stops the evaluation of recording rules too.

@jhalterman
Copy link
Member

jhalterman commented May 19, 2023

Some of the read and write dashboards also don't work since cortex_request_duration_seconds_count, and similar, aren't populated. Edit: it appears this was caused by a switch to native histograms in #4987.

For container_cpu_usage_seconds_total and similar, could we just re-create these using grafana agent and some recording rules?

@charleskorn
Copy link
Contributor Author

For container_cpu_usage_seconds_total and similar, could we just re-create these using grafana agent and some recording rules?

Probably - when I ran into this issue, I modified the dashboards to use process_cpu_seconds_total and that seemed to work fine, so perhaps a recording rule that records container_cpu_usage_seconds_total from process_cpu_seconds_total would work?

(4563731 is the commit where I did this)

@pstibrany
Copy link
Member

Many dashboards use metrics from Kubernetes (from cadvisor), and may be hard to get from inside docker-compose.

@jhalterman
Copy link
Member

I tried replacing container_cpu_usage_seconds_total with process_cpu_seconds_total and it worked for a few places, but not others, since the labels available on them are a bit different. We'd also need a replacement for container_spec_cpu_period.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants