This tool analyzes static files - like dashboards and Prometheus alert rules - to track where and how Prometheus metrics are used.
It’s especially helpful for identifying whether metrics are actively used. Prometheus should ideally not scrape unused metrics to avoid an unnecessary load.
The tool provides an API endpoint, /api/v1/metrics
, which returns the usage data for each collected metric as shown below:
{
"node_cpu_seconds_total": {
"usage": {
"dashboards": [
"https://demo.perses.dev/api/v1/projects/myinsight/dashboards/first_demo",
"https://demo.perses.dev/api/v1/projects/myworkshopproject/dashboards/myfirstdashboard",
"https://demo.perses.dev/api/v1/projects/perses/dashboards/nodeexporterfull",
"https://demo.perses.dev/api/v1/projects/showcase/dashboards/statchartpanel"
],
"recordingRules": [
{
"prom_link": "https://prometheus.demo.do.prometheus.io",
"group_name": "node-exporter.rules",
"name": "instance:node_num_cpu:sum",
"expression": "count without (cpu, mode) (node_cpu_seconds_total{job=\"node\",mode=\"idle\"})"
},
{
"prom_link": "https://prometheus.demo.do.prometheus.io",
"group_name": "node-exporter.rules",
"name": "instance:node_cpu_utilisation:rate5m",
"expression": "1 - avg without (cpu) (sum without (mode) (rate(node_cpu_seconds_total{job=\"node\",mode=~\"idle|iowait|steal\"}[5m])))"
}
],
"alertRules": [
{
"prom_link": "https://prometheus.demo.do.prometheus.io",
"group_name": "node-exporter",
"name": "NodeCPUHighUsage",
"expression": "sum without (mode) (avg without (cpu) (rate(node_cpu_seconds_total{job=\"node\",mode!=\"idle\"}[2m]))) * 100 > 90"
},
{
"prom_link": "https://prometheus.demo.do.prometheus.io",
"group_name": "node-exporter",
"name": "NodeSystemSaturation",
"expression": "node_load1{job=\"node\"} / count without (cpu, mode) (node_cpu_seconds_total{job=\"node\",mode=\"idle\"}) > 2"
}
]
}
},
"node_cpu_utilization_percent_threshold": {
"usage": {
"alertRules": [
{
"prom_link": "https://prometheus.demo.do.prometheus.io",
"group_name": "ansible managed alert rules",
"name": "NodeCPUUtilizationHigh",
"expression": "instance:node_cpu_utilisation:rate5m * 100 > ignoring (severity) node_cpu_utilization_percent_threshold{severity=\"critical\"}"
}
]
}
},
"node_disk_discard_time_seconds_total": {
"usage": {
"dashboards": [
"https://demo.perses.dev/api/v1/projects/perses/dashboards/nodeexporterfull"
]
}
}
}
You can used the following query parameter to filter the list returned:
- metric_name: when used, it will trigger a fuzzy search on the metric_name based on the pattern provided.
- used: when used, will return only the metric used or not (depending on if you set this boolean to true or to false). Leave it empty if you want both.
- merge_partial_metrics: when used, it will use the data from /api/v1/partial_metrics and merge them here.
The API endpoint /api/v1/partial_metrics
is exposing the usage for metrics that contains variable or regexp.
{
"node_disk_discard_time_.+": {
"usage": {
"alertRules": [
{
"prom_link": "https://prometheus.demo.do.prometheus.io",
"group_name": "ansible managed alert rules",
"name": "NodeCPUUtilizationHigh",
"expression": "instance:node_cpu_utilisation:rate5m * 100 > ignoring (severity) node_cpu_utilization_percent_threshold{severity=\"critical\"}"
}
]
}
},
"node_cpu_utilization_${instance}": {
"usage": {
"dashboards": [
"https://demo.perses.dev/api/v1/projects/perses/dashboards/nodeexporterfull"
]
}
}
}
The API endpoint /api/v1/pending_usages
is exposing usage associated to metrics that has not yet been associated to the metrics available on the endpoint /api/v1/metrics
.
It's even possible usage is never associated as the metric doesn't exist anymore.
Metrics Usage can be configured as a central instance, which collects data from multiple sources in a stateful setup.
In setups with numerous rules, central data collection may become impractical due to the volume. Instead, you can deploy Metrics Usage as a sidecar container, configured to push data to a central instance.
Metrics Usage offers various collectors for obtaining metric usage data:
This collector retrieves a list of metrics over a specified period and stores them for association with usage data from other collectors.
Refer to the complete configuration here
Example:
metric_collector:
enable: true
prometheus_client:
url: "https://prometheus.demo.do.prometheus.io"
This collector retrieves Prometheus rule groups using the HTTP API and extracts metrics from alerting & recording rules.
Multiple rule collectors can be configured for different Prometheus/Thanos instances.
Refer to the complete configuration here
Example:
rules_collectors:
- enable: true
prometheus_client:
url: "https://prometheus.demo.do.prometheus.io"
This collector fetches dashboards from Perses via its HTTP API, extracting metrics used in variables and panels.
Refer to the complete configuration here
Example:
perses_collector:
enable: true
perses_client:
url: "https://demo.perses.dev"
This collector fetches dashboards from Grafana via its HTTP API, extracting metrics used in the panels.
Refer to the complete configuration here
Example:
grafana_collector:
enable: true
grafana_client:
url: "https//demo.grafana.dev"
There are several ways of installing Metrics Usage:
Download precompiled binaries from the GitHub releases page. It is recommended to use the latest release available.
Docker images are available on Docker Hub.
To try it out with Docker:
docker run --name metrics-usage -d -p 127.0.0.1:8080:8080 persesdev/metrics-usage
To build from source, you’ll need Go version 1.23 or higher.
Start by cloning the repository:
git clone https://github.com/perses/metrics-usage.git
cd metrics-usage
Then build the web assets and Metrics Usage itself with:
make build
./bin/metrics-usage --config=your_config.yml