Skip to content

Commit

Permalink
dashboards: overview: use native histograms in status (#7627)
Browse files Browse the repository at this point in the history
* dashboards: overview: use native histograms in status

Allow switching between basing status on classic or native version
of cortex_request_duration_seconds.

Related to #7154

Signed-off-by: György Krajcsovits <[email protected]>
  • Loading branch information
krajorama authored Apr 8, 2024
1 parent d23ddb6 commit f4048a5
Show file tree
Hide file tree
Showing 8 changed files with 221 additions and 52 deletions.
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,8 @@
* `MimirIngesterFailsEnforceStrongConsistencyOnReadPath`
* [ENHANCEMENT] Dashboards: add in-flight queries scaling metric panel for ruler-querier. #7749
* [ENHANCEMENT] Dashboards: renamed rows in the "Remote ruler reads" and "Remote ruler reads resources" dashboards to match the actual component names. #7750
* [ENHANCEMENT] Dashboards: allow switching between using classic of native histograms in dashboards. #7627
* Overview dashboard, Status panel, `cortex_request_duration_seconds` metric.
* [BUGFIX] Dashboards: Fix regular expression for matching read-path gRPC ingester methods to include querying of exemplars, label-related queries, or active series queries. #7676
* [BUGFIX] Dashboards: Fix user id abbreviations and column heads for Top Tenants dashboard. #7724

Expand Down
2 changes: 2 additions & 0 deletions operations/helm/charts/mimir-distributed/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,8 @@ Entries should include a reference to the Pull Request that introduced the chang
* [ENHANCEMENT] Recording rules: add native histogram recording rules to `cortex_request_duration_seconds`. #7528
* [ENHANCEMENT] Make the port used in ServiceMonitor for kube-state-metrics configurable. #7507
* [ENHANCEMENT] Produce a clearer error messages when multiple X-Scope-OrgID headers are present. #7704
* [ENHANCEMENT] Dashboards: allow switching between using classic of native histograms in dashboards. #7627
* Overview dashboard, Status panel, `cortex_request_duration_seconds` metric.
* [BUGFIX] Metamonitoring: update dashboards to drop unsupported `step` parameter in targets. #7157
* [BUGFIX] Recording rules: drop rules for metrics removed in 2.0: `cortex_memcache_request_duration_seconds` and `cortex_cache_request_duration_seconds`. #7514
* [BUGFIX] Store-gateway: setting "resources.requests.memory" with a quantity that used power-of-ten SI suffix, caused an error. #7506
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9441,7 +9441,7 @@ data:
"uid": "$datasource"
},
"exemplar": false,
"expr": "(\n # gRPC errors are not tracked as 5xx but \"error\".\n sum(rate(cortex_request_duration_seconds_count{cluster=~\"$cluster\", job=~\"($namespace)/((distributor.*|cortex|mimir|mimir-write.*))\", route=~\"/distributor.Distributor/Push|/httpgrpc.*|api_(v1|prom)_push|otlp_v1_metrics\",status_code=~\"5.*|error\"}[$__rate_interval]))\n or\n # Handle the case no failure has been tracked yet.\n vector(0)\n)\n/\nsum(rate(cortex_request_duration_seconds_count{cluster=~\"$cluster\", job=~\"($namespace)/((distributor.*|cortex|mimir|mimir-write.*))\", route=~\"/distributor.Distributor/Push|/httpgrpc.*|api_(v1|prom)_push|otlp_v1_metrics\"}[$__rate_interval]))\n",
"expr": "(\n # gRPC errors are not tracked as 5xx but \"error\".\n sum(histogram_count(rate(cortex_request_duration_seconds{cluster=~\"$cluster\", job=~\"($namespace)/((distributor.*|cortex|mimir|mimir-write.*))\", route=~\"/distributor.Distributor/Push|/httpgrpc.*|api_(v1|prom)_push|otlp_v1_metrics\",status_code=~\"5.*|error\"}[$__rate_interval])))\n or\n # Handle the case no failure has been tracked yet.\n vector(0)\n)\n/\nsum(histogram_count(rate(cortex_request_duration_seconds{cluster=~\"$cluster\", job=~\"($namespace)/((distributor.*|cortex|mimir|mimir-write.*))\", route=~\"/distributor.Distributor/Push|/httpgrpc.*|api_(v1|prom)_push|otlp_v1_metrics\"}[$__rate_interval])))\n < ($latency_metrics * -Inf)",
"instant": false,
"legendFormat": "Writes",
"range": true
Expand All @@ -9451,7 +9451,27 @@ data:
"uid": "$datasource"
},
"exemplar": false,
"expr": "(\n sum(rate(cortex_request_duration_seconds_count{cluster=~\"$cluster\", job=~\"($namespace)/((query-frontend.*|cortex|mimir|mimir-read.*))\", route=~\"(prometheus|api_prom)_api_v1_.+\",status_code=~\"5.*\"}[$__rate_interval]))\n or\n # Handle the case no failure has been tracked yet.\n vector(0)\n)\n/\nsum(rate(cortex_request_duration_seconds_count{cluster=~\"$cluster\", job=~\"($namespace)/((query-frontend.*|cortex|mimir|mimir-read.*))\", route=~\"(prometheus|api_prom)_api_v1_.+\"}[$__rate_interval]))\n",
"expr": "(\n # gRPC errors are not tracked as 5xx but \"error\".\n sum(rate(cortex_request_duration_seconds_count{cluster=~\"$cluster\", job=~\"($namespace)/((distributor.*|cortex|mimir|mimir-write.*))\", route=~\"/distributor.Distributor/Push|/httpgrpc.*|api_(v1|prom)_push|otlp_v1_metrics\",status_code=~\"5.*|error\"}[$__rate_interval]))\n or\n # Handle the case no failure has been tracked yet.\n vector(0)\n)\n/\nsum(rate(cortex_request_duration_seconds_count{cluster=~\"$cluster\", job=~\"($namespace)/((distributor.*|cortex|mimir|mimir-write.*))\", route=~\"/distributor.Distributor/Push|/httpgrpc.*|api_(v1|prom)_push|otlp_v1_metrics\"}[$__rate_interval]))\n < ($latency_metrics * +Inf)",
"instant": false,
"legendFormat": "Writes",
"range": true
},
{
"datasource": {
"uid": "$datasource"
},
"exemplar": false,
"expr": "(\n # gRPC errors are not tracked as 5xx but \"error\".\n sum(histogram_count(rate(cortex_request_duration_seconds{cluster=~\"$cluster\", job=~\"($namespace)/((query-frontend.*|cortex|mimir|mimir-read.*))\", route=~\"(prometheus|api_prom)_api_v1_.+\",status_code=~\"5.*|error\"}[$__rate_interval])))\n or\n # Handle the case no failure has been tracked yet.\n vector(0)\n)\n/\nsum(histogram_count(rate(cortex_request_duration_seconds{cluster=~\"$cluster\", job=~\"($namespace)/((query-frontend.*|cortex|mimir|mimir-read.*))\", route=~\"(prometheus|api_prom)_api_v1_.+\"}[$__rate_interval])))\n < ($latency_metrics * -Inf)",
"instant": false,
"legendFormat": "Reads",
"range": true
},
{
"datasource": {
"uid": "$datasource"
},
"exemplar": false,
"expr": "(\n # gRPC errors are not tracked as 5xx but \"error\".\n sum(rate(cortex_request_duration_seconds_count{cluster=~\"$cluster\", job=~\"($namespace)/((query-frontend.*|cortex|mimir|mimir-read.*))\", route=~\"(prometheus|api_prom)_api_v1_.+\",status_code=~\"5.*|error\"}[$__rate_interval]))\n or\n # Handle the case no failure has been tracked yet.\n vector(0)\n)\n/\nsum(rate(cortex_request_duration_seconds_count{cluster=~\"$cluster\", job=~\"($namespace)/((query-frontend.*|cortex|mimir|mimir-read.*))\", route=~\"(prometheus|api_prom)_api_v1_.+\"}[$__rate_interval]))\n < ($latency_metrics * +Inf)",
"instant": false,
"legendFormat": "Reads",
"range": true
Expand Down Expand Up @@ -10921,6 +10941,35 @@ data:
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"current": {
"selected": true,
"text": "classic",
"value": "1"
},
"description": "Choose between showing latencies based on low precision classic or high precision native histogram metrics.",
"hide": 0,
"includeAll": false,
"label": "Latency metrics",
"multi": false,
"name": "latency_metrics",
"options": [
{
"selected": false,
"text": "native",
"value": "-1"
},
{
"selected": true,
"text": "classic",
"value": "1"
}
],
"query": "native : -1,classic : 1",
"skipUrlSync": false,
"type": "custom",
"useTags": false
}
]
},
Expand Down

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading

0 comments on commit f4048a5

Please sign in to comment.