Skip to content

Commit

Permalink
Enhance API server monitoring (gardener#7639)
Browse files Browse the repository at this point in the history
* Add API server metrics to allowlist

Gardener currently supports Kubernetes v1.20 to v1.25.

In Kubernetes v1.23, `apiserver_registered_watchers` is deprecated in favor
of `apiserver_longrunning_requests`. It is hidden in v1.24. This commit
adds the new metric, `apiserver_longrunning_requests` to the allowlist.

Co-authored-by: Istvan Zoltan Ballok <[email protected]>
Co-authored-by: Jeremy Rickards <[email protected]>

* Adjust the promql query to support all the K8s versions

The promql expression:

    sum by (group, version, kind) (apiserver_registered_watchers)
  + on () group_left ()
    absent(apiserver_longrunning_requests) * 0
or
  sum by (group, version, resource) (apiserver_longrunning_requests)

returns the result of the newer metric `apiserver_longrunning_requests` (>=1.23)
if present, otherwise it will return the `apiserver_registered_watchers` (<1.23).

Note that the "total" query used the "count" aggregation which was semantically
not meaningful. This aspect is also fixed in this commit: the registered
watchers / long running requests need to be added up to get the total value.

Co-authored-by: Istvan Zoltan Ballok <[email protected]>
Co-authored-by: Jeremy Rickards <[email protected]>

* Fix the "Dropped Requests" panel of Kubernetes API Server Details

This panel was not showing any data in different Kubernetes versions for 2
reasons:

- apiserver_dropped_requests_total was removed from the allowlist, see
  gardener#3502. This means that this panel was not showing any data in any
  Kubernetes version.

- apiserver_dropped_requests_total is deprecated in 1.24 and removed in
  1.25. So in Kubernetes clusters >= 1.25, this panel would have been empty
  for this reason as well.

The replacement metric `apiserver_request_terminations_total`, is already
allowlisted and available since Kubernetes v1.17, so we can simply use
that for a semantically similar query.

Co-authored-by: Istvan Zoltan Ballok <[email protected]>
Co-authored-by: Jeremy Rickards <[email protected]>

---------

Co-authored-by: Istvan Zoltan Ballok <[email protected]>
  • Loading branch information
2 people authored and andrerun committed Jul 6, 2023
1 parent f7cf800 commit 88d5675
Show file tree
Hide file tree
Showing 5 changed files with 31 additions and 15 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -2748,9 +2748,11 @@
"steppedLine": true,
"targets": [
{
"expr": "sum(apiserver_registered_watchers{pod=~\"$apiserver\"})by(kind)",
"exemplar": true,
"expr": " sum by (kind) (apiserver_registered_watchers{pod=~\"$apiserver\",kind!=\"\"})\n + on () group_left ()\n absent(apiserver_longrunning_requests) * 0\nor\n sum by (resource) (apiserver_longrunning_requests{pod=~\"$apiserver\",verb=\"WATCH\"})",
"hide": false,
"interval": "",
"legendFormat": "{{kind}}",
"legendFormat": "{{resource}}{{kind}}",
"refId": "A"
}
],
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -902,13 +902,17 @@
"hiddenSeries": false,
"id": 55,
"legend": {
"alignAsTable": true,
"avg": false,
"current": false,
"max": false,
"max": true,
"min": false,
"rightSide": true,
"show": true,
"sort": "max",
"sortDesc": true,
"total": false,
"values": false
"values": true
},
"lines": true,
"linewidth": 1,
Expand All @@ -927,9 +931,11 @@
"steppedLine": false,
"targets": [
{
"expr": "sum(rate(apiserver_dropped_requests_total[$rate])) by (requestKind)",
"legendFormat": "{{ requestKind }}",
"exemplar": true,
"expr": "sum(rate(apiserver_request_terminations_total{code=\"429\"}[$__rate_interval])) by (verb, group, version, resource, subresource)",
"hide": false,
"interval": "",
"legendFormat": "{{verb}} {{group}}/{{version}}/{{resource}} {{subresource}}",
"refId": "A"
}
],
Expand All @@ -939,7 +945,7 @@
"timeShift": null,
"title": "Dropped Requests",
"tooltip": {
"shared": true,
"shared": false,
"sort": 0,
"value_type": "individual"
},
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1319,17 +1319,21 @@
"steppedLine": false,
"targets": [
{
"expr": "apiserver_registered_watchers",
"exemplar": true,
"expr": " sum by (group, version, kind) (apiserver_registered_watchers)\n + on () group_left ()\n absent(apiserver_longrunning_requests) * 0\nor\n sum by (group, version, resource) (apiserver_longrunning_requests)",
"format": "time_series",
"hide": false,
"interval": "",
"intervalFactor": 1,
"legendFormat": "{{group}}/{{version}}/{{kind}}",
"legendFormat": "{{group}}/{{version}}/{{resource}}{{kind}}",
"refId": "A"
},
{
"expr": "count(apiserver_registered_watchers)\n",
"format": "time_series",
"intervalFactor": 1,
"legendFormat": "total",
"exemplar": true,
"expr": "sum(\n sum by (group, version, kind) (apiserver_registered_watchers)\n + on () group_left ()\n absent(apiserver_longrunning_requests) * 0\n or\n sum by (group, version, resource) (apiserver_longrunning_requests)\n)",
"hide": false,
"interval": "",
"legendFormat": "Total",
"refId": "B"
}
],
Expand All @@ -1353,15 +1357,17 @@
},
"yaxes": [
{
"$$hashKey": "object:244",
"decimals": 0,
"format": "short",
"label": "Count Watches",
"logBase": 10,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"$$hashKey": "object:245",
"format": "short",
"label": null,
"logBase": 1,
Expand Down
2 changes: 2 additions & 0 deletions pkg/operation/botanist/component/kubeapiserver/monitoring.go
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@ const (
monitoringMetricApiserverCRDWebhookConversionDurationSeconds = "apiserver_crd_webhook_conversion_duration_seconds_.+"
monitoringMetricApiserverCurrentInflightRequests = "apiserver_current_inflight_requests"
monitoringMetricApiserverCurrentInqueueRequests = "apiserver_current_inqueue_requests"
monitoringMetricApiserverLongrunningRequests = "apiserver_longrunning_requests"
monitoringMetricApiserverResponseSizes = "apiserver_response_sizes_.+"
monitoringMetricApiserverRegisteredWatchers = "apiserver_registered_watchers"
monitoringMetricApiserverRequestDurationSeconds = "apiserver_request_duration_seconds_.+"
Expand Down Expand Up @@ -218,6 +219,7 @@ var (
monitoringMetricApiserverCRDWebhookConversionDurationSeconds,
monitoringMetricApiserverCurrentInflightRequests,
monitoringMetricApiserverCurrentInqueueRequests,
monitoringMetricApiserverLongrunningRequests,
monitoringMetricApiserverResponseSizes,
monitoringMetricApiserverRegisteredWatchers,
monitoringMetricApiserverRequestDurationSeconds,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,7 @@ relabel_configs:
metric_relabel_configs:
- source_labels: [ __name__ ]
action: keep
regex: ^(authentication_attempts|authenticated_user_requests|apiserver_admission_controller_admission_duration_seconds_.+|apiserver_admission_webhook_admission_duration_seconds_.+|apiserver_admission_step_admission_duration_seconds_.+|apiserver_admission_webhook_rejection_count|apiserver_audit_event_total|apiserver_audit_error_total|apiserver_audit_requests_rejected_total|apiserver_latency_seconds|apiserver_crd_webhook_conversion_duration_seconds_.+|apiserver_current_inflight_requests|apiserver_current_inqueue_requests|apiserver_response_sizes_.+|apiserver_registered_watchers|apiserver_request_duration_seconds_.+|apiserver_request_terminations_total|apiserver_request_total|apiserver_request_count|apiserver_storage_transformation_duration_seconds_.+|apiserver_storage_transformation_operations_total|apiserver_init_events_total|apiserver_watch_events_sizes_.+|apiserver_watch_events_total|etcd_db_total_size_in_bytes|apiserver_storage_db_total_size_in_bytes|etcd_object_counts|apiserver_storage_objects|etcd_request_duration_seconds_.+|go_.+|process_max_fds|process_open_fds|watch_cache_capacity_increase_total|watch_cache_capacity_decrease_total|watch_cache_capacity|apiserver_cache_list_.+|apiserver_storage_list_.+)$
regex: ^(authentication_attempts|authenticated_user_requests|apiserver_admission_controller_admission_duration_seconds_.+|apiserver_admission_webhook_admission_duration_seconds_.+|apiserver_admission_step_admission_duration_seconds_.+|apiserver_admission_webhook_rejection_count|apiserver_audit_event_total|apiserver_audit_error_total|apiserver_audit_requests_rejected_total|apiserver_latency_seconds|apiserver_crd_webhook_conversion_duration_seconds_.+|apiserver_current_inflight_requests|apiserver_current_inqueue_requests|apiserver_longrunning_requests|apiserver_response_sizes_.+|apiserver_registered_watchers|apiserver_request_duration_seconds_.+|apiserver_request_terminations_total|apiserver_request_total|apiserver_request_count|apiserver_storage_transformation_duration_seconds_.+|apiserver_storage_transformation_operations_total|apiserver_init_events_total|apiserver_watch_events_sizes_.+|apiserver_watch_events_total|etcd_db_total_size_in_bytes|apiserver_storage_db_total_size_in_bytes|etcd_object_counts|apiserver_storage_objects|etcd_request_duration_seconds_.+|go_.+|process_max_fds|process_open_fds|watch_cache_capacity_increase_total|watch_cache_capacity_decrease_total|watch_cache_capacity|apiserver_cache_list_.+|apiserver_storage_list_.+)$
`

expectedAlertingRule = `groups:
Expand Down

0 comments on commit 88d5675

Please sign in to comment.