From 74d075186a76dc76c508de57f54ac8aef923c218 Mon Sep 17 00:00:00 2001 From: Marek Siarkowicz Date: Thu, 14 Nov 2019 14:53:54 +0100 Subject: [PATCH 1/4] Document control plane monitoring --- .../cluster-administration/monitoring.md | 105 ++++++++++++++++++ data/concepts.yml | 1 + 2 files changed, 106 insertions(+) create mode 100644 content/en/docs/concepts/cluster-administration/monitoring.md diff --git a/content/en/docs/concepts/cluster-administration/monitoring.md b/content/en/docs/concepts/cluster-administration/monitoring.md new file mode 100644 index 0000000000000..607bb7c1a1347 --- /dev/null +++ b/content/en/docs/concepts/cluster-administration/monitoring.md @@ -0,0 +1,105 @@ +--- +title: Metrics For The Kubernetes Control Plane +reviewers: +- brancz +- logicalhan +- RainbowMango +content_template: templates/concept +weight: 60 +--- + +{{% capture overview %}} + +System component metrics can give a better look into what is happening inside them. Metrics are particularly useful for building dashboards and alerts. + +Metrics in Kubernetes control plane components are exposed in Prometheus text format. + +{{% /capture %}} + +{{% capture body %}} + +## Metrics in Kubernetes + +In most cases those metrics are available on `/metrics` endpoint of the HTTP server. For components that doesn't expose endpoint by default it can be enabled using `--bind-address` flag. + +Examples of those components: +* {{< glossary_tooltip term_id="kube-controller-manager" text="kube-controller-manager" >}} +* {{< glossary_tooltip term_id="kube-proxy" text="kube-proxy" >}} +* {{< glossary_tooltip term_id="kube-apiserver" text="kube-apiserver" >}} +* {{< glossary_tooltip term_id="kube-scheduler" text="kube-scheduler" >}} +* {{< glossary_tooltip term_id="kubelet" text="kubelet" >}} + +Note that {{< glossary_tooltip term_id="kubelet" text="kubelet" >}} also exposes metrics in `/metrics/cadvisor` and `/metrics/resource` endpoints. Those metrics do not have same lifecycle. + +If your cluster uses {{< glossary_tooltip term_id="rbac" text="RBAC" >}}, reading metrics requires authorization via a user, group or ServiceAccount with a ClusterRole that allows accessing `/metrics`. +For example: +``` +apiVersion: rbac.authorization.k8s.io/v1 +kind: ClusterRole +metadata: + name: prometheus +rules: + - nonResourceURLs: + - "/metrics" + verbs: + - get +``` + +## Metric lifecycle + +Alpha metric → Stable metric → Deprecated metric → Hidden metric → Deletion + +Alpha metrics have no stability guarantees; as such they can be modified or deleted at any time. + +Stable metrics can be guaranteed to not change; Specifically, stability means: + +* the metric itself will not be deleted (or renamed) +* the type of metric will not be modified +* no labels can be added or removed from this metric + +Deprecated metric signal that the metric will eventually be deleted; to find which version, you need to check annotation, which includes from which kubernetes version that metric will be considered deprecated. + +Before deprecation: + +``` +# HELP some_counter this counts things +# TYPE some_counter counter +some_counter 0 +``` + +After deprecation: + +``` +# HELP some_counter (Deprecated since 1.15.0) this counts things +# TYPE some_counter counter +some_counter 0 +``` + +Hidden metrics will no longer be exposed by default; to use a hidden metric, you need to override the configuration for the relevant cluster component. + +Deleted metrics will no longer be available. + + +## Show Hidden Metrics + +As described above, admins can enable hidden metrics through a command-line flag on a specific binary. This intends to be used as an escape hatch for admins if they missed the migration of the metrics deprecated in the last release. + +The flag `show-hidden-metrics-for-version` takes a version for which you want to show metrics deprecated in that release. The version is expressed as x.y, where x is the major version, y is the minor version. The patch version is not needed even though a metrics can be deprecated in a patch release, the reason for that is the metrics deprecation policy runs against the minor release. + +The flag can only take the previous minor version as it's value. All metrics hidden in previous will be emitted if admins set the previous version to `show-hidden-metrics-for-version`. The too old version is not allowed because this violates the metrics deprecated policy. + +Take metric `A` as an example, here assumed that `A` is deprecated in 1.n. According to metrics deprecated policy, we can reach the following conclusion: + +* In release `1.n`, the metric is deprecated, and it can be emitted by default. +* In release `1.n+1`, the metric is hidden by default and it can be emitted by command line `show-hidden-metrics-for-version=1.n`. +* In release `1.n+2`, the metric should be removed from the codebase. No escape hatch anymore. + +If you're upgrading from release `1.12` to `1.13`, but still depend on a metric `A` deprecated in `1.12`, you should set hidden metrics via command line: `--show-hidden-metrics=1.12` and remember to remove this metric dependency before upgrading to `1.14` + +{{% /capture %}} + +{{% capture whatsnext %}} +* Read about the [Prometheus text format](https://github.com/prometheus/docs/blob/master/content/docs/instrumenting/exposition_formats.md#text-based-format) for metrics +* See the list of [stable Kubernetes metrics](https://github.com/kubernetes/kubernetes/blob/master/test/instrumentation/testdata/stable-metrics-list.yaml) +* Read about the [Kubernetes deprecation policy](https://kubernetes.io/docs/reference/using-api/deprecation-policy/#deprecating-a-feature-or-behavior ) +{{% /capture %}} diff --git a/data/concepts.yml b/data/concepts.yml index 998b265e0666f..a20948c8abdd3 100644 --- a/data/concepts.yml +++ b/data/concepts.yml @@ -116,6 +116,7 @@ toc: - docs/concepts/cluster-administration/networking.md - docs/concepts/cluster-administration/network-plugins.md - docs/concepts/cluster-administration/logging.md + - docs/concepts/cluster-administration/monitoring.md - docs/concepts/cluster-administration/kubelet-garbage-collection.md - docs/concepts/cluster-administration/federation.md - docs/concepts/cluster-administration/sysctl-cluster.md From 9e37dc82c7e0446b0bfb3f542b16431dbab76b52 Mon Sep 17 00:00:00 2001 From: Marek Siarkowicz Date: Wed, 15 Jan 2020 09:19:42 +0100 Subject: [PATCH 2/4] Update content/en/docs/concepts/cluster-administration/monitoring.md Co-Authored-By: Tim Bannister --- content/en/docs/concepts/cluster-administration/monitoring.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/en/docs/concepts/cluster-administration/monitoring.md b/content/en/docs/concepts/cluster-administration/monitoring.md index 607bb7c1a1347..9258b0453de57 100644 --- a/content/en/docs/concepts/cluster-administration/monitoring.md +++ b/content/en/docs/concepts/cluster-administration/monitoring.md @@ -77,7 +77,7 @@ some_counter 0 Hidden metrics will no longer be exposed by default; to use a hidden metric, you need to override the configuration for the relevant cluster component. -Deleted metrics will no longer be available. +Once a metric is deleted, the metric is not published. You cannot change this using an override. ## Show Hidden Metrics From c069d23ca68138e300ea079de8edd49633709cb1 Mon Sep 17 00:00:00 2001 From: Marek Siarkowicz Date: Wed, 15 Jan 2020 09:19:55 +0100 Subject: [PATCH 3/4] Update content/en/docs/concepts/cluster-administration/monitoring.md Co-Authored-By: Tim Bannister --- content/en/docs/concepts/cluster-administration/monitoring.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/en/docs/concepts/cluster-administration/monitoring.md b/content/en/docs/concepts/cluster-administration/monitoring.md index 9258b0453de57..b2b941e49657b 100644 --- a/content/en/docs/concepts/cluster-administration/monitoring.md +++ b/content/en/docs/concepts/cluster-administration/monitoring.md @@ -75,7 +75,7 @@ After deprecation: some_counter 0 ``` -Hidden metrics will no longer be exposed by default; to use a hidden metric, you need to override the configuration for the relevant cluster component. +Once a metric is hidden then by default the metrics is not published for scraping. To use a hidden metric, you need to override the configuration for the relevant cluster component. Once a metric is deleted, the metric is not published. You cannot change this using an override. From b346861ac1eb8efa0e9dd7d90148a9847f941de2 Mon Sep 17 00:00:00 2001 From: Marek Siarkowicz Date: Wed, 15 Jan 2020 09:19:55 +0100 Subject: [PATCH 4/4] Merge controller-metrics.md into monitoring.md --- .../controller-metrics.md | 50 ------------------- .../cluster-administration/monitoring.md | 35 +++++++++++-- data/concepts.yml | 1 - 3 files changed, 31 insertions(+), 55 deletions(-) delete mode 100644 content/en/docs/concepts/cluster-administration/controller-metrics.md diff --git a/content/en/docs/concepts/cluster-administration/controller-metrics.md b/content/en/docs/concepts/cluster-administration/controller-metrics.md deleted file mode 100644 index 57ed5c16d657a..0000000000000 --- a/content/en/docs/concepts/cluster-administration/controller-metrics.md +++ /dev/null @@ -1,50 +0,0 @@ ---- -title: Controller manager metrics -content_template: templates/concept -weight: 100 ---- - -{{% capture overview %}} -Controller manager metrics provide important insight into the performance and health of -the controller manager. - -{{% /capture %}} - -{{% capture body %}} -## What are controller manager metrics - -Controller manager metrics provide important insight into the performance and health of the controller manager. -These metrics include common Go language runtime metrics such as go_routine count and controller specific metrics such as -etcd request latencies or Cloudprovider (AWS, GCE, OpenStack) API latencies that can be used -to gauge the health of a cluster. - -Starting from Kubernetes 1.7, detailed Cloudprovider metrics are available for storage operations for GCE, AWS, Vsphere and OpenStack. -These metrics can be used to monitor health of persistent volume operations. - -For example, for GCE these metrics are called: - -``` -cloudprovider_gce_api_request_duration_seconds { request = "instance_list"} -cloudprovider_gce_api_request_duration_seconds { request = "disk_insert"} -cloudprovider_gce_api_request_duration_seconds { request = "disk_delete"} -cloudprovider_gce_api_request_duration_seconds { request = "attach_disk"} -cloudprovider_gce_api_request_duration_seconds { request = "detach_disk"} -cloudprovider_gce_api_request_duration_seconds { request = "list_disk"} -``` - - - -## Configuration - - -In a cluster, controller-manager metrics are available from `http://localhost:10252/metrics` -from the host where the controller-manager is running. - -The metrics are emitted in [prometheus format](https://prometheus.io/docs/instrumenting/exposition_formats/) and are human readable. - -In a production environment you may want to configure prometheus or some other metrics scraper -to periodically gather these metrics and make them available in some kind of time series database. - -{{% /capture %}} - - diff --git a/content/en/docs/concepts/cluster-administration/monitoring.md b/content/en/docs/concepts/cluster-administration/monitoring.md index b2b941e49657b..92b74b6634c22 100644 --- a/content/en/docs/concepts/cluster-administration/monitoring.md +++ b/content/en/docs/concepts/cluster-administration/monitoring.md @@ -6,13 +6,15 @@ reviewers: - RainbowMango content_template: templates/concept weight: 60 +aliases: +- controller-metrics.md --- {{% capture overview %}} System component metrics can give a better look into what is happening inside them. Metrics are particularly useful for building dashboards and alerts. -Metrics in Kubernetes control plane components are exposed in Prometheus text format. +Metrics in Kubernetes control plane are emitted in [prometheus format](https://prometheus.io/docs/instrumenting/exposition_formats/) and are human readable. {{% /capture %}} @@ -20,7 +22,7 @@ Metrics in Kubernetes control plane components are exposed in Prometheus text fo ## Metrics in Kubernetes -In most cases those metrics are available on `/metrics` endpoint of the HTTP server. For components that doesn't expose endpoint by default it can be enabled using `--bind-address` flag. +In most cases metrics are available on `/metrics` endpoint of the HTTP server. For components that doesn't expose endpoint by default it can be enabled using `--bind-address` flag. Examples of those components: * {{< glossary_tooltip term_id="kube-controller-manager" text="kube-controller-manager" >}} @@ -29,7 +31,10 @@ Examples of those components: * {{< glossary_tooltip term_id="kube-scheduler" text="kube-scheduler" >}} * {{< glossary_tooltip term_id="kubelet" text="kubelet" >}} -Note that {{< glossary_tooltip term_id="kubelet" text="kubelet" >}} also exposes metrics in `/metrics/cadvisor` and `/metrics/resource` endpoints. Those metrics do not have same lifecycle. +In a production environment you may want to configure [Prometheus Server](https://prometheus.io/) or some other metrics scraper +to periodically gather these metrics and make them available in some kind of time series database. + +Note that {{< glossary_tooltip term_id="kubelet" text="kubelet" >}} also exposes metrics in `/metrics/cadvisor`, `/metrics/resource` and `/metrics/probes` endpoints. Those metrics do not have same lifecycle. If your cluster uses {{< glossary_tooltip term_id="rbac" text="RBAC" >}}, reading metrics requires authorization via a user, group or ServiceAccount with a ClusterRole that allows accessing `/metrics`. For example: @@ -55,7 +60,6 @@ Stable metrics can be guaranteed to not change; Specifically, stability means: * the metric itself will not be deleted (or renamed) * the type of metric will not be modified -* no labels can be added or removed from this metric Deprecated metric signal that the metric will eventually be deleted; to find which version, you need to check annotation, which includes from which kubernetes version that metric will be considered deprecated. @@ -96,6 +100,29 @@ Take metric `A` as an example, here assumed that `A` is deprecated in 1.n. Accor If you're upgrading from release `1.12` to `1.13`, but still depend on a metric `A` deprecated in `1.12`, you should set hidden metrics via command line: `--show-hidden-metrics=1.12` and remember to remove this metric dependency before upgrading to `1.14` +## Component metrics + +### kube-controller-manager metrics + +Controller manager metrics provide important insight into the performance and health of the controller manager. +These metrics include common Go language runtime metrics such as go_routine count and controller specific metrics such as +etcd request latencies or Cloudprovider (AWS, GCE, OpenStack) API latencies that can be used +to gauge the health of a cluster. + +Starting from Kubernetes 1.7, detailed Cloudprovider metrics are available for storage operations for GCE, AWS, Vsphere and OpenStack. +These metrics can be used to monitor health of persistent volume operations. + +For example, for GCE these metrics are called: + +``` +cloudprovider_gce_api_request_duration_seconds { request = "instance_list"} +cloudprovider_gce_api_request_duration_seconds { request = "disk_insert"} +cloudprovider_gce_api_request_duration_seconds { request = "disk_delete"} +cloudprovider_gce_api_request_duration_seconds { request = "attach_disk"} +cloudprovider_gce_api_request_duration_seconds { request = "detach_disk"} +cloudprovider_gce_api_request_duration_seconds { request = "list_disk"} +``` + {{% /capture %}} {{% capture whatsnext %}} diff --git a/data/concepts.yml b/data/concepts.yml index a20948c8abdd3..51974d25c14f4 100644 --- a/data/concepts.yml +++ b/data/concepts.yml @@ -123,7 +123,6 @@ toc: - docs/concepts/cluster-administration/authenticate-across-clusters-kubeconfig.md - docs/concepts/cluster-administration/master-node-communication.md - docs/concepts/cluster-administration/proxies.md - - docs/concepts/cluster-administration/controller-metrics.md - docs/concepts/cluster-administration/device-plugins.md - title: Policies section: