From 74d075186a76dc76c508de57f54ac8aef923c218 Mon Sep 17 00:00:00 2001
From: Marek Siarkowicz <siarkowicz@google.com>
Date: Thu, 14 Nov 2019 14:53:54 +0100
Subject: [PATCH 1/4] Document control plane monitoring

---
 .../cluster-administration/monitoring.md      | 105 ++++++++++++++++++
 data/concepts.yml                             |   1 +
 2 files changed, 106 insertions(+)
 create mode 100644 content/en/docs/concepts/cluster-administration/monitoring.md

diff --git a/content/en/docs/concepts/cluster-administration/monitoring.md b/content/en/docs/concepts/cluster-administration/monitoring.md
new file mode 100644
index 0000000000000..607bb7c1a1347
--- /dev/null
+++ b/content/en/docs/concepts/cluster-administration/monitoring.md
@@ -0,0 +1,105 @@
+---
+title: Metrics For The Kubernetes Control Plane
+reviewers:
+- brancz
+- logicalhan
+- RainbowMango
+content_template: templates/concept
+weight: 60
+---
+
+{{% capture overview %}}
+
+System component metrics can give a better look into what is happening inside them. Metrics are particularly useful for building dashboards and alerts.
+
+Metrics in Kubernetes control plane components are exposed in Prometheus text format.
+
+{{% /capture %}}
+
+{{% capture body %}}
+
+## Metrics in Kubernetes
+
+In most cases those metrics are available on `/metrics` endpoint of the HTTP server. For components that doesn't expose endpoint by default it can be enabled using `--bind-address` flag.
+
+Examples of those components:
+* {{< glossary_tooltip term_id="kube-controller-manager" text="kube-controller-manager" >}}
+* {{< glossary_tooltip term_id="kube-proxy" text="kube-proxy" >}}
+* {{< glossary_tooltip term_id="kube-apiserver" text="kube-apiserver" >}}
+* {{< glossary_tooltip term_id="kube-scheduler" text="kube-scheduler" >}}
+* {{< glossary_tooltip term_id="kubelet" text="kubelet" >}}
+
+Note that {{< glossary_tooltip term_id="kubelet" text="kubelet" >}} also exposes metrics in `/metrics/cadvisor` and `/metrics/resource` endpoints. Those metrics do not have same lifecycle.
+
+If your cluster uses {{< glossary_tooltip term_id="rbac" text="RBAC" >}}, reading metrics requires authorization via a user, group or ServiceAccount with a ClusterRole that allows accessing `/metrics`.
+For example:
+```
+apiVersion: rbac.authorization.k8s.io/v1	
+kind: ClusterRole	
+metadata:	
+  name: prometheus	
+rules:	
+  - nonResourceURLs:	
+      - "/metrics"	
+    verbs:	
+      - get
+```
+
+## Metric lifecycle
+
+Alpha metric →  Stable metric →  Deprecated metric →  Hidden metric → Deletion
+
+Alpha metrics have no stability guarantees; as such they can be modified or deleted at any time.
+
+Stable metrics can be guaranteed to not change; Specifically, stability means:
+
+* the metric itself will not be deleted (or renamed)
+* the type of metric will not be modified
+* no labels can be added or removed from this metric
+
+Deprecated metric signal that the metric will eventually be deleted; to find which version, you need to check annotation, which includes from which kubernetes version that metric will be considered deprecated.
+
+Before deprecation:
+
+```
+# HELP some_counter this counts things
+# TYPE some_counter counter
+some_counter 0
+```
+
+After deprecation:
+
+```
+# HELP some_counter (Deprecated since 1.15.0) this counts things
+# TYPE some_counter counter
+some_counter 0
+```
+
+Hidden metrics will no longer be exposed by default; to use a hidden metric, you need to override the configuration for the relevant cluster component.
+
+Deleted metrics will no longer be available.
+
+
+## Show Hidden Metrics
+
+As described above, admins can enable hidden metrics through a command-line flag on a specific binary. This intends to be used as an escape hatch for admins if they missed the migration of the metrics deprecated in the last release.
+
+The flag `show-hidden-metrics-for-version` takes a version for which you want to show metrics deprecated in that release. The version is expressed as x.y, where x is the major version, y is the minor version. The patch version is not needed even though a metrics can be deprecated in a patch release, the reason for that is the metrics deprecation policy runs against the minor release.
+
+The flag can only take the previous minor version as it's value. All metrics hidden in previous will be emitted if admins set the previous version to `show-hidden-metrics-for-version`. The too old version is not allowed because this violates the metrics deprecated policy.
+
+Take metric `A` as an example, here assumed that `A` is deprecated in 1.n. According to metrics deprecated policy, we can reach the following conclusion:
+
+* In release `1.n`, the metric is deprecated, and it can be emitted by default.
+* In release `1.n+1`, the metric is hidden by default and it can be emitted by command line `show-hidden-metrics-for-version=1.n`.
+* In release `1.n+2`, the metric should be removed from the codebase. No escape hatch anymore.
+
+If you're upgrading from release `1.12` to `1.13`, but still depend on a metric `A` deprecated in `1.12`, you should set hidden metrics via command line: `--show-hidden-metrics=1.12` and remember to remove this metric dependency before upgrading to `1.14`
+
+{{% /capture %}}
+
+{{% capture whatsnext %}}
+* Read about the [Prometheus text format](https://github.com/prometheus/docs/blob/master/content/docs/instrumenting/exposition_formats.md#text-based-format) for metrics
+* See the list of [stable Kubernetes metrics](https://github.com/kubernetes/kubernetes/blob/master/test/instrumentation/testdata/stable-metrics-list.yaml)
+* Read about the [Kubernetes deprecation policy](https://kubernetes.io/docs/reference/using-api/deprecation-policy/#deprecating-a-feature-or-behavior )
+{{% /capture %}}
diff --git a/data/concepts.yml b/data/concepts.yml
index 998b265e0666f..a20948c8abdd3 100644
--- a/data/concepts.yml
+++ b/data/concepts.yml
@@ -116,6 +116,7 @@ toc:
   - docs/concepts/cluster-administration/networking.md
   - docs/concepts/cluster-administration/network-plugins.md
   - docs/concepts/cluster-administration/logging.md
+  - docs/concepts/cluster-administration/monitoring.md
   - docs/concepts/cluster-administration/kubelet-garbage-collection.md
   - docs/concepts/cluster-administration/federation.md
   - docs/concepts/cluster-administration/sysctl-cluster.md

From 9e37dc82c7e0446b0bfb3f542b16431dbab76b52 Mon Sep 17 00:00:00 2001
From: Marek Siarkowicz <marek.siarkowicz@protonmail.com>
Date: Wed, 15 Jan 2020 09:19:42 +0100
Subject: [PATCH 2/4] Update
 content/en/docs/concepts/cluster-administration/monitoring.md

Co-Authored-By: Tim Bannister <tim@scalefactory.com>
---
 content/en/docs/concepts/cluster-administration/monitoring.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/content/en/docs/concepts/cluster-administration/monitoring.md b/content/en/docs/concepts/cluster-administration/monitoring.md
index 607bb7c1a1347..9258b0453de57 100644
--- a/content/en/docs/concepts/cluster-administration/monitoring.md
+++ b/content/en/docs/concepts/cluster-administration/monitoring.md
@@ -77,7 +77,7 @@ some_counter 0
 
 Hidden metrics will no longer be exposed by default; to use a hidden metric, you need to override the configuration for the relevant cluster component.
 
-Deleted metrics will no longer be available.
+Once a metric is deleted, the metric is not published. You cannot change this using an override.
 
 
 ## Show Hidden Metrics

From c069d23ca68138e300ea079de8edd49633709cb1 Mon Sep 17 00:00:00 2001
From: Marek Siarkowicz <marek.siarkowicz@protonmail.com>
Date: Wed, 15 Jan 2020 09:19:55 +0100
Subject: [PATCH 3/4] Update
 content/en/docs/concepts/cluster-administration/monitoring.md

Co-Authored-By: Tim Bannister <tim@scalefactory.com>
---
 content/en/docs/concepts/cluster-administration/monitoring.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/content/en/docs/concepts/cluster-administration/monitoring.md b/content/en/docs/concepts/cluster-administration/monitoring.md
index 9258b0453de57..b2b941e49657b 100644
--- a/content/en/docs/concepts/cluster-administration/monitoring.md
+++ b/content/en/docs/concepts/cluster-administration/monitoring.md
@@ -75,7 +75,7 @@ After deprecation:
 some_counter 0
 ```
 
-Hidden metrics will no longer be exposed by default; to use a hidden metric, you need to override the configuration for the relevant cluster component.
+Once a metric is hidden then by default the metrics is not published for scraping. To use a hidden metric, you need to override the configuration for the relevant cluster component.
 
 Once a metric is deleted, the metric is not published. You cannot change this using an override.
 

From b346861ac1eb8efa0e9dd7d90148a9847f941de2 Mon Sep 17 00:00:00 2001
From: Marek Siarkowicz <marek.siarkowicz@protonmail.com>
Date: Wed, 15 Jan 2020 09:19:55 +0100
Subject: [PATCH 4/4] Merge controller-metrics.md into monitoring.md

---
 .../controller-metrics.md                     | 50 -------------------
 .../cluster-administration/monitoring.md      | 35 +++++++++++--
 data/concepts.yml                             |  1 -
 3 files changed, 31 insertions(+), 55 deletions(-)
 delete mode 100644 content/en/docs/concepts/cluster-administration/controller-metrics.md

diff --git a/content/en/docs/concepts/cluster-administration/controller-metrics.md b/content/en/docs/concepts/cluster-administration/controller-metrics.md
deleted file mode 100644
index 57ed5c16d657a..0000000000000
--- a/content/en/docs/concepts/cluster-administration/controller-metrics.md
+++ /dev/null
@@ -1,50 +0,0 @@
----
-title: Controller manager metrics
-content_template: templates/concept
-weight: 100
----
-
-{{% capture overview %}}
-Controller manager metrics provide important insight into the performance and health of
-the controller manager.
-
-{{% /capture %}}
-
-{{% capture body %}}
-## What are controller manager metrics
-
-Controller manager metrics provide important insight into the performance and health of the controller manager.
-These metrics include common Go language runtime metrics such as go_routine count and controller specific metrics such as
-etcd request latencies or Cloudprovider (AWS, GCE, OpenStack) API latencies that can be used
-to gauge the health of a cluster.
-
-Starting from Kubernetes 1.7, detailed Cloudprovider metrics are available for storage operations for GCE, AWS, Vsphere and OpenStack.
-These metrics can be used to monitor health of persistent volume operations.
-
-For example, for GCE these metrics are called:
-
-```
-cloudprovider_gce_api_request_duration_seconds { request = "instance_list"}
-cloudprovider_gce_api_request_duration_seconds { request = "disk_insert"}
-cloudprovider_gce_api_request_duration_seconds { request = "disk_delete"}
-cloudprovider_gce_api_request_duration_seconds { request = "attach_disk"}
-cloudprovider_gce_api_request_duration_seconds { request = "detach_disk"}
-cloudprovider_gce_api_request_duration_seconds { request = "list_disk"}
-```
-
-
-
-## Configuration
-
-
-In a cluster, controller-manager metrics are available from `http://localhost:10252/metrics`
-from the host where the controller-manager is running.
-
-The metrics are emitted in [prometheus format](https://prometheus.io/docs/instrumenting/exposition_formats/) and are human readable.
-
-In a production environment you may want to configure prometheus or some other metrics scraper
-to periodically gather these metrics and make them available in some kind of time series database.
-
-{{% /capture %}}
-
-
diff --git a/content/en/docs/concepts/cluster-administration/monitoring.md b/content/en/docs/concepts/cluster-administration/monitoring.md
index b2b941e49657b..92b74b6634c22 100644
--- a/content/en/docs/concepts/cluster-administration/monitoring.md
+++ b/content/en/docs/concepts/cluster-administration/monitoring.md
@@ -6,13 +6,15 @@ reviewers:
 - RainbowMango
 content_template: templates/concept
 weight: 60
+aliases:
+- controller-metrics.md
 ---
 
 {{% capture overview %}}
 
 System component metrics can give a better look into what is happening inside them. Metrics are particularly useful for building dashboards and alerts.
 
-Metrics in Kubernetes control plane components are exposed in Prometheus text format.
+Metrics in Kubernetes control plane are emitted in [prometheus format](https://prometheus.io/docs/instrumenting/exposition_formats/) and are human readable.
 
 {{% /capture %}}
 
@@ -20,7 +22,7 @@ Metrics in Kubernetes control plane components are exposed in Prometheus text fo
 
 ## Metrics in Kubernetes
 
-In most cases those metrics are available on `/metrics` endpoint of the HTTP server. For components that doesn't expose endpoint by default it can be enabled using `--bind-address` flag.
+In most cases metrics are available on `/metrics` endpoint of the HTTP server. For components that doesn't expose endpoint by default it can be enabled using `--bind-address` flag.
 
 Examples of those components:
 * {{< glossary_tooltip term_id="kube-controller-manager" text="kube-controller-manager" >}}
@@ -29,7 +31,10 @@ Examples of those components:
 * {{< glossary_tooltip term_id="kube-scheduler" text="kube-scheduler" >}}
 * {{< glossary_tooltip term_id="kubelet" text="kubelet" >}}
 
-Note that {{< glossary_tooltip term_id="kubelet" text="kubelet" >}} also exposes metrics in `/metrics/cadvisor` and `/metrics/resource` endpoints. Those metrics do not have same lifecycle.
+In a production environment you may want to configure [Prometheus Server](https://prometheus.io/) or some other metrics scraper
+to periodically gather these metrics and make them available in some kind of time series database.
+
+Note that {{< glossary_tooltip term_id="kubelet" text="kubelet" >}} also exposes metrics in `/metrics/cadvisor`, `/metrics/resource` and `/metrics/probes` endpoints. Those metrics do not have same lifecycle.
 
 If your cluster uses {{< glossary_tooltip term_id="rbac" text="RBAC" >}}, reading metrics requires authorization via a user, group or ServiceAccount with a ClusterRole that allows accessing `/metrics`.
 For example:
@@ -55,7 +60,6 @@ Stable metrics can be guaranteed to not change; Specifically, stability means:
 
 * the metric itself will not be deleted (or renamed)
 * the type of metric will not be modified
-* no labels can be added or removed from this metric
 
 Deprecated metric signal that the metric will eventually be deleted; to find which version, you need to check annotation, which includes from which kubernetes version that metric will be considered deprecated.
 
@@ -96,6 +100,29 @@ Take metric `A` as an example, here assumed that `A` is deprecated in 1.n. Accor
 
 If you're upgrading from release `1.12` to `1.13`, but still depend on a metric `A` deprecated in `1.12`, you should set hidden metrics via command line: `--show-hidden-metrics=1.12` and remember to remove this metric dependency before upgrading to `1.14`
 
+## Component metrics
+
+### kube-controller-manager metrics
+
+Controller manager metrics provide important insight into the performance and health of the controller manager.
+These metrics include common Go language runtime metrics such as go_routine count and controller specific metrics such as
+etcd request latencies or Cloudprovider (AWS, GCE, OpenStack) API latencies that can be used
+to gauge the health of a cluster.
+
+Starting from Kubernetes 1.7, detailed Cloudprovider metrics are available for storage operations for GCE, AWS, Vsphere and OpenStack.
+These metrics can be used to monitor health of persistent volume operations.
+
+For example, for GCE these metrics are called:
+
+```
+cloudprovider_gce_api_request_duration_seconds { request = "instance_list"}
+cloudprovider_gce_api_request_duration_seconds { request = "disk_insert"}
+cloudprovider_gce_api_request_duration_seconds { request = "disk_delete"}
+cloudprovider_gce_api_request_duration_seconds { request = "attach_disk"}
+cloudprovider_gce_api_request_duration_seconds { request = "detach_disk"}
+cloudprovider_gce_api_request_duration_seconds { request = "list_disk"}
+```
+
 {{% /capture %}}
 
 {{% capture whatsnext %}}
diff --git a/data/concepts.yml b/data/concepts.yml
index a20948c8abdd3..51974d25c14f4 100644
--- a/data/concepts.yml
+++ b/data/concepts.yml
@@ -123,7 +123,6 @@ toc:
   - docs/concepts/cluster-administration/authenticate-across-clusters-kubeconfig.md
   - docs/concepts/cluster-administration/master-node-communication.md
   - docs/concepts/cluster-administration/proxies.md
-  - docs/concepts/cluster-administration/controller-metrics.md
   - docs/concepts/cluster-administration/device-plugins.md
   - title: Policies
     section: