From ab7ae543b531149b344bbd66c49e1101f125190c Mon Sep 17 00:00:00 2001
From: Marek Siarkowicz <siarkowicz@google.com>
Date: Thu, 14 Nov 2019 14:53:54 +0100
Subject: [PATCH] Document control plane monitoring

---
 .../cluster-administration/monitoring.md      | 103 ++++++++++++++++++
 data/concepts.yml                             |   1 +
 2 files changed, 104 insertions(+)
 create mode 100644 content/en/docs/concepts/cluster-administration/monitoring.md

diff --git a/content/en/docs/concepts/cluster-administration/monitoring.md b/content/en/docs/concepts/cluster-administration/monitoring.md
new file mode 100644
index 0000000000000..012056f71dbb5
--- /dev/null
+++ b/content/en/docs/concepts/cluster-administration/monitoring.md
@@ -0,0 +1,103 @@
+---
+title: Monitoring Control Plane Components
+reviewers:
+- brancz
+- logicalhan
+- RainbowMango
+content_template: templates/concept
+weight: 60
+---
+
+{{% capture overview %}}
+
+System component metrics can give a better look into what is happening inside them. Metrics are particularly useful for building dashboards and alerts.
+
+{{% /capture %}}
+
+{{% capture body %}}
+
+## Metrics in Kubernetes
+
+Kubernetes control plane components publish metrics in Prometheus text format. In most cases those metrics are available on `/metrics` endpoint of the HTTP server. For components that doesn't expose endpoint by default it can be enabled using `--bind-address` flag.
+
+Examples of those components:
+* {{< glossary_tooltip term_id="kube-controller-manager" text="kube-controller-manager" >}}
+* {{< glossary_tooltip term_id="kube-proxy" text="kube-proxy" >}}
+* {{< glossary_tooltip term_id="kube-apiserver" text="kube-apiserver" >}}
+* {{< glossary_tooltip term_id="kube-scheduler" text="kube-scheduler" >}}
+* kube-proxy
+* kube-apiserver
+* kube-scheduler
+* kubelet (metrics exposed on `/metrics/cadvisor` and `/metrics/resource` do not have same guarantee)
+
+If your cluster uses {{< glossary_tooltip term_id="rbac" text="RBAC" >}}, reading metrics requires authorization via a user, group or ServiceAccount with a ClusterRole that allows accessing `/metrics`.
+For example:
+```
+apiVersion: rbac.authorization.k8s.io/v1	
+kind: ClusterRole	
+metadata:	
+  name: prometheus	
+rules:	
+  - nonResourceURLs:	
+      - "/metrics"	
+    verbs:	
+      - get
+```
+
+## Metric lifecycle
+
+Alpha metric →  Stable metric →  Deprecated metric →  Hidden metric → Deletion
+
+Alpha metrics have no stability guarantees; as such they can be modified or deleted at any time.
+
+Stable metrics can be guaranteed to not change; By not change, we mean three things:
+
+* the metric itself will not be deleted (or renamed)
+* the type of metric will not be modified
+* no labels can be added or removed from this metric
+
+Deprecated metric signal that the metric will eventually be deleted; are annotated with a Kubernetes version, from which point that metric will be considered deprecated.
+
+Before deprecation:
+
+```
+# HELP some_counter this counts things
+# TYPE some_counter counter
+some_counter 0
+```
+
+After deprecation:
+
+```
+# HELP some_counter (Deprecated from 1.15) this counts things
+# TYPE some_counter counter
+some_counter 0
+```
+
+Hidden metrics will no longer be exposed by default; to use a hidden metric, you need to override the configuration for the relevant cluster component.
+
+Deleted metrics will no longer be available.
+
+
+## Show Hidden Metrics
+
+As described above, admins can enable hidden metrics through a command-line flag on a specific binary. This intends to be used as an escape hatch for admins if they missed the migration of the metrics deprecated in the last release.
+
+The flag `show-hidden-metrics-for-version` takes a version for which you want to show metrics deprecated in that release. The version is expressed as x.y, where x is the major version, y is the minor version. The patch version is not needed even though a metrics can be deprecated in a patch release, the reason for that is the metrics deprecation policy runs against the minor release.
+
+The flag can only take the previous minor version as it's value. All metrics hidden in previous will be emitted if admins set the previous version to `show-hidden-metrics-for-version`. The too old version is not allowed because this violates the metrics deprecated policy.
+
+Take metric `A` as an example, here assumed that `A` is deprecated in 1.n. According to metrics deprecated policy, we can reach the following conclusion:
+
+* In release `1.n`, the metric is deprecated, and it can be emitted by default.
+* In release `1.n+1`, the metric is hidden by default and it can be emitted by command line `show-hidden-metrics-for-version=1.n`.
+* In release `1.n+2`, the metric should be removed from the codebase. No escape hatch anymore.
+
+So, if admins want to enable metric `A` in release `1.n+1`, they should set `1.n` to the command line flag. That is `show-hidden-metrics=1.n`.
+
+{{% /capture %}}
+
+{{% capture whatsnext %}}
+* Read about the [Prometheus text format](https://github.com/prometheus/docs/blob/master/content/docs/instrumenting/exposition_formats.md#text-based-format) for metrics
+* See the list of [stable Kubernetes metrics](https://github.com/kubernetes/kubernetes/blob/master/test/instrumentation/testdata/stable-metrics-list.yaml)
+{{% /capture %}}
diff --git a/data/concepts.yml b/data/concepts.yml
index 998b265e0666f..a20948c8abdd3 100644
--- a/data/concepts.yml
+++ b/data/concepts.yml
@@ -116,6 +116,7 @@ toc:
   - docs/concepts/cluster-administration/networking.md
   - docs/concepts/cluster-administration/network-plugins.md
   - docs/concepts/cluster-administration/logging.md
+  - docs/concepts/cluster-administration/monitoring.md
   - docs/concepts/cluster-administration/kubelet-garbage-collection.md
   - docs/concepts/cluster-administration/federation.md
   - docs/concepts/cluster-administration/sysctl-cluster.md