diff --git a/content/en/docs/concepts/cluster-administration/monitoring.md b/content/en/docs/concepts/cluster-administration/monitoring.md new file mode 100644 index 0000000000000..71f7e90c9c703 --- /dev/null +++ b/content/en/docs/concepts/cluster-administration/monitoring.md @@ -0,0 +1,97 @@ +--- +title: Monitoring Control Plane Components +reviewers: +- brancz +- logicalhan +- RainbowMango +content_template: templates/concept +weight: 60 +--- + +{{% capture overview %}} + +System component metrics can give a better look into what is happening inside them. Metrics are particularly useful for building dashboards and alerts. + +{{% /capture %}} + +{{% capture body %}} + +## Metrics in Kubernetes + +Kubernetes control plane components use prometheus as the default client for exposing metrics. In most cases those metrics are available on `/metrics` endpoint of the HTTP server. For components that doesn't expose endpoint by default it can be enabled using `--bind-address` flag. + +Examples of those components: +* kube-controller-manager +* kube-proxy +* kube-apiserver +* kube-scheduler +* kubelet (metrics exposed on `/metrics/cadvisor` and `/metrics/resource` do not have same guarantee) + +For clusters with RBAC enabled accessing metrics requires authorization through service account with `ClusterRole` for `/metrics` url. + +``` +apiVersion: rbac.authorization.k8s.io/v1 +kind: ClusterRole +metadata: + name: prometheus +rules: + - nonResourceURLs: + - "/metrics" + verbs: + - get +``` + +## Metric Lifecycle + +Alpha metric -> Stable metric -> Deprecated metric -> Hidden metric -> Deletion + +Alpha metrics have no stability guarantees; as such they can be modified or deleted at any time. + +Stable metrics can be guaranteed to not change, except that the metric may become marked deprecated for a future Kubernetes version. By not change, we mean three things: + +* the metric itself will not be deleted (or renamed) +* the type of metric will not be modified +* no labels can be added or removed from this metric + +List of currently supported stable metrics can be found [here](https://github.com/kubernetes/kubernetes/blob/master/test/instrumentation/testdata/stable-metrics-list.yaml) + +Deprecated metric are annotated with a Kubernetes version, from which point that metric will be considered deprecated. When a stable metric undergoes the deprecation process, we are signaling that the metric will eventually be deleted. + +Before deprecation: + +``` +# HELP some_counter this counts things +# TYPE some_counter counter +some_counter 0 +``` + +After deprecation: + +``` +# HELP some_counter (Deprecated from 1.15) this counts things +# TYPE some_counter counter +some_counter 0 +``` + +Hidden metrics will no longer be exposed by default and will require manual component configuration by cluster administrators; + +Deleted metrics will no longer be available; + + +## Show Hidden Metrics + +As described above, admins can enable hidden metrics through a command-line flag on a specific binary. This intends to be used as an escape hatch for admins if they missed the migration of the metrics deprecated in the last release. + +The flag `show-hidden-metrics-for-version` takes a version for which you want to show metrics deprecated in that release. The version is expressed as x.y, where x is the major version, y is the minor version. The patch version is not needed even though a metrics can be deprecated in a patch release, the reason for that is the metrics deprecation policy runs against the minor release. + +The flag can only take the previous minor version as it's value. All metrics hidden in previous will be emitted if admins set the previous version to `show-hidden-metrics-for-version`. The too old version is not allowed because this violates the metrics deprecated policy. + +Take metric `A` as an example, here assumed that `A` is deprecated in 1.n. According to metrics deprecated policy, we can reach the following conclusion: + +* In release `1.n`, the metric is deprecated, and it can be emitted by default. +* In release `1.n+1`, the metric is hidden by default and it can be emitted by command line `show-hidden-metrics-for-version=1.n`. +* In release `1.n+2`, the metric should be removed from the codebase. No escape hatch anymore. + +So, if admins want to enable metric `A` in release `1.n+1`, they should set `1.n` to the command line flag. That is `show-hidden-metrics=1.n`. + +{{% /capture %}} diff --git a/data/concepts.yml b/data/concepts.yml index 998b265e0666f..a92c34667e16b 100644 --- a/data/concepts.yml +++ b/data/concepts.yml @@ -113,6 +113,7 @@ toc: - docs/concepts/cluster-administration/certificates.md - docs/concepts/cluster-administration/cloud-providers.md - docs/concepts/cluster-administration/manage-deployment.md + - docs/concepts/cluster-administration/monitoring.md - docs/concepts/cluster-administration/networking.md - docs/concepts/cluster-administration/network-plugins.md - docs/concepts/cluster-administration/logging.md