Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KubeControllerManagerDown & kubeSchedulerDown firing on kubeadm 1.18 cluster #718

Open
jeanluclariviere opened this issue Oct 8, 2020 · 41 comments

Comments

@jeanluclariviere
Copy link

jeanluclariviere commented Oct 8, 2020

What happened?
Deploying kube-prometheus release-0.6 to a kubeadm boostrapped bare-metal cluster causes KubeControllerManagerDown and kubeSchedulerDown alerts to fire.

Did you expect to see some different?
Alerts should not fire as everything is up.

How to reproduce it (as minimally and precisely as possible):
Deploy release-0.6 with the bellow config to a kubeadm boostrapped cluster running 1.18.x

  • Prometheus Operator version:
    prometheus-operator:v0.42.1

  • Manifests:

local kp =
  (import 'kube-prometheus/kube-prometheus.libsonnet') +
  (import 'kube-prometheus/kube-prometheus-kubeadm.libsonnet') +
  (import 'kube-prometheus/kube-prometheus-all-namespaces.libsonnet') +
  (import 'kube-prometheus/kube-prometheus-anti-affinity.libsonnet')
  {
    _config+:: {
      namespace: 'monitoring',
    },
  };

{ ['setup/0namespace-' + name]: kp.kubePrometheus[name] for name in std.objectFields(kp.kubePrometheus) } +
{
  ['setup/prometheus-operator-' + name]: kp.prometheusOperator[name]
  for name in std.filter((function(name) name != 'serviceMonitor'), std.objectFields(kp.prometheusOperator))
} +
// serviceMonitor is separated so that it can be created after the CRDs are ready
{ 'prometheus-operator-serviceMonitor': kp.prometheusOperator.serviceMonitor } +
{ ['node-exporter-' + name]: kp.nodeExporter[name] for name in std.objectFields(kp.nodeExporter) } +
{ ['kube-state-metrics-' + name]: kp.kubeStateMetrics[name] for name in std.objectFields(kp.kubeStateMetrics) } +
{ ['alertmanager-' + name]: kp.alertmanager[name] for name in std.objectFields(kp.alertmanager) } +
{ ['prometheus-' + name]: kp.prometheus[name] for name in std.objectFields(kp.prometheus) } +
{ ['prometheus-adapter-' + name]: kp.prometheusAdapter[name] for name in std.objectFields(kp.prometheusAdapter) } +
{ ['grafana-' + name]: kp.grafana[name] for name in std.objectFields(kp.grafana) }

Anything else we need to know?:
This issue is related to Kubernetes 1.18.x, it appears a few changes were made to the kube-controller-manager and kube-scheduler.

Firstly, version 1.18+ now uses the more secure https port of 10257 and disables http by default. Unfortunately, the --secure-port used by both kube-controller-manager and kube-scheduler is bound to 127.0.0.1 and not 0.0.0.0.

As a result, metrics cannot be collected until the bound address for both of these is updated.

The workaround:
Updating the manifests in /etc/kubernetes/manifests/ to use the --bind-address 0.0.0.0 for both the scheduler and the controller manager will relaunch the the pods with the correct bind address, but these settings will not survive a kubeadm upgrade.

In order to persist the settings, the kubeadm-config configmap in the kube-system namespace should also be edited to include the following:

    controllerManager:
      extraArgs:
        bind-address: 0.0.0.0
    scheduler:
      extraArgs:
        bind-address: 0.0.0.0

I understand this isn't a bug directly related to kube-prometheus, but I didn't find this documented anywhere and had been scratching my head for a day looking at this. Hoping this will help someone else in the future.

@paulfantom
Copy link
Member

We do have a patch to disable those alerts and ServiceMonitors in clusters which don't expose access to controller manager and scheduler metrics. But yes, our documentation is lacking in this field.

@brancz
Copy link
Collaborator

brancz commented Oct 16, 2020

It would be amazing if you could propose a PR to add what you described in "the workaround" where you expected to find this documentation when you were looking for it! :)

@dnsmap
Copy link

dnsmap commented Nov 17, 2020

yes

@jeanluclariviere
Copy link
Author

@brancz sorry, I've been meaning to get back to this but I keep getting side tracked with other things. Do you think the Troubleshooting section on the main page would be a suitable place for this? If yes, I can put something together linking to either the patch for disabling those checks if on a managed cluster, or for updating kubeadm to use 0.0.0.0 instead of the loopback address for the secure port (at a users discretion of course).

@brancz
Copy link
Collaborator

brancz commented Nov 26, 2020

Yes! I think the troubleshooting guide is a great place because that's most likely what people look at when they encounter this.

@KeithTt
Copy link

KeithTt commented Dec 9, 2020

I have changed the bind address, but it did not work. It is strange.

@johntostring
Copy link

johntostring commented Dec 28, 2020

@KeithTt I have encountered the same trouble recently, It took me 5 days to figure it out.
My K8S cluster was created via kubeadm and the version is 1.19.2

Check these points:

  1. Edit /etc/kubernetes/manifests/kube-scheduler.yaml, change --bind-address=127.0.0.1 to --bind-address=0.0.0.0
  2. Create a Service and make sure it has a label(for me it's k8s-app: kube-scheduler) matching with the ServiceMonitor(spec.selector.matchLabels)
  3. Make sure the Service matching the right pod with right label (for me, it's component: kube-scheduler)
  4. It took me a long time to find the last one.:triumph: the Service's port name(for me, it's https-metrics) must matching with ServiceMonitor(spec.endpoints.port)
  5. Do the same check for kube-controller-manager

FYI. my yaml files:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  labels:
    k8s-app: kube-scheduler
  name: kube-scheduler
  namespace: monitoring
spec:
  endpoints:
  - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
    interval: 30s
    port: https-metrics
    scheme: https
    tlsConfig:
      insecureSkipVerify: true
  jobLabel: k8s-app
  namespaceSelector:
    matchNames:
    - kube-system
  selector:
    matchLabels:
      k8s-app: kube-scheduler
apiVersion: v1
kind: Service
metadata:
  namespace: kube-system
  name: kube-scheduler
  labels:
    k8s-app: kube-scheduler
spec:
  ports:
  - name: https-metrics
    port: 10259
  selector:
    component: kube-scheduler

@ksa-real
Copy link

ksa-real commented Jan 19, 2021

Note, that both kube-scheduler and kube-controller-manager certificates have localhost and 127.0.0.1. This is opposing to etcd ones, which contains actual hostname and node IP. So at this moment, both scheduler and controller-manager require insecureSkipVerify: true

Binding to 0.0.0.0 may have some security implications, like exposing previously hidden endpoints to the cluster and maybe even to the public.
insecureSkipVerify is also more of a temporary workaround.

I see two approaches:

  • Design component endpoints with Prometheus scraper in mind: i.e. they provide a certificate that includes master nodes' IPs (similar to etcd). Bind to the main network interface instead of 127.0.0.1. This likely requires support by kubeadm.
  • Delegate scraping to some collector, ran as a DaemonSet. E.g. node-exporter can act as such. It may proxy the request to the node and respond to Prometheus with the result. (Suboptimal, but controllable on the Prometheus side).

@jeanluclariviere
Copy link
Author

Binding to 0.0.0.0 may have some security implications, like exposing previously hidden endpoints to the cluster and maybe even to the public.

I totally agree, which is partially why I haven't made time to submit the PR for updating the troubleshooting docs - that and I'm swamped.

So at this moment, both scheduler and controller-manager require insecureSkipVerify: true

Yea, if what you're saying is true than that would be necessary - I don't recall having to set this value though. Does prometheus even check for valid certificates on the endpoints it scrapes?

Design component endpoints with Prometheus scraper in mind: i.e. they provide a certificate that includes master nodes' IPs (similar to etcd). Bind to the main network interface instead of 127.0.0.1. This likely requires support by kubeadm.

I think this solution is more realistic - I set the bind address to 0.0.0.0 out of laziness, and I suspect most folks are like me (not that this is a good thing!)

@ksa-real
Copy link

So at this moment, both scheduler and controller-manager require insecureSkipVerify: true

Yea, if what you're saying is true than that would be necessary - I don't recall having to set this value though. Does prometheus even check for valid certificates on the endpoints it scrapes?

It is. Scraping fails otherwise.

Design component endpoints with Prometheus scraper in mind: i.e. they provide a certificate that includes master nodes' IPs (similar to etcd). Bind to the main network interface instead of 127.0.0.1. This likely requires support by kubeadm.

I think this solution is more realistic - I set the bind address to 0.0.0.0 out of laziness, and I suspect most folks are like me (not that this is a good thing!)

At the moment I have no idea how to make it. bind-address is passed to scheduler/controller-manager manifests from extraArgs: bind-address: 0.0.0.0. This parameter is static among all nodes. The only "universal" addresses are 127.0.0.1 and 0.0.0.0. Instead, the wanted value is the node IP address. The certificate is generated for localhost/127.0.0.1 regardless of the bind-address. So the right config seems not possible with kubeadm init phase control-plane controller-manager --config kubeadm.yml.

Also, I'm not sure if anything relies on scheduler/control-manager being bounded to 127.0.0.1. Opposing to etcd these can be bound only to a single address AFAIK.

@ksa-real
Copy link

kubernetes/kubeadm#2244 - related issue about kube-scheduler and kube-controller-manager certificates.

@omerozery
Copy link

omerozery commented Feb 2, 2021

it seems that in k8s v1.20.2 (probably even before, i didn't check)
the 0.0.0.0 workaround is no longer working, since the default insecure address is already 0.0.0.0 (both for the scheduler and the controller manager) but it is disabled by default, and it looks like the only solution is to use the deprecated port parameter to enable the insecure listening.
in short. the workaround should be configured like so:

controllerManager:
  extraArgs:
    port: '10252'
scheduler:
  extraArgs:
    port: '10251'

even though in the documentations they are saying:
the default port for the scheduler is 10251 - (Ref: https://kubernetes.io/docs/reference/command-line-tools-reference/kube-scheduler/)
the default port for the controller-manager is not specified - (Ref: https://kubernetes.io/docs/reference/command-line-tools-reference/kube-controller-manager/)

both /etc/kubernetes/manifests/kube-scheduler.yaml and /etc/kubernetes/manifests/kube-controller-manager.yaml come with: - --port=0 out of the box
which disable the insecure listening.

if someone can approve/deny it i will appreciate it, because i don't like this solution even for a workaround.

@weibeld
Copy link

weibeld commented Feb 5, 2021

@omerozery I think the default value for --bind-address for both kube-controller-manager and kube-scheduler v1.20 is 0.0.0.0, in which case the issue reported by the OP wouldn't occur. But the point is that kubeadm by default sets these values to 127.0.0.1. So, to revert these settings made by kubeadm, the workaround is

controllerManager:
  extraArgs:
    bind-address: 0.0.0.0
scheduler:
  extraArgs:
    bind-address: 0.0.0.0

as posted by the OP.

Regarding your workaround, kube-controller-manager v1.20 doesn't have a --port flag anymore, so your configuration likely wouldn't work (kube-scheduler v1.20 still has the --port flag). Unless you report that you could indeed enable port 10252 for kube-controller-manager.

@omerozery
Copy link

omerozery commented Feb 7, 2021

i wrote the comment above after i tried it myself..
i was using the "bind-address" workaround on kubeadm deployed clusters v1.18.2, because like you said it was needed and everything worked fine, (putting aside the security issue).
last couple of days i was working on deploying clusters v1.20.2 (using kubeadm)
and this configuration didn't make any difference (netstat command below returned nothing and prometheus couldn't scrap the metrics).
the only thing that opened the ports and listened using the address 0.0.0.0 is the "ports" workaround.
this is the output from my k8s-master when using the "ports" workaround:

[root@my-k8s-master ~]# cat /etc/kubernetes/manifests/kube-scheduler.yaml  | egrep -i '\--bind|\--port'
    - --bind-address=127.0.0.1
    - --port=10251
[root@my-k8s-master ~]# netstat -tunap | grep -i 10251
tcp6       0      0 :::10251                :::*                    LISTEN      12058/kube-schedule 
[root@my-k8s-master ~]# cat /etc/kubernetes/manifests/kube-controller-manager.yaml  | egrep -i '\--bind|\--port'
    - --bind-address=127.0.0.1
    - --port=10252
[root@my-k8s-master ~]# netstat -tunap | grep -i 10252
tcp6       0      0 :::10252                :::*                    LISTEN      12065/kube-controll 

clearly the 127.0.0.1 is ignored , am i missing something?

@ksa-real
Copy link

ksa-real commented Feb 7, 2021

@omerozery
https://kubernetes.io/docs/reference/command-line-tools-reference/kube-controller-manager/
--port is non-existent, there is only --secure-port now. Default is 10257 for controller-manager, 10259 for the scheduler.

@weibeld Note that --bind-address=0.0.0.0 may expose your metrics to the internet in the production environment. I see
two solutions at the moment:

  • Apply firewall rules to drop all connections besides going to the node IP address
  • Patching kubeadm.yml with extraArg: --bind-address=<NODE IP ADDRESS> on every control plane node before doing kubeadm init phase control-plane controller-manager --config kubeadm.yml (same for scheduler).

I made comments in linked kubeadm issue. Generally, IMO kubeadm authors haven't thought it through. The best would be using node IP addresses instead of 127.0.0.1 by default for --bind-address and probes.

kubernetes/kubeadm#2388

@neolit123
Copy link

neolit123 commented Feb 8, 2021

see my comment here:
kubernetes/kubeadm#2388 (comment)

I made comments in linked kubeadm issue. Generally, IMO kubeadm authors haven't thought it through. The best would be using node IP addresses instead of 127.0.0.1 by default for --bind-address and probes.

we've seen the requests about it, but the response has been that we don't want to expose the components outside of localhost due to metrics.

the current best workaround:

  • on each control-plane node, create a bash script that writes the files kube-scheduler.json and kube-controller-manager.json in a folder with the following same contents:
[
	{ "op": "add", "path": "/spec/containers/0/command/-", "value": "--bind-address=SOME_IP" },
	{ "op": "replace", "path": "/spec/containers/0/livenessProbe/httpGet/host", "value": "SOME_IP" }
	{ "op": "replace", "path": "/spec/containers/0/startupProbe/httpGet/host", "value": "SOME_IP" }
]
  • the bash script should put SOME_IP to be the IP you want to bind.
  • call the bash script then call kubeadm init/join/upgrade with --experimental-patches=thepatchfolder.

of course, if there are sufficient votes about this change request let's comment on kubernetes/kubeadm#2388

@omerozery
Copy link

omerozery commented Feb 8, 2021

@ksa-real @weibeld.
the documents and your comments above (which are basically documents references) do not reflect what is actually happening, for kubeadm v1.20.2 deployed clusters, repeating it won't make it true, please try it before you comment.

@ksa-real
Copy link

ksa-real commented Feb 8, 2021

@ksa-real @weibeld.
the documents and your comments above (which are basically documents references) do not reflect what is actually happening, for kubeadm v1.20.2 deployed clusters, repeating it won't make it true, please try it before you comment.

I meant don't use the deprecated --port at all. Bind to 0.0.0.0 or node IP and scrape the HTTPS 10257/10259 ignoring the certificate.

netstat -tunap | grep -i 1025[79]

@weibeld
Copy link

weibeld commented Feb 10, 2021

The proxy workaround can be relatively easily implemented by running an HAProxy container with the following configuration as a DaemonSet on each master node:

defaults
  mode http
  timeout connect 5000ms
  timeout client 5000ms
  timeout server 5000ms
  default-server maxconn 10

frontend kube-controller-manager
  bind ${NODE_IP}:10257
  http-request deny if !{ path /metrics }
  default_backend kube-controller-manager
backend kube-controller-manager
  server kube-controller-manager 127.0.0.1:10257 ssl verify none

frontend kube-scheduler
  bind ${NODE_IP}:10259
  http-request deny if !{ path /metrics }
  default_backend kube-scheduler
backend kube-scheduler
  server kube-scheduler 127.0.0.1:10259 ssl verify none

Note the following:

  • The $NODE_IP environment variable (which is the desired IP address that the proxy should listen on) can be passed into the HAProxy Pod with a fieldRef:
    env:
    - name: NODE_IP
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: status.hostIP
  • The proxy skips the validation of the TLS server certificate of kube-controller-manager and kube-scheduler (verify none). This is due to the kubeadm defaults setting up kube-controller-manager and kube-scheduler with an unsigned TLS server certificate for serving HTTPS which is saved at an unknown location (see Provide proper certificates for kube-scheduler and kube-controller-manager kubernetes/kubeadm#2244 and kubeadm doest't configure kube-scheduler pods to mount k8s cert directories kubernetes/kubernetes#80063). However, this could be changed by using either --tls-private-key-file and --tls-cert-file or --cert-dir on kube-controller-manager and kube-scheduler, in which case it should be possible to validate the TLS server certificate.
  • The proxy serves only HTTP, however, if HTTPS is really necessary, this could be adapted in the HAProxy configuration (in which case the serving certificate could be freely chosen).
  • The proxy accepts only requests to the /metrics endpoint to not expose any other functionality of the backing services. If necessary, this could be further restricted in the HAProxy configuration by, e.g. only allowing requests from a certain IP address range.
  • The proxy Pods must run in the hostNetwork so that they can access the loopback interfaces of the corresponding kube-controller-manager and kube-scheduler Pods.

After deploying the DaemonSet, you can scrape the metrics of kube-controller-manager and kube-scheduler on http://<NODE_IP>:10257/metrics and http://<NODE_IP>:10259/metrics.

The main disadvantage of this workaround is that the names of the kube-controller-manager and kube-scheduler Pods are lost. Prometheus discovers only the names of the proxy Pods, which is not really useful. Prometheus discovers the names of the nodes the Pods run on (in the __meta_kubernetes_pod_node_name label), so it's at least possible to tell that a given metric belongs to the kube-controller-manager of kube-scheduler of that node, however, there seems to be no easy way to deduce the exact name of this kube-controller-manager or kube-scheduler Pod.

@VanLiuZhi
Copy link

Thanks! But I think this is‘ot good idea.

@jgerry2002
Copy link

We do have a patch to disable those alerts and ServiceMonitors in clusters which don't expose access to controller manager and scheduler metrics. But yes, our documentation is lacking in this field.

Does anyone know where this patch is? Any 1.20 or above cluster, basically none of these workarounds, including HAProxy, ever expose those metrics. Probably best to offer a quick way to disable this entirely until folks make a decision some day about this.

@paulfantom
Copy link
Member

paulfantom commented Jul 7, 2021

Try setting platform: 'kubeadm' as described in https://github.com/prometheus-operator/kube-prometheus#cluster-creation-tools. Patch will be applied automatically along with optimizations for that platform.

Patch is in https://github.com/prometheus-operator/kube-prometheus/blob/main/jsonnet/kube-prometheus/addons/managed-cluster.libsonnet

@budimanjojo
Copy link

@KeithTt I have encountered the same trouble recently, It took me 5 days to figure it out.
My K8S cluster was created via kubeadm and the version is 1.19.2

Check these points:

  1. Edit /etc/kubernetes/manifests/kube-scheduler.yaml, change --bind-address=127.0.0.1 to --bind-address=0.0.0.0
  2. Create a Service and make sure it has a label(for me it's k8s-app: kube-scheduler) matching with the ServiceMonitor(spec.selector.matchLabels)
  3. Make sure the Service matching the right pod with right label (for me, it's component: kube-scheduler)
  4. It took me a long time to find the last one.triumph the Service's port name(for me, it's https-metrics) must matching with ServiceMonitor(spec.endpoints.port)
  5. Do the same check for kube-controller-manager

FYI. my yaml files:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  labels:
    k8s-app: kube-scheduler
  name: kube-scheduler
  namespace: monitoring
spec:
  endpoints:
  - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
    interval: 30s
    port: https-metrics
    scheme: https
    tlsConfig:
      insecureSkipVerify: true
  jobLabel: k8s-app
  namespaceSelector:
    matchNames:
    - kube-system
  selector:
    matchLabels:
      k8s-app: kube-scheduler
apiVersion: v1
kind: Service
metadata:
  namespace: kube-system
  name: kube-scheduler
  labels:
    k8s-app: kube-scheduler
spec:
  ports:
  - name: https-metrics
    port: 10259
  selector:
    component: kube-scheduler

@jgerry2002 I have this working, but you need to match the service label to app.kubernetes.io/name: kube-scheduler. But I have another problem now. I have this alert:

name: TargetDown
expr: 100 * (count by(job, namespace, service) (up == 0) / count by(job, namespace, service) (up)) > 10
for: 10m

firing instead now, it is complaining kube-scheduler and kube-controller-manager pod in kube-system is down 😆

@budimanjojo
Copy link

Try setting platform: 'kubeadm' as described in https://github.com/prometheus-operator/kube-prometheus#cluster-creation-tools. Patch will be applied automatically along with optimizations for that platform.

Patch is in https://github.com/prometheus-operator/kube-prometheus/blob/main/jsonnet/kube-prometheus/addons/managed-cluster.libsonnet

So it looks like the solution from the patch is simply not creating any rule for scheduler and controller?

@paulfantom
Copy link
Member

paulfantom commented Jul 7, 2021

So it looks like the solution from the patch is simply not creating any rule for scheduler and controller?

Actually, no. Selecting kubeadm as platform will add 2 Service objects which should allow scraping data from scheduler and controller-manager (both components need to be configured to expose metrics). You can see this in kubeadm.libsonnet which is applied when selecting correct platform: https://github.com/prometheus-operator/kube-prometheus/blob/main/jsonnet/kube-prometheus/platforms/kubeadm.libsonnet.

Disabling and using managed-cluster.libsonnet addon is the last resort for cases when you cannot have access to kube-scheduler nor kube-controller-manager (for example in EKS).

@budimanjojo
Copy link

So it looks like the solution from the patch is simply not creating any rule for scheduler and controller?

Actually, no. Selecting kubeadm as platform will add 2 Service objects which should allow scraping data from scheduler and controller-manager (both components need to be configured to expose metrics). You can see this in kubeadm.libsonnet which is applied when selecting correct platform: https://github.com/prometheus-operator/kube-prometheus/blob/main/jsonnet/kube-prometheus/platforms/kubeadm.libsonnet.

Disabling and using managed-cluster.libsonnet addon is the last resort for cases when you cannot have access to kube-scheduler nor kube-controller-manager (for example in EKS).

Thank you for the explanation. I got it now 😊

@jgerry2002
Copy link

My K8S cluster was created via kubeadm and the version is 1.19.2

Have you tested in 1.20+? I'm also using Tanzu, which is sort of EKS-ish with some of the security and other (attempted) abstraction that is in there. Which is why I figured I'd ask to have an option to disable these checks cleanly and attempt to figure out a way to add that piece of monitoring back in later on.

@budimanjojo
Copy link

I just tried to create the manifest for kubeadm platform and I can't find the service manifest anywhere. Am I doing something wrong? This is the modified example.jsonnet which I called using build.sh:

local kp =
  (import 'kube-prometheus/main.libsonnet') +
  // Uncomment the following imports to enable its patches
  // (import 'kube-prometheus/addons/anti-affinity.libsonnet') +
  // (import 'kube-prometheus/addons/managed-cluster.libsonnet') +
  // (import 'kube-prometheus/addons/node-ports.libsonnet') +
  // (import 'kube-prometheus/addons/static-etcd.libsonnet') +
  // (import 'kube-prometheus/addons/custom-metrics.libsonnet') +
  // (import 'kube-prometheus/addons/external-metrics.libsonnet') +
  {
    values+:: {
      common+: {
        namespace: 'monitoring-system',
        platform: 'kubeadm',
      },
    },
  };

{ 'setup/0namespace-namespace': kp.kubePrometheus.namespace } +
{
  ['setup/prometheus-operator-' + name]: kp.prometheusOperator[name]
  for name in std.filter((function(name) name != 'serviceMonitor' && name != 'prometheusRule'), std.objectFields(kp.prometheusOperator))
} +
// serviceMonitor and prometheusRule are separated so that they can be created after the CRDs are ready
{ 'prometheus-operator-serviceMonitor': kp.prometheusOperator.serviceMonitor } +
{ 'prometheus-operator-prometheusRule': kp.prometheusOperator.prometheusRule } +
{ 'kube-prometheus-prometheusRule': kp.kubePrometheus.prometheusRule } +
{ ['alertmanager-' + name]: kp.alertmanager[name] for name in std.objectFields(kp.alertmanager) } +
{ ['blackbox-exporter-' + name]: kp.blackboxExporter[name] for name in std.objectFields(kp.blackboxExporter) } +
{ ['kube-state-metrics-' + name]: kp.kubeStateMetrics[name] for name in std.objectFields(kp.kubeStateMetrics) } +
{ ['kubernetes-' + name]: kp.kubernetesControlPlane[name] for name in std.objectFields(kp.kubernetesControlPlane) }
{ ['node-exporter-' + name]: kp.nodeExporter[name] for name in std.objectFields(kp.nodeExporter) } +
{ ['prometheus-' + name]: kp.prometheus[name] for name in std.objectFields(kp.prometheus) } +
{ ['prometheus-adapter-' + name]: kp.prometheusAdapter[name] for name in std.objectFields(kp.prometheusAdapter) }

I modified the namespace intentionally to see if the namespace does change in the generated manifests, only no service for kube-scheduler and kube-controller-manager manifest found anywhere.

@paulfantom
Copy link
Member

paulfantom commented Jul 7, 2021

Those should be created as files with kubernetes- prefix, possibly kubernetes-kubeControllerManagerPrometheusDiscoveryService.yaml

@budimanjojo
Copy link

Those should be created as files with kubernetes- prefix, possibly kubernetes-kubeControllerManagerPrometheusDiscoveryService.yaml

There's no such file created.

╰ ls -la manifests | grep kubernetes
-rw-r--r-- 1 budiman disk   64268 Jul  7 22:08 kubernetes-prometheusRule.yaml
-rw-r--r-- 1 budiman disk    6905 Jul  7 22:08 kubernetes-serviceMonitorApiserver.yaml
-rw-r--r-- 1 budiman disk     447 Jul  7 22:08 kubernetes-serviceMonitorCoreDNS.yaml
-rw-r--r-- 1 budiman disk    6424 Jul  7 22:08 kubernetes-serviceMonitorKubeControllerManager.yaml
-rw-r--r-- 1 budiman disk    7240 Jul  7 22:08 kubernetes-serviceMonitorKubelet.yaml
-rw-r--r-- 1 budiman disk     537 Jul  7 22:08 kubernetes-serviceMonitorKubeScheduler.yaml

I also tried changing it to: platforms: 'kubeadm', platform: 'kubespray', platforms: 'kubespray'
Nothing work. I don't know why it doesn't work T.T

@freym
Copy link

freym commented Jul 9, 2021

Its also not working for me

Version: 0.8

minimal example.jsonnet:

local kp = (import 'kube-prometheus/main.libsonnet') +
{
  values+:: {
    common+: {
      platform: 'kubeadm',
    },
  },
};

{ ['kubernetes-' + name]: kp.kubernetesControlPlane[name] for name in std.objectFields(kp.kubernetesControlPlane) }

@freym
Copy link

freym commented Jul 9, 2021

Got it.
The doc is wrong it is not in common, it is in kubePrometheus. Working example.jsonnet:

local kp =
  (import 'kube-prometheus/main.libsonnet') +
  {
    values+:: {
      kubePrometheus+: {
        platform: 'kubeadm',
      },
    },
  };

{ ['kubernetes-' + name]: kp.kubernetesControlPlane[name] for name in std.objectFields(kp.kubernetesControlPlane) }

@budimanjojo
Copy link

@freym thank you!

@paulfantom
Copy link
Member

Since this issue is tagged with kind/documentation I need to clarify that doc for release-0.8 specifies explicitly that platform value is set with: $.values.kubePrometheus.platform. This has changed on main branch and now we are using $.values.common.platform. The plan is to have $.values.kubePrometheus renamed/replaced/removed as it should be used only to configure a few alerts that are in https://github.com/prometheus-operator/kube-prometheus/tree/main/jsonnet/kube-prometheus/components/mixin

@LanDinh
Copy link

LanDinh commented Dec 12, 2021

Just in case someone else like me having troubles understanding the things everyone here is talking about because they are quite new to all of this, using the helm chart prometheus-community/kube-prometheus-stack to deploy kube-prometheus via helm, and using Docker Desktop for their local development cluster, I had posted a question to solve a problem that turned out to be this on the mailing list of prometheus-community, and I've posted my explanations on why which step is necessary to fix which sub-problem over there once I've figured it out: https://groups.google.com/g/prometheus-users/c/_aI-HySJ-xM/m/kqrL1FYVCQAJ - I hope this helps!

@panboo0106
Copy link

@KeithTt I have encountered the same trouble recently, It took me 5 days to figure it out. My K8S cluster was created via kubeadm and the version is 1.19.2

Check these points:

  1. Edit /etc/kubernetes/manifests/kube-scheduler.yaml, change --bind-address=127.0.0.1 to --bind-address=0.0.0.0
  2. Create a Service and make sure it has a label(for me it's k8s-app: kube-scheduler) matching with the ServiceMonitor(spec.selector.matchLabels)
  3. Make sure the Service matching the right pod with right label (for me, it's component: kube-scheduler)
  4. It took me a long time to find the last one.😤 the Service's port name(for me, it's https-metrics) must matching with ServiceMonitor(spec.endpoints.port)
  5. Do the same check for kube-controller-manager

FYI. my yaml files:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  labels:
    k8s-app: kube-scheduler
  name: kube-scheduler
  namespace: monitoring
spec:
  endpoints:
  - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
    interval: 30s
    port: https-metrics
    scheme: https
    tlsConfig:
      insecureSkipVerify: true
  jobLabel: k8s-app
  namespaceSelector:
    matchNames:
    - kube-system
  selector:
    matchLabels:
      k8s-app: kube-scheduler
apiVersion: v1
kind: Service
metadata:
  namespace: kube-system
  name: kube-scheduler
  labels:
    k8s-app: kube-scheduler
spec:
  ports:
  - name: https-metrics
    port: 10259
  selector:
    component: kube-scheduler

same to you. service's port name from http-metrics to https-metrics. It works

@angelsclare
Copy link

@KeithTt我最近遇到了同样的麻烦,我花了5天时间才弄明白。我的 K8S 集群是通过kubeadm创建的,版本是1.19.2
检查以下几点:

  1. 编辑/etc/kubernetes/manifests/kube-scheduler.yaml,更改--bind-address=127.0.0.1--bind-address=0.0.0.0
  2. 创建一个服务并确保它有一个与ServiceMonitor ( )k8s-app: kube-scheduler匹配的标签(对我来说是)****spec.selector.matchLabels
  3. 确保服务与正确的标签匹配正确的 pod(对我来说,它是component: kube-scheduler
  4. 我花了很长时间才找到最后一个。😤Service 的端口名(对我来说是)必须与ServiceMonitor ( )https-metrics匹配****spec.endpoints.port
  5. 对 kube-controller-manager 做同样的检查

供参考。我的 yaml 文件:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  labels:
    k8s-app: kube-scheduler
  name: kube-scheduler
  namespace: monitoring
spec:
  endpoints:
  - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
    interval: 30s
    port: https-metrics
    scheme: https
    tlsConfig:
      insecureSkipVerify: true
  jobLabel: k8s-app
  namespaceSelector:
    matchNames:
    - kube-system
  selector:
    matchLabels:
      k8s-app: kube-scheduler
apiVersion: v1
kind: Service
metadata:
  namespace: kube-system
  name: kube-scheduler
  labels:
    k8s-app: kube-scheduler
spec:
  ports:
  - name: https-metrics
    port: 10259
  selector:
    component: kube-scheduler

你也一样。服务的端口名称从 http-metrics 到 https-metrics。有用

Thank you for using your method to solve the problem

@justbuilding
Copy link

justbuilding commented Mar 28, 2023

访问prometheus的时候会发现有三个告警
一个是Watchdog这个是始终保持告警状态的,用于检测Prometheus告警服务是否正常,可以不用理
另外两个一个是KubeControllerManagerDown 还有一个是KubeSchedulerDown
是因为没有创建KubeControllerManager和KubeSchedulerDown的svc


#具体这个svc要打什么label要selector参考如下
kubectl get ServiceMonitor kube-controller-manager  -n monitoring -oyaml


cat << "EOF" > kube-scheduler-svc.yaml
apiVersion: v1
kind: Service
metadata:
  namespace: kube-system
  name: kube-scheduler
  labels:
    k8s-app: kube-scheduler
spec:
  ports:
  - name: https-metrics
    port: 10259
  selector:
    component: kube-scheduler
EOF

kubectl apply -f kube-scheduler-svc.yaml

#具体这个svc要打什么label要selector参考如下
kubectl get ServiceMonitor kube-controller-manager  -n monitoring -oyaml

cat << "EOF" > kube-controller-manager-svc.yaml
apiVersion: v1
kind: Service
metadata:
  namespace: kube-system
  name: kube-controller-manager
  labels:
    k8s-app: kube-controller-manager
spec:
  ports:
  - name: https-metrics
    port: 10257
  selector:
    component: kube-controller-manager
EOF

kubectl apply -f kube-controller-manager-svc.yaml

image

@davhdavh
Copy link

davhdavh commented Aug 2, 2024

ok, mega-hack incoming:

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: socat-hack-for-kubeadm-bind-on-localhost-only
  namespace: kube-system
  labels:
    app: socat-hack
spec:
  selector:
    matchLabels:
      app: socat-hack
  template:
    metadata:
      labels:
        app: socat-hack
    spec:
      nodeSelector:
        node-role.kubernetes.io/control-plane: ""
      tolerations:
      - key: "node-role.kubernetes.io/control-plane"
        operator: "Exists"
        effect: "NoSchedule"
      hostNetwork: true
      containers:
      - name: socat-etcd
        image: alpine/socat
        env:
        - name: NODE_IP
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: status.hostIP
        command: ["/bin/sh"]
        args:
        - -c
        - socat TCP4-LISTEN:2381,reuseaddr,bind=${NODE_IP},fork TCP4:127.0.0.1:2381
        ports:
        - containerPort: 2381
      - name: socat-kube-controller-manager
        image: alpine/socat
        env:
        - name: NODE_IP
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: status.hostIP
        command: ["/bin/sh"]
        args:
        - -c
        - socat TCP4-LISTEN:10257,reuseaddr,bind=${NODE_IP},fork TCP4:127.0.0.1:10257
        ports:
        - containerPort: 10257
      - name: socat-kube-scheduler
        image: alpine/socat
        env:
        - name: NODE_IP
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: status.hostIP
        command: ["/bin/sh"]
        args:
        - -c
        - socat TCP4-LISTEN:10259,reuseaddr,bind=${NODE_IP},fork TCP4:127.0.0.1:10259
        ports:
        - containerPort: 10259
      - name: socat-kube-proxy
        image: alpine/socat
        env:
        - name: NODE_IP
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: status.hostIP
        command: ["/bin/sh"]
        args:
        - -c
        - socat TCP4-LISTEN:10249,reuseaddr,bind=${NODE_IP},fork TCP4:127.0.0.1:10249
        ports:
        - containerPort: 10249

Why this works? Because e.g. etcd has bound itself on 127.0.0.1:2381 there is nothing bound on the 192.168.123.123:2381 (assuming your node ip is 192.168.123.123).
So we can start a daemonset where a socat forwards all TCP traffic from the host-ip to the 127.0.0.1 on port 2381.
Then whenever that reads the ServiceMonitor Spec tries to connects to the port 2381 on the node ip because thats what the configure Service resolves to, it actually hits socat which forwards the traffic to the actual etcd port giving the correct answer.

Client
   |
   | HTTP Request to Node IP (e.g., 192.168.123.123:2381)
   V
+--------------------------------+
| Node (192.168.123.123)         |
|                                |
|  +--------------------------+  |
|  | DaemonSet Pod            |  |
|  |                          |  |
|  |  +--------------------+  |  |
|  |  | socat              |  |  |
|  |  |                    |  |  |
|  |  | TCP4-LISTEN:2381   |  |  |
|  |  | bind=${NODE_IP}    |  |  |
|  |  | fork               |  |  |
|  |  | TCP4:127.0.0.1:2381|  |  |
|  |  +--------------------+  |  |
|        |
|        | HTTP Request to 127.0.0.1:2381
|        V
| +----------------------------+ |
| | Localhost (127.0.0.1)      | |
| |                            | |
| |  +----------------------+  | |
| |  | etcd                 |  | |
| |  | Listening on         |  | |
| |  | 127.0.0.1:2381       |  | |
| |  +----------------------+  | |
| |                            | |
| +----------------------------+ |
+--------------------------------+

@pschichtel
Copy link

I don't think this is as hacky as you make it sound. It's pretty much what I suggested in the k0s issue, except simpler compared to mine.

@danny-does-stuff
Copy link

Here is an awesome writeup I found that gives a more beginner-friendly analysis of this issue https://groups.google.com/g/prometheus-users/c/_aI-HySJ-xM/m/kqrL1FYVCQAJ

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests