[prometheus-kube-stack] Unhealthy Targets: controller-manager, etcd, proxy, kube-scheduler #1704

PhilipMay · 2022-01-12T14:25:38Z

Describe the bug a clear and concise description of what the bug is.

I install kube-prometheus-stack with version: 30.0.1 and appVersion: 0.53.1

I do helm upgrade --install prometheus prometheus-community/kube-prometheus-stack --namespace prometheus --create-namespace -f kube-prometheus-stack-helm-values.yaml

With this config:

grafana:
  enabled: false

prometheus:
  prometheusSpec:
    storageSpec:
      volumeClaimTemplate:
        spec:
          storageClassName: longhorn-crypto-global
          resources:
            requests:
              storage: 10Gi

alertmanager:
  alertmanagerSpec:
    storage:
      volumeClaimTemplate:
        spec:
          storageClassName: longhorn-crypto-global
          resources:
            requests:
              storage: 10Gi

So it is a relatively "pure" config. But the GUI is telling my that there targets are unhealthy:

serviceMonitor/prometheus/prometheus-kube-prometheus-kube-controller-manager/0 (0/1 up)
serviceMonitor/prometheus/prometheus-kube-prometheus-kube-etcd/0 (0/1 up)
serviceMonitor/prometheus/prometheus-kube-prometheus-kube-proxy/0 (0/1 up)
serviceMonitor/prometheus/prometheus-kube-prometheus-kube-scheduler/0 (0/1 up)

See screenshot:

What's your helm version?

version.BuildInfo{Version:"v3.7.2", GitCommit:"663a896f4a815053445eec4153677ddc24a0a361", GitTreeState:"clean", GoVersion:"go1.17.4"}

What's your kubectl version?

Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.1", GitCommit:"86ec240af8cbd1b60bcc4c03c20da9b98005b92e", GitTreeState:"archive", BuildDate:"2021-12-16T20:16:11Z", GoVersion:"go1.17.5", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.1", GitCommit:"86ec240af8cbd1b60bcc4c03c20da9b98005b92e", GitTreeState:"clean", BuildDate:"2021-12-16T11:34:54Z", GoVersion:"go1.17.5", Compiler:"gc", Platform:"linux/amd64"}

Which chart?

kube-prometheus-stack with version: 30.0.1 and appVersion: 0.53.1

What's the chart version?

version: 30.0.1 and appVersion: 0.53.1

What happened?

see above

What you expected to happen?

No response

How to reproduce it?

IMO there should not be such error in default installation or a documentation ho to fix / avoid them.

Enter the changed values of values.yaml?

see above

Enter the command that you execute and failing/misfunctioning.

see above

Anything else we need to know?

No response

The text was updated successfully, but these errors were encountered:

PhilipMay · 2022-01-19T12:16:02Z

Hello dear maintainer team?
Is there any comment on this? I think it might be a pretty bad bug for other ppl. trying to use this nice stack...

andrewgkew · 2022-01-19T14:03:20Z

@PhilipMay where have you installed this? A managed k8s instance (EKS, AKS, GKE)?

PhilipMay · 2022-01-19T14:12:15Z

@PhilipMay where have you installed this? A managed k8s instance (EKS, AKS, GKE)?

Bare metal one node cluster installed with kubeadm with flannel.

Kubernetes Version: 1.23.1
container.d Version: 1.4.12
Debian GNU/Linux 11 (bullseye) Version 5.10.0-10-amd64
flannel 0.16.1
Longhorn 1.2.3

andrewgkew · 2022-01-19T14:25:20Z

Just wanted to rule out the issue of managed clusters (those services arent accessible when CP is managed) which is why I asked.

One off the top of my head, check the metrics bind address of kube proxy, by default its set to 127.0.0.0.1 so prometheus wont be able to scrape the metrics endpoint, update that to 0.0.0.0

loudmouth · 2022-01-21T14:58:25Z

Is there a configuration that can be set via values to disable these in managed environments (GKE, AKS, etc) since, as pointed out, those services aren't accessible in those managed enviornments?

Assuming this is the place? https://github.com/prometheus-community/helm-charts/blob/main/charts/kube-prometheus-stack/values.yaml#L45

KristapsT · 2022-01-26T10:33:56Z

Running latest version (30.2.0) of kube-prometheus-stack in managed AKS cluster.

@loudmouth to disable certain component scarping altogether, you can do that in their respective blocks, for example, to disable kubeControllerManager scarping, you can do it here: https://github.com/prometheus-community/helm-charts/blob/main/charts/kube-prometheus-stack/values.yaml#L970
I believe, that disables both scarping and rule creation.

However, not every component should be disabled in managed clusters, in my experience with AKS, only kubeControllerManager and kubeScheduler need to be disabled.

kubeProxy metrics are accessible in AKS, however the Service object that kube-prometheus-stack creates to monitor kubeProxy is wrong. Here is the manifest of it:

apiVersion: v1
kind: Service
metadata:
  annotations:
    meta.helm.sh/release-name: kube-prometheus-stack
    meta.helm.sh/release-namespace: prometheus-operator
  creationTimestamp: "2021-08-26T07:30:55Z"
  labels:
    app: kube-prometheus-stack-kube-proxy
    app.kubernetes.io/instance: kube-prometheus-stack
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/part-of: kube-prometheus-stack
    app.kubernetes.io/version: 30.2.0
    chart: kube-prometheus-stack-30.2.0
    heritage: Helm
    jobLabel: kube-proxy
    release: kube-prometheus-stack
  name: kube-prometheus-stack-kube-proxy
  namespace: kube-system
  resourceVersion: "74298045"
  uid: cc296f5e-9d72-4667-856f-9e24886a1055
spec:
  clusterIP: None
  clusterIPs:
  - None
  ipFamilies:
  - IPv4
  ipFamilyPolicy: SingleStack
  ports:
  - name: http-metrics
    port: 10249
    protocol: TCP
    targetPort: 10249
  selector:
    k8s-app: kube-proxy
  sessionAffinity: None
  type: ClusterIP
status:
  loadBalancer: {}

And here are the labels of kube-proxy pod inside AKS cluster version 1.21.2

labels:
  component: kube-proxy
  controller-revision-hash: b8f7b747b
  pod-template-generation: "7"
  tier: node

label k8s-app: kube-proxy doesn't exist in the pod, yet service is using it as selector, it will never work.
~~As far as I can tell there is no workaround to fix it within helm chart itself it requires a change in whatever creates the service.~~

Edit: my mistake, you can specify custom selector for kubeProxy service: https://github.com/prometheus-community/helm-charts/blob/main/charts/kube-prometheus-stack/values.yaml#L1276-L1277

dkrizic · 2022-02-06T01:09:32Z

I just saw your question.

kube-controller-manager and kube-scheduler need this setting

-bind-address=0.0.0.0

etcd needs

listen-metrics-urls=http://0.0.0.0:2381`

First quick test you can change the files in /etc/kubernetes/manifests or you configure that in the ClusterConfiugraiton (configmap kubeadm-config in namespace kube-system) with entries like

apiVersion: kubeadm.k8s.io/v1beta2
kind: ClusterConfiguration
...
controllerManager:
  extraArgs:
    bind-address: "0.0.0.0"
...
etcd:
  extraArgs:
    listen-metrics-urls: http://0.0.0.0:2381
...
scheduler:
  extraArgs:
    bind-address: "0.0.0.0"

PhilipMay · 2022-02-08T20:45:40Z

Thanks @dkrizic

what about "kube proxy"? How do I configure that?

dkrizic · 2022-02-08T21:36:04Z

edit kube-proxy in namespace kube-system

data:
  config.conf: |-
    apiVersion: kubeproxy.config.k8s.io/v1alpha1
    bindAddress: 0.0.0.0
    bindAddressHardFail: false
    clientConnection:
      acceptContentTypes: ""
      burst: 0
      contentType: ""
      kubeconfig: /var/lib/kube-proxy/kubeconfig.conf
      qps: 0
    clusterCIDR: 192.168.0.0/16
    configSyncPeriod: 0s
    conntrack:
      maxPerCore: null
      min: null
      tcpCloseWaitTimeout: null
      tcpEstablishedTimeout: null
    detectLocalMode: ""
    enableProfiling: false
    healthzBindAddress: ""
    hostnameOverride: ""
    iptables:
      masqueradeAll: false
      masqueradeBit: null
      minSyncPeriod: 0s
      syncPeriod: 0s
    ipvs:
      excludeCIDRs: null
      minSyncPeriod: 0s
      scheduler: ""
      strictARP: false
      syncPeriod: 0s
      tcpFinTimeout: 0s
      tcpTimeout: 0s
      udpTimeout: 0s
    kind: KubeProxyConfiguration
    metricsBindAddress: 0.0.0.0
    mode: ""
    nodePortAddresses: null
    oomScoreAdj: null
    portRange: ""
...

and add/change the line

metricsBindAddress: 0.0.0.0

If I remember you need to do a

kubectl -n kube-system rollout restart daemonset kube-proxy

to activate it

stale · 2022-03-12T22:08:58Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.

stale · 2022-03-27T07:20:42Z

This issue is being automatically closed due to inactivity.

superbrothers · 2022-04-16T08:36:17Z

It can also be solved by deploying a proxy server to expose the metrics endpoints for each components.

# based on https://github.com/kubermatic/kubeone/issues/1215#issuecomment-992471229
apiVersion: v1
kind: ConfigMap
metadata:
  name: metrics-proxy-config
  namespace: monitoring
data:
  haproxy.cfg: |
    defaults
      mode http
      timeout connect 5000ms
      timeout client 5000ms
      timeout server 5000ms
      default-server maxconn 10

    frontend kube-controller-manager
      bind ${NODE_IP}:10257
      mode tcp
      default_backend kube-controller-manager

    backend kube-controller-manager
      mode tcp
      server kube-controller-manager 127.0.0.1:10257

    frontend kube-scheduler
      bind ${NODE_IP}:10259
      mode tcp
      default_backend kube-scheduler

    backend kube-scheduler
      mode tcp
      server kube-scheduler 127.0.0.1:10259

    frontend kube-proxy
      bind ${NODE_IP}:10249
      http-request deny if !{ path /metrics }
      default_backend kube-proxy

    backend kube-proxy
      server kube-proxy 127.0.0.1:10249

    frontend etcd
      bind ${NODE_IP}:2381
      http-request deny if !{ path /metrics }
      default_backend etcd

    backend etcd
      server etcd 127.0.0.1:2381
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: metrics-proxy
  namespace: monitoring
spec:
  selector:
    matchLabels:
      app: metrics-proxy
  template:
    metadata:
      labels:
        app: metrics-proxy
    spec:
      containers:
      - env:
        - name: NODE_IP
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: status.hostIP
        image: docker.io/haproxy:2.5
        name: haproxy
        securityContext:
          allowPrivilegeEscalation: false
          runAsUser: 99 # 'haproxy' user
        volumeMounts:
        - mountPath: /usr/local/etc/haproxy
          name: config
      hostNetwork: true
      tolerations:
      - effect: NoSchedule
        key: node-role.kubernetes.io/master
        operator: Exists
      - effect: NoSchedule
        key: node-role.kubernetes.io/control-plane
        operator: Exists
      volumes:
      - configMap:
          name: metrics-proxy-config
        name: config

https://gist.github.com/superbrothers/089fabaa888d2a56e7c98400fe32c95b

If you built your cluster with kubeadm, the port number of etcd metrics endpoint is different from the kube-prometheus-stack default value, so the following change to values.yaml is required.

kubeEtcd:
  service:
    port: 2381
    targetPort: 2381

It may be possible to deploy this proxy server as an option for kube-prometheus-stack.

quasimodo-r · 2023-01-06T19:09:04Z

I just saw your question.

kube-controller-manager and kube-scheduler need this setting
-bind-address=0.0.0.0
etcd needs
listen-metrics-urls=http://0.0.0.0:2381`
First quick test you can change the files in /etc/kubernetes/manifests or you configure that in the ClusterConfiugraiton (configmap kubeadm-config in namespace kube-system) with entries like
apiVersion: kubeadm.k8s.io/v1beta2
kind: ClusterConfiguration
...
controllerManager:
  extraArgs:
    bind-address: "0.0.0.0"
...
etcd:
  extraArgs:
    listen-metrics-urls: http://0.0.0.0:2381
...
scheduler:
  extraArgs:
    bind-address: "0.0.0.0"

thank you for this, it was much appreciated!!

Just to sum things out anybody struggling with these errors,

editing manifests inf folder /etc/kubernetes/manifests/

kube-scheduler.yaml,
-bind-address=127.0.0.1
to
-bind-address=0.0.0.0

kube-controller-manager.yaml
-bind-address=127.0.0.1
to
-bind-address=0.0.0.0

etcd.yaml
listen-metrics-urls=http://127.0.0.1:2381
to
listen-metrics-urls=http://0.0.0.0:2381

fixed my issues after reboot, i guess same could have been done with restart daemonset?

And yes I changed livenessProbe: and startupProbe: 127.0.0.1:s to 0.0.0.0 too.

bmgante · 2023-03-21T11:52:15Z

@quasimodo-r
I am running a minikube cluster with latest version of kube-prometheus-stack and noticed this same problem.
For scheduler and contoller-manager i was able to pass the extra args but can't figure it out how to change the listen-metrics-urls for etcd. Do you have any idea on how to achieve it using minikube?

Thanks

mvtab · 2024-07-30T09:51:49Z

It can also be solved by deploying a proxy server to expose the metrics endpoints for each components.

# based on https://github.com/kubermatic/kubeone/issues/1215#issuecomment-992471229
apiVersion: v1
kind: ConfigMap
metadata:
  name: metrics-proxy-config
  namespace: monitoring
data:
  haproxy.cfg: |
    defaults
      mode http
      timeout connect 5000ms
      timeout client 5000ms
      timeout server 5000ms
      default-server maxconn 10

    frontend kube-controller-manager
      bind ${NODE_IP}:10257
      mode tcp
      default_backend kube-controller-manager

    backend kube-controller-manager
      mode tcp
      server kube-controller-manager 127.0.0.1:10257

    frontend kube-scheduler
      bind ${NODE_IP}:10259
      mode tcp
      default_backend kube-scheduler

    backend kube-scheduler
      mode tcp
      server kube-scheduler 127.0.0.1:10259

    frontend kube-proxy
      bind ${NODE_IP}:10249
      http-request deny if !{ path /metrics }
      default_backend kube-proxy

    backend kube-proxy
      server kube-proxy 127.0.0.1:10249

    frontend etcd
      bind ${NODE_IP}:2381
      http-request deny if !{ path /metrics }
      default_backend etcd

    backend etcd
      server etcd 127.0.0.1:2381
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: metrics-proxy
  namespace: monitoring
spec:
  selector:
    matchLabels:
      app: metrics-proxy
  template:
    metadata:
      labels:
        app: metrics-proxy
    spec:
      containers:
      - env:
        - name: NODE_IP
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: status.hostIP
        image: docker.io/haproxy:2.5
        name: haproxy
        securityContext:
          allowPrivilegeEscalation: false
          runAsUser: 99 # 'haproxy' user
        volumeMounts:
        - mountPath: /usr/local/etc/haproxy
          name: config
      hostNetwork: true
      tolerations:
      - effect: NoSchedule
        key: node-role.kubernetes.io/master
        operator: Exists
      - effect: NoSchedule
        key: node-role.kubernetes.io/control-plane
        operator: Exists
      volumes:
      - configMap:
          name: metrics-proxy-config
        name: config

* https://gist.github.com/superbrothers/089fabaa888d2a56e7c98400fe32c95b

If you built your cluster with kubeadm, the port number of etcd metrics endpoint is different from the kube-prometheus-stack default value, so the following change to values.yaml is required.

kubeEtcd:
  service:
    port: 2381
    targetPort: 2381

It may be possible to deploy this proxy server as an option for kube-prometheus-stack.

Can we stop people from suggesting everyone to change the bind addresses to 0.0.0.0 and promote this reply everywhere instead? (Documentation too?)

PhilipMay added the bug Something isn't working label Jan 12, 2022

stale bot added the lifecycle/stale label Mar 12, 2022

stale bot closed this as completed Mar 27, 2022

fernferret mentioned this issue May 30, 2022

Backport of #106539: Replace url label in rest client latency metrics by host and path kubernetes/kubernetes#109699

Merged

mvtab mentioned this issue Jul 30, 2024

[prometheus-kube-stack] Target Kubelet 0/0 up and others are down #204

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[prometheus-kube-stack] Unhealthy Targets: controller-manager, etcd, proxy, kube-scheduler #1704

[prometheus-kube-stack] Unhealthy Targets: controller-manager, etcd, proxy, kube-scheduler #1704

PhilipMay commented Jan 12, 2022 •

edited

Loading

PhilipMay commented Jan 19, 2022

andrewgkew commented Jan 19, 2022

PhilipMay commented Jan 19, 2022 •

edited

Loading

andrewgkew commented Jan 19, 2022

loudmouth commented Jan 21, 2022 •

edited

Loading

KristapsT commented Jan 26, 2022 •

edited

Loading

dkrizic commented Feb 6, 2022

PhilipMay commented Feb 8, 2022

dkrizic commented Feb 8, 2022

stale bot commented Mar 12, 2022

stale bot commented Mar 27, 2022

superbrothers commented Apr 16, 2022 •

edited

Loading

quasimodo-r commented Jan 6, 2023

bmgante commented Mar 21, 2023

mvtab commented Jul 30, 2024

[prometheus-kube-stack] Unhealthy Targets: controller-manager, etcd, proxy, kube-scheduler #1704

[prometheus-kube-stack] Unhealthy Targets: controller-manager, etcd, proxy, kube-scheduler #1704

Comments

PhilipMay commented Jan 12, 2022 • edited Loading

Describe the bug a clear and concise description of what the bug is.

What's your helm version?

What's your kubectl version?

Which chart?

What's the chart version?

What happened?

What you expected to happen?

How to reproduce it?

Enter the changed values of values.yaml?

Enter the command that you execute and failing/misfunctioning.

Anything else we need to know?

PhilipMay commented Jan 19, 2022

andrewgkew commented Jan 19, 2022

PhilipMay commented Jan 19, 2022 • edited Loading

andrewgkew commented Jan 19, 2022

loudmouth commented Jan 21, 2022 • edited Loading

KristapsT commented Jan 26, 2022 • edited Loading

dkrizic commented Feb 6, 2022

PhilipMay commented Feb 8, 2022

dkrizic commented Feb 8, 2022

stale bot commented Mar 12, 2022

stale bot commented Mar 27, 2022

superbrothers commented Apr 16, 2022 • edited Loading

quasimodo-r commented Jan 6, 2023

bmgante commented Mar 21, 2023

mvtab commented Jul 30, 2024

PhilipMay commented Jan 12, 2022 •

edited

Loading

PhilipMay commented Jan 19, 2022 •

edited

Loading

loudmouth commented Jan 21, 2022 •

edited

Loading

KristapsT commented Jan 26, 2022 •

edited

Loading

superbrothers commented Apr 16, 2022 •

edited

Loading