Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[prometheus-kube-stack] Unhealthy Targets: controller-manager, etcd, proxy, kube-scheduler #1704

Closed
PhilipMay opened this issue Jan 12, 2022 · 15 comments
Labels
bug Something isn't working lifecycle/stale

Comments

@PhilipMay
Copy link

PhilipMay commented Jan 12, 2022

Describe the bug a clear and concise description of what the bug is.

I install kube-prometheus-stack with version: 30.0.1 and appVersion: 0.53.1

I do helm upgrade --install prometheus prometheus-community/kube-prometheus-stack --namespace prometheus --create-namespace -f kube-prometheus-stack-helm-values.yaml

With this config:

grafana:
  enabled: false

prometheus:
  prometheusSpec:
    storageSpec:
      volumeClaimTemplate:
        spec:
          storageClassName: longhorn-crypto-global
          resources:
            requests:
              storage: 10Gi

alertmanager:
  alertmanagerSpec:
    storage:
      volumeClaimTemplate:
        spec:
          storageClassName: longhorn-crypto-global
          resources:
            requests:
              storage: 10Gi

So it is a relatively "pure" config. But the GUI is telling my that there targets are unhealthy:

serviceMonitor/prometheus/prometheus-kube-prometheus-kube-controller-manager/0 (0/1 up)
serviceMonitor/prometheus/prometheus-kube-prometheus-kube-etcd/0 (0/1 up)
serviceMonitor/prometheus/prometheus-kube-prometheus-kube-proxy/0 (0/1 up)
serviceMonitor/prometheus/prometheus-kube-prometheus-kube-scheduler/0 (0/1 up)

See screenshot:

image

What's your helm version?

version.BuildInfo{Version:"v3.7.2", GitCommit:"663a896f4a815053445eec4153677ddc24a0a361", GitTreeState:"clean", GoVersion:"go1.17.4"}

What's your kubectl version?

Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.1", GitCommit:"86ec240af8cbd1b60bcc4c03c20da9b98005b92e", GitTreeState:"archive", BuildDate:"2021-12-16T20:16:11Z", GoVersion:"go1.17.5", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.1", GitCommit:"86ec240af8cbd1b60bcc4c03c20da9b98005b92e", GitTreeState:"clean", BuildDate:"2021-12-16T11:34:54Z", GoVersion:"go1.17.5", Compiler:"gc", Platform:"linux/amd64"}

Which chart?

kube-prometheus-stack with version: 30.0.1 and appVersion: 0.53.1

What's the chart version?

version: 30.0.1 and appVersion: 0.53.1

What happened?

see above

What you expected to happen?

No response

How to reproduce it?

IMO there should not be such error in default installation or a documentation ho to fix / avoid them.

Enter the changed values of values.yaml?

see above

Enter the command that you execute and failing/misfunctioning.

see above

Anything else we need to know?

No response

@PhilipMay PhilipMay added the bug Something isn't working label Jan 12, 2022
@PhilipMay
Copy link
Author

Hello dear maintainer team?
Is there any comment on this? I think it might be a pretty bad bug for other ppl. trying to use this nice stack...

@andrewgkew
Copy link
Contributor

@PhilipMay where have you installed this? A managed k8s instance (EKS, AKS, GKE)?

@PhilipMay
Copy link
Author

PhilipMay commented Jan 19, 2022

@PhilipMay where have you installed this? A managed k8s instance (EKS, AKS, GKE)?

Bare metal one node cluster installed with kubeadm with flannel.

  • Kubernetes Version: 1.23.1
  • container.d Version: 1.4.12
  • Debian GNU/Linux 11 (bullseye) Version 5.10.0-10-amd64
  • flannel 0.16.1
  • Longhorn 1.2.3

@andrewgkew
Copy link
Contributor

Just wanted to rule out the issue of managed clusters (those services arent accessible when CP is managed) which is why I asked.

One off the top of my head, check the metrics bind address of kube proxy, by default its set to 127.0.0.0.1 so prometheus wont be able to scrape the metrics endpoint, update that to 0.0.0.0

@loudmouth
Copy link

loudmouth commented Jan 21, 2022

Is there a configuration that can be set via values to disable these in managed environments (GKE, AKS, etc) since, as pointed out, those services aren't accessible in those managed enviornments?

Assuming this is the place? https://github.com/prometheus-community/helm-charts/blob/main/charts/kube-prometheus-stack/values.yaml#L45

@KristapsT
Copy link

KristapsT commented Jan 26, 2022

Running latest version (30.2.0) of kube-prometheus-stack in managed AKS cluster.

@loudmouth to disable certain component scarping altogether, you can do that in their respective blocks, for example, to disable kubeControllerManager scarping, you can do it here: https://github.com/prometheus-community/helm-charts/blob/main/charts/kube-prometheus-stack/values.yaml#L970
I believe, that disables both scarping and rule creation.

However, not every component should be disabled in managed clusters, in my experience with AKS, only kubeControllerManager and kubeScheduler need to be disabled.

kubeProxy metrics are accessible in AKS, however the Service object that kube-prometheus-stack creates to monitor kubeProxy is wrong. Here is the manifest of it:

apiVersion: v1
kind: Service
metadata:
  annotations:
    meta.helm.sh/release-name: kube-prometheus-stack
    meta.helm.sh/release-namespace: prometheus-operator
  creationTimestamp: "2021-08-26T07:30:55Z"
  labels:
    app: kube-prometheus-stack-kube-proxy
    app.kubernetes.io/instance: kube-prometheus-stack
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/part-of: kube-prometheus-stack
    app.kubernetes.io/version: 30.2.0
    chart: kube-prometheus-stack-30.2.0
    heritage: Helm
    jobLabel: kube-proxy
    release: kube-prometheus-stack
  name: kube-prometheus-stack-kube-proxy
  namespace: kube-system
  resourceVersion: "74298045"
  uid: cc296f5e-9d72-4667-856f-9e24886a1055
spec:
  clusterIP: None
  clusterIPs:
  - None
  ipFamilies:
  - IPv4
  ipFamilyPolicy: SingleStack
  ports:
  - name: http-metrics
    port: 10249
    protocol: TCP
    targetPort: 10249
  selector:
    k8s-app: kube-proxy
  sessionAffinity: None
  type: ClusterIP
status:
  loadBalancer: {}

And here are the labels of kube-proxy pod inside AKS cluster version 1.21.2

labels:
  component: kube-proxy
  controller-revision-hash: b8f7b747b
  pod-template-generation: "7"
  tier: node

label k8s-app: kube-proxy doesn't exist in the pod, yet service is using it as selector, it will never work.
As far as I can tell there is no workaround to fix it within helm chart itself it requires a change in whatever creates the service.

Edit: my mistake, you can specify custom selector for kubeProxy service: https://github.com/prometheus-community/helm-charts/blob/main/charts/kube-prometheus-stack/values.yaml#L1276-L1277

@dkrizic
Copy link

dkrizic commented Feb 6, 2022

I just saw your question.

kube-controller-manager and kube-scheduler need this setting

-bind-address=0.0.0.0

etcd needs

listen-metrics-urls=http://0.0.0.0:2381`

First quick test you can change the files in /etc/kubernetes/manifests or you configure that in the ClusterConfiugraiton (configmap kubeadm-config in namespace kube-system) with entries like

apiVersion: kubeadm.k8s.io/v1beta2
kind: ClusterConfiguration
...
controllerManager:
  extraArgs:
    bind-address: "0.0.0.0"
...
etcd:
  extraArgs:
    listen-metrics-urls: http://0.0.0.0:2381
...
scheduler:
  extraArgs:
    bind-address: "0.0.0.0"

@PhilipMay
Copy link
Author

Thanks @dkrizic

what about "kube proxy"? How do I configure that?

@dkrizic
Copy link

dkrizic commented Feb 8, 2022

edit kube-proxy in namespace kube-system

data:
  config.conf: |-
    apiVersion: kubeproxy.config.k8s.io/v1alpha1
    bindAddress: 0.0.0.0
    bindAddressHardFail: false
    clientConnection:
      acceptContentTypes: ""
      burst: 0
      contentType: ""
      kubeconfig: /var/lib/kube-proxy/kubeconfig.conf
      qps: 0
    clusterCIDR: 192.168.0.0/16
    configSyncPeriod: 0s
    conntrack:
      maxPerCore: null
      min: null
      tcpCloseWaitTimeout: null
      tcpEstablishedTimeout: null
    detectLocalMode: ""
    enableProfiling: false
    healthzBindAddress: ""
    hostnameOverride: ""
    iptables:
      masqueradeAll: false
      masqueradeBit: null
      minSyncPeriod: 0s
      syncPeriod: 0s
    ipvs:
      excludeCIDRs: null
      minSyncPeriod: 0s
      scheduler: ""
      strictARP: false
      syncPeriod: 0s
      tcpFinTimeout: 0s
      tcpTimeout: 0s
      udpTimeout: 0s
    kind: KubeProxyConfiguration
    metricsBindAddress: 0.0.0.0
    mode: ""
    nodePortAddresses: null
    oomScoreAdj: null
    portRange: ""
...

and add/change the line

metricsBindAddress: 0.0.0.0

If I remember you need to do a

kubectl -n kube-system rollout restart daemonset kube-proxy

to activate it

@stale
Copy link

stale bot commented Mar 12, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.

@stale
Copy link

stale bot commented Mar 27, 2022

This issue is being automatically closed due to inactivity.

@stale stale bot closed this as completed Mar 27, 2022
@superbrothers
Copy link
Contributor

superbrothers commented Apr 16, 2022

It can also be solved by deploying a proxy server to expose the metrics endpoints for each components.

# based on https://github.com/kubermatic/kubeone/issues/1215#issuecomment-992471229
apiVersion: v1
kind: ConfigMap
metadata:
  name: metrics-proxy-config
  namespace: monitoring
data:
  haproxy.cfg: |
    defaults
      mode http
      timeout connect 5000ms
      timeout client 5000ms
      timeout server 5000ms
      default-server maxconn 10

    frontend kube-controller-manager
      bind ${NODE_IP}:10257
      mode tcp
      default_backend kube-controller-manager

    backend kube-controller-manager
      mode tcp
      server kube-controller-manager 127.0.0.1:10257

    frontend kube-scheduler
      bind ${NODE_IP}:10259
      mode tcp
      default_backend kube-scheduler

    backend kube-scheduler
      mode tcp
      server kube-scheduler 127.0.0.1:10259

    frontend kube-proxy
      bind ${NODE_IP}:10249
      http-request deny if !{ path /metrics }
      default_backend kube-proxy

    backend kube-proxy
      server kube-proxy 127.0.0.1:10249

    frontend etcd
      bind ${NODE_IP}:2381
      http-request deny if !{ path /metrics }
      default_backend etcd

    backend etcd
      server etcd 127.0.0.1:2381
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: metrics-proxy
  namespace: monitoring
spec:
  selector:
    matchLabels:
      app: metrics-proxy
  template:
    metadata:
      labels:
        app: metrics-proxy
    spec:
      containers:
      - env:
        - name: NODE_IP
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: status.hostIP
        image: docker.io/haproxy:2.5
        name: haproxy
        securityContext:
          allowPrivilegeEscalation: false
          runAsUser: 99 # 'haproxy' user
        volumeMounts:
        - mountPath: /usr/local/etc/haproxy
          name: config
      hostNetwork: true
      tolerations:
      - effect: NoSchedule
        key: node-role.kubernetes.io/master
        operator: Exists
      - effect: NoSchedule
        key: node-role.kubernetes.io/control-plane
        operator: Exists
      volumes:
      - configMap:
          name: metrics-proxy-config
        name: config

If you built your cluster with kubeadm, the port number of etcd metrics endpoint is different from the kube-prometheus-stack default value, so the following change to values.yaml is required.

kubeEtcd:
  service:
    port: 2381
    targetPort: 2381

It may be possible to deploy this proxy server as an option for kube-prometheus-stack.

@quasimodo-r
Copy link

I just saw your question.

kube-controller-manager and kube-scheduler need this setting

-bind-address=0.0.0.0

etcd needs

listen-metrics-urls=http://0.0.0.0:2381`

First quick test you can change the files in /etc/kubernetes/manifests or you configure that in the ClusterConfiugraiton (configmap kubeadm-config in namespace kube-system) with entries like

apiVersion: kubeadm.k8s.io/v1beta2
kind: ClusterConfiguration
...
controllerManager:
  extraArgs:
    bind-address: "0.0.0.0"
...
etcd:
  extraArgs:
    listen-metrics-urls: http://0.0.0.0:2381
...
scheduler:
  extraArgs:
    bind-address: "0.0.0.0"

thank you for this, it was much appreciated!!

Just to sum things out anybody struggling with these errors,

editing manifests inf folder /etc/kubernetes/manifests/

kube-scheduler.yaml,
-bind-address=127.0.0.1
to
-bind-address=0.0.0.0

kube-controller-manager.yaml
-bind-address=127.0.0.1
to
-bind-address=0.0.0.0

etcd.yaml
listen-metrics-urls=http://127.0.0.1:2381
to
listen-metrics-urls=http://0.0.0.0:2381

fixed my issues after reboot, i guess same could have been done with restart daemonset?

And yes I changed livenessProbe: and startupProbe: 127.0.0.1:s to 0.0.0.0 too.

@bmgante
Copy link

bmgante commented Mar 21, 2023

@quasimodo-r
I am running a minikube cluster with latest version of kube-prometheus-stack and noticed this same problem.
For scheduler and contoller-manager i was able to pass the extra args but can't figure it out how to change the listen-metrics-urls for etcd. Do you have any idea on how to achieve it using minikube?

image

Thanks

@mvtab
Copy link

mvtab commented Jul 30, 2024

It can also be solved by deploying a proxy server to expose the metrics endpoints for each components.

# based on https://github.com/kubermatic/kubeone/issues/1215#issuecomment-992471229
apiVersion: v1
kind: ConfigMap
metadata:
  name: metrics-proxy-config
  namespace: monitoring
data:
  haproxy.cfg: |
    defaults
      mode http
      timeout connect 5000ms
      timeout client 5000ms
      timeout server 5000ms
      default-server maxconn 10

    frontend kube-controller-manager
      bind ${NODE_IP}:10257
      mode tcp
      default_backend kube-controller-manager

    backend kube-controller-manager
      mode tcp
      server kube-controller-manager 127.0.0.1:10257

    frontend kube-scheduler
      bind ${NODE_IP}:10259
      mode tcp
      default_backend kube-scheduler

    backend kube-scheduler
      mode tcp
      server kube-scheduler 127.0.0.1:10259

    frontend kube-proxy
      bind ${NODE_IP}:10249
      http-request deny if !{ path /metrics }
      default_backend kube-proxy

    backend kube-proxy
      server kube-proxy 127.0.0.1:10249

    frontend etcd
      bind ${NODE_IP}:2381
      http-request deny if !{ path /metrics }
      default_backend etcd

    backend etcd
      server etcd 127.0.0.1:2381
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: metrics-proxy
  namespace: monitoring
spec:
  selector:
    matchLabels:
      app: metrics-proxy
  template:
    metadata:
      labels:
        app: metrics-proxy
    spec:
      containers:
      - env:
        - name: NODE_IP
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: status.hostIP
        image: docker.io/haproxy:2.5
        name: haproxy
        securityContext:
          allowPrivilegeEscalation: false
          runAsUser: 99 # 'haproxy' user
        volumeMounts:
        - mountPath: /usr/local/etc/haproxy
          name: config
      hostNetwork: true
      tolerations:
      - effect: NoSchedule
        key: node-role.kubernetes.io/master
        operator: Exists
      - effect: NoSchedule
        key: node-role.kubernetes.io/control-plane
        operator: Exists
      volumes:
      - configMap:
          name: metrics-proxy-config
        name: config
* https://gist.github.com/superbrothers/089fabaa888d2a56e7c98400fe32c95b

If you built your cluster with kubeadm, the port number of etcd metrics endpoint is different from the kube-prometheus-stack default value, so the following change to values.yaml is required.

kubeEtcd:
  service:
    port: 2381
    targetPort: 2381

It may be possible to deploy this proxy server as an option for kube-prometheus-stack.

Can we stop people from suggesting everyone to change the bind addresses to 0.0.0.0 and promote this reply everywhere instead? (Documentation too?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working lifecycle/stale
Projects
None yet
Development

No branches or pull requests

9 participants