Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[kube-prometheus-stack] kubernetes components unhealthy #771

Closed
ricosega opened this issue Mar 17, 2021 · 4 comments
Closed

[kube-prometheus-stack] kubernetes components unhealthy #771

ricosega opened this issue Mar 17, 2021 · 4 comments
Labels
bug Something isn't working lifecycle/stale

Comments

@ricosega
Copy link

ricosega commented Mar 17, 2021

Describe the bug
Some kubernetes components are shown unhealthy.

  • kube-controller-manager
  • kube-proxy
  • kube-etcd
  • kube-scheduler

I've seen the problem is this components are available only from localhost and not from inside the kubernetes.
What steps do I have to follow to get the metrics from prometheus without exposing the components to everyone?

Version of Helm and Kubernetes:

Helm Version:

$ helm version
version.BuildInfo{Version:"v3.4.1", GitCommit:"c4e74854886b2efe3321e185578e6db9be0a6e29", GitTreeState:"clean", GoVersion:"go1.14.11"}

Kubernetes Version:

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.4", GitCommit:"d360454c9bcd1634cf4cc52d1867af5491dc9c5f", GitTreeState:"clean", BuildDate:"2020-11-11T13:17:17Z", GoVersion:"go1.15.2", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.0", GitCommit:"af46c47ce925f4c4ad5cc8d1fca46c7b77d13b38", GitTreeState:"clean", BuildDate:"2020-12-08T17:51:19Z", GoVersion:"go1.15.5", Compiler:"gc", Platform:"linux/amd64"}

Which chart:
kube-prometheus-stack

Which version of the chart:
14.0.1

What happened:
Some kubernetes components are shown unhealthy:

kube-controller-manager
kube-proxy
kube-etcd
kube-scheduler

What you expected to happen:
To be healthy.

How to reproduce it (as minimally and precisely as possible):
Well, it is a recent created cluster and the values.yml are the default.

Helm values set after installation/upgrade:

helm get values my-release

USER-SUPPLIED VALUES:
additionalPrometheusRulesMap: {}
alertmanager:
  alertmanagerSpec:
    additionalPeers: []
    affinity: {}
    alertmanagerConfigNamespaceSelector: {}
    alertmanagerConfigSelector: {}
    clusterAdvertiseAddress: false
    configMaps: []
    containers: []
    externalUrl: null
    forceEnableClusterMode: false
    image:
      repository: monitoring/quay.io/prometheus/alertmanager
      sha: ""
      tag: v0.21.0
    initContainers: []
    listenLocal: false
    logFormat: logfmt
    logLevel: info
    nodeSelector: {}
    paused: false
    podAntiAffinity: ""
    podAntiAffinityTopologyKey: kubernetes.io/hostname
    podMetadata: {}
    portName: web
    priorityClassName: ""
    replicas: 1
    resources: {}
    retention: 120h
    routePrefix: /
    secrets: []
    securityContext:
      fsGroup: 2000
      runAsGroup: 2000
      runAsNonRoot: true
      runAsUser: 1000
    storage: {}
    tolerations: []
    topologySpreadConstraints: []
    useExistingSecret: false
    volumeMounts: []
    volumes: []
  apiVersion: v2
  config:
    global:
      resolve_timeout: 5m
    receivers:
    - name: "null"
    route:
      group_by:
      - job
      group_interval: 5m
      group_wait: 30s
      receiver: "null"
      repeat_interval: 12h
      routes:
      - match:
          alertname: Watchdog
        receiver: "null"
    templates:
    - /etc/alertmanager/config/*.tmpl
  enabled: true
  ingress:
    annotations: {}
    enabled: true
    hosts:
    - alertmanager.mydomain
    labels: {}
    paths: []
    tls: []
  ingressPerReplica:
    annotations: {}
    enabled: false
    hostDomain: ""
    hostPrefix: ""
    labels: {}
    paths: []
    tlsSecretName: ""
    tlsSecretPerReplica:
      enabled: false
      prefix: alertmanager
  podDisruptionBudget:
    enabled: false
    maxUnavailable: ""
    minAvailable: 1
  secret:
    annotations: {}
  service:
    additionalPorts: []
    annotations: {}
    clusterIP: ""
    externalIPs: []
    labels: {}
    loadBalancerIP: ""
    loadBalancerSourceRanges: []
    nodePort: 30903
    port: 9093
    targetPort: 9093
    type: ClusterIP
  serviceAccount:
    annotations: {}
    create: true
    name: ""
  serviceMonitor:
    bearerTokenFile: null
    interval: ""
    metricRelabelings: []
    relabelings: []
    scheme: ""
    selfMonitor: true
    tlsConfig: {}
  servicePerReplica:
    annotations: {}
    enabled: false
    loadBalancerSourceRanges: []
    nodePort: 30904
    port: 9093
    targetPort: 9093
    type: ClusterIP
  templateFiles: {}
  tplConfig: false
commonLabels: {}
coreDns:
  enabled: true
  service:
    port: 9153
    targetPort: 9153
  serviceMonitor:
    interval: ""
    metricRelabelings: []
    relabelings: []
defaultRules:
  additionalRuleLabels: {}
  annotations: {}
  appNamespacesTarget: .*
  create: true
  labels: {}
  rules:
    alertmanager: true
    etcd: true
    general: true
    k8s: true
    kubeApiserver: true
    kubeApiserverAvailability: true
    kubeApiserverError: true
    kubeApiserverSlos: true
    kubePrometheusGeneral: true
    kubePrometheusNodeAlerting: true
    kubePrometheusNodeRecording: true
    kubeScheduler: true
    kubeStateMetrics: true
    kubelet: true
    kubernetesAbsent: true
    kubernetesApps: true
    kubernetesResources: true
    kubernetesStorage: true
    kubernetesSystem: true
    network: true
    node: true
    prometheus: true
    prometheusOperator: true
    time: true
  runbookUrl: https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#
fullnameOverride: ""
global:
  imagePullSecrets: []
  rbac:
    create: true
    pspAnnotations: {}
    pspEnabled: true
grafana:
  additionalDataSources:
  - access: proxy
    editable: true
    isDefault: true
    jsonData:
      timeInterval: 30s
    name: mon
    type: prometheus
    url: http://prometheus-kube-prometheus-prometheus:9090/
  adminPassword: admin
  defaultDashboardsEnabled: true
  enabled: true
  extraConfigmapMounts: []
  ingress:
    annotations: {}
    enabled: true
    hosts:
    - grafana.mydomain
    labels: {}
    path: /
    tls: []
  namespaceOverride: ""
  service:
    portName: service
  serviceMonitor:
    interval: ""
    metricRelabelings: []
    path: /metrics
    relabelings: []
    selfMonitor: true
  sidecar:
    dashboards:
      annotations: {}
      enabled: true
      label: grafana_dashboard
      multicluster: false
    datasources:
      annotations: {}
      createPrometheusReplicasDatasources: false
      defaultDatasourceEnabled: false
      enabled: true
      label: grafana_datasource
kube-state-metrics:
  kubeconfig:
    enabled: false
    secret: 
  namespaceOverride: ""
  podSecurityPolicy:
    enabled: true
  rbac:
    create: true
kubeApiServer:
  enabled: true
  relabelings: []
  serviceMonitor:
    interval: ""
    jobLabel: component
    metricRelabelings: []
    selector:
      matchLabels:
        component: apiserver
        provider: kubernetes
  tlsConfig:
    insecureSkipVerify: false
    serverName: kubernetes
kubeControllerManager:
  enabled: true
  endpoints: []
  service:
    port: 10252
    selector:
      component: kube-controller-manager
    targetPort: 10252
  serviceMonitor:
    https: false
    insecureSkipVerify: true
    interval: ""
    metricRelabelings: []
    relabelings: []
    serverName: null
kubeDns:
  enabled: false
  service:
    dnsmasq:
      port: 10054
      targetPort: 10054
    skydns:
      port: 10055
      targetPort: 10055
  serviceMonitor:
    dnsmasqMetricRelabelings: []
    dnsmasqRelabelings: []
    interval: ""
    metricRelabelings: []
    relabelings: []
kubeEtcd:
  enabled: true
  endpoints: []
  service:
    port: 2379
    targetPort: 2379
  serviceMonitor:
    caFile: ""
    certFile: ""
    insecureSkipVerify: false
    interval: ""
    keyFile: ""
    metricRelabelings: []
    relabelings: []
    scheme: http
    serverName: ""
kubeProxy:
  enabled: true
  endpoints: []
  service:
    port: 10249
    targetPort: 10249
  serviceMonitor:
    https: false
    interval: ""
    metricRelabelings: []
    relabelings: []
kubeScheduler:
  enabled: true
  endpoints: []
  service:
    port: 10251
    targetPort: 10251
  serviceMonitor:
    https: false
    insecureSkipVerify: false
    interval: ""
    metricRelabelings: []
    relabelings: []
    serverName: null
kubeStateMetrics:
  enabled: true
  serviceMonitor:
    interval: ""
    metricRelabelings: []
    relabelings: []
    selectorOverride: {}
kubeTargetVersionOverride: ""
kubelet:
  enabled: true
  namespace: kube-system
  serviceMonitor:
    cAdvisor: true
    cAdvisorMetricRelabelings: []
    cAdvisorRelabelings:
    - sourceLabels:
      - __metrics_path__
      targetLabel: metrics_path
    https: true
    interval: ""
    metricRelabelings: []
    probes: true
    probesMetricRelabelings: []
    probesRelabelings:
    - sourceLabels:
      - __metrics_path__
      targetLabel: metrics_path
    relabelings:
    - sourceLabels:
      - __metrics_path__
      targetLabel: metrics_path
    resource: false
    resourcePath: /metrics/resource/v1alpha1
    resourceRelabelings:
    - sourceLabels:
      - __metrics_path__
      targetLabel: metrics_path
nameOverride: ""
namespaceOverride: ""
nodeExporter:
  enabled: true
  jobLabel: jobLabel
  serviceMonitor:
    interval: ""
    metricRelabelings: []
    relabelings: []
    scrapeTimeout: ""
prometheus:
  additionalPodMonitors: []
  additionalRulesForClusterRole: []
  additionalServiceMonitors: []
  annotations: {}
  enabled: true
  ingress:
    annotations: {}
    enabled: true
    hosts:
    - prometheus.mydomain
    labels: {}
    paths: []
    tls: []
  ingressPerReplica:
    annotations: {}
    enabled: false
    hostDomain: ""
    hostPrefix: ""
    labels: {}
    paths: []
    tlsSecretName: ""
    tlsSecretPerReplica:
      enabled: false
      prefix: prometheus
  podDisruptionBudget:
    enabled: false
    maxUnavailable: ""
    minAvailable: 1
  podSecurityPolicy:
    allowedCapabilities: []
    allowedHostPaths: []
    volumes: []
  prometheusSpec:
    additionalAlertManagerConfigs: []
    additionalAlertRelabelConfigs: []
    additionalPrometheusSecretsAnnotations: {}
    additionalScrapeConfigs: []
    additionalScrapeConfigsSecret: {}
    affinity: {}
    alertingEndpoints: []
    allowOverlappingBlocks: false
    apiserverConfig: {}
    arbitraryFSAccessThroughSMs: false
    configMaps: []
    containers: []
    disableCompaction: false
    enableAdminAPI: false
    enforcedSampleLimit: false
    evaluationInterval: ""
    externalLabels: {}
    externalUrl: ""
    ignoreNamespaceSelectors: false
    image:
      repository: monitoring/quay.io/prometheus/prometheus
      sha: ""
      tag: v2.24.0
    initContainers: []
    listenLocal: false
    logFormat: logfmt
    logLevel: info
    nodeSelector: {}
    overrideHonorLabels: false
    overrideHonorTimestamps: false
    paused: false
    podAntiAffinity: ""
    podAntiAffinityTopologyKey: kubernetes.io/hostname
    podMetadata: {}
    podMonitorNamespaceSelector: {}
    podMonitorSelector: {}
    podMonitorSelectorNilUsesHelmValues: true
    portName: web
    priorityClassName: ""
    probeNamespaceSelector: {}
    probeSelector: {}
    probeSelectorNilUsesHelmValues: true
    prometheusExternalLabelName: ""
    prometheusExternalLabelNameClear: false
    prometheusRulesExcludedFromEnforce: false
    query: {}
    queryLogFile: false
    remoteRead: []
    remoteWrite: []
    remoteWriteDashboards: false
    replicaExternalLabelName: ""
    replicaExternalLabelNameClear: false
    replicas: 1
    resources: {}
    retention: 10d
    retentionSize: ""
    routePrefix: /
    ruleNamespaceSelector: {}
    ruleSelector: {}
    ruleSelectorNilUsesHelmValues: true
    scrapeInterval: ""
    scrapeTimeout: ""
    secrets: []
    securityContext:
      fsGroup: 2000
      runAsGroup: 2000
      runAsNonRoot: true
      runAsUser: 1000
    serviceMonitorNamespaceSelector: {}
    serviceMonitorSelector: {}
    serviceMonitorSelectorNilUsesHelmValues: true
    shards: 1
    storageSpec: {}
    thanos: {}
    tolerations: []
    topologySpreadConstraints: []
    volumeMounts: []
    volumes: []
    walCompression: false
  service:
    annotations: {}
    clusterIP: ""
    externalIPs: []
    labels: {}
    loadBalancerIP: ""
    loadBalancerSourceRanges: []
    nodePort: 30090
    port: 9090
    sessionAffinity: ""
    targetPort: 9090
    type: ClusterIP
  serviceAccount:
    create: true
    name: ""
  serviceMonitor:
    bearerTokenFile: null
    interval: ""
    metricRelabelings: []
    relabelings: []
    scheme: ""
    selfMonitor: true
    tlsConfig: {}
  servicePerReplica:
    annotations: {}
    enabled: false
    loadBalancerSourceRanges: []
    nodePort: 30091
    port: 9090
    targetPort: 9090
    type: ClusterIP
  thanosIngress:
    annotations: {}
    enabled: false
    hosts: []
    labels: {}
    nodePort: 30901
    paths: []
    servicePort: 10901
    tls: []
  thanosService:
    annotations: {}
    enabled: false
    labels: {}
    port: 10901
    portName: grpc
    targetPort: grpc
prometheus-node-exporter:
  extraArgs:
  - --collector.filesystem.ignored-mount-points=^/(dev|proc|sys|var/lib/docker/.+|var/lib/kubelet/.+)($|/)
  - --collector.filesystem.ignored-fs-types=^(autofs|binfmt_misc|bpf|cgroup2?|configfs|debugfs|devpts|devtmpfs|fusectl|hugetlbfs|iso9660|mqueue|nsfs|overlay|proc|procfs|pstore|rpc_pipefs|securityfs|selinuxfs|squashfs|sysfs|tracefs)$
  namespaceOverride: ""
  podLabels:
    jobLabel: node-exporter
prometheusOperator:
  admissionWebhooks:
    caBundle: ""
    certManager:
      enabled: false
    enabled: true
    failurePolicy: Fail
    patch:
      affinity: {}
      enabled: true
      image:
        pullPolicy: IfNotPresent
        repository: monitoring/jettech/kube-webhook-certgen
        sha: ""
        tag: v1.5.0
      nodeSelector: {}
      podAnnotations: {}
      priorityClassName: ""
      resources: {}
      tolerations: []
  affinity: {}
  alertmanagerInstanceNamespaces: []
  configReloaderCpu: 100m
  configReloaderMemory: 50Mi
  denyNamespaces: []
  dnsConfig: {}
  enabled: true
  hostNetwork: false
  image:
    pullPolicy: IfNotPresent
    repository: monitoring/quay.io/prometheus-operator/prometheus-operator
    sha: ""
    tag: v0.46.0
  kubeletService:
    enabled: true
    namespace: kube-system
  namespaces: {}
  nodeSelector: {}
  podAnnotations: {}
  podLabels: {}
  prometheusConfigReloaderImage:
    repository: monitoring/quay.io/prometheus-operator/prometheus-config-reloader
    sha: ""
    tag: v0.46.0
  prometheusInstanceNamespaces: []
  resources: {}
  secretFieldSelector: ""
  securityContext:
    fsGroup: 65534
    runAsGroup: 65534
    runAsNonRoot: true
    runAsUser: 65534
  service:
    additionalPorts: []
    annotations: {}
    clusterIP: ""
    externalIPs: []
    labels: {}
    loadBalancerIP: ""
    loadBalancerSourceRanges: []
    nodePort: 30080
    nodePortTls: 30443
    type: ClusterIP
  serviceAccount:
    create: true
    name: ""
  serviceMonitor:
    interval: ""
    metricRelabelings: []
    relabelings: []
    scrapeTimeout: ""
    selfMonitor: true
  thanosRulerInstanceNamespaces: []
  tls:
    enabled: true
    internalPort: 10250
    tlsMinVersion: VersionTLS13
  tolerations: []

@ricosega ricosega added the bug Something isn't working label Mar 17, 2021
@RobatBender
Copy link

Same problem

@ricosega
Copy link
Author

ricosega commented Mar 17, 2021

Well, just found that this seems to be a common issue when creating clusters with kubeadm so in just in case it could help anyone with the same issue I solved it with:

prometheus-operator/kube-prometheus#718 (comment)

I created the following .yml files, this first one is daemonset running only in the master nodes because etcd, kube-controller and kube-scheduler are found on them:

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: haproxy-prometheus-cp
  namespace: prometheus
data:
  haproxy.cfg: |+
    defaults
      mode http
      timeout connect 5000ms
      timeout client 5000ms
      timeout server 5000ms
      default-server maxconn 10

    frontend kube-controller-manager
      bind ${NODE_IP}:10257
      http-request deny if !{ path /metrics }
      default_backend kube-controller-manager
    backend kube-controller-manager
      server kube-controller-manager 127.0.0.1:10257 ssl verify none

    frontend kube-scheduler
      bind ${NODE_IP}:10259
      http-request deny if !{ path /metrics }
      default_backend kube-scheduler
    backend kube-scheduler
      server kube-scheduler 127.0.0.1:10259 ssl verify none

    frontend kube-etcd
      bind ${NODE_IP}:2381
      http-request deny if !{ path /metrics }
      default_backend kube-etcd
    backend kube-etcd
      server kube-etcd 127.0.0.1:2381 check

---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: haproxy-prometheus-cp
  namespace: prometheus
  labels:
    k8s-app: haproxy-prometheus-cp
spec:
  selector:
    matchLabels:
      k8s-app: haproxy-prometheus-cp
      name: haproxy-prometheus-cp
  template:
    metadata:
      labels:
        k8s-app: haproxy-prometheus-cp
        name: haproxy-prometheus-cp
    spec:
      tolerations:
      - key: node-role.kubernetes.io/master
        effect: NoSchedule
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: node-role.kubernetes.io/master
                operator: Exists
      containers:
      - image: k8s/haproxy:lts
        imagePullPolicy: Always
        name: haproxy
        env:
        - name: NODE_IP
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: status.hostIP 
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
          - name: haproxy-prometheus-cp
            mountPath: /usr/local/etc/haproxy/haproxy.cfg
            subPath: haproxy.cfg         
      volumes:
        - name: haproxy-prometheus-cp
          configMap:
            name: haproxy-prometheus-cp          
      hostNetwork: true
      restartPolicy: Always
      terminationGracePeriodSeconds: 60

And the following will run on every node because of kube-proxy running in all nodes:

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: haproxy-prometheus-nodes
  namespace: prometheus
data:
  haproxy.cfg: |+
    defaults
      mode http
      timeout connect 5000ms
      timeout client 5000ms
      timeout server 5000ms
      default-server maxconn 10

    frontend kube-proxy
      bind ${NODE_IP}:10249
      http-request deny if !{ path /metrics }
      default_backend kube-proxy
    backend kube-proxy
      server kube-proxy 127.0.0.1:10249 check

---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: haproxy-prometheus-nodes
  namespace: prometheus
  labels:
    k8s-app: haproxy-prometheus-nodes
spec:
  selector:
    matchLabels:
      k8s-app: haproxy-prometheus-nodes
      name: haproxy-prometheus-nodes
  template:
    metadata:
      labels:
        k8s-app: haproxy-prometheus-nodes
        name: haproxy-prometheus-nodes
    spec:
      tolerations:
      - key: node-role.kubernetes.io/master
        effect: NoSchedule
      containers:
      - image: k8s/haproxy:lts
        imagePullPolicy: Always
        name: haproxy
        env:
        - name: NODE_IP
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: status.hostIP            
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
          - name: haproxy-prometheus-nodes
            mountPath: /usr/local/etc/haproxy/haproxy.cfg
            subPath: haproxy.cfg         
      volumes:
        - name: haproxy-prometheus-nodes
          configMap:
            name: haproxy-prometheus-nodes          
      hostNetwork: true
      restartPolicy: Always
      terminationGracePeriodSeconds: 60

Hope this can help!

@stale
Copy link

stale bot commented Apr 16, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.

@stale
Copy link

stale bot commented May 1, 2021

This issue is being automatically closed due to inactivity.

@stale stale bot closed this as completed May 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working lifecycle/stale
Projects
None yet
Development

No branches or pull requests

2 participants