Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[kube-prometheus-stack] cannot scrape Etcd, Scheduler and Controller-Manager #1236

Closed
kristvanbesien opened this issue Aug 7, 2021 · 3 comments
Labels
bug Something isn't working lifecycle/stale

Comments

@kristvanbesien
Copy link

kristvanbesien commented Aug 7, 2021

Describe the bug a clear and concise description of what the bug is.

When deploying the kube-prometheus-stack chart on the fresh kubernetes cluster, build using kubeadm the resulting prometheus is not able to scrape the metrics of Etcd, Scheduler and Controller-Manager.

What's your helm version?

v3.5.4

What's your kubectl version?

1.21.1

Which chart?

kube-prometheus-stack

What's the chart version?

17.1.1

What happened?

Installed a kubernetes cluster using kubeadm, with minimal modifications. The only modification being settting the podSubnet.

Installed a prometheus stack using this chart.

Once installed it turned out that the servicemonitors for Etcd, Controller-Manager and Scheduler are not able to scrap the metrics. They show lots of errors of the following kind:
"Get "https://192.168.3.82:10257/metrics": dial tcp 192.168.3.82:10257: connect: connection refused"

What you expected to happen?

I expected the whole stack to be fully functional.

How to reproduce it?

On a fresh kubernetes cluster:
helm install prometheus -f values.yaml --namespace prometheus --create-namespace prometheus-community/kube-prometheus-stack

Enter the changed values of values.yaml?

grafana:
  adminPassword: B1ackb0x
  service:
    type: LoadBalancer
    annotations:
        metallb.universe.tf/allow-shared-ip: prometheus
alertmanager:
  service:
    type: LoadBalancer
    annotations:
        metallb.universe.tf/allow-shared-ip: prometheus
  
prometheus:
  service:
    type: LoadBalancer
    annotations:
        metallb.universe.tf/allow-shared-ip: prometheus
  prometheusSpec:
    storageSpec:
      volumeClaimTemplate:
        spec:
          storageClassName: rook-ceph-block
          resources:
            requests:
              storage: 50Gi    

kubeControllerManager:
  service:
    port: 10257
    targetPort: 10257
  serviceMonitor:
    https: true
    insecureSkipVerify: true

kubeScheduler:
  service:
    port: 10259
    targetPort: 10259
  serviceMonitor:
    https: true
    insecureSkipVerify: true
  
kubeEtcd:
  service:
    port: 2381
    targetPort: 2381
  

Enter the command that you execute and failing/misfunctioning.

helm install prometheus -f values.yaml --namespace prometheus --create-namespace prometheus-community/kube-prometheus-stack

Anything else we need to know?

I suspect that the core problem is that the Etcd, Scheduler and Controller-Manager expose their metrics only on localhost. So the scraper cannot collect them.

Searching for solutions I only found that I could change this in the manifest files on my kubernetes nodes. But these manifest files get overwritten at the next update, and I suspect that the defaults are sane, so that only listening to localhost is by design.

So probably something else must be done. However I have not been able to find out what that something else is. So either:

  • By default the chart should do something so that the metrics get scraped from localhost (creating a service maybe?)
    or
  • This needs to be documented somewhere.
@kristvanbesien kristvanbesien added the bug Something isn't working label Aug 7, 2021
@stale
Copy link

stale bot commented Sep 6, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.

@stale stale bot added the lifecycle/stale label Sep 6, 2021
@stale
Copy link

stale bot commented Sep 20, 2021

This issue is being automatically closed due to inactivity.

@stale stale bot closed this as completed Sep 20, 2021
@zentavr
Copy link

zentavr commented Dec 13, 2021

The workaround for etcd is provided here: #204 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working lifecycle/stale
Projects
None yet
Development

No branches or pull requests

2 participants