-
Notifications
You must be signed in to change notification settings - Fork 5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[prometheus-kube-stack] Unhealthy Targets: controller-manager, etcd, proxy, kube-scheduler #1704
Comments
Hello dear maintainer team? |
@PhilipMay where have you installed this? A managed k8s instance (EKS, AKS, GKE)? |
Bare metal one node cluster installed with kubeadm with flannel.
|
Just wanted to rule out the issue of managed clusters (those services arent accessible when CP is managed) which is why I asked. One off the top of my head, check the metrics bind address of kube proxy, by default its set to 127.0.0.0.1 so prometheus wont be able to scrape the metrics endpoint, update that to 0.0.0.0 |
Is there a configuration that can be set via Assuming this is the place? https://github.com/prometheus-community/helm-charts/blob/main/charts/kube-prometheus-stack/values.yaml#L45 |
Running latest version (30.2.0) of kube-prometheus-stack in managed AKS cluster. @loudmouth to disable certain component scarping altogether, you can do that in their respective blocks, for example, to disable However, not every component should be disabled in managed clusters, in my experience with AKS, only kubeControllerManager and kubeScheduler need to be disabled. kubeProxy metrics are accessible in AKS, however the Service object that kube-prometheus-stack creates to monitor kubeProxy is wrong. Here is the manifest of it:
And here are the labels of kube-proxy pod inside AKS cluster version 1.21.2
label Edit: my mistake, you can specify custom selector for kubeProxy service: https://github.com/prometheus-community/helm-charts/blob/main/charts/kube-prometheus-stack/values.yaml#L1276-L1277 |
I just saw your question. kube-controller-manager and kube-scheduler need this setting
etcd needs
First quick test you can change the files in /etc/kubernetes/manifests or you configure that in the ClusterConfiugraiton (configmap kubeadm-config in namespace kube-system) with entries like
|
Thanks @dkrizic what about "kube proxy"? How do I configure that? |
edit kube-proxy in namespace kube-system
and add/change the line
If I remember you need to do a
to activate it |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions. |
This issue is being automatically closed due to inactivity. |
It can also be solved by deploying a proxy server to expose the metrics endpoints for each components. # based on https://github.com/kubermatic/kubeone/issues/1215#issuecomment-992471229
apiVersion: v1
kind: ConfigMap
metadata:
name: metrics-proxy-config
namespace: monitoring
data:
haproxy.cfg: |
defaults
mode http
timeout connect 5000ms
timeout client 5000ms
timeout server 5000ms
default-server maxconn 10
frontend kube-controller-manager
bind ${NODE_IP}:10257
mode tcp
default_backend kube-controller-manager
backend kube-controller-manager
mode tcp
server kube-controller-manager 127.0.0.1:10257
frontend kube-scheduler
bind ${NODE_IP}:10259
mode tcp
default_backend kube-scheduler
backend kube-scheduler
mode tcp
server kube-scheduler 127.0.0.1:10259
frontend kube-proxy
bind ${NODE_IP}:10249
http-request deny if !{ path /metrics }
default_backend kube-proxy
backend kube-proxy
server kube-proxy 127.0.0.1:10249
frontend etcd
bind ${NODE_IP}:2381
http-request deny if !{ path /metrics }
default_backend etcd
backend etcd
server etcd 127.0.0.1:2381
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: metrics-proxy
namespace: monitoring
spec:
selector:
matchLabels:
app: metrics-proxy
template:
metadata:
labels:
app: metrics-proxy
spec:
containers:
- env:
- name: NODE_IP
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: status.hostIP
image: docker.io/haproxy:2.5
name: haproxy
securityContext:
allowPrivilegeEscalation: false
runAsUser: 99 # 'haproxy' user
volumeMounts:
- mountPath: /usr/local/etc/haproxy
name: config
hostNetwork: true
tolerations:
- effect: NoSchedule
key: node-role.kubernetes.io/master
operator: Exists
- effect: NoSchedule
key: node-role.kubernetes.io/control-plane
operator: Exists
volumes:
- configMap:
name: metrics-proxy-config
name: config If you built your cluster with kubeadm, the port number of etcd metrics endpoint is different from the kube-prometheus-stack default value, so the following change to values.yaml is required. kubeEtcd:
service:
port: 2381
targetPort: 2381 It may be possible to deploy this proxy server as an option for kube-prometheus-stack. |
thank you for this, it was much appreciated!! Just to sum things out anybody struggling with these errors, editing manifests inf folder /etc/kubernetes/manifests/ kube-scheduler.yaml, kube-controller-manager.yaml etcd.yaml fixed my issues after reboot, i guess same could have been done with restart daemonset? And yes I changed livenessProbe: and startupProbe: 127.0.0.1:s to 0.0.0.0 too. |
@quasimodo-r Thanks |
Can we stop people from suggesting everyone to change the bind addresses to 0.0.0.0 and promote this reply everywhere instead? (Documentation too?) |
Describe the bug a clear and concise description of what the bug is.
I install kube-prometheus-stack with version: 30.0.1 and appVersion: 0.53.1
I do
helm upgrade --install prometheus prometheus-community/kube-prometheus-stack --namespace prometheus --create-namespace -f kube-prometheus-stack-helm-values.yaml
With this config:
So it is a relatively "pure" config. But the GUI is telling my that there targets are unhealthy:
See screenshot:
What's your helm version?
version.BuildInfo{Version:"v3.7.2", GitCommit:"663a896f4a815053445eec4153677ddc24a0a361", GitTreeState:"clean", GoVersion:"go1.17.4"}
What's your kubectl version?
Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.1", GitCommit:"86ec240af8cbd1b60bcc4c03c20da9b98005b92e", GitTreeState:"archive", BuildDate:"2021-12-16T20:16:11Z", GoVersion:"go1.17.5", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.1", GitCommit:"86ec240af8cbd1b60bcc4c03c20da9b98005b92e", GitTreeState:"clean", BuildDate:"2021-12-16T11:34:54Z", GoVersion:"go1.17.5", Compiler:"gc", Platform:"linux/amd64"}
Which chart?
kube-prometheus-stack with version: 30.0.1 and appVersion: 0.53.1
What's the chart version?
version: 30.0.1 and appVersion: 0.53.1
What happened?
see above
What you expected to happen?
No response
How to reproduce it?
IMO there should not be such error in default installation or a documentation ho to fix / avoid them.
Enter the changed values of values.yaml?
see above
Enter the command that you execute and failing/misfunctioning.
see above
Anything else we need to know?
No response
The text was updated successfully, but these errors were encountered: