Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

metrics-server error because it doesn't contain any IP SANs #196

Closed
doubledna opened this issue Dec 29, 2018 · 43 comments
Closed

metrics-server error because it doesn't contain any IP SANs #196

doubledna opened this issue Dec 29, 2018 · 43 comments
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@doubledna
Copy link

when I finish deploy the metrics-server ,Go to the log and find the following error
E1229 07:09:05.013998 1 summary.go:97] error while getting metrics summary from Kubelet kube-node3(172.16.52.132:10250): Get https://172.16.52.132:10250/stats/summary/: x509: cannot validate certificate for 172.16.52.132 because it doesn't contain any IP SANs

@doubledna
Copy link
Author

I'm use the metrics-server version is v0.2.1

@yueyongyue
Copy link

yueyongyue commented Jan 3, 2019

v0.3.1 the same problem
solve:
metrics-server/deploy/1.8+/metrics-server-deployment.yaml Add the following

command:
- /metrics-server
- --kubelet-insecure-tls

@DirectXMan12
Copy link
Contributor

please make sure that your certificates are signed for the right names, and that your nodes are surfacing those names.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 28, 2019
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels May 28, 2019
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@k8s-ci-robot
Copy link
Contributor

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@abdennour
Copy link

Did you do something like this :
command:

  • /metrics-server
  • --kubelet-preferred-address-types=InternalIP

Remove --kubelet-preferred-address-types=InternalIP

@ltmleo
Copy link

ltmleo commented Dec 16, 2020

v0.3.1 the same problem
solve:
metrics-server/deploy/1.8+/metrics-server-deployment.yaml Add the following

command:

  • /metrics-server
  • --kubelet-insecure-tls

Actually in the new version, is just add it to args

  • args:
    - --cert-dir=/tmp
    - --secure-port=4443
    - --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
    - --kubelet-use-node-status-port
    - --kubelet-insecure-tls

Thank you, solved my problem!

@hmahdiany
Copy link

same problem. "--kubelet-insecure-tls" solved it.
Thank you.

@lx1036
Copy link

lx1036 commented Mar 30, 2021

  • kubelet-insecure-tls

--kubelet-insecure-tls is just for testing purpose only. it just skip the issue, not resolve it totally.

@raspitakesovertheworld
Copy link

raspitakesovertheworld commented Apr 19, 2021

Why is this issue still happening, why is the workaround still required? I Just setup a new cluster (k8s 1.20.6) from scratch (Ubuntu 20.04 LTS) and when I deployed the metrics server, bang, bug.

@serathius
Copy link
Contributor

serathius commented Apr 20, 2021

The default configuration of metrics assumes that you have cluster setup with reasonable (not in any way great) security, basic certificate setup. This is pretty good assumption for most production clusters. On the other hand most local development tools like (minikube, kind, k3d etc) skip proper certificate setup, requiring them to pass --kubelet-insecure-tls to metrics server.

Having default secure setup, and simple way to disable it for local development seems like a good default, but let me know if you think otherwise.

@raspitakesovertheworld
Copy link

Hmm, I setup the cluster via kubeadm, so security should be setup properly. I would understand if minkube and kind would do shortcuts on security, they are not meant for productive use, but kubeadm should not.

@Jeansen
Copy link

Jeansen commented Apr 25, 2021

I agree with @raspitakesovertheworld. I did the same, with kubeadm, and only ran into problems with the metrics server ;-(

@sam-sre
Copy link

sam-sre commented May 9, 2021

Same thing here, I used kubeadm for bootstraping K8s v21 cluster and ran to the same issue
@serathius

@serathius
Copy link
Contributor

serathius commented May 9, 2021

Who said that the default kubeadm configuration is secure?

https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/troubleshooting-kubeadm/#cannot-use-the-metrics-server-securely-in-a-kubeadm-cluster

@sam-sre
Copy link

sam-sre commented May 9, 2021

Who said that the default kubeadm configuration is secure?

https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/troubleshooting-kubeadm/#cannot-use-the-metrics-server-securely-in-a-kubeadm-cluster

No one ..
Am just saying that the issue is still there.
If I dont add the "--kubelet-insecure-tls" the kube-metric Pod will fail to start..
If I add the "--kubelet-insecure-tls" the pod will run successfully with no errors but the "kubectl top nodes" is giving :
Error from server (ServiceUnavailable): the server is currently unable to handle the request (get nodes.metrics.k8s.io)

Logs from the metrics Pod:

I0509 13:49:37.732595       1 serving.go:325] Generated self-signed cert (/tmp/apiserver.crt, /tmp/apiserver.key)
I0509 13:49:38.399178       1 requestheader_controller.go:169] Starting RequestHeaderAuthRequestController
I0509 13:49:38.399296       1 configmap_cafile_content.go:202] Starting client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0509 13:49:38.399306       1 shared_informer.go:240] Waiting for caches to sync for RequestHeaderAuthRequestController
I0509 13:49:38.399311       1 shared_informer.go:240] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0509 13:49:38.399228       1 secure_serving.go:197] Serving securely on [::]:4443
I0509 13:49:38.399241       1 tlsconfig.go:240] Starting DynamicServingCertificateController
I0509 13:49:38.399249       1 dynamic_serving_content.go:130] Starting serving-cert::/tmp/apiserver.crt::/tmp/apiserver.key
I0509 13:49:38.399264       1 configmap_cafile_content.go:202] Starting client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I0509 13:49:38.402788       1 shared_informer.go:240] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I0509 13:49:38.499480       1 shared_informer.go:247] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0509 13:49:38.499660       1 shared_informer.go:247] Caches are synced for RequestHeaderAuthRequestController
I0509 13:49:38.503018       1 shared_informer.go:247] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file

@serathius
Copy link
Contributor

serathius commented May 9, 2021

To avoid passing --kubelet-insecure-tls please follow the link I provided. It should describe how to setup Kubelet Service Certs on KubeAdm

@sam-sre
Copy link

sam-sre commented May 9, 2021

To avoid passing --kubelet-insecure-tls please follow the link I provided. I should describe how to setup Kubelet Service Certs on KubeAdm

Actually I was also looking at this document, but they wrote:

If you have already created the cluster you must adapt it by doing the following:

Find and edit the kubelet-config-1.21 ConfigMap in the kube-system namespace. In that ConfigMap, the config key has a KubeletConfiguration document as its value. Edit the KubeletConfiguration document to set serverTLSBootstrap: true.

But in a kubeadm deployed cluster, there is no config key in the kubelet-config-1.21 ConfigMap ..
There is this:

 data:
    kubelet: |
      apiVersion: kubelet.config.k8s.io/v1beta1
      kind: KubeletConfiguration

Which you can not edit as they described

@serathius
Copy link
Contributor

Please file an issue to KubeAdm.

@kirandeshpande1990
Copy link

v0.3.1 the same problem
solve:
metrics-server/deploy/1.8+/metrics-server-deployment.yaml Add the following

command:

  • /metrics-server
  • --kubelet-insecure-tls

thanks!

lisenet added a commit to lisenet/kubernetes-homelab that referenced this issue Oct 28, 2021
@omniproc
Copy link

omniproc commented Nov 12, 2021

  • --kubelet-insecure-tls

This is no solution but a security issue waiting to happen. Anyone interested in a actual solution still or do we simply disable TLS verification everywhere now gals?

The problem seems to be that metrics is using the IPs to communicate with the nodes. It's totaly legit that the node certificates don't contain IPs in the SAN since this is considered bad security practise (see RFC 6125 for more details).
Why dosn't metrics just use the node names as provided by core-dns and available in the certificate SAN?

@omniproc
Copy link

/reopen

@k8s-ci-robot
Copy link
Contributor

@omniproc: You can't reopen an issue/PR unless you authored it or you are a collaborator.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@omniproc omniproc mentioned this issue Nov 12, 2021
@omniproc
Copy link

omniproc commented Nov 12, 2021

Looking more into this the problem seems to be that by default this order is used for the metrics server:

--kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname

However the docs clearly say the default order should be like this:

--kubelet-preferred-address-types=Hostname,InternalDNS,InternalIP,ExternalDNS,ExternalIP

Quote from the docs

--kubelet-preferred-address-types - The priority of node address types used when determining an address for connecting to a particular node (default [Hostname,InternalDNS,InternalIP,ExternalDNS,ExternalIP])

explicitly setting this as arg for the metrics-server seems to actually fix the issue.

@serathius
Copy link
Contributor

serathius commented Nov 16, 2021

However the docs clearly say the default order should be like this:

--kubelet-preferred-address-types=Hostname,InternalDNS,InternalIP,ExternalDNS,ExternalIP

Part of the docs you are referring to is about command line flags. It's true that command line flags defaults doesn't match does we use in manifests. We changed the config as it supports more clusters out of the box. This is because most cluster distributions configure Nodes with Hostname as address, but it is not resolvable domain by cluster DNS. We also found that using DNS can be problematic as it can be unreliable, we preferred to avoid dependency on DNS. Using IP addresses meant that we cannot validate Kubelet certificates, but we found out that most K8s distros don't provide Kubelets with certificates signed by k8s CA, meaning users would still need to disable validating certs.

So in the end default configuration is not most secure one as it requires a lot of preparation by cluster administrator to reconfigure cluster level configuration to work. Work which most users don't want to do, or just cant as they cannot change cluster config. Overall topic of MS security is very broad topic as there are a lot of other places that wound need improvements (#576 #545) however there was little interest in addressing those issues by Security folks as attack vector via MS is not very critical. Basically attacker could inject an invalid metric value into HPA that could cause a unneeded scale up.

However, if you are interested in this topic and would like to contribute security improvements I would be happy help with that.

@omniproc
Copy link

omniproc commented Nov 17, 2021

@serathius

OK I get the reasoning behind the decission to use that as default values. The thing that bothers me is that the implications aren't communicated well by the docs.
When you run into the issue mentioned above the first thing most people will do is google it - which leads them to this issue and similar issues. All serve the same "solution": disable TLS verification.
In my opinion at least the official documentation should make it clear that:

  • Disableing TLS verification has security implications and is supposed to be used for dev / test only
  • The order of the defaults when using the CLI is different from the order of the defaults when using the manifest
  • Using the default IP verification is considered at least bad practise for production envs and that users should change their PKI setup so that the other options such as InternalDNS are included in the node certificates

I agree that ultimatly it's the responsibility of the administrator who runs metrics to take care about those things but it can't hurt to warn / remind people even more so when every google search so far leads the unaware admin to think disabling TLS is the solution.

@vdellaglio
Copy link

For me what solved the issue was: https://particule.io/en/blog/kubeadm-metrics-server/

Long story short:

  • edit kubelet-config configmap in kube-system
  • edit nodes' kubelet yaml config file
  • approve csr's
  • (I did) recycled the metrics-server pod

@huxulm
Copy link

huxulm commented Dec 9, 2021

  • kubelet-insecure-tls

--kubelet-insecure-tls is just for testing purpose only. it just skip the issue, not resolve it totally.

You said something but said nothing😅

@travnewmatic
Copy link

travnewmatic commented Jan 6, 2022

i am curious to know what The Right Way to fix this is

its definitely possible to work around the issue with --kubelet-insecure-tls but that definitely doesnt feel like a good solution.

it seems like kube-metrics-cert expects some proper certificates for each node to exist before its added to the cluster?

after reading #196 (comment) i think i understand what the situation (i think)

tl;dr, there is a Right Way, but its kind of complicated, and leaving it insecure isn't that big of a deal

so just use --kubelet-insecure-tls

OR DONT!!

https://particule.io/en/blog/kubeadm-metrics-server/ does work!

this is basically a copy of a section in kubernetes documentation: Certificate Management with kubeadm:

  • create a new cluster (though there are instructions for an existing cluster, i just made a new cluster because mine was empty) using that KubeletConfiguration thing, documented here
  • kubectl certificate approve the csrs generated by kubeadm
  • kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml without any modification to metrics-server configuration

@charlielin
Copy link

i am curious to know what The Right Way to fix this is

its definitely possible to work around the issue with --kubelet-insecure-tls but that definitely doesnt feel like a good solution.

it seems like kube-metrics-cert expects some proper certificates for each node to exist before its added to the cluster?

after reading #196 (comment) i think i understand what the situation (i think)

tl;dr, there is a Right Way, but its kind of complicated, and leaving it insecure isn't that big of a deal

so just use --kubelet-insecure-tls

OR DONT!!

https://particule.io/en/blog/kubeadm-metrics-server/ does work!

this is basically a copy of a section in kubernetes documentation: Certificate Management with kubeadm:

  • create a new cluster (though there are instructions for an existing cluster, i just made a new cluster because mine was empty) using that KubeletConfiguration thing, documented here
  • kubectl certificate approve the csrs generated by kubeadm
  • kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml without any modification to metrics-server configuration

Thanks! It WORKS!

@pcgeek86
Copy link

I'm having the same error with Kubernetes metrics-server version 0.6.1 running on Linode Kubernetes Engine (LKE) k8s version 1.23.

Installation command:

helm install --namespace metrics metrics-server/metrics-server --generate-name

In the Metrics Server Pod logs, I just see this:

92.168.143.61 because it doesn't contain any IP SANs" node="lke60627-94194-627bc5588479"
I0511 21:56:53.747234       1 server.go:187] "Failed probe" probe="metric-storage-ready" err="no metrics to serve"
I0511 21:56:58.272958       1 server.go:187] "Failed probe" probe="metric-storage-ready" err="no metrics to serve"
I0511 21:57:03.748846       1 server.go:187] "Failed probe" probe="metric-storage-ready" err="no metrics to serve"
E0511 21:57:06.607078       1 scraper.go:140] "Failed to scrape node" err="Get \"https://192.168.203.181:10250/metrics/resource\": x509: cannot validate certificate for 192.168.203.181 because it doesn't contain any IP SANs" node="lke60627-94194-627bc558aaae"
E0511 21:57:06.609430       1 scraper.go:140] "Failed to scrape node" err="Get \"https://192.168.143.61:10250/metrics/resource\": x509: cannot validate certificate for 192.168.143.61 because it doesn't contain any IP SANs" node="lke60627-94194-627bc5588479"
I0511 21:57:13.753194       1 server.go:187] "Failed probe" probe="metric-storage-ready" err="no metrics to serve"
E0511 21:57:21.603808       1 scraper.go:140] "Failed to scrape node" err="Get \"https://192.168.143.61:10250/metrics/resource\": x509: cannot validate certificate for 192.168.143.61 because it doesn't contain any IP SANs" node="lke60627-94194-627bc5588479"
E0511 21:57:21.603960       1 scraper.go:140] "Failed to scrape node" err="Get \"https://192.168.203.181:10250/metrics/resource\": x509: cannot validate certificate for 192.168.203.181 because it doesn't contain any IP SANs" node="lke60627-94194-627bc558aaae"
I0511 21:57:23.749247       1 server.go:187] "Failed probe" probe="metric-storage-ready" err="no metrics to serve"
I0511 21:57:33.745705       1 server.go:187] "Failed probe" probe="metric-storage-ready" err="no metrics to serve"
E0511 21:57:36.599133       1 scraper.go:140] "Failed to scrape node" err="Get \"https://192.168.203.181:10250/metrics/resource\": x509: cannot validate certificate for 192.168.203.181 because it doesn't contain any IP SANs" node="lke60627-94194-627bc558aaae"
E0511 21:57:36.611810       1 scraper.go:140] "Failed to scrape node" err="Get \"https://192.168.143.61:10250/metrics/resource\": x509: cannot validate certificate for 192.168.143.61 because it doesn't contain any IP SANs" node="lke60627-94194-627bc5588479"

@khmarochos
Copy link

I'd only like to mention that the article contains a piece of advice to edit /var/lib/kubelet/config.yaml, which is not always absolutely correct. In some clusters (e.g. created by Kubespray) kubelet uses /etc/kubernetes/kubelet-config.yaml.

Thus, one can make changes to /var/lib/kubelet/config.yaml, restart kubelet and then get surprised that there are no CSRs'. One should look up the kubelet's startup parameters and then change the correct configuration file.

Cheers 🍻

@awalker125
Copy link

awalker125 commented Nov 11, 2022

Obviously not for production but if you just have a local dev cluster you created with sudo kubeadm init and you want to add metrics to it quickly then this one liner will do it.

kubectl -n kube-system patch deployment metrics-server --type=json \
-p='[{"op": "add", "path": "/spec/template/spec/containers/0/args/-", "value": "--kubelet-insecure-tls"}]]'

@advissor
Copy link

@awalker125 Works like a charm!
** Helped with metric server on self hosted cluster created via kubeadm

Changes are not applied immediately, you still need to restart the metrics server pod to read the patched config map :

kubectl rollout restart deployment metrics-server -n kube-system

@phosae
Copy link

phosae commented Apr 23, 2023

I solve it in KinD, and write a detailed post

https://www.zeng.dev/post/2023-kubeadm-enable-kubelet-serving-certs/

@v-wan
Copy link

v-wan commented Sep 16, 2023

Brilliant @phosae, thanks for the detailed solution for kind.

@softlion
Copy link

softlion commented Sep 28, 2023

My solution for a running cluster with metric server already installed (I installed it with helm).

On the control plane:

k -n kube-system edit configmap kubelet-config

Append serverTLSBootstrap: true in the kubelet: section and save.

On each node:

sudo nano /var/lib/kubelet/config.yaml

Append serverTLSBootstrap: true at the bottom and save.

sudo systemctl restart kubelet

for kubeletcsr in `kubectl -n kube-system get csr | grep kubernetes.io/kubelet-serving | awk '{ print $1 }'`; do kubectl certificate approve $kubeletcsr; done

Verify:

k logs -f -n kube-system `k get pods -n kube-system | grep metrics-server | awk '{ print $1 }'`

kubectl top pods --all-namespaces

Done.
No need to restart the metric-storage pod.

@LilMonk
Copy link

LilMonk commented Dec 1, 2023

I solve it in KinD, and write a detailed post

https://www.zeng.dev/post/2023-kubeadm-enable-kubelet-serving-certs/

This worked for my local kind setup. Thank you @phosae

@salehhoushangi
Copy link

The case is, when you add - --kubelet-insecure-tls, it does not matter what the secure port is, so this configuration works for me:
containers:
- args:
- --kubelet-insecure-tls
- --cert-dir=/tmp
- --secure-port=10250
- --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
- --kubelet-use-node-status-port
- --metric-resolution=15s

@adamency
Copy link

adamency commented Apr 5, 2024

Definite answer for people having setup their cluster with kubeadm :

(which I believe is the majority of people here)

is to make this configuration change on all their Kubelets: https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-certs/#kubelet-serving-certs

I.e. use certs actually signed by the API Server CA instead of self-signed certificates.

This requires you to approve the certificates manually and is precisely why this cannot be the default init configuration for kubeadm.

PS: People need to stop sharing their own opinion, article, advice, etc.... as absolutely ALL answer here fall into 2 categories:

  • advise to use --kubelet-insecure-tls which is NOT SAFE
  • tell to use serverTLSBootstrap in the Kubelet configuration which is precisely what is already described in the official Kubernetes docs I linked above. PLEASE STOP DUPLICATING INFORMATION (and adding noise everywhere)

@LKD-PIX
Copy link

LKD-PIX commented May 27, 2024

I am having this issue on a "Kubernetes the hard way" cluster that i created from scratch without kubeadm, kind, helm or the like.
I didn't really understand the root cause of this issue from the conversation nor do i see a way for me to fix this.
Scraping the control node iteself works fine for me but i can't scrape my worker node

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests