Metrics server issue with hostname resolution of kubelet and apiserver unable to communicate with metric-server clusterIP #131

vikranttkamble · 2018-09-03T06:49:50Z

Metric-server unable to resolve the hostname to scrape the metrics from kubelet.

E0903  1 manager.go:102] unable to fully collect metrics: [unable to fully scrape metrics from source kubelet_summary:<hostname>: unable to fetch metrics from Kubelet <hostname> (<hostname>): Get https://<hostname>:10250/stats/summary/: dial tcp: lookup <hostname> on 10.96.0.10:53: no such host

I figured its not resolving the hostname from kubedns

and as mentioned in following issues: #105 (comment)
and #97

I did try to edit kubectl -n kube-system edit deploy metrics-server But metrics-server pod entered the error state.

The describe apiservice v1beta1.metrics.k8s.io have message:

no response from https://10.101.248.96:443: Get https://10.101.248.96:443: Proxy Error ( Connection refused )

10.101.248.96 being the clusterIP of the metric-server.

The text was updated successfully, but these errors were encountered:

MIBc · 2018-09-03T09:55:30Z

@vikranttkamble you can try --kubelet-preferred-address-types InternalIP,Hostname,InternalDNS,ExternalDNS,ExternalIP

damascenorakuten · 2018-09-03T10:38:52Z

I'm having the same issue. +1

juan-vg · 2018-09-03T11:07:42Z

I think the main problem is that the hostname resolution is being performed through the internal DNS server (which is set by default to the pod where the metrics-server runs in). That server contains the pods/services entries, but not the cluster-nodes ones. AFAIK the cluster nodes are not in that scope so they can't be resolved via that DNS. The InternalIP should be queried from the api instead

damascenorakuten · 2018-09-03T11:09:24Z

The solution proposed by @MIBc works. Change the metrics-server-deployment.yaml file and add:

        command:
        - /metrics-server
        - --kubelet-preferred-address-types=InternalIP

The metrics-server is now able to talk to the node (It was failing before because it could not resolve the hostname of the node). There's something strange happening though, I can see the metrics now from HPA but it shows an error in the logs:

E0903 11:04:45.914111       1 reststorage.go:98] unable to fetch pod metrics for pod default/my-nginx-84c58b9888-whg8r: no metrics known for pod "default/my-nginx-84c58b9888-whg8r"

amolredhat · 2018-09-03T15:15:04Z

I and Vikrant is working same servers, We are now able to edit Metric Server from below command.
kubectl -n kube-system edit deploy metrics-server
But we are facing issues with proxy issues.

$ kubectl describe apiservice v1beta1.metrics.k8s.io

Name:         v1beta1.metrics.k8s.io
Namespace:    
Labels:       <none>
Annotations:  <none>
API Version:  apiregistration.k8s.io/v1
Kind:         APIService
Metadata:
  Creation Timestamp:  2018-09-03T12:36:06Z
  Resource Version:    985112
  Self Link:           /apis/apiregistration.k8s.io/v1/apiservices/v1beta1.metrics.k8s.io
  UID:                 ed81fe44-af75-11e8-8333-ac162d793244
Spec:
  Group:                     metrics.k8s.io
  Group Priority Minimum:    100
  Insecure Skip TLS Verify:  true
  Service:
    Name:            metrics-server
    Namespace:       kube-system
  Version:           v1beta1
  Version Priority:  100
Status:
  Conditions:
    Last Transition Time:  2018-09-03T12:36:06Z
    Message:               no response from https://10.101.212.101:443: Get https://10.101.212.101:443: Proxy Error ( Connection refused )
    Reason:                FailedDiscoveryCheck
    Status:                False
    Type:                  Available
Events:                    <none>

kubectl get --raw /apis/metrics.k8s.io/v1beta1/nodes
Error from server (ServiceUnavailable): the server is currently unable to handle the request

amolredhat · 2018-09-03T15:40:00Z

In Metrics Servers we are found below logs.

E0903 15:36:38.239003 1 manager.go:102] unable to fully collect metrics: [unable to fully scrape metrics from source kubelet_summary:nvm250d00: unable to fetch metrics from Kubelet nvm250d00 (10.130.X.X): Get https://10.130.X.X:10250/stats/summary/: x509: cannot validate certificate for 10.130.X.X because it doesn't contain any IP SANs, unable to fully scrape metrics from source kubelet_summary:nvmbd1aow270d00: unable to fetch metrics from Kubelet

MIBc · 2018-09-04T08:59:01Z

It is OK when set the kubelet flag "--authorization-mode=AlwaysAllow", and metrics-server flag "--kubelet-insecure-tls".

MIBc · 2018-09-04T09:03:23Z

I think it should authorize for metrics-server to access kubelet if authorization-mode=webhook.

amolredhat · 2018-09-04T11:46:55Z

We also got, SSL issue and get sock connection refused issue, and resolved with below conf parameters in metrics-server-deployment.yaml

containers:
      - name: metrics-server
        image: k8s.gcr.io/metrics-server-amd64:v0.2.1
        command:
        - /metrics-server
        - --source=kubernetes.summary_api:''?useServiceAccount=true&kubeletHttps=true&kubeletPort=10250&insecure=true
        - --requestheader-allowed-names=

We are currently facing proxy issue and working on it.

vikranttkamble · 2018-09-04T12:05:48Z

@MIBc --kubelet-preferred-address-types InternalIP,Hostname,InternalDNS,ExternalDNS,ExternalIP this parameter is for proxy issue?

no response from https://10.101.248.96:443: Get https://10.101.248.96:443: Proxy Error ( Connection refused )

Also InternalIP we have to put IP or just the keep it InternalIP?

juan-vg · 2018-09-04T15:13:31Z

@amolredhat The '--source' flag is unavailable right now (v0.3.0-alpha.1)

I (finally) got it to work by setting the following args:

        command:
        - /metrics-server
        - --kubelet-insecure-tls
        - --kubelet-preferred-address-types=InternalIP

It works like a charm!

kaiterramike · 2018-09-05T06:24:10Z

@juan-vg awesome, this also works for me too (metrics-server-amd64:v.0.3.0 on k8s 1.10.3). Btw, so as not to duplicate the entrypoint set in the Dockerfile, consider using args: instead:

        args:
        - --kubelet-insecure-tls
        - --kubelet-preferred-address-types=InternalIP

DirectXMan12 · 2018-09-05T17:20:52Z

If you're going to use InternalIP, you should probably set up your node's serving certs to list the IP as an alternative name. You generally don't want to pass kubelet-insecure-tls except in testing setups.

wilsonjackson · 2018-09-07T21:59:27Z

This seems to be a blocking issue for 0.3.0 when running a kops deployment on AWS using a private network topology.

dial tcp: lookup ip-x-x-x-x.us-west-2.compute.internal on 100.64.0.10:53: no such host

Naturally kubedns can't resolve that hostname. I tried setting dnsPolicy: Default in the metrics-server deployment, which skirts the DNS issue, but then I see this:

x509: certificate signed by unknown authority

Not really sure what to do with that. I don't want to start monkeying with my node's certs without knowing exactly what I'm fixing. For now I've had to revert to metrics-server 0.2.1.

DirectXMan12 · 2018-09-10T15:37:57Z

You're the second person to mention issues with kops (#133), so I'm starting to think that kops sets up its certs differently than expected. Basically, the issue is that whatever the kops kubelet serving certs are, they aren't signed by the default kubernetes CA. Can we maybe get a kops maintainer in here to comment?

amolredhat · 2018-09-10T16:13:52Z

@wilsonjackson @DirectXMan12
Observed because of proxy, request was not serving Internally, We Configured proxy server on one of the master server with NoProxy configuration for Internal IPs.

And it Worked !!

Also, we changed some parameters in kubernetes/manifests/kube-apiserver.yaml

apiVersion: v1
kind: Pod
metadata:
  annotations:
    scheduler.alpha.kubernetes.io/critical-pod: ""
  creationTimestamp: null
  labels:
    component: kube-apiserver
    tier: control-plane
  name: kube-apiserver
  namespace: kube-system
spec:
  containers:
  - command:
    - kube-apiserver
    - --authorization-mode=Node,RBAC
    #- --authorization-mode=AlwaysAllow
    #- --kubelet_tls_verify=True
    - --advertise-address=MASTERIP
    - --allow-privileged=true
    - --client-ca-file=/etc/kubernetes/pki/ca.crt
    #- --disable-admission-plugins=
    # https://github.com/kubernetes/website/issues/6012, https://kubernetes.io/docs/reference/command-line-tools-reference/kube-apiserver/
    - --enable-admission-plugins=NodeRestriction,DefaultStorageClass,PersistentVolumeClaimResize,PersistentVolumeLabel
    - --enable-bootstrap-token-auth=true
    - --etcd-cafile=/etc/kubernetes/pki/etcd/ca.crt
    - --etcd-certfile=/etc/kubernetes/pki/apiserver-etcd-client.crt
    - --etcd-keyfile=/etc/kubernetes/pki/apiserver-etcd-client.key
    - --etcd-servers=https://127.0.0.1:2379
    - --insecure-port=0
    - --kubelet-client-certificate=/etc/kubernetes/pki/apiserver-kubelet-client.crt
    - --kubelet-client-key=/etc/kubernetes/pki/apiserver-kubelet-client.key
    - --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
    - --proxy-client-cert-file=/etc/kubernetes/pki/front-proxy-client.crt
    - --proxy-client-key-file=/etc/kubernetes/pki/front-proxy-client.key
    - --requestheader-allowed-names=
    - --requestheader-client-ca-file=/etc/kubernetes/pki/front-proxy-ca.crt

724399396 · 2018-09-14T06:52:47Z

I temporary fix this by edit metric-service deployemnt add this cofnig unser Deployemnt.spec.template.spec

      hostAliases:
      - hostnames:
        - k8s-master1
        ip: xxxxx
      - hostnames:
        - k8s-node1
        ip: yyyyy
      - hostnames:
        - k8s-node2
        ip: zzzzz

Demon-DK · 2018-09-19T16:06:55Z

Obviously the issue in
lookup <hostname-ip> in <dns-service-ip>..... no such host

In my case coreDNS is using for cluster DNS resolutions.
By default coreDNS ( in my case deployed with Kubespray) is set only for services name resolution and not pods/nodes.

Then we could look at default config for coreDNS

kind: ConfigMap
metadata:  name: coredns
  namespace: kube-system
data:  Corefile: |
    .:53 {
        errors
        log stdout
        health
        kubernetes cluster.local {
          cidrs 10.3.0.0/24
        }
        proxy . /etc/resolv.conf
        cache 30
    }

This option proxy . /etc/resolv.conf in general is meaning that your DNS service will use your external nameserver (in my case there were defined external nameservers) for node name resolution.

So, yes, I've looked in my DNS logs and found that my DNS were received that requests.

Eventually, I've just added my node's hostnames records to my external DNS service and that's it.
Metrics are collecting successfully.

DirectXMan12 · 2018-09-24T15:18:39Z

Awesome. I'm going to close this issue, but feel free to ping me if you think it's not solved yet.

kidlj · 2018-09-26T11:31:32Z

Obviously the issue in
lookup <hostname-ip> in <dns-service-ip>..... no such host

In my case coreDNS is using for cluster DNS resolutions.
By default coreDNS ( in my case deployed with Kubespray) is set only for services name resolution and not pods/nodes.

Then we could look at default config for coreDNS
kind: ConfigMap
metadata:  name: coredns
  namespace: kube-system
data:  Corefile: |
    .:53 {
        errors
        log stdout
        health
        kubernetes cluster.local {
          cidrs 10.3.0.0/24
        }
        proxy . /etc/resolv.conf
        cache 30
    }
This option proxy . /etc/resolv.conf in general is meaning that your DNS service will use your external nameserver (in my case there were defined external nameservers) for node name resolution.

So, yes, I've looked in my DNS logs and found that my DNS were received that requests.

Eventually, I've just added my node's hostnames records to my external DNS service and that's it.
Metrics are collecting successfully.

Hi, I'm using kube-dns instead of coredns, and I have my node's /etc/hosts set properly, and it still fails:

E0926 11:30:18.620009       1 manager.go:102] unable to fully collect metrics: unable to fully scrape metrics from source kubelet_summary:kube: unable to fetch metrics from Kubelet kube (kube): Get https://kube:10250/stats/summary/: dial tcp: lookup kube on 10.96.0.10:53: no such host

xiaotian45123 · 2018-09-27T03:32:39Z

显然问题在于
lookup <hostname-ip> in <dns-service-ip>..... no such host

就我而言，coreDNS用于群集DNS解析。
默认情况下，coreDNS（在我的情况下使用Kubespray部署）仅设置用于服务名称解析而不是pods / nodes。

然后我们可以查看coreDNS的默认配置
kind: ConfigMap
metadata:  name: coredns
  namespace: kube-system
data:  Corefile: |
    .:53 {
        errors
        log stdout
        health
        kubernetes cluster.local {
          cidrs 10.3.0.0/24
        }
        proxy . /etc/resolv.conf
        cache 30
    }
此选项proxy . /etc/resolv.conf通常意味着您的DNS服务将使用您的外部名称服务器（在我的情况下，已定义外部名称服务器）来进行节点名称解析。

所以，是的，我查看了我的DNS日志，发现我的DNS收到了请求。

最后，我刚刚将我的节点的主机名记录添加到我的外部DNS服务中，就是这样。
度量标准已成功收集。

The host uses /etc/hosts to parse. How to handle it better?

xiaotian45123 · 2018-09-27T06:03:07Z

coredns，并且我已正确设置节点的/ etc / hosts，但它仍然失败：

问题解决了吗？

Demon-DK · 2018-09-27T14:54:58Z

Hi, I'm using kube-dns instead of coredns, and I have my node's /etc/hosts set properly, and it still fails:

E0926 11:30:18.620009       1 manager.go:102] unable to fully collect metrics: unable to fully scrape metrics from source kubelet_summary:kube: unable to fetch metrics from Kubelet kube (kube): Get https://kube:10250/stats/summary/: dial tcp: lookup kube on 10.96.0.10:53: no such host

Hi,
I'd recommend just start with making things more clear

$ kubectl exec -it -n <metrics-server-namespace> metrics-server-xxxx -- sh
/ # nslookup kube

** because in your logs requests are making to https://kube:<port>/bla/bla/bla

I can assume your nslookup request will fail.
If I right you have to investigate your cluster DNS settings and then this is not related metrics-server issue.

TracyBin · 2018-10-10T09:43:57Z

If you're going to use InternalIP, you should probably set up your node's serving certs to list the IP as an alternative name. You generally don't want to pass kubelet-insecure-tls except in testing setups.
@DirectXMan12 How to config node using two hostnames?

TracyBin · 2018-10-10T10:42:17Z

@originsmike Another problem after modify tls and internalIP

[root@192 ~]# docker logs -f fa55e7f7343a
I1010 10:40:01.108023       1 serving.go:273] Generated self-signed cert (apiserver.local.config/certificates/apiserver.crt, apiserver.local.config/certificates/apiserver.key)
[restful] 2018/10/10 10:40:32 log.go:33: [restful/swagger] listing is available at https://:443/swaggerapi
[restful] 2018/10/10 10:40:32 log.go:33: [restful/swagger] https://:443/swaggerui/ is mapped to folder /swagger-ui/
I1010 10:40:33.308883       1 serve.go:96] Serving securely on [::]:443
I1010 10:40:33.609544       1 logs.go:49] http: TLS handshake error from 172.20.0.1:49456: EOF
E1010 10:41:02.208299       1 manager.go:102] unable to fully collect metrics: [unable to fully scrape metrics from source kubelet_summary:master: unable to fetch metrics from Kubelet master (192.168.8.184): request failed - "403 Forbidden", response: "Forbidden (user=system:serviceaccount:kube-system:metrics-server, verb=get, resource=nodes, subresource=stats)", unable to fully scrape metrics from source kubelet_summary:node: unable to fetch metrics from Kubelet node (192.168.8.185): request failed - "403 Forbidden", response: "Forbidden (user=system:serviceaccount:kube-system:metrics-server, verb=get, resource=nodes, subresource=stats)"]
E1010 10:41:32.116815       1 manager.go:102] unable to fully collect metrics: [unable to fully scrape metrics from source kubelet_summary:node: unable to fetch metrics from Kubelet node (192.168.8.185): request failed - "403 Forbidden", response: "Forbidden (user=system:serviceaccount:kube-system:metrics-server, verb=get, resource=nodes, subresource=stats)", unable to fully scrape metrics from source kubelet_summary:master: unable to fetch metrics from Kubelet master (192.168.8.184): request failed - "403 Forbidden", response: "Forbidden (user=system:serviceaccount:kube-system:metrics-server, verb=get, resource=nodes, subresource=stats)"]

shimpikk · 2018-10-20T18:15:33Z

I am facing bit different issue here. Don't know if it is metrics server problem or api server. But thought of posting here. Please see below command output and logs and let me know whts wrong.

I believe ApiServer is not able to contact metrics-server over internal IPs.

# kubectl -n kube-system describe apiservice v1beta1.metrics.k8s.io
Name: v1beta1.metrics.k8s.io
Namespace:
Labels:
Annotations: kubectl.kubernetes.io/last-applied-configuration:
{"apiVersion":"apiregistration.k8s.io/v1beta1","kind":"APIService","metadata":{"annotations":{},"name":"v1beta1.metrics.k8s.io"},"spec":{"...
API Version: apiregistration.k8s.io/v1
Kind: APIService
Metadata:
Creation Timestamp: 2018-10-20T16:10:05Z
Resource Version: 638754
Self Link: /apis/apiregistration.k8s.io/v1/apiservices/v1beta1.metrics.k8s.io
UID: 9b8e655c-d482-11e8-9794-0050569160a8
Spec:
Group: metrics.k8s.io
Group Priority Minimum: 100
Insecure Skip TLS Verify: true
Service:
Name: metrics-server
Namespace: kube-system
Version: v1beta1
Version Priority: 100
Status:
Conditions:
Last Transition Time: 2018-10-20T16:10:05Z
Message: no response from https://10.99.121.153:443: Get https://10.99.121.153:443: dial tcp 10.99.121.153:443: connect: no route to host
Reason: FailedDiscoveryCheck
Status: False
Type: Available
Events:

# kubectl -n kube-system get svc -o wide
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
kube-dns ClusterIP 10.96.0.10 53/UDP,53/TCP 2d8h k8s-app=kube-dns
kubernetes-dashboard NodePort 10.97.142.189 80:31378/TCP 5d7h k8s-app=kubernetes-dashboard
metrics-server ClusterIP 10.99.121.153 443/TCP 104m k8s-app=metrics-server

Logs from APIServer

E1020 17:56:51.069077 1 available_controller.go:311] v1beta1.metrics.k8s.io failed with: Get https://10.99.121.153:443: dial tcp 10.99.121.153:443: connect: no route to host
I1020 17:56:52.046598 1 controller.go:105] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io
E1020 17:56:52.046777 1 controller.go:111] loading OpenAPI spec for "v1beta1.metrics.k8s.io" failed with: failed to retrieve openAPI spec, http error: ResponseCode: 503, Body: service unavailable
, Header: map[Content-Type:[text/plain; charset=utf-8] X-Content-Type-Options:[nosniff]]
I1020 17:56:52.046803 1 controller.go:119] OpenAPI AggregationController: action for item v1beta1.metrics.k8s.io: Rate Limited Requeue.
E1020 17:56:52.250466 1 memcache.go:134] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request

shimpikk · 2018-10-23T07:37:01Z

I temporary fix this by edit metric-service deployemnt add this cofnig unser Deployemnt.spec.template.spec
      hostAliases:
      - hostnames:
        - k8s-master1
        ip: xxxxx
      - hostnames:
        - k8s-node1
        ip: yyyyy
      - hostnames:
        - k8s-node2
        ip: zzzzz

Would you please provide some details on how to do it?

shimpikk · 2018-10-23T08:23:57Z

@DirectXMan12 ,
Do you have any inputs for the issue above?
I have created a cluster on centos 7.

nabheet · 2019-07-30T21:29:01Z

@silverbackdan Hopefully I can help. I assume you already know about CSRs. You can start with https://kubernetes.io/docs/tasks/tls/managing-tls-in-a-cluster/#create-a-certificate-signing-request-object-to-send-to-the-kubernetes-api.

Once you approve and download the cert, put the private key from your CSR in /var/lib/kubelet/pki/kubelet.key and the cert in /var/lib/kubelet/pki/kubelet.crt. Then restart kubelet service.

service kubelet restart (or systemctl)

you are good to go.

silverbackdan · 2019-07-31T09:26:16Z

Thanks very much for this info. kubectl get csr is a new one to me and that link is very helpful about CSRs - I know about them but not so much in the context outlined in that link.

Just to check, I'd be replacing the key and cert already at those paths because they do exist. I imagine they are just self-signed which is what is being used at the moment and therefore is seen as insecure?

nabheet · 2019-07-31T15:49:32Z

Yes, that is correct. They are self-signed certs and you will be replacing them with certs signed by your Kubernetes cluster CA. I wish the kubeadm join command would do this but IIRC I think there is a chicken and egg problem here. I am not sure exactly how but for some reason "they" decided to not implement this replacement feature. I think there is a closed issue in the K8S repo regarding this too. I can try to find it if needed. But hopefully, you should be set now.

nabheet · 2019-07-31T16:02:13Z

Also, I wonder if this would avoid the issue - https://kubernetes.io/docs/tasks/tls/certificate-rotation/.

silverbackdan · 2019-07-31T16:25:41Z

Perhaps, specifically the --bootstrap-kubeconfig flag https://kubernetes.io/docs/tasks/tls/certificate-rotation/#understanding-the-certificate-rotation-configuration

I had to run this command to change the cert SANs for the API and to listen on different IPs

kubeadm init phase certs all \
  --apiserver-advertise-address=0.0.0.0 \
  --apiserver-cert-extra-sans=10.244.0.1,11.0.0.10,example.com

It'd have been nice if that command could also sign the cert with the cluster's CA, anyway, I see where things are going wrong now and can use your very helpful notes to take me further.

nabheet · 2019-07-31T16:44:44Z

Nice! I was reading that same page too. There might be more learnings on the page you linked for me too. Queued for when I have some learning time. :-)

AdelBouridah · 2019-10-29T15:22:54Z

I have the same issue resolved by adding :
command:
- /metrics-server
- --kubelet-insecure-tls
- --kubelet-preferred-address-types=InternalIP

Whereas the metrics server now return only used ressources for The master node (always unknown for workers node).

NB. Deployment done with vagrant (hence kubebadm).

Please have you any hint to solve it

electrocucaracha · 2019-11-08T01:10:18Z

I'm still getting the error, I'm using Kubespray v2.11.0 to deploy a Kubernetes v1.15.3 with the following instruction:

helm install stable/metrics-server --set args[0]="--kubelet-insecure-tls" --set args[1]="--kubelet-preferred-address-types=InternalIP" --name metrics-server

carlylelu · 2019-11-12T06:17:24Z

The solution proposed by @MIBc works. Change the metrics-server-deployment.yaml file and add:
        command:
        - /metrics-server
        - --kubelet-preferred-address-types=InternalIP
The metrics-server is now able to talk to the node (It was failing before because it could not resolve the hostname of the node). There's something strange happening though, I can see the metrics now from HPA but it shows an error in the logs:
E0903 11:04:45.914111       1 reststorage.go:98] unable to fetch pod metrics for pod default/my-nginx-84c58b9888-whg8r: no metrics known for pod "default/my-nginx-84c58b9888-whg8r"

I just got the same situation on my metrics-server,unable to fetch pod metrics for pod default/xxxxxxxx: no metrics known for pod "default/xxxxxxxx" @damascenorakuten
Do you fix it already?

serathius · 2019-11-12T09:03:25Z

@carlylelu appirence of this log does not mean that there is a problem, nor points to possible cause.
More info #349 (comment)
This Issue is about host resolution. Example log dial tcp: lookup <hostname> on 10.96.0.10:53: no such host

lucj · 2019-11-12T14:17:06Z

Got the same issue (host resolution) when installing server-metrics in a DO cluster. I think I did not have the issue a couple of days ago :(

weshouman · 2020-02-19T03:40:39Z

One small detail that may get overlooked, in case a proxy is used, that could be caused by forgetting to add the 10.0.0.0/8 to the no_proxy list.

veton · 2020-04-23T21:08:12Z

In recent versions of metrics-server, where there is no "command" or "metrics-server-deployment.yaml", the following helped me

Open deployment editor
kubectl -n kube-system edit deploy metrics-server
Add a few args:

      - args:
        - --cert-dir=/tmp
        - --secure-port=4443
        - --v=2
        - --kubelet-insecure-tls
        - --kubelet-preferred-address-types=InternalIP

progapandist · 2020-06-03T09:01:04Z

@veton 's answer is the most on-point and up to date, thank you! 🙏

HisokaHous · 2020-08-27T12:49:29Z

In recent versions of metrics-server, where there is no "command" or "metrics-server-deployment.yaml", the following helped me

Open deployment editor
kubectl -n kube-system edit deploy metrics-server

Add a few args:
      - args:
        - --cert-dir=/tmp
        - --secure-port=4443
        - --v=2
        - --kubelet-insecure-tls
        - --kubelet-preferred-address-types=InternalIP

Thanks, that solved my problem too

justinaslelys · 2020-09-07T12:16:28Z

Adding --kubelet-preferred-address-types=InternalIP flag helped me to fix metrics-server 0.3.6 after enabling NodeLocal DNS Cache.

edwardfberliner · 2021-01-27T21:16:21Z

I temporary fix this by edit metric-service deployemnt add this cofnig unser Deployemnt.spec.template.spec
      hostAliases:
      - hostnames:
        - k8s-master1
        ip: xxxxx
      - hostnames:
        - k8s-node1
        ip: yyyyy
      - hostnames:
        - k8s-node2
        ip: zzzzz

I tried this suggestion and it worked for me.

My problem:
All my nodes have same InternalIP (10.0.2.15) due to VirtualBox/NAT configuration so with --kubelet-preferred-address-types=InternalIP my metrics-server could only query it's own node. I then tried --kubelet-preferred-address-types=Hostname
but DNS was incorrectly resolving the hostname (when I used nslookup on the DNS server it returned an incorrect IP address so I don't know what is wrong with my configuration). There are other solutions to use a NodePort or a LoadBalancer but they don't seem applicable to my situation since the metric-server is already inside the cluster, and they are more complicated as well.

Here is my working yaml file:
metrics-server-components.zip

You can compare to the one in GitHub

Note: the IP addresses I used for the hostAliases are the ones that were assigned to the nodes for the pod network by Flannel, so they are reachable from all of the containers.

QooGeek · 2021-02-03T10:44:54Z

[root@k8s-master1:~]# kubectl get svc -n kube-system -o wide
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
kube-dns ClusterIP 10.96.0.10 53/UDP,53/TCP,9153/TCP 20d k8s-app=kube-dns
metrics-server ClusterIP 10.99.131.42 443/TCP 20d k8s-app=metrics-server

[root@k8s-master1:~]# kubectl describe apiservice v1beta1.metrics.k8s.io
Name: v1beta1.metrics.k8s.io
Namespace:
Labels:
Annotations:
API Version: apiregistration.k8s.io/v1
Kind: APIService
Metadata:
Creation Timestamp: 2021-01-14T05:44:07Z
Resource Version: 6313248
Self Link: /apis/apiregistration.k8s.io/v1/apiservices/v1beta1.metrics.k8s.io
UID: 09c2cee7-af53-4920-bd7e-1617c4068da3
Spec:
Group: metrics.k8s.io
Group Priority Minimum: 100
Insecure Skip TLS Verify: true
Service:
Name: metrics-server
Namespace: kube-system
Port: 443
Version: v1beta1
Version Priority: 100
Status:
Conditions:
Last Transition Time: 2021-01-14T05:44:07Z
Message: failing or missing response from https://10.99.131.42:443/apis/metrics.k8s.io/v1beta1: Get "https://10.99.131.42:443/apis/metrics.k8s.io/v1beta1": dial tcp 10.99.131.42:443: connect: connection refused
Reason: FailedDiscoveryCheck
Status: False
Type: Available
Events:

[root@k8s-master1:~]# kubectl top pods
Error from server (ServiceUnavailable): the server is currently unable to handle the request (get pods.metrics.k8s.io)

[root@k8s-master1:~]# kubectl logs -f -n kube-system metrics-server-69fbcb864b-jvkkr
[..........]
I0203 10:38:52.792663 1 round_trippers.go:443] GET https://k8s-master1:10250/stats/summary?only_cpu_and_memory=true 200 OK in 29 milliseconds
I0203 10:38:52.823497 1 round_trippers.go:443] GET https://k8s-node1:10250/stats/summary?only_cpu_and_memory=true 200 OK in 54 milliseconds
I0203 10:38:52.842695 1 round_trippers.go:443] GET https://k8s-node2:10250/stats/summary?only_cpu_and_memory=true 200 OK in 70 milliseconds
I0203 10:38:52.844152 1 scraper.go:168] ScrapeMetrics: time: 101.319542ms, nodes: 4, pods: 60
I0203 10:38:52.844195 1 server.go:138] ...Storing metrics...
I0203 10:38:52.844264 1 server.go:143] ...Cycle complete

[root@k8s-master1:~]# cat metrics-server-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: metrics-server
namespace: kube-system
labels:
k8s-app: metrics-server
spec:
selector:
matchLabels:
k8s-app: metrics-server
template:
metadata:
name: metrics-server
labels:
k8s-app: metrics-server
spec:
serviceAccountName: metrics-server
volumes:
# mount in tmp so we can safely use from-scratch images and/or read-only containers
- name: tmp-dir
emptyDir: {}
containers:
- name: metrics-server
image: ip/library/metrics-server:v0.4.1
imagePullPolicy: IfNotPresent
args:
- --v=6
- --cert-dir=/tmp
- --secure-port=4443
- --kubelet-preferred-address-types=Hostname
- --kubelet-use-node-status-port
- --kubelet-insecure-tls
ports:
- name: main-port
containerPort: 4443
protocol: TCP
securityContext:
readOnlyRootFilesystem: true
runAsNonRoot: true
runAsUser: 1000
volumeMounts:
- name: tmp-dir
mountPath: /tmp
hostAliases:
- hostnames:
- k8s-master1
ip: 192.xx.x.x
- hostnames:
- k8s-node1
ip: 192.x.x.x
- hostnames:
- k8s-node2
ip: 192.x.x.x
- hostnames:
- k8s-node3
ip: 192.x.x.x
nodeSelector:
kubernetes.io/os: linux

edwardfberliner · 2021-02-03T21:50:47Z

some other things to try:

verify that 192.x.x.x IP addresses are in pod network; check kube-controller-manager process: "--cluster-cidr=" option which was configured during kubeadm init with "--pod-network-cidr" option

verify that pod network was configured to use correct interface in each node (e.g. flanneld command has an "--iface option); I had to change my daemonset/kube-flannel-ds configuration to use the flannel (pod) network interface; otherwise it would default to an internal (VirtualBox) network interface

check your kube-proxy process; if you see "--hostname-override=...", try removing this argument from the daemonset/kube-proxy yaml configuration

try setting "- --enable-aggregator-routing=true" in kube-apiserver

DirectXMan12 closed this as completed Sep 24, 2018

lesomnus mentioned this issue Oct 29, 2018

metrics-server 0.3.1 installed but FailedGetResourceMetric #150

Closed

deliahu mentioned this issue Aug 7, 2019

Remove per-node pod limit cortexlabs/cortex#294

Closed

3 tasks

jenting mentioned this issue Nov 18, 2019

Kubelet use cluster-wide root CA, not per-node (bsc#1155810) SUSE/skuba#832

Merged

vsinghal13 mentioned this issue Dec 13, 2019

Add HorizontalPodAutoscaler for fluentd SumoLogic/sumologic-kubernetes-collection#339

Merged

3 tasks

scoopex mentioned this issue Dec 20, 2019

Missing installation instructions for production installs. #340

Closed

logan2211 mentioned this issue Feb 13, 2020

metrics-server deployments cannot connect to kubelet InternalIP kubernetes/kubeadm#2028

Closed

rabenhorst mentioned this issue Apr 29, 2020

Resource Metrics API (metrics-server) kubernetes-sigs/kind#398

Closed

thanos1983 mentioned this issue Aug 20, 2020

Document securing connection between Metrics Server <-> Kubelet #576

Open

jackfrancis mentioned this issue Oct 13, 2020

fix: use InternalIP for metrics-server DNS resolution Azure/aks-engine#3929

Merged

8 tasks

searce-aditya mentioned this issue Oct 17, 2022

GKE Private Cluster "v1beta1.custom.metrics.k8s.io" Apiservice showing (FailedDiscoveryCheck) "net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)"" kubernetes-sigs/prometheus-adapter#534

Closed

fbozic mentioned this issue Dec 2, 2022

Metrics-server failing to scrape nodes due to timeout and DNS lookup errors kubernetes/kops#14708

Closed

joaozanuttoqr mentioned this issue Sep 6, 2023

metrics-server deployment failing - failed to scrape node eddycharly/terraform-provider-kops#1095

Open

Metrics server issue with hostname resolution of kubelet and apiserver unable to communicate with metric-server clusterIP #131

Metrics server issue with hostname resolution of kubelet and apiserver unable to communicate with metric-server clusterIP #131

Comments

vikranttkamble commented Sep 3, 2018

MIBc commented Sep 3, 2018

damascenorakuten commented Sep 3, 2018

juan-vg commented Sep 3, 2018

damascenorakuten commented Sep 3, 2018

amolredhat commented Sep 3, 2018 • edited Loading

amolredhat commented Sep 3, 2018

MIBc commented Sep 4, 2018

MIBc commented Sep 4, 2018

amolredhat commented Sep 4, 2018

vikranttkamble commented Sep 4, 2018

juan-vg commented Sep 4, 2018 • edited Loading

kaiterramike commented Sep 5, 2018

DirectXMan12 commented Sep 5, 2018

wilsonjackson commented Sep 7, 2018

DirectXMan12 commented Sep 10, 2018

amolredhat commented Sep 10, 2018

724399396 commented Sep 14, 2018

Demon-DK commented Sep 19, 2018 • edited Loading

DirectXMan12 commented Sep 24, 2018

kidlj commented Sep 26, 2018

xiaotian45123 commented Sep 27, 2018

xiaotian45123 commented Sep 27, 2018

Demon-DK commented Sep 27, 2018 • edited Loading

TracyBin commented Oct 10, 2018

TracyBin commented Oct 10, 2018

shimpikk commented Oct 20, 2018 • edited Loading

shimpikk commented Oct 23, 2018

shimpikk commented Oct 23, 2018

nabheet commented Jul 30, 2019

silverbackdan commented Jul 31, 2019

nabheet commented Jul 31, 2019

nabheet commented Jul 31, 2019

silverbackdan commented Jul 31, 2019

nabheet commented Jul 31, 2019

AdelBouridah commented Oct 29, 2019

electrocucaracha commented Nov 8, 2019

carlylelu commented Nov 12, 2019

serathius commented Nov 12, 2019

lucj commented Nov 12, 2019

weshouman commented Feb 19, 2020

veton commented Apr 23, 2020

progapandist commented Jun 3, 2020

HisokaHous commented Aug 27, 2020

justinaslelys commented Sep 7, 2020

edwardfberliner commented Jan 27, 2021 • edited Loading

QooGeek commented Feb 3, 2021 • edited Loading

edwardfberliner commented Feb 3, 2021

amolredhat commented Sep 3, 2018 •

edited

Loading

juan-vg commented Sep 4, 2018 •

edited

Loading

Demon-DK commented Sep 19, 2018 •

edited

Loading

Demon-DK commented Sep 27, 2018 •

edited

Loading

shimpikk commented Oct 20, 2018 •

edited

Loading

edwardfberliner commented Jan 27, 2021 •

edited

Loading

QooGeek commented Feb 3, 2021 •

edited

Loading