Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metrics server issue with hostname resolution of kubelet and apiserver unable to communicate with metric-server clusterIP #131

Closed
vikranttkamble opened this issue Sep 3, 2018 · 73 comments

Comments

@vikranttkamble
Copy link

Metric-server unable to resolve the hostname to scrape the metrics from kubelet.

E0903  1 manager.go:102] unable to fully collect metrics: [unable to fully scrape metrics from source kubelet_summary:<hostname>: unable to fetch metrics from Kubelet <hostname> (<hostname>): Get https://<hostname>:10250/stats/summary/: dial tcp: lookup <hostname> on 10.96.0.10:53: no such host

I figured its not resolving the hostname from kubedns

and as mentioned in following issues: #105 (comment)
and #97

I did try to edit kubectl -n kube-system edit deploy metrics-server But metrics-server pod entered the error state.

The describe apiservice v1beta1.metrics.k8s.io have message:

no response from https://10.101.248.96:443: Get https://10.101.248.96:443: Proxy Error ( Connection refused )

10.101.248.96 being the clusterIP of the metric-server.

@MIBc
Copy link
Contributor

MIBc commented Sep 3, 2018

@vikranttkamble you can try --kubelet-preferred-address-types InternalIP,Hostname,InternalDNS,ExternalDNS,ExternalIP

@damascenorakuten
Copy link

I'm having the same issue. +1

@juan-vg
Copy link

juan-vg commented Sep 3, 2018

I think the main problem is that the hostname resolution is being performed through the internal DNS server (which is set by default to the pod where the metrics-server runs in). That server contains the pods/services entries, but not the cluster-nodes ones. AFAIK the cluster nodes are not in that scope so they can't be resolved via that DNS. The InternalIP should be queried from the api instead

@damascenorakuten
Copy link

The solution proposed by @MIBc works. Change the metrics-server-deployment.yaml file and add:

        command:
        - /metrics-server
        - --kubelet-preferred-address-types=InternalIP

The metrics-server is now able to talk to the node (It was failing before because it could not resolve the hostname of the node). There's something strange happening though, I can see the metrics now from HPA but it shows an error in the logs:

E0903 11:04:45.914111       1 reststorage.go:98] unable to fetch pod metrics for pod default/my-nginx-84c58b9888-whg8r: no metrics known for pod "default/my-nginx-84c58b9888-whg8r"

@amolredhat
Copy link

amolredhat commented Sep 3, 2018

I and Vikrant is working same servers, We are now able to edit Metric Server from below command.
kubectl -n kube-system edit deploy metrics-server
But we are facing issues with proxy issues.

$ kubectl describe apiservice v1beta1.metrics.k8s.io

Name:         v1beta1.metrics.k8s.io
Namespace:    
Labels:       <none>
Annotations:  <none>
API Version:  apiregistration.k8s.io/v1
Kind:         APIService
Metadata:
  Creation Timestamp:  2018-09-03T12:36:06Z
  Resource Version:    985112
  Self Link:           /apis/apiregistration.k8s.io/v1/apiservices/v1beta1.metrics.k8s.io
  UID:                 ed81fe44-af75-11e8-8333-ac162d793244
Spec:
  Group:                     metrics.k8s.io
  Group Priority Minimum:    100
  Insecure Skip TLS Verify:  true
  Service:
    Name:            metrics-server
    Namespace:       kube-system
  Version:           v1beta1
  Version Priority:  100
Status:
  Conditions:
    Last Transition Time:  2018-09-03T12:36:06Z
    Message:               no response from https://10.101.212.101:443: Get https://10.101.212.101:443: Proxy Error ( Connection refused )
    Reason:                FailedDiscoveryCheck
    Status:                False
    Type:                  Available
Events:                    <none>
kubectl get --raw /apis/metrics.k8s.io/v1beta1/nodes
Error from server (ServiceUnavailable): the server is currently unable to handle the request

@amolredhat
Copy link

In Metrics Servers we are found below logs.

E0903 15:36:38.239003 1 manager.go:102] unable to fully collect metrics: [unable to fully scrape metrics from source kubelet_summary:nvm250d00: unable to fetch metrics from Kubelet nvm250d00 (10.130.X.X): Get https://10.130.X.X:10250/stats/summary/: x509: cannot validate certificate for 10.130.X.X because it doesn't contain any IP SANs, unable to fully scrape metrics from source kubelet_summary:nvmbd1aow270d00: unable to fetch metrics from Kubelet

@MIBc
Copy link
Contributor

MIBc commented Sep 4, 2018

It is OK when set the kubelet flag "--authorization-mode=AlwaysAllow", and metrics-server flag "--kubelet-insecure-tls".

@MIBc
Copy link
Contributor

MIBc commented Sep 4, 2018

I think it should authorize for metrics-server to access kubelet if authorization-mode=webhook.

@amolredhat
Copy link

We also got, SSL issue and get sock connection refused issue, and resolved with below conf parameters in metrics-server-deployment.yaml

containers:
      - name: metrics-server
        image: k8s.gcr.io/metrics-server-amd64:v0.2.1
        command:
        - /metrics-server
        - --source=kubernetes.summary_api:''?useServiceAccount=true&kubeletHttps=true&kubeletPort=10250&insecure=true
        - --requestheader-allowed-names=

We are currently facing proxy issue and working on it.

@vikranttkamble
Copy link
Author

@MIBc --kubelet-preferred-address-types InternalIP,Hostname,InternalDNS,ExternalDNS,ExternalIP this parameter is for proxy issue?

no response from https://10.101.248.96:443: Get https://10.101.248.96:443: Proxy Error ( Connection refused )

Also InternalIP we have to put IP or just the keep it InternalIP?

@juan-vg
Copy link

juan-vg commented Sep 4, 2018

@amolredhat The '--source' flag is unavailable right now (v0.3.0-alpha.1)

I (finally) got it to work by setting the following args:

        command:
        - /metrics-server
        - --kubelet-insecure-tls
        - --kubelet-preferred-address-types=InternalIP

It works like a charm!

@kaiterramike
Copy link

@juan-vg awesome, this also works for me too (metrics-server-amd64:v.0.3.0 on k8s 1.10.3). Btw, so as not to duplicate the entrypoint set in the Dockerfile, consider using args: instead:

        args:
        - --kubelet-insecure-tls
        - --kubelet-preferred-address-types=InternalIP

@DirectXMan12
Copy link
Contributor

If you're going to use InternalIP, you should probably set up your node's serving certs to list the IP as an alternative name. You generally don't want to pass kubelet-insecure-tls except in testing setups.

@wilsonjackson
Copy link

This seems to be a blocking issue for 0.3.0 when running a kops deployment on AWS using a private network topology.

dial tcp: lookup ip-x-x-x-x.us-west-2.compute.internal on 100.64.0.10:53: no such host

Naturally kubedns can't resolve that hostname. I tried setting dnsPolicy: Default in the metrics-server deployment, which skirts the DNS issue, but then I see this:

x509: certificate signed by unknown authority

Not really sure what to do with that. I don't want to start monkeying with my node's certs without knowing exactly what I'm fixing. For now I've had to revert to metrics-server 0.2.1.

@DirectXMan12
Copy link
Contributor

You're the second person to mention issues with kops (#133), so I'm starting to think that kops sets up its certs differently than expected. Basically, the issue is that whatever the kops kubelet serving certs are, they aren't signed by the default kubernetes CA. Can we maybe get a kops maintainer in here to comment?

@amolredhat
Copy link

@wilsonjackson @DirectXMan12
Observed because of proxy, request was not serving Internally, We Configured proxy server on one of the master server with NoProxy configuration for Internal IPs.

And it Worked !!

Also, we changed some parameters in kubernetes/manifests/kube-apiserver.yaml

apiVersion: v1
kind: Pod
metadata:
  annotations:
    scheduler.alpha.kubernetes.io/critical-pod: ""
  creationTimestamp: null
  labels:
    component: kube-apiserver
    tier: control-plane
  name: kube-apiserver
  namespace: kube-system
spec:
  containers:
  - command:
    - kube-apiserver
    - --authorization-mode=Node,RBAC
    #- --authorization-mode=AlwaysAllow
    #- --kubelet_tls_verify=True
    - --advertise-address=MASTERIP
    - --allow-privileged=true
    - --client-ca-file=/etc/kubernetes/pki/ca.crt
    #- --disable-admission-plugins=
    # https://github.com/kubernetes/website/issues/6012, https://kubernetes.io/docs/reference/command-line-tools-reference/kube-apiserver/
    - --enable-admission-plugins=NodeRestriction,DefaultStorageClass,PersistentVolumeClaimResize,PersistentVolumeLabel
    - --enable-bootstrap-token-auth=true
    - --etcd-cafile=/etc/kubernetes/pki/etcd/ca.crt
    - --etcd-certfile=/etc/kubernetes/pki/apiserver-etcd-client.crt
    - --etcd-keyfile=/etc/kubernetes/pki/apiserver-etcd-client.key
    - --etcd-servers=https://127.0.0.1:2379
    - --insecure-port=0
    - --kubelet-client-certificate=/etc/kubernetes/pki/apiserver-kubelet-client.crt
    - --kubelet-client-key=/etc/kubernetes/pki/apiserver-kubelet-client.key
    - --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
    - --proxy-client-cert-file=/etc/kubernetes/pki/front-proxy-client.crt
    - --proxy-client-key-file=/etc/kubernetes/pki/front-proxy-client.key
    - --requestheader-allowed-names=
    - --requestheader-client-ca-file=/etc/kubernetes/pki/front-proxy-ca.crt


@724399396
Copy link

I temporary fix this by edit metric-service deployemnt add this cofnig unser Deployemnt.spec.template.spec

      hostAliases:
      - hostnames:
        - k8s-master1
        ip: xxxxx
      - hostnames:
        - k8s-node1
        ip: yyyyy
      - hostnames:
        - k8s-node2
        ip: zzzzz

@Demon-DK
Copy link

Demon-DK commented Sep 19, 2018

Obviously the issue in
lookup <hostname-ip> in <dns-service-ip>..... no such host

In my case coreDNS is using for cluster DNS resolutions.
By default coreDNS ( in my case deployed with Kubespray) is set only for services name resolution and not pods/nodes.

Then we could look at default config for coreDNS

kind: ConfigMap
metadata:  name: coredns
  namespace: kube-system
data:  Corefile: |
    .:53 {
        errors
        log stdout
        health
        kubernetes cluster.local {
          cidrs 10.3.0.0/24
        }
        proxy . /etc/resolv.conf
        cache 30
    }

This option proxy . /etc/resolv.conf in general is meaning that your DNS service will use your external nameserver (in my case there were defined external nameservers) for node name resolution.

So, yes, I've looked in my DNS logs and found that my DNS were received that requests.

Eventually, I've just added my node's hostnames records to my external DNS service and that's it.
Metrics are collecting successfully.

@DirectXMan12
Copy link
Contributor

Awesome. I'm going to close this issue, but feel free to ping me if you think it's not solved yet.

@kidlj
Copy link

kidlj commented Sep 26, 2018

Obviously the issue in
lookup <hostname-ip> in <dns-service-ip>..... no such host

In my case coreDNS is using for cluster DNS resolutions.
By default coreDNS ( in my case deployed with Kubespray) is set only for services name resolution and not pods/nodes.

Then we could look at default config for coreDNS

kind: ConfigMap
metadata:  name: coredns
  namespace: kube-system
data:  Corefile: |
    .:53 {
        errors
        log stdout
        health
        kubernetes cluster.local {
          cidrs 10.3.0.0/24
        }
        proxy . /etc/resolv.conf
        cache 30
    }

This option proxy . /etc/resolv.conf in general is meaning that your DNS service will use your external nameserver (in my case there were defined external nameservers) for node name resolution.

So, yes, I've looked in my DNS logs and found that my DNS were received that requests.

Eventually, I've just added my node's hostnames records to my external DNS service and that's it.
Metrics are collecting successfully.

Hi, I'm using kube-dns instead of coredns, and I have my node's /etc/hosts set properly, and it still fails:

E0926 11:30:18.620009       1 manager.go:102] unable to fully collect metrics: unable to fully scrape metrics from source kubelet_summary:kube: unable to fetch metrics from Kubelet kube (kube): Get https://kube:10250/stats/summary/: dial tcp: lookup kube on 10.96.0.10:53: no such host

@xiaotian45123
Copy link

显然问题在于
lookup <hostname-ip> in <dns-service-ip>..... no such host

就我而言,coreDNS用于群集DNS解析。
默认情况下,coreDNS(在我的情况下使用Kubespray部署)仅设置用于服务名称解析而不是pods / nodes。

然后我们可以查看coreDNS的默认配置

kind: ConfigMap
metadata:  name: coredns
  namespace: kube-system
data:  Corefile: |
    .:53 {
        errors
        log stdout
        health
        kubernetes cluster.local {
          cidrs 10.3.0.0/24
        }
        proxy . /etc/resolv.conf
        cache 30
    }

此选项proxy . /etc/resolv.conf通常意味着您的DNS服务将使用您的外部名称服务器(在我的情况下,已定义外部名称服务器)来进行节点名称解析。

所以,是的,我查看了我的DNS日志,发现我的DNS收到了请求。

最后,我刚刚将我的节点的主机名记录添加到我的外部DNS服务中,就是这样
度量标准已成功收集。

The host uses /etc/hosts to parse. How to handle it better?

@xiaotian45123
Copy link

coredns,并且我已正确设置节点的/ etc / hosts,但它仍然失败:

问题解决了吗?

@Demon-DK
Copy link

Demon-DK commented Sep 27, 2018

Hi, I'm using kube-dns instead of coredns, and I have my node's /etc/hosts set properly, and it still fails:

E0926 11:30:18.620009       1 manager.go:102] unable to fully collect metrics: unable to fully scrape metrics from source kubelet_summary:kube: unable to fetch metrics from Kubelet kube (kube): Get https://kube:10250/stats/summary/: dial tcp: lookup kube on 10.96.0.10:53: no such host

Hi,
I'd recommend just start with making things more clear

$ kubectl exec -it -n <metrics-server-namespace> metrics-server-xxxx -- sh
/ # nslookup kube

** because in your logs requests are making to https://kube:<port>/bla/bla/bla

I can assume your nslookup request will fail.
If I right you have to investigate your cluster DNS settings and then this is not related metrics-server issue.

@TracyBin
Copy link

If you're going to use InternalIP, you should probably set up your node's serving certs to list the IP as an alternative name. You generally don't want to pass kubelet-insecure-tls except in testing setups.
@DirectXMan12 How to config node using two hostnames?

@TracyBin
Copy link

@originsmike Another problem after modify tls and internalIP

[root@192 ~]# docker logs -f fa55e7f7343a
I1010 10:40:01.108023       1 serving.go:273] Generated self-signed cert (apiserver.local.config/certificates/apiserver.crt, apiserver.local.config/certificates/apiserver.key)
[restful] 2018/10/10 10:40:32 log.go:33: [restful/swagger] listing is available at https://:443/swaggerapi
[restful] 2018/10/10 10:40:32 log.go:33: [restful/swagger] https://:443/swaggerui/ is mapped to folder /swagger-ui/
I1010 10:40:33.308883       1 serve.go:96] Serving securely on [::]:443
I1010 10:40:33.609544       1 logs.go:49] http: TLS handshake error from 172.20.0.1:49456: EOF
E1010 10:41:02.208299       1 manager.go:102] unable to fully collect metrics: [unable to fully scrape metrics from source kubelet_summary:master: unable to fetch metrics from Kubelet master (192.168.8.184): request failed - "403 Forbidden", response: "Forbidden (user=system:serviceaccount:kube-system:metrics-server, verb=get, resource=nodes, subresource=stats)", unable to fully scrape metrics from source kubelet_summary:node: unable to fetch metrics from Kubelet node (192.168.8.185): request failed - "403 Forbidden", response: "Forbidden (user=system:serviceaccount:kube-system:metrics-server, verb=get, resource=nodes, subresource=stats)"]
E1010 10:41:32.116815       1 manager.go:102] unable to fully collect metrics: [unable to fully scrape metrics from source kubelet_summary:node: unable to fetch metrics from Kubelet node (192.168.8.185): request failed - "403 Forbidden", response: "Forbidden (user=system:serviceaccount:kube-system:metrics-server, verb=get, resource=nodes, subresource=stats)", unable to fully scrape metrics from source kubelet_summary:master: unable to fetch metrics from Kubelet master (192.168.8.184): request failed - "403 Forbidden", response: "Forbidden (user=system:serviceaccount:kube-system:metrics-server, verb=get, resource=nodes, subresource=stats)"]

@shimpikk
Copy link

shimpikk commented Oct 20, 2018

I am facing bit different issue here. Don't know if it is metrics server problem or api server. But thought of posting here. Please see below command output and logs and let me know whts wrong.

I believe ApiServer is not able to contact metrics-server over internal IPs.

# kubectl -n kube-system describe apiservice v1beta1.metrics.k8s.io
Name: v1beta1.metrics.k8s.io
Namespace:
Labels:
Annotations: kubectl.kubernetes.io/last-applied-configuration:
{"apiVersion":"apiregistration.k8s.io/v1beta1","kind":"APIService","metadata":{"annotations":{},"name":"v1beta1.metrics.k8s.io"},"spec":{"...
API Version: apiregistration.k8s.io/v1
Kind: APIService
Metadata:
Creation Timestamp: 2018-10-20T16:10:05Z
Resource Version: 638754
Self Link: /apis/apiregistration.k8s.io/v1/apiservices/v1beta1.metrics.k8s.io
UID: 9b8e655c-d482-11e8-9794-0050569160a8
Spec:
Group: metrics.k8s.io
Group Priority Minimum: 100
Insecure Skip TLS Verify: true
Service:
Name: metrics-server
Namespace: kube-system
Version: v1beta1
Version Priority: 100
Status:
Conditions:
Last Transition Time: 2018-10-20T16:10:05Z
Message: no response from https://10.99.121.153:443: Get https://10.99.121.153:443: dial tcp 10.99.121.153:443: connect: no route to host
Reason: FailedDiscoveryCheck
Status: False
Type: Available
Events:

# kubectl -n kube-system get svc -o wide
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
kube-dns ClusterIP 10.96.0.10 53/UDP,53/TCP 2d8h k8s-app=kube-dns
kubernetes-dashboard NodePort 10.97.142.189 80:31378/TCP 5d7h k8s-app=kubernetes-dashboard
metrics-server ClusterIP 10.99.121.153 443/TCP 104m k8s-app=metrics-server

Logs from APIServer

E1020 17:56:51.069077 1 available_controller.go:311] v1beta1.metrics.k8s.io failed with: Get https://10.99.121.153:443: dial tcp 10.99.121.153:443: connect: no route to host
I1020 17:56:52.046598 1 controller.go:105] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io
E1020 17:56:52.046777 1 controller.go:111] loading OpenAPI spec for "v1beta1.metrics.k8s.io" failed with: failed to retrieve openAPI spec, http error: ResponseCode: 503, Body: service unavailable
, Header: map[Content-Type:[text/plain; charset=utf-8] X-Content-Type-Options:[nosniff]]
I1020 17:56:52.046803 1 controller.go:119] OpenAPI AggregationController: action for item v1beta1.metrics.k8s.io: Rate Limited Requeue.
E1020 17:56:52.250466 1 memcache.go:134] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request

@shimpikk
Copy link

I temporary fix this by edit metric-service deployemnt add this cofnig unser Deployemnt.spec.template.spec

      hostAliases:
      - hostnames:
        - k8s-master1
        ip: xxxxx
      - hostnames:
        - k8s-node1
        ip: yyyyy
      - hostnames:
        - k8s-node2
        ip: zzzzz

Would you please provide some details on how to do it?

@shimpikk
Copy link

@DirectXMan12 ,
Do you have any inputs for the issue above?
I have created a cluster on centos 7.

@nabheet
Copy link

nabheet commented Jul 30, 2019

@silverbackdan Hopefully I can help. I assume you already know about CSRs. You can start with https://kubernetes.io/docs/tasks/tls/managing-tls-in-a-cluster/#create-a-certificate-signing-request-object-to-send-to-the-kubernetes-api.

Once you approve and download the cert, put the private key from your CSR in /var/lib/kubelet/pki/kubelet.key and the cert in /var/lib/kubelet/pki/kubelet.crt. Then restart kubelet service.

service kubelet restart (or systemctl)

you are good to go.

@silverbackdan
Copy link

Thanks very much for this info. kubectl get csr is a new one to me and that link is very helpful about CSRs - I know about them but not so much in the context outlined in that link.

Just to check, I'd be replacing the key and cert already at those paths because they do exist. I imagine they are just self-signed which is what is being used at the moment and therefore is seen as insecure?

@nabheet
Copy link

nabheet commented Jul 31, 2019

Yes, that is correct. They are self-signed certs and you will be replacing them with certs signed by your Kubernetes cluster CA. I wish the kubeadm join command would do this but IIRC I think there is a chicken and egg problem here. I am not sure exactly how but for some reason "they" decided to not implement this replacement feature. I think there is a closed issue in the K8S repo regarding this too. I can try to find it if needed. But hopefully, you should be set now.

@nabheet
Copy link

nabheet commented Jul 31, 2019

Also, I wonder if this would avoid the issue - https://kubernetes.io/docs/tasks/tls/certificate-rotation/.

@silverbackdan
Copy link

Perhaps, specifically the --bootstrap-kubeconfig flag https://kubernetes.io/docs/tasks/tls/certificate-rotation/#understanding-the-certificate-rotation-configuration

I had to run this command to change the cert SANs for the API and to listen on different IPs

kubeadm init phase certs all \
  --apiserver-advertise-address=0.0.0.0 \
  --apiserver-cert-extra-sans=10.244.0.1,11.0.0.10,example.com

It'd have been nice if that command could also sign the cert with the cluster's CA, anyway, I see where things are going wrong now and can use your very helpful notes to take me further.

@nabheet
Copy link

nabheet commented Jul 31, 2019

Nice! I was reading that same page too. There might be more learnings on the page you linked for me too. Queued for when I have some learning time. :-)

@AdelBouridah
Copy link

I have the same issue resolved by adding :
command:
- /metrics-server
- --kubelet-insecure-tls
- --kubelet-preferred-address-types=InternalIP

Whereas the metrics server now return only used ressources for The master node (always unknown for workers node).

NB. Deployment done with vagrant (hence kubebadm).

Please have you any hint to solve it

@electrocucaracha
Copy link

I'm still getting the error, I'm using Kubespray v2.11.0 to deploy a Kubernetes v1.15.3 with the following instruction:

helm install stable/metrics-server --set args[0]="--kubelet-insecure-tls" --set args[1]="--kubelet-preferred-address-types=InternalIP" --name metrics-server

@carlylelu
Copy link

The solution proposed by @MIBc works. Change the metrics-server-deployment.yaml file and add:

        command:
        - /metrics-server
        - --kubelet-preferred-address-types=InternalIP

The metrics-server is now able to talk to the node (It was failing before because it could not resolve the hostname of the node). There's something strange happening though, I can see the metrics now from HPA but it shows an error in the logs:

E0903 11:04:45.914111       1 reststorage.go:98] unable to fetch pod metrics for pod default/my-nginx-84c58b9888-whg8r: no metrics known for pod "default/my-nginx-84c58b9888-whg8r"

The solution proposed by @MIBc works. Change the metrics-server-deployment.yaml file and add:

        command:
        - /metrics-server
        - --kubelet-preferred-address-types=InternalIP

The metrics-server is now able to talk to the node (It was failing before because it could not resolve the hostname of the node). There's something strange happening though, I can see the metrics now from HPA but it shows an error in the logs:

E0903 11:04:45.914111       1 reststorage.go:98] unable to fetch pod metrics for pod default/my-nginx-84c58b9888-whg8r: no metrics known for pod "default/my-nginx-84c58b9888-whg8r"

I just got the same situation on my metrics-server,unable to fetch pod metrics for pod default/xxxxxxxx: no metrics known for pod "default/xxxxxxxx" @damascenorakuten
Do you fix it already?

@serathius
Copy link
Contributor

@carlylelu appirence of this log does not mean that there is a problem, nor points to possible cause.
More info #349 (comment)
This Issue is about host resolution. Example log dial tcp: lookup <hostname> on 10.96.0.10:53: no such host

@lucj
Copy link

lucj commented Nov 12, 2019

Got the same issue (host resolution) when installing server-metrics in a DO cluster. I think I did not have the issue a couple of days ago :(

@weshouman
Copy link

One small detail that may get overlooked, in case a proxy is used, that could be caused by forgetting to add the 10.0.0.0/8 to the no_proxy list.

@veton
Copy link

veton commented Apr 23, 2020

In recent versions of metrics-server, where there is no "command" or "metrics-server-deployment.yaml", the following helped me

  1. Open deployment editor
    kubectl -n kube-system edit deploy metrics-server
  2. Add a few args:
      - args:
        - --cert-dir=/tmp
        - --secure-port=4443
        - --v=2
        - --kubelet-insecure-tls
        - --kubelet-preferred-address-types=InternalIP

@progapandist
Copy link

@veton 's answer is the most on-point and up to date, thank you! 🙏

@HisokaHous
Copy link

In recent versions of metrics-server, where there is no "command" or "metrics-server-deployment.yaml", the following helped me

  1. Open deployment editor
    kubectl -n kube-system edit deploy metrics-server
  2. Add a few args:
      - args:
        - --cert-dir=/tmp
        - --secure-port=4443
        - --v=2
        - --kubelet-insecure-tls
        - --kubelet-preferred-address-types=InternalIP

Thanks, that solved my problem too

@justinaslelys
Copy link

Adding --kubelet-preferred-address-types=InternalIP flag helped me to fix metrics-server 0.3.6 after enabling NodeLocal DNS Cache.

@edwardfberliner
Copy link

edwardfberliner commented Jan 27, 2021

I temporary fix this by edit metric-service deployemnt add this cofnig unser Deployemnt.spec.template.spec

      hostAliases:
      - hostnames:
        - k8s-master1
        ip: xxxxx
      - hostnames:
        - k8s-node1
        ip: yyyyy
      - hostnames:
        - k8s-node2
        ip: zzzzz

I tried this suggestion and it worked for me.

My problem:
All my nodes have same InternalIP (10.0.2.15) due to VirtualBox/NAT configuration so with --kubelet-preferred-address-types=InternalIP my metrics-server could only query it's own node. I then tried --kubelet-preferred-address-types=Hostname
but DNS was incorrectly resolving the hostname (when I used nslookup on the DNS server it returned an incorrect IP address so I don't know what is wrong with my configuration). There are other solutions to use a NodePort or a LoadBalancer but they don't seem applicable to my situation since the metric-server is already inside the cluster, and they are more complicated as well.

Here is my working yaml file:
metrics-server-components.zip

You can compare to the one in GitHub

Note: the IP addresses I used for the hostAliases are the ones that were assigned to the nodes for the pod network by Flannel, so they are reachable from all of the containers.

@QooGeek
Copy link

QooGeek commented Feb 3, 2021

[root@k8s-master1:~]# kubectl get svc -n kube-system -o wide
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
kube-dns ClusterIP 10.96.0.10 53/UDP,53/TCP,9153/TCP 20d k8s-app=kube-dns
metrics-server ClusterIP 10.99.131.42 443/TCP 20d k8s-app=metrics-server

[root@k8s-master1:~]# kubectl describe apiservice v1beta1.metrics.k8s.io
Name: v1beta1.metrics.k8s.io
Namespace:
Labels:
Annotations:
API Version: apiregistration.k8s.io/v1
Kind: APIService
Metadata:
Creation Timestamp: 2021-01-14T05:44:07Z
Resource Version: 6313248
Self Link: /apis/apiregistration.k8s.io/v1/apiservices/v1beta1.metrics.k8s.io
UID: 09c2cee7-af53-4920-bd7e-1617c4068da3
Spec:
Group: metrics.k8s.io
Group Priority Minimum: 100
Insecure Skip TLS Verify: true
Service:
Name: metrics-server
Namespace: kube-system
Port: 443
Version: v1beta1
Version Priority: 100
Status:
Conditions:
Last Transition Time: 2021-01-14T05:44:07Z
Message: failing or missing response from https://10.99.131.42:443/apis/metrics.k8s.io/v1beta1: Get "https://10.99.131.42:443/apis/metrics.k8s.io/v1beta1": dial tcp 10.99.131.42:443: connect: connection refused
Reason: FailedDiscoveryCheck
Status: False
Type: Available
Events:

[root@k8s-master1:~]# kubectl top pods
Error from server (ServiceUnavailable): the server is currently unable to handle the request (get pods.metrics.k8s.io)

[root@k8s-master1:~]# kubectl logs -f -n kube-system metrics-server-69fbcb864b-jvkkr
[..........]
I0203 10:38:52.792663 1 round_trippers.go:443] GET https://k8s-master1:10250/stats/summary?only_cpu_and_memory=true 200 OK in 29 milliseconds
I0203 10:38:52.823497 1 round_trippers.go:443] GET https://k8s-node1:10250/stats/summary?only_cpu_and_memory=true 200 OK in 54 milliseconds
I0203 10:38:52.842695 1 round_trippers.go:443] GET https://k8s-node2:10250/stats/summary?only_cpu_and_memory=true 200 OK in 70 milliseconds
I0203 10:38:52.844152 1 scraper.go:168] ScrapeMetrics: time: 101.319542ms, nodes: 4, pods: 60
I0203 10:38:52.844195 1 server.go:138] ...Storing metrics...
I0203 10:38:52.844264 1 server.go:143] ...Cycle complete

[root@k8s-master1:~]# cat metrics-server-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: metrics-server
namespace: kube-system
labels:
k8s-app: metrics-server
spec:
selector:
matchLabels:
k8s-app: metrics-server
template:
metadata:
name: metrics-server
labels:
k8s-app: metrics-server
spec:
serviceAccountName: metrics-server
volumes:
# mount in tmp so we can safely use from-scratch images and/or read-only containers
- name: tmp-dir
emptyDir: {}
containers:
- name: metrics-server
image: ip/library/metrics-server:v0.4.1
imagePullPolicy: IfNotPresent
args:
- --v=6
- --cert-dir=/tmp
- --secure-port=4443
- --kubelet-preferred-address-types=Hostname
- --kubelet-use-node-status-port
- --kubelet-insecure-tls
ports:
- name: main-port
containerPort: 4443
protocol: TCP
securityContext:
readOnlyRootFilesystem: true
runAsNonRoot: true
runAsUser: 1000
volumeMounts:
- name: tmp-dir
mountPath: /tmp
hostAliases:
- hostnames:
- k8s-master1
ip: 192.xx.x.x
- hostnames:
- k8s-node1
ip: 192.x.x.x
- hostnames:
- k8s-node2
ip: 192.x.x.x
- hostnames:
- k8s-node3
ip: 192.x.x.x
nodeSelector:
kubernetes.io/os: linux

@edwardfberliner
Copy link

some other things to try:

verify that 192.x.x.x IP addresses are in pod network; check kube-controller-manager process: "--cluster-cidr=" option which was configured during kubeadm init with "--pod-network-cidr" option

verify that pod network was configured to use correct interface in each node (e.g. flanneld command has an "--iface option); I had to change my daemonset/kube-flannel-ds configuration to use the flannel (pod) network interface; otherwise it would default to an internal (VirtualBox) network interface

check your kube-proxy process; if you see "--hostname-override=...", try removing this argument from the daemonset/kube-proxy yaml configuration

try setting "- --enable-aggregator-routing=true" in kube-apiserver

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests