Kube UI not working #1367

andreimc · 2016-04-20T12:32:35Z

I get this when I spin up a new cluster.

Internal Server Error (500)

Get https://10.254.0.1:443/api/v1/replicationcontrollers: dial tcp 10.254.0.1:443: getsockopt: connection refused

- Ansible version (1.9.4): - Python version (2.79): - Git commit hash or branch: a86bf60 - Cloud Environment: Vagrant - Terraform version (0.6.4.11):

The text was updated successfully, but these errors were encountered:

ryane · 2016-04-21T13:11:54Z

I am also having a problem with the UI on AWS. But, instead of the 500 error, I am getting a 504 Gateway Time-out error from the nginx proxy.

andreimc · 2016-04-21T13:40:36Z

@ryane I also get that, I think it's something to do with the kubeworker. Sometimes it comes up other times not really.

ryane · 2016-04-22T11:57:38Z

This appears to be the source of the problem (at least on AWS):

[Service]
ExecStart=/usr/bin/kubelet \
  --api-servers=http://localhost:8085 \
  --register-schedulable=false \
  --hostname-override={{ ansible_hostname }} \
  ...

On AWS, ansible_hostname ends up being something like ip-10-1-1-31. But, this is not resolvable within the cluster:

$ cat /etc/hostname
ip-10-1-1-31.ec2.internal

$ host ip-10-1-1-31.ec2.internal
ip-10-1-1-31.ec2.internal has address 10.1.1.31

$ host ip-10-1-1-31
Host ip-10-1-1-31 not found: 3(NXDOMAIN)

Open questions:

Do we need the hostname-override option? What hostname is used without it? If it is needed, is there a safer variable (inventory_hostname, maybe?) that we can use instead of ansible_hostname?

Why isn't ip-10-1-1-31 resolvable in DNS?

$ sudo yum list installed mantl-dns -q
Installed Packages
mantl-dns.x86_64                                                                                1.1.0-1.centos                                                                                @asteris-mantl-rpm

$ cat /etc/resolv.conf.mantl-dns
# this config installed by mantl-dns

# search is added here so that the system can address nodes by their simplest
# name (x.node.consul becomes x.node or simply x.) This will work for service
# addressing as well, so you can use y.service instead of y.service.consul
options ndots:2
search node.consul consul
nameserver 127.0.0.1

$ cat /etc/resolv.conf.masq
; generated by /usr/sbin/dhclient-script
search ec2.internal
nameserver 10.1.0.2

set persistent, friendly hostname #1374 resolves the problem without any other changes. And, perhaps might fix other things (elasticsearch-executor fails to load on a worker node. #1195). Is setting the hostname this way a good option? Will it cause other problems?

ping @BrianHicks @Zogg @stevendborrelli

BrianHicks · 2016-04-22T12:58:00Z

The Kubernetes integration brings in some older code that the core team didn't write. It's entirely possible that I missed something when upgrading it. #1374 seems like the best solution, it does need to be resolvable.

ryane · 2016-04-22T13:49:20Z

I'm thinking we should also change the service file to use inventory_hostname also so that it is consistent.

Any idea why DNS is not working for the internal AWS hostname?

BrianHicks · 2016-04-22T13:54:56Z

Yes, sorry, that was my intention: we should mirror the behavior of the other PR. As for the DNS failure, that’s bizarre. Have you tried putting DNSMasq into debug mode and watching as you try to resolve? It’s got pretty good logging for dealing with these kinds of issues.

ryane · 2016-04-22T14:17:10Z

$ sudo /usr/sbin/dnsmasq -d
dnsmasq: started, version 2.66 cachesize 150
dnsmasq: compile time options: IPv6 GNU-getopt DBus no-i18n IDN DHCP DHCPv6 no-Lua TFTP no-conntrack ipset auth
dnsmasq: using nameserver 127.0.0.1#153 for domain cluster.local
dnsmasq: using nameserver 127.0.0.1#8600 for domain consul
dnsmasq: reading /etc/resolv.conf.masq
dnsmasq: using nameserver 10.1.0.2#53
dnsmasq: using nameserver 127.0.0.1#153 for domain cluster.local
dnsmasq: using nameserver 127.0.0.1#8600 for domain consul
dnsmasq: read /etc/hosts - 12 addresses
dnsmasq: read /etc/hosts - 12 addresses
dnsmasq: read /etc/hosts - 12 addresses
dnsmasq: time 1461334116
dnsmasq: cache size 150, 0/0 cache insertions re-used unexpired cache entries.
dnsmasq: queries forwarded 2, queries answered locally 1
dnsmasq: server 127.0.0.1#8600: queries sent 2, retried or failed 0
dnsmasq: server 127.0.0.1#153: queries sent 0, retried or failed 0
dnsmasq: server 10.1.0.2#53: queries sent 0, retried or failed 0

Is it that queries are not getting forwarded to the internal dns servers (only consul) for some reason?

On GCE, I see something like this in the logs:

dnsmasq: server 127.0.0.1#8600: queries sent 3, retried or failed 0
dnsmasq: server 127.0.0.1#153: queries sent 0, retried or failed 0
dnsmasq: server 169.254.169.254#53: queries sent 12, retried or failed 0

BrianHicks · 2016-04-22T14:46:15Z

Is that when you try and resolve {ip}.ec2.internal? IIRC debugging mode logs every query, the results, and where they came from. Did you try that?

ryane · 2016-04-22T14:49:58Z

RE: #1346 (comment)

ah! My cluster is down now but I'll try that in a bit.

ryane · 2016-04-22T16:03:20Z

$ sudo /usr/sbin/dnsmasq -d -q
dnsmasq: started, version 2.66 cachesize 150
dnsmasq: compile time options: IPv6 GNU-getopt DBus no-i18n IDN DHCP DHCPv6 no-Lua TFTP no-conntrack ipset auth
dnsmasq: using nameserver 127.0.0.1#153 for domain cluster.local
dnsmasq: using nameserver 127.0.0.1#8600 for domain consul
dnsmasq: reading /etc/resolv.conf.masq
dnsmasq: using nameserver 10.1.0.2#53
dnsmasq: using nameserver 127.0.0.1#153 for domain cluster.local
dnsmasq: using nameserver 127.0.0.1#8600 for domain consul
dnsmasq: read /etc/hosts - 12 addresses
dnsmasq: read /etc/hosts - 12 addresses
dnsmasq: read /etc/hosts - 12 addresses
dnsmasq: query[A] ip-10-1-1-125.node.consul from 127.0.0.1
dnsmasq: forwarded ip-10-1-1-125.node.consul to 127.0.0.1
dnsmasq: query[A] ip-10-1-1-125.consul from 127.0.0.1
dnsmasq: forwarded ip-10-1-1-125.consul to 127.0.0.1
dnsmasq: query[A] ip-10-1-1-125 from 127.0.0.1
dnsmasq: config ip-10-1-1-125 is NODATA-IPv4
dnsmasq: query[AAAA] ip-10-1-1-125 from 127.0.0.1
dnsmasq: config ip-10-1-1-125 is NODATA-IPv6
dnsmasq: query[MX] ip-10-1-1-125 from 127.0.0.1
dnsmasq: forwarded ip-10-1-1-125 to 10.1.0.2
dnsmasq: time 1461340612
dnsmasq: cache size 150, 0/0 cache insertions re-used unexpired cache entries.
dnsmasq: queries forwarded 3, queries answered locally 2
dnsmasq: server 127.0.0.1#8600: queries sent 2, retried or failed 0
dnsmasq: server 127.0.0.1#153: queries sent 0, retried or failed 0
dnsmasq: server 10.1.0.2#53: queries sent 1, retried or failed 0
dnsmasq: Host                                     Address                        Flags     Expires
dnsmasq: localhost.localdomain                    ::1                            6F I   H
dnsmasq: localhost.localdomain                    127.0.0.1                      4F I   H
dnsmasq: resching-aws-control-03                  10.1.3.152                     4FRI   H
dnsmasq: resching-aws-kubeworker-001              10.1.1.125                     4FRI   H
dnsmasq: resching-aws-control-02                  10.1.2.170                     4FRI   H
dnsmasq: resching-aws-control-01                  10.1.1.204                     4FRI   H
dnsmasq: resching-aws-worker-002                  10.1.2.208                     4FRI   H
dnsmasq: resching-aws-worker-003                  10.1.3.240                     4FRI   H
dnsmasq: resching-aws-worker-001                  10.1.1.229                     4FRI   H
dnsmasq: resching-aws-worker-004                  10.1.1.36                      4FRI   H
dnsmasq: localhost6.localdomain6                  ::1                            6F I   H
dnsmasq: localhost4                               127.0.0.1                      4F I   H
dnsmasq: localhost                                ::1                            6FRI   H
dnsmasq: localhost                                127.0.0.1                      4FRI   H
dnsmasq: resching-aws-edge-01                     10.1.1.232                     4FRI   H
dnsmasq: localhost6                               ::1                            6F I   H
dnsmasq: localhost4.localdomain4                  127.0.0.1                      4F I   H
dnsmasq: resching-aws-kubeworker-002              10.1.2.243                     4FRI   H

$ dig ip-10-1-1-125 @10.1.0.2

; <<>> DiG 9.9.4-RedHat-9.9.4-29.el7_2.3 <<>> ip-10-1-1-125 @10.1.0.2
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 21487
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;ip-10-1-1-125.                 IN      A

;; AUTHORITY SECTION:
.                       49      IN      SOA     a.root-servers.net. nstld.verisign-grs.com. 2016042200 1800 900 604800 86400

;; Query time: 0 msec
;; SERVER: 10.1.0.2#53(10.1.0.2)
;; WHEN: Fri Apr 22 16:00:24 UTC 2016
;; MSG SIZE  rcvd: 117

$ dig ip-10-1-1-125.ec2.internal @10.1.0.2

; <<>> DiG 9.9.4-RedHat-9.9.4-29.el7_2.3 <<>> ip-10-1-1-125.ec2.internal @10.1.0.2
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 59043
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;ip-10-1-1-125.ec2.internal.    IN      A

;; ANSWER SECTION:
ip-10-1-1-125.ec2.internal. 20  IN      A       10.1.1.125

;; Query time: 1 msec
;; SERVER: 10.1.0.2#53(10.1.0.2)
;; WHEN: Fri Apr 22 16:00:33 UTC 2016
;; MSG SIZE  rcvd: 71

I guess the internal aws nameservers just don't resolve the short ip-10-1-1-125 hostnames. Do you interpret that the same way?

BrianHicks · 2016-04-22T16:09:36Z

The nameservers don't resolve it because dig doesn't respect search paths. And in this case, even with +search dig would be reading from /etc/resolv.conf instead of /etc/resolv.conf.masq so it's not a fair test.

I do see that dnsmasq forwarded the IP to the upstream nameserver in the log:

dnsmasq: forwarded ip-10-1-1-125 to 10.1.0.2

But it doesn't have any information about how it applied the search paths, which are clearly not being applied if the query is failing.

This is turning into two different issues at this point. We need to change the Kubelet hostname to the inventory name instead of the IP, but we also need to fix this search config in dnsmasq.

ryane · 2016-04-22T16:53:08Z

ah, makes sense.

We already have #1376 for the kubelet hostname which can hopefully close out this issue (although the AWS problem looks to be slightly different than the problem @andreimc is experiencing on Vagrant) . I opened #1377 for the dnsmasq issue.

andreimc · 2016-04-23T00:16:35Z

@ryane I am happy if you close this issue I think if the search configuration is updated it will also work in Vagrant

stevendborrelli · 2016-05-03T01:56:51Z

After applying 1374, I can bring up that endpoint.

ryane · 2016-05-03T21:58:39Z

aws problem resolved in #1374. DNS fix is tracked in #1377. Kubernetes on Vagrant issues are being tracked in #1365. going to close this one

ryane added the core/kubernetes label Apr 21, 2016

ryane modified the milestone: 1.1 Apr 22, 2016

ryane mentioned this issue Apr 22, 2016

update kubelet hostname override #1376

Merged

3 tasks

BrianHicks mentioned this issue Apr 22, 2016

nginx-consul not running on kube workers #1346

Closed

This was referenced Apr 22, 2016

set persistent, friendly hostname #1374

Merged

dnsmasq search configuration #1377

Closed

ryane closed this as completed May 3, 2016

ryane mentioned this issue May 4, 2016

mantl-dns: dnsmasq search configuration mantl/mantl-packaging#84

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kube UI not working #1367

Kube UI not working #1367

andreimc commented Apr 20, 2016

ryane commented Apr 21, 2016

andreimc commented Apr 21, 2016

ryane commented Apr 22, 2016

BrianHicks commented Apr 22, 2016

ryane commented Apr 22, 2016

BrianHicks commented Apr 22, 2016 via email

ryane commented Apr 22, 2016

BrianHicks commented Apr 22, 2016

ryane commented Apr 22, 2016

ryane commented Apr 22, 2016

BrianHicks commented Apr 22, 2016

ryane commented Apr 22, 2016

andreimc commented Apr 23, 2016

stevendborrelli commented May 3, 2016

ryane commented May 3, 2016

Kube UI not working #1367

Kube UI not working #1367

Comments

andreimc commented Apr 20, 2016

ryane commented Apr 21, 2016

andreimc commented Apr 21, 2016

ryane commented Apr 22, 2016

BrianHicks commented Apr 22, 2016

ryane commented Apr 22, 2016

BrianHicks commented Apr 22, 2016 via email

ryane commented Apr 22, 2016

BrianHicks commented Apr 22, 2016

ryane commented Apr 22, 2016

ryane commented Apr 22, 2016

BrianHicks commented Apr 22, 2016

ryane commented Apr 22, 2016

andreimc commented Apr 23, 2016

stevendborrelli commented May 3, 2016

ryane commented May 3, 2016