Cluster DNS is broken #503

CecileRobertMichon · 2020-04-07T01:00:03Z

From conformance tests:

 [Fail] [sig-network] DNS [It] should provide DNS for the cluster  [Conformance]
test/e2e/network/dns_common.go:556

STEP: Running these commands on jessie: for i in `seq 1 600`; do check="$$(dig +notcp +noall +answer +search kubernetes.default.svc.cluster.local A)" && test -n "$$check" && echo OK > /results/[email protected];check="$$(dig +tcp +noall +answer +search kubernetes.default.svc.cluster.local A)" && test -n "$$check" && echo OK > /results/[email protected];podARec=$$(hostname -i| awk -F. '{print $$1"-"$$2"-"$$3"-"$$4".dns-2961.pod.cluster.local"}');check="$$(dig +notcp +noall +answer +search $${podARec} A)" && test -n "$$check" && echo OK > /results/jessie_udp@PodARecord;check="$$(dig +tcp +noall +answer +search $${podARec} A)" && test -n "$$check" && echo OK > /results/jessie_tcp@PodARecord;sleep 1; done
STEP: creating a pod to probe DNS
STEP: submitting the pod to kubernetes
STEP: retrieving the pod
STEP: looking for the results for each expected name from probers
Apr  6 15:45:53.045: INFO: Unable to read [email protected] from pod dns-2961/dns-test-94072041-586f-43b0-a5a1-7161fad8b35c: the server is currently unable to handle the request (get pods dns-test-94072041-586f-43b0-a5a1-7161fad8b35c)
Apr  6 15:46:23.111: INFO: Unable to read [email protected] from pod dns-2961/dns-test-94072041-586f-43b0-a5a1-7161fad8b35c: the server is currently unable to handle the request (get pods dns-test-94072041-586f-43b0-a5a1-7161fad8b35c)
Apr  6 15:46:53.178: INFO: Unable to read wheezy_udp@PodARecord from pod dns-2961/dns-test-94072041-586f-43b0-a5a1-7161fad8b35c: the server is currently unable to handle the request (get pods dns-test-94072041-586f-43b0-a5a1-7161fad8b35c)
Apr  6 15:47:23.329: INFO: Unable to read wheezy_tcp@PodARecord from pod dns-2961/dns-test-94072041-586f-43b0-a5a1-7161fad8b35c: the server is currently unable to handle the request (get pods dns-test-94072041-586f-43b0-a5a1-7161fad8b35c)
Apr  6 15:47:53.402: INFO: Unable to read [email protected] from pod dns-2961/dns-test-94072041-586f-43b0-a5a1-7161fad8b35c: the server is currently unable to handle the request (get pods dns-test-94072041-586f-43b0-a5a1-7161fad8b35c)
Apr  6 15:48:23.471: INFO: Unable to read [email protected] from pod dns-2961/dns-test-94072041-586f-43b0-a5a1-7161fad8b35c: the server is currently unable to handle the request (get pods dns-test-94072041-586f-43b0-a5a1-7161fad8b35c)

Calico seems okay:

From a control plane:

sudo ./calicoctl node status
Calico process is running.
IPv4 BGP status
+--------------+-------------------+-------+----------+-------------+
| PEER ADDRESS |     PEER TYPE     | STATE |  SINCE   |    INFO     |
+--------------+-------------------+-------+----------+-------------+
| 10.0.0.5     | node-to-node mesh | up    | 23:45:53 | Established |
| 10.1.0.4     | node-to-node mesh | up    | 23:45:57 | Established |
| 10.0.0.6     | node-to-node mesh | up    | 23:48:30 | Established |
+--------------+-------------------+-------+----------+-------------+
IPv6 BGP status

From the worker node:

 sudo ./calicoctl node statusCalico process is running.
IPv4 BGP status
+--------------+-------------------+-------+----------+-------------+
| PEER ADDRESS |     PEER TYPE     | STATE |  SINCE   |    INFO     |
+--------------+-------------------+-------+----------+-------------+
| 10.0.0.4     | node-to-node mesh | up    | 23:45:56 | Established |
| 10.0.0.5     | node-to-node mesh | up    | 23:45:56 | Established |
| 10.0.0.6     | node-to-node mesh | up    | 23:48:28 | Established |
+--------------+-------------------+-------+----------+-------------+
IPv6 BGP status
No IPv6 peers found.

Followed steps in https://kubernetes.io/docs/tasks/administer-cluster/dns-debugging-resolution/ and deployed a dnsutils pod:

kubectl exec -ti dnsutils -- nslookup kubernetes.default           9 ↵  10723  17:54:52

;; connection timed out; no servers could be reached

command terminated with exit code 1

 kubectl exec dnsutils cat /etc/resolv.conf                     
search default.svc.cluster.local svc.cluster.local cluster.local 1gujdos4yxfulgew4jyguatide.jx.internal.cloudapp.net
nameserver 10.96.0.10
options ndots:5

 kubectl exec -ti dnsutils -- nslookup google.com 8.8.8.8
server:        8.8.8.8
Address:    8.8.8.8#53
** server can't find google.com.1gujdos4yxfulgew4jyguatide.jx.internal.cloudapp.net: SERVFAIL
command terminated with exit code 1

The text was updated successfully, but these errors were encountered:

CecileRobertMichon · 2020-04-07T17:46:13Z

Root cause described here: https://docs.projectcalico.org/v3.0/reference/public-cloud/azure

The default Calico CNI config doesn't work with Azure.

CecileRobertMichon self-assigned this Apr 7, 2020

CecileRobertMichon added this to the v0.5 milestone Apr 7, 2020

CecileRobertMichon added the priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. label Apr 7, 2020

CecileRobertMichon mentioned this issue Apr 7, 2020

Add Calico CNI spec using VXLAN #504

Merged

k8s-ci-robot closed this as completed in #504 Apr 8, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cluster DNS is broken #503

Cluster DNS is broken #503

CecileRobertMichon commented Apr 7, 2020

CecileRobertMichon commented Apr 7, 2020

Cluster DNS is broken #503

Cluster DNS is broken #503

Comments

CecileRobertMichon commented Apr 7, 2020

CecileRobertMichon commented Apr 7, 2020