Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cluster DNS is broken #503

Closed
CecileRobertMichon opened this issue Apr 7, 2020 · 1 comment · Fixed by #504
Closed

Cluster DNS is broken #503

CecileRobertMichon opened this issue Apr 7, 2020 · 1 comment · Fixed by #504
Assignees
Labels
priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now.
Milestone

Comments

@CecileRobertMichon
Copy link
Contributor

From conformance tests:

 [Fail] [sig-network] DNS [It] should provide DNS for the cluster  [Conformance]
test/e2e/network/dns_common.go:556
STEP: Running these commands on jessie: for i in `seq 1 600`; do check="$$(dig +notcp +noall +answer +search kubernetes.default.svc.cluster.local A)" && test -n "$$check" && echo OK > /results/[email protected];check="$$(dig +tcp +noall +answer +search kubernetes.default.svc.cluster.local A)" && test -n "$$check" && echo OK > /results/[email protected];podARec=$$(hostname -i| awk -F. '{print $$1"-"$$2"-"$$3"-"$$4".dns-2961.pod.cluster.local"}');check="$$(dig +notcp +noall +answer +search $${podARec} A)" && test -n "$$check" && echo OK > /results/jessie_udp@PodARecord;check="$$(dig +tcp +noall +answer +search $${podARec} A)" && test -n "$$check" && echo OK > /results/jessie_tcp@PodARecord;sleep 1; done
STEP: creating a pod to probe DNS
STEP: submitting the pod to kubernetes
STEP: retrieving the pod
STEP: looking for the results for each expected name from probers
Apr  6 15:45:53.045: INFO: Unable to read [email protected] from pod dns-2961/dns-test-94072041-586f-43b0-a5a1-7161fad8b35c: the server is currently unable to handle the request (get pods dns-test-94072041-586f-43b0-a5a1-7161fad8b35c)
Apr  6 15:46:23.111: INFO: Unable to read [email protected] from pod dns-2961/dns-test-94072041-586f-43b0-a5a1-7161fad8b35c: the server is currently unable to handle the request (get pods dns-test-94072041-586f-43b0-a5a1-7161fad8b35c)
Apr  6 15:46:53.178: INFO: Unable to read wheezy_udp@PodARecord from pod dns-2961/dns-test-94072041-586f-43b0-a5a1-7161fad8b35c: the server is currently unable to handle the request (get pods dns-test-94072041-586f-43b0-a5a1-7161fad8b35c)
Apr  6 15:47:23.329: INFO: Unable to read wheezy_tcp@PodARecord from pod dns-2961/dns-test-94072041-586f-43b0-a5a1-7161fad8b35c: the server is currently unable to handle the request (get pods dns-test-94072041-586f-43b0-a5a1-7161fad8b35c)
Apr  6 15:47:53.402: INFO: Unable to read [email protected] from pod dns-2961/dns-test-94072041-586f-43b0-a5a1-7161fad8b35c: the server is currently unable to handle the request (get pods dns-test-94072041-586f-43b0-a5a1-7161fad8b35c)
Apr  6 15:48:23.471: INFO: Unable to read [email protected] from pod dns-2961/dns-test-94072041-586f-43b0-a5a1-7161fad8b35c: the server is currently unable to handle the request (get pods dns-test-94072041-586f-43b0-a5a1-7161fad8b35c)

Calico seems okay:

From a control plane:

sudo ./calicoctl node status
Calico process is running.
IPv4 BGP status
+--------------+-------------------+-------+----------+-------------+
| PEER ADDRESS |     PEER TYPE     | STATE |  SINCE   |    INFO     |
+--------------+-------------------+-------+----------+-------------+
| 10.0.0.5     | node-to-node mesh | up    | 23:45:53 | Established |
| 10.1.0.4     | node-to-node mesh | up    | 23:45:57 | Established |
| 10.0.0.6     | node-to-node mesh | up    | 23:48:30 | Established |
+--------------+-------------------+-------+----------+-------------+
IPv6 BGP status

From the worker node:

 sudo ./calicoctl node statusCalico process is running.
IPv4 BGP status
+--------------+-------------------+-------+----------+-------------+
| PEER ADDRESS |     PEER TYPE     | STATE |  SINCE   |    INFO     |
+--------------+-------------------+-------+----------+-------------+
| 10.0.0.4     | node-to-node mesh | up    | 23:45:56 | Established |
| 10.0.0.5     | node-to-node mesh | up    | 23:45:56 | Established |
| 10.0.0.6     | node-to-node mesh | up    | 23:48:28 | Established |
+--------------+-------------------+-------+----------+-------------+
IPv6 BGP status
No IPv6 peers found.

Followed steps in https://kubernetes.io/docs/tasks/administer-cluster/dns-debugging-resolution/ and deployed a dnsutils pod:

kubectl exec -ti dnsutils -- nslookup kubernetes.default           9 ↵  10723  17:54:52

;; connection timed out; no servers could be reached

command terminated with exit code 1
 kubectl exec dnsutils cat /etc/resolv.conf                     
search default.svc.cluster.local svc.cluster.local cluster.local 1gujdos4yxfulgew4jyguatide.jx.internal.cloudapp.net
nameserver 10.96.0.10
options ndots:5
 kubectl exec -ti dnsutils -- nslookup google.com 8.8.8.8
server:        8.8.8.8
Address:    8.8.8.8#53
** server can't find google.com.1gujdos4yxfulgew4jyguatide.jx.internal.cloudapp.net: SERVFAIL
command terminated with exit code 1
@CecileRobertMichon CecileRobertMichon self-assigned this Apr 7, 2020
@CecileRobertMichon CecileRobertMichon added this to the v0.5 milestone Apr 7, 2020
@CecileRobertMichon CecileRobertMichon added the priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. label Apr 7, 2020
@CecileRobertMichon
Copy link
Contributor Author

Root cause described here: https://docs.projectcalico.org/v3.0/reference/public-cloud/azure

The default Calico CNI config doesn't work with Azure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant