kubelet/dns issues when switching to coredns #3979

McSlow · 2019-01-04T13:35:19Z

Is this a BUG REPORT or FEATURE REQUEST? (choose one):
Bug

Environment:
1.12.4

Cloud provider or hardware configuration:
hw
OS (printf "$(uname -srm)\n$(cat /etc/os-release)\n"):
Linux 4.15.0-43-generic x86_64
NAME="Ubuntu"
VERSION="18.04.1 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.1 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic
Version of Ansible (ansible --version):

Kubespray version (commit) (git rev-parse --short HEAD):
2.8.1

Network plugin used:
calico

Hi,
i think there is an issue regarding ubuntu 18.04 (or any other os which comes with systemd-resolved active), kubelet and coreDNS.
We upgraded our cluster from 10.11.x to 10.12.4 using the latest kubespray manifest. This changes dns from kubedns to coredns, which is ok so far.
Problem is that coredns resulted in a crashloop, because it detected a config loop. This came from the fact that kubelet copies the host's /etc/resolv.conf to the pods (including the coredns pod) and this resolv.conf contains a 127.0.0.53 entry coming from systemd-resolved, a dns cache, which is active by default in ubuntu 18.04.

Result is that coredns finds a resolve loop and other pods think that there's a resolver in their own pod ( aka 127.0.0.53) which isn't true.

So in order to run correctly, kubelet needs to either copy a dedicated upstream dns config, omitting the 127.0.0.53 entry or systemd-resolved needs to be switched of.

Switching off systemd-resolved (or dnsmasq... etc) isn't that great too, because kubelet itself ignores /etc/hosts for dns lookups and crashes after disabling the dns cache, because kubelet doesn't seem to handle the searchdomain config in resolv.conf correctly and then is unable to resolve it's own non-fqdn hostname (so it wont find something like k8s-node-01 - it works with a dns cache because these usually mix in /etc/hosts for their results...)

Bug should be reproducible with a fresh ubuntu 18.04 setup ( with standards enabled) and coredns enabled.

The text was updated successfully, but these errors were encountered:

McSlow · 2019-01-04T13:42:09Z

btw.: i am aware of the fact that this might be more of a bug in kubelet, because it's a bad idea to just copy the node's resolv.conf ( for the reasons above), but i guess kubespray needs to handle this somehow. (creating a sane resolv.conf for kubelet...)

wangxf1987 · 2019-01-04T14:04:03Z

I have occur a issue like this, if i set the DNS value to the network interface. Even thought CoreDNS will be generate automate, But if restart the network, Old Value will be recover.

TYPE="Ethernet"
BOOTPROTO="none"
DEFROUTE="yes"
IPV4_FAILURE_FATAL="no"
IPV6INIT="yes"
IPV6_AUTOCONF="yes"
IPV6_DEFROUTE="yes"
IPV6_FAILURE_FATAL="no"
IPV6_ADDR_GEN_MODE="stable-privacy"
NAME="enp7s0f0"
UUID="9aa68d8c-39bb-493e-9a8c-16a89bf6b69c"
DEVICE="enp7s0f0"
ONBOOT="yes"
IPADDR="172.19.xx.220"
PREFIX="24"
DNS1='123.123.123.123'
GATEWAY="172.19.xx.254"
IPV6_PEERDNS="yes"
IPV6_PEERROUTES="yes"
IPV6_PRIVACY="no"

fejta-bot · 2019-04-11T17:08:54Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot · 2019-05-11T17:26:39Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

yelhouti · 2019-05-21T05:03:02Z

I'm having the same problem with centos 7 (I know, not great), the exact error log is:
[FATAL] plugin/loop: Loop (127.0.0.1:52939 -> :53) detected for zone "."

fejta-bot · 2019-06-20T05:57:39Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

k8s-ci-robot · 2019-06-20T05:57:47Z

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot · 2022-10-24T03:00:07Z

@jqiuyin: You can't reopen an issue/PR unless you authored it or you are a collaborator.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

jqiuyin · 2022-10-24T03:03:49Z

Is there any solution to this problem?
I had the same problem

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 11, 2019

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels May 11, 2019

k8s-ci-robot closed this as completed Jun 20, 2019

jqiuyin mentioned this issue Oct 24, 2022

Pods faiiling after restart of VM #8850

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kubelet/dns issues when switching to coredns #3979

kubelet/dns issues when switching to coredns #3979

McSlow commented Jan 4, 2019 •

edited

Loading

McSlow commented Jan 4, 2019

wangxf1987 commented Jan 4, 2019 •

edited

Loading

fejta-bot commented Apr 11, 2019

fejta-bot commented May 11, 2019

yelhouti commented May 21, 2019

fejta-bot commented Jun 20, 2019

k8s-ci-robot commented Jun 20, 2019

k8s-ci-robot commented Oct 24, 2022

jqiuyin commented Oct 24, 2022

kubelet/dns issues when switching to coredns #3979

kubelet/dns issues when switching to coredns #3979

Comments

McSlow commented Jan 4, 2019 • edited Loading

McSlow commented Jan 4, 2019

wangxf1987 commented Jan 4, 2019 • edited Loading

fejta-bot commented Apr 11, 2019

fejta-bot commented May 11, 2019

yelhouti commented May 21, 2019

fejta-bot commented Jun 20, 2019

k8s-ci-robot commented Jun 20, 2019

k8s-ci-robot commented Oct 24, 2022

jqiuyin commented Oct 24, 2022

McSlow commented Jan 4, 2019 •

edited

Loading

wangxf1987 commented Jan 4, 2019 •

edited

Loading