Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubelet/dns issues when switching to coredns #3979

Closed
McSlow opened this issue Jan 4, 2019 · 9 comments
Closed

kubelet/dns issues when switching to coredns #3979

McSlow opened this issue Jan 4, 2019 · 9 comments
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@McSlow
Copy link

McSlow commented Jan 4, 2019

Is this a BUG REPORT or FEATURE REQUEST? (choose one):
Bug

Environment:
1.12.4

Kubespray version (commit) (git rev-parse --short HEAD):
2.8.1

Network plugin used:
calico

Hi,
i think there is an issue regarding ubuntu 18.04 (or any other os which comes with systemd-resolved active), kubelet and coreDNS.
We upgraded our cluster from 10.11.x to 10.12.4 using the latest kubespray manifest. This changes dns from kubedns to coredns, which is ok so far.
Problem is that coredns resulted in a crashloop, because it detected a config loop. This came from the fact that kubelet copies the host's /etc/resolv.conf to the pods (including the coredns pod) and this resolv.conf contains a 127.0.0.53 entry coming from systemd-resolved, a dns cache, which is active by default in ubuntu 18.04.

Result is that coredns finds a resolve loop and other pods think that there's a resolver in their own pod ( aka 127.0.0.53) which isn't true.

So in order to run correctly, kubelet needs to either copy a dedicated upstream dns config, omitting the 127.0.0.53 entry or systemd-resolved needs to be switched of.

Switching off systemd-resolved (or dnsmasq... etc) isn't that great too, because kubelet itself ignores /etc/hosts for dns lookups and crashes after disabling the dns cache, because kubelet doesn't seem to handle the searchdomain config in resolv.conf correctly and then is unable to resolve it's own non-fqdn hostname (so it wont find something like k8s-node-01 - it works with a dns cache because these usually mix in /etc/hosts for their results...)

Bug should be reproducible with a fresh ubuntu 18.04 setup ( with standards enabled) and coredns enabled.

@McSlow
Copy link
Author

McSlow commented Jan 4, 2019

btw.: i am aware of the fact that this might be more of a bug in kubelet, because it's a bad idea to just copy the node's resolv.conf ( for the reasons above), but i guess kubespray needs to handle this somehow. (creating a sane resolv.conf for kubelet...)

@wangxf1987
Copy link
Contributor

wangxf1987 commented Jan 4, 2019

I have occur a issue like this, if i set the DNS value to the network interface. Even thought CoreDNS will be generate automate, But if restart the network, Old Value will be recover.

TYPE="Ethernet"
BOOTPROTO="none"
DEFROUTE="yes"
IPV4_FAILURE_FATAL="no"
IPV6INIT="yes"
IPV6_AUTOCONF="yes"
IPV6_DEFROUTE="yes"
IPV6_FAILURE_FATAL="no"
IPV6_ADDR_GEN_MODE="stable-privacy"
NAME="enp7s0f0"
UUID="9aa68d8c-39bb-493e-9a8c-16a89bf6b69c"
DEVICE="enp7s0f0"
ONBOOT="yes"
IPADDR="172.19.xx.220"
PREFIX="24"
DNS1='123.123.123.123'
GATEWAY="172.19.xx.254"
IPV6_PEERDNS="yes"
IPV6_PEERROUTES="yes"
IPV6_PRIVACY="no"

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 11, 2019
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels May 11, 2019
@yelhouti
Copy link
Contributor

I'm having the same problem with centos 7 (I know, not great), the exact error log is:
[FATAL] plugin/loop: Loop (127.0.0.1:52939 -> :53) detected for zone "."

@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@k8s-ci-robot
Copy link
Contributor

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot
Copy link
Contributor

@jqiuyin: You can't reopen an issue/PR unless you authored it or you are a collaborator.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@jqiuyin
Copy link

jqiuyin commented Oct 24, 2022

Is there any solution to this problem?
I had the same problem

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

6 participants