Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

coredns has caching plugin installed which causes non-authoritative responses most of the time #1512

Closed
joejulian opened this issue Apr 18, 2019 · 13 comments
Labels
lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. sig/network Categorizes an issue or PR as relevant to SIG Network.
Milestone

Comments

@joejulian
Copy link

What keywords did you search in kubeadm issues before filing this one?

coredns cache

Is this a BUG REPORT or FEATURE REQUEST?

BUG REPORT

Versions

kubeadm version (use kubeadm version):
kubeadm version: &version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.0", GitCommit:"641856db18352033a0d96dbc99153fa3b27298e5", GitTreeState:"clean", BuildDate:"2019-03-25T15:51:21Z", GoVersion:"go1.12.1", Compiler:"gc", Platform:"linux/amd64"}

Environment:

  • Kubernetes version (use kubectl version):
    Client Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.1", GitCommit:"b7394102d6ef778017f2ca4046abbaa23b88c290", GitTreeState:"clean", BuildDate:"2019-04-08T17:11:31Z", GoVersion:"go1.12.1", Compiler:"gc", Platform:"linux/amd64"}
  • Cloud provider or hardware configuration: aws
  • OS (e.g. from /etc/os-release):
    NAME="CentOS Linux"
    VERSION="7 (Core)"
    ID="centos"
    ID_LIKE="rhel fedora"
    VERSION_ID="7"
    PRETTY_NAME="CentOS Linux 7 (Core)"
    ANSI_COLOR="0;31"
    CPE_NAME="cpe:/o:centos:centos:7"
    HOME_URL="https://www.centos.org/"
    BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"

What happened?

The coredns configmap is:

apiVersion: v1
data:
  Corefile: |
    .:53 {
        errors
        health
        kubernetes cluster.local in-addr.arpa ip6.arpa {
           pods insecure
           upstream
           fallthrough in-addr.arpa ip6.arpa
        }
        prometheus :9153
        forward . /etc/resolv.conf
        cache 30
        loop
        reload
        loadbalance
    }
kind: ConfigMap
metadata:
  creationTimestamp: "2019-04-17T21:48:42Z"
  name: coredns
  namespace: kube-system
  resourceVersion: "244"
  selfLink: /api/v1/namespaces/kube-system/configmaps/coredns
  uid: 9164b9b9-615a-11e9-b7a6-0a76de0932ee

What you expected to happen?

When querying a service dns name, I expect the result to be authoritative ("aa"), ie:

;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

This is a successful query as expected:

; <<>> DiG 9.12.3-P4 <<>> +search kube-dns
;; global options: +cmd
;; Got answer:
;; WARNING: .local is reserved for Multicast DNS
;; You are currently testing what happens when an mDNS query is leaked to DNS
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 2917
;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
; COOKIE: 913282e2b788674b (echoed)
;; QUESTION SECTION:
;kube-dns.kube-system.svc.cluster.local.	IN A

;; ANSWER SECTION:
kube-dns.kube-system.svc.cluster.local.	5 IN A	10.0.0.10

;; Query time: 0 msec
;; SERVER: 10.0.0.10#53(10.0.0.10)
;; WHEN: Thu Apr 18 03:24:31 UTC 2019
;; MSG SIZE  rcvd: 133

This is an unsuccessful query:

; <<>> DiG 9.12.3-P4 <<>> +search kube-dns
;; global options: +cmd
;; Got answer:
;; WARNING: .local is reserved for Multicast DNS
;; You are currently testing what happens when an mDNS query is leaked to DNS
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 3084
;; flags: qr rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
; COOKIE: 2ac1a88733f8227d (echoed)
;; QUESTION SECTION:
;kube-dns.kube-system.svc.cluster.local.	IN A

;; ANSWER SECTION:
kube-dns.kube-system.svc.cluster.local.	3 IN A	10.0.0.10

;; Query time: 0 msec
;; SERVER: 10.0.0.10#53(10.0.0.10)
;; WHEN: Thu Apr 18 03:24:47 UTC 2019
;; MSG SIZE  rcvd: 133

This is unsuccessful because it's not in the first 1 second of being updated. Once the ttl of the entry is less than the min ttl, it is, by definition, no longer authoritative. This is because it's being served from the cache instead of from the authoritative domain entry.

How to reproduce it (as minimally and precisely as possible)?

Install a cluster.

kubectl --generator=run-pod/v1 -n kube-system run tmp --rm -it --image alpine -- /bin/sh -c 'apk update && apk add bind-tools && sh'
# dig kube-dns.kube-system.svc.cluster.local. ; sleep 1; dig kube-dns.kube-system.svc.cluster.local.

The first query will be authoritative because that query populates the cache. One second later, the next query is served from cache and not authoritative.

Anything else we need to know?

This problem was reported to me as affecting some of our customer's software that's written in python that will just fail if the dns response is not authoritative.

Removing the cache 30 line from the ConfigMap resolves this problem. Caching should not be necessary unless mirroring a high latency remote zone.

@joejulian
Copy link
Author

joejulian commented Apr 18, 2019

/sig network

@k8s-ci-robot k8s-ci-robot added the sig/network Categorizes an issue or PR as relevant to SIG Network. label Apr 18, 2019
@neolit123
Copy link
Member

@joejulian

Removing the cache 30 line from the ConfigMap resolves this problem. Caching should not be necessary unless mirroring a high latency remote zone.

cc @chrisohaver @rajansandeep

what is your take on the cache 30 value and this use case?

@neolit123 neolit123 added the priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. label Apr 18, 2019
@chrisohaver
Copy link

The cache plugin in the default Corefile is used to reduce traffic to the upstream DNS.
As the default Corefile is structured, it also happens to cache k8s responses (for 5 seconds, the default TTL for the kubernetes plugin).
There is no significant performance benefit for CoreDNS caching the k8s responses because the kubernetes plugin more less keeps its own cache (part of the k8s client-go api watch).

If you don't want kubernetes records to be cached, you have a couple of options, each has possible drawbacks:

  1. You can set the TTL for kubernetes records to zero. This will prevent them from entering the cache. In theory this could confuse clients that look at the TTL, but I think TTL is usually ignored by DNS clients.
  2. You can remove the cache plugin. This would result in increased traffic, and higher average latency for queries of external zones.

FWIW, I recall a recent issue opened requesting that the we cache kubernetes records for longer than 5 seconds. It's hard to pick a default value that suits everyone.

@joejulian
Copy link
Author

That would be true if kubeadm allowed the Corefile to be configured, but that's been brought up before, too, and rejected as too difficult to maintain during the upgrade process.

@joejulian
Copy link
Author

My view on this is that caching entries breaks existing user software and not caching increases latency. One's breaking the other's inconvenient. IMHO, the breakage should take precedent and optimizing for use case should be the responsibility of the cluster maintainer.

@neolit123
Copy link
Member

That would be true if kubeadm allowed the Corefile to be configured, but that's been brought up before, too, and rejected as too difficult to maintain during the upgrade process.

that is true, the umbrella ticket for allowing such customization of kubeadm generated manifests is here:
#1379

if the coredns maintainers give their +1 on modifying the default corefile in kubeadm we can proceed to change it, otherwise this ticket should be closed and mentioned in a comment in the above ticket - e.g. "allow customization of the CoreDNS deployment".

is modifying the coredns config map of a running cluster and restarting the pods a viable, immediate solution for you?

@joejulian
Copy link
Author

That is our immediate workaround, yes.

@chrisohaver
Copy link

IMHO, the breakage should take precedent and optimizing

It depends on how wide the breakage is. I don't think it common for clients to reject non-authoritative responses - but I'm not a DNS expert. Is this a python wide thing or is it something specific to your customer's application?

If this is something that is fairly common, then we should accommodate it in the default config. If it turns out to be an unusual special case, then probably not.

@samba
Copy link

samba commented Apr 18, 2019

@chrisohaver it appears to be common to Python, via the socket.getaddrinfo call, if I'm reading correctly, in cases like this:
https://stackoverflow.com/questions/54778160/python-requests-library-not-resolving-non-authoritative-dns-lookups

@chrisohaver
Copy link

chrisohaver commented Apr 18, 2019

In CoreDNS you can disable cache so all local cluster zone responses will be authoritative. But it wont change responses from upstream servers. They would mostly be non-authoritative, retrieved from the cache of intermediate recursive servers. This is normal, and thus confounding that Python should only be able to resolve names directly from authoritative servers.

I just sanity checked this on a k8s cluster running CoreDNS with cache enabled: In my test, Python3 (3.6.5) seems to be fine with non aa responses from CoreDNS.

>>> socket.getaddrinfo("kubernetes.default.svc.cluster.local.", 0, 0, 0, socket.SOCK_STREAM)
[(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, 'kubernetes.default.svc.cluster.local.', ('10.96.0.1', 0)), (<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_DGRAM: 2>, 17, 'kubernetes.default.svc.cluster.local.', ('10.96.0.1', 0))]
>>> socket.getaddrinfo("kubernetes.default.svc.cluster.local.", 0, 0, 0, socket.SOCK_STREAM)
[(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, 'kubernetes.default.svc.cluster.local.', ('10.96.0.1', 0)), (<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_DGRAM: 2>, 17, 'kubernetes.default.svc.cluster.local.', ('10.96.0.1', 0))]

... and the CoreDNS logs ...

2019-04-18T21:14:59.987Z [INFO] 172.17.0.4:36730 - 37402 "A IN kubernetes.default.svc.cluster.local. udp 54 false 512" NOERROR qr,aa,rd 106 0.00078162s
2019-04-18T21:15:01.179Z [INFO] 172.17.0.4:39902 - 2144 "A IN kubernetes.default.svc.cluster.local. udp 54 false 512" NOERROR qr,rd 106 0.000080777s

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 1, 2019
@neolit123
Copy link
Member

this is the only report we saw about this, which tells me that having the cache enabled by default is tolerable. you can already patch the Corefile as you see fit and the change should persist across upgrades.

punting to #1379
where we can possibly enable kustomizations of the Corefile, but also the coredns addon might become external to kubeadm one day (but still mandatory).

@chrisohaver
Copy link

FWIW, as of v1.5.1, CoreDNS cache responses are always authoritative. coredns/coredns#2885.
However, upgrading to 1.5.1 has pitfalls, because it removes some configuration options that some users may still be using, which could require manual modification of the configuration file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. sig/network Categorizes an issue or PR as relevant to SIG Network.
Projects
None yet
Development

No branches or pull requests

7 participants