-
Notifications
You must be signed in to change notification settings - Fork 715
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
coredns has caching plugin installed which causes non-authoritative responses most of the time #1512
Comments
/sig network |
what is your take on the |
The cache plugin in the default Corefile is used to reduce traffic to the upstream DNS. If you don't want kubernetes records to be cached, you have a couple of options, each has possible drawbacks:
FWIW, I recall a recent issue opened requesting that the we cache kubernetes records for longer than 5 seconds. It's hard to pick a default value that suits everyone. |
That would be true if kubeadm allowed the Corefile to be configured, but that's been brought up before, too, and rejected as too difficult to maintain during the upgrade process. |
My view on this is that caching entries breaks existing user software and not caching increases latency. One's breaking the other's inconvenient. IMHO, the breakage should take precedent and optimizing for use case should be the responsibility of the cluster maintainer. |
that is true, the umbrella ticket for allowing such customization of kubeadm generated manifests is here: if the coredns maintainers give their +1 on modifying the default corefile in kubeadm we can proceed to change it, otherwise this ticket should be closed and mentioned in a comment in the above ticket - e.g. "allow customization of the CoreDNS deployment". is modifying the coredns config map of a running cluster and restarting the pods a viable, immediate solution for you? |
That is our immediate workaround, yes. |
It depends on how wide the breakage is. I don't think it common for clients to reject non-authoritative responses - but I'm not a DNS expert. Is this a python wide thing or is it something specific to your customer's application? If this is something that is fairly common, then we should accommodate it in the default config. If it turns out to be an unusual special case, then probably not. |
@chrisohaver it appears to be common to Python, via the |
In CoreDNS you can disable cache so all local cluster zone responses will be authoritative. But it wont change responses from upstream servers. They would mostly be non-authoritative, retrieved from the cache of intermediate recursive servers. This is normal, and thus confounding that Python should only be able to resolve names directly from authoritative servers. I just sanity checked this on a k8s cluster running CoreDNS with cache enabled: In my test, Python3 (3.6.5) seems to be fine with non aa responses from CoreDNS.
... and the CoreDNS logs ...
|
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
this is the only report we saw about this, which tells me that having the cache enabled by default is tolerable. you can already patch the Corefile as you see fit and the change should persist across upgrades. punting to #1379 |
FWIW, as of v1.5.1, CoreDNS cache responses are always authoritative. coredns/coredns#2885. |
What keywords did you search in kubeadm issues before filing this one?
coredns cache
Is this a BUG REPORT or FEATURE REQUEST?
BUG REPORT
Versions
kubeadm version (use
kubeadm version
):kubeadm version: &version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.0", GitCommit:"641856db18352033a0d96dbc99153fa3b27298e5", GitTreeState:"clean", BuildDate:"2019-03-25T15:51:21Z", GoVersion:"go1.12.1", Compiler:"gc", Platform:"linux/amd64"}
Environment:
kubectl version
):Client Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.1", GitCommit:"b7394102d6ef778017f2ca4046abbaa23b88c290", GitTreeState:"clean", BuildDate:"2019-04-08T17:11:31Z", GoVersion:"go1.12.1", Compiler:"gc", Platform:"linux/amd64"}
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"
CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"
Kernel (e.g.
uname -a
): Linux ip-10-0-2-199.us-west-2.compute.internal 3.10.0-957.1.3.el7.x86_64 kubeadm join on slave node fails preflight checks #1 SMP Thu Nov 29 14:49:43 UTC 2018 x86_64 x86_64 x86_64 GNU/LinuxOthers:
What happened?
The coredns configmap is:
What you expected to happen?
When querying a service dns name, I expect the result to be authoritative ("aa"), ie:
This is a successful query as expected:
This is an unsuccessful query:
This is unsuccessful because it's not in the first 1 second of being updated. Once the ttl of the entry is less than the min ttl, it is, by definition, no longer authoritative. This is because it's being served from the cache instead of from the authoritative domain entry.
How to reproduce it (as minimally and precisely as possible)?
Install a cluster.
The first query will be authoritative because that query populates the cache. One second later, the next query is served from cache and not authoritative.
Anything else we need to know?
This problem was reported to me as affecting some of our customer's software that's written in python that will just fail if the dns response is not authoritative.
Removing the
cache 30
line from the ConfigMap resolves this problem. Caching should not be necessary unless mirroring a high latency remote zone.The text was updated successfully, but these errors were encountered: