High level of i/o timeout erros in nodelocaldns pod when coredns pod on same node #480

rtmie · 2021-09-14T15:09:00Z

Environment

coredns version 1.8.0
nodelocaldns version v1.17.0
kubernetes version : 1.20.8
kube-proxy in ipvs mode

Config

configmap:

data:
  Corefile: |
    .:53 {
        errors
        cache {
                success 9984 30
                denial 9984 5
        }
        reload
        loop
        bind 169.254.20.10
        forward . 192.168.128.10 {
                force_tcp
        }
        prometheus :9253
        health 169.254.20.10:8080
        }

Issue Description

We are seeing an issue with a high level of i/o timeout errors in the node-local-dns daemonset pods on those nodes where the coredns pod is also running .

Running

for pod in `kubectl -n kube-system get po -l k8s-app=node-local-dns -o name|cut -f2 -d '/'`;do echo "$(kubectl -n kube-system get po $pod -o wide --no-headers| awk '{ print $1,$7,$5 }') $(kubectl -n kube-system logs $pod|grep ERRO|wc -l)";done

When sorting above output based on error count we see massive increase in error count for those nodes where coredns is running.

Reproduce

Running a k8s job with a dnsperf pod in it - with 3 fqdns to lookup - 1 on cluster service, one service in AWS and one external service

On a node where coredns is also running

Statistics:

  Queries sent:         1500000
  Queries completed:    1500000 (100.00%)
  Queries lost:         0 (0.00%)

  Response codes:       NOERROR 1499884 (99.99%), SERVFAIL 116 (0.01%)
  Average packet size:  request 52, response 224
  Run time (s):         300.000132
  Queries per second:   4999.997800

  Average Latency (s):  0.000260 (min 0.000044, max 1.067598)
  Latency StdDev (s):   0.012459

On a node where coredns is not also running

Statistics:

  Queries sent:         1500000
  Queries completed:    1500000 (100.00%)
  Queries lost:         0 (0.00%)

  Response codes:       NOERROR 1500000 (100.00%)
  Average packet size:  request 52, response 224
  Run time (s):         300.000153
  Queries per second:   4999.997450

  Average Latency (s):  0.000208 (min 0.000045, max 1.082059)
  Latency StdDev (s):   0.010506

cluster service resolution vs external resolution

when test set is all k8s service fqdns - SERVFAIL errors much lower than when test set is all external fqdns.

On node where no coredns is running: 0 servfail
On node where coredns is running and all on cluster fqdns: 8 servfail from 1500000 queries
On node where coredns is running and all on external fqdns: 164 servfail from 1500000 queries

When we scale coredns down to one replica and run dns perf on the same node we get no servfails int he test

The text was updated successfully, but these errors were encountered:

prameshj · 2021-09-17T18:24:19Z

Are you able to run a packet capture on one of the nodes that runs coreDNS and nodelocaldns and see the response? Any logs in the corresponding coreDNS pod?

prameshj · 2021-09-17T18:24:32Z

/assign @rtmie

rtmie · 2021-09-20T14:38:27Z

@prameshj I enabled logging via coredns core file.

No Fails seen in the coredns pod log (I do see some nxdomain denials but that is for a known missing service in my cluster, non related to the FQDNs in my dnsperf test).

However in the log of the node localdns pod I do see this error for each SERVFAIL in the dnsperf test:

"[ERROR] plugin/errors: 2 <external.fqdn.mydomain> A: dial tcp 192.168.128.10:53: i/o timeout\n"
--

(192.168.128.10 is the coredns service)

prameshj · 2021-09-20T23:09:53Z

The i/o timeout most likely means coreDNS pod did not respond in 2 seconds. NodeLocalDNS(which uses coreDNS) has a 2s read timeout for forwarded requests. It sends the request via TCP to coreDNS.

Could this particular coreDNS pod be getting overloaded? How many requests per second is that pod getting? Perhaps the test is hitting the limits of the qps that can be handled.

In case of external FQDNS, the i/o timeout log message will have the upstream server IP.

rtmie · 2021-09-21T08:08:06Z

The consistent behaviour that I see is that where I run the dnsperf load on a node that does not have coredns pod running I get zero i/o timeout errors in the node-local-dns. When I run it on a node where there is a coredns I see the tiomouts.

I wonder if conntrack causes the node-local-dns to hotspot the coredns that is local to it on the node causing it to overload? (we are using ipvs for kube-proxy).

Right now I have dnsperf running with qps of 5000 for 300 seconds.

I will test with a lower qps.

@prameshj is the 2 second timeout on the node-local-dns be configured?

rtmie · 2021-09-21T10:34:09Z

@prameshj when I run DNS perf job qith QPS of 5k on a node where there is no coredns pod I get no SERVFAIL.

When I run on a node where coredns is running I reduce qps to 500 and still SEVFAIL @0.01%

prameshj · 2021-09-23T18:51:20Z

I wonder if conntrack causes the node-local-dns to hotspot the coredns that is local to it on the node causing it to overload? (we are using ipvs for kube-proxy).

Can you check the dns_request_count metric on the various coreDNS pods and see if a specific one is getting more queries?

@prameshj is the 2 second timeout on the node-local-dns be configured?

I don't see it as configurable. The forward plugin sets an overall timeout of 5s to handle the query - https://github.com/coredns/coredns/blob/5b9b079dabc7f71463cea3f0c6a92f338935039d/plugin/forward/forward.go#L97

The connect call sets a write/read deadline of maxTimeout and readTImeout respectively. https://github.com/coredns/coredns/blob/5b9b079dabc7f71463cea3f0c6a92f338935039d/plugin/forward/connect.go#L99, https://github.com/coredns/coredns/blob/5b9b079dabc7f71463cea3f0c6a92f338935039d/plugin/forward/connect.go#L109

These are set to 2s right now. https://github.com/coredns/coredns/blob/5b9b079dabc7f71463cea3f0c6a92f338935039d/plugin/forward/persistent.go#L160, https://github.com/coredns/coredns/blob/5b9b079dabc7f71463cea3f0c6a92f338935039d/plugin/forward/proxy.go#L79

So, the behavior is - forward plugin will try to get a response from an upstream within 2s and contact a different one if there was no response from the previous. It will do this for 5s before failing the query. @chrisohaver is that accurate?

rtmie · 2021-09-27T10:04:47Z

@prameshj thanks for the info -I was away for a few days. Will go through your feedback today/tomorrow.

gjkim42 · 2021-09-28T02:49:17Z

I think I have the same issue.
There are quite a lot of timeout requests from nodelocaldns to coredns. (about 70% of DNS requests are timeout in worst case node)
And I just realized that only nodes having coredns pod have this issue as @rtmie said.

chrisohaver · 2021-09-28T03:34:17Z

@chrisohaver is that accurate?

Yes. A self adjusting dial timeout comes into play too, but that’s the general idea. None of it is configurable.

gjkim42 · 2021-09-28T15:03:17Z

The i/o timeout most likely means coreDNS pod did not respond in 2 seconds. NodeLocalDNS(which uses coreDNS) has a 2s read timeout for forwarded requests. It sends the request via TCP to coreDNS.

In my case, I think coreDNS is working fine (it does not make any timeout). However nodelocaldns rarely makes any connection with coreDNS service(192.168.0.3).

# conntrack -L | grep 192.168.0.3
tcp      6 108 SYN_SENT src=192.168.0.3 dst=192.168.0.3 sport=51422 dport=53 [UNREPLIED] src=192.168.89.1 dst=192.168.126.0 sport=53 dport=19944 mark=0 use=1
tcp      6 107 SYN_SENT src=192.168.0.3 dst=192.168.0.3 sport=51414 dport=53 [UNREPLIED] src=192.168.76.1 dst=192.168.126.0 sport=53 dport=45629 mark=0 use=1
tcp      6 110 SYN_SENT src=192.168.0.3 dst=192.168.0.3 sport=51438 dport=53 [UNREPLIED] src=192.168.87.1 dst=192.168.126.0 sport=53 dport=22125 mark=0 use=1
tcp      6 105 SYN_SENT src=192.168.0.3 dst=192.168.0.3 sport=51394 dport=53 [UNREPLIED] src=192.168.66.2 dst=192.168.126.0 sport=53 dport=42584 mark=0 use=1
tcp      6 111 SYN_SENT src=192.168.0.3 dst=192.168.0.3 sport=51444 dport=53 [UNREPLIED] src=192.168.117.1 dst=192.168.126.0 sport=53 dport=43203 mark=0 use=1
tcp      6 109 SYN_SENT src=192.168.0.3 dst=192.168.0.3 sport=51430 dport=53 [UNREPLIED] src=192.168.72.1 dst=192.168.126.0 sport=53 dport=48859 mark=0 use=1
tcp      6 7 TIME_WAIT src=192.168.0.3 dst=192.168.0.3 sport=50610 dport=53 src=192.168.126.1 dst=10.168.30.38 sport=53 dport=11486 [ASSURED] mark=0 use=1
tcp      6 103 SYN_SENT src=192.168.0.3 dst=192.168.0.3 sport=51372 dport=53 [UNREPLIED] src=192.168.84.1 dst=192.168.126.0 sport=53 dport=35905 mark=0 use=1
tcp      6 86392 ESTABLISHED src=192.168.0.3 dst=192.168.0.3 sport=51454 dport=53 src=192.168.126.1 dst=10.168.30.38 sport=53 dport=42359 [ASSURED] mark=0 use=1
udp      17 20 src=192.168.0.3 dst=192.168.0.3 sport=51515 dport=53 [UNREPLIED] src=192.168.84.1 dst=192.168.126.0 sport=53 dport=53046 mark=0 use=1
tcp      6 102 SYN_SENT src=192.168.0.3 dst=192.168.0.3 sport=51366 dport=53 [UNREPLIED] src=192.168.111.4 dst=192.168.126.0 sport=53 dport=39537 mark=0 use=1
tcp      6 106 SYN_SENT src=192.168.0.3 dst=192.168.0.3 sport=51404 dport=53 [UNREPLIED] src=192.168.93.1 dst=192.168.126.0 sport=53 dport=33050 mark=0 use=1
udp      17 16 src=192.168.0.3 dst=192.168.0.3 sport=49564 dport=53 [UNREPLIED] src=192.168.111.4 dst=192.168.126.0 sport=53 dport=18936 mark=0 use=1
udp      17 23 src=192.168.0.3 dst=192.168.0.3 sport=37426 dport=53 [UNREPLIED] src=192.168.124.1 dst=192.168.126.0 sport=53 dport=65084 mark=0 use=1
tcp      6 67 TIME_WAIT src=192.168.0.3 dst=192.168.0.3 sport=51056 dport=53 src=192.168.126.1 dst=10.168.30.38 sport=53 dport=55891 [ASSURED] mark=0 use=1
tcp      6 104 SYN_SENT src=192.168.0.3 dst=192.168.0.3 sport=51384 dport=53 [UNREPLIED] src=192.168.124.1 dst=192.168.126.0 sport=53 dport=58595 mark=0 use=1
udp      17 27 src=192.168.0.3 dst=192.168.0.3 sport=59777 dport=53 [UNREPLIED] src=192.168.66.2 dst=192.168.126.0 sport=53 dport=55549 mark=0 use=1
udp      17 13 src=192.168.0.3 dst=192.168.0.3 sport=32790 dport=53 [UNREPLIED] src=192.168.78.1 dst=192.168.126.0 sport=53 dport=41575 mark=0 use=1

gjkim42 · 2021-09-30T12:04:38Z

Maybe I found a workaround for this - change ipvs scheduler to sh(source hash) from rr(round robin).
However, I am not sure why it addresses this issue.

@rtmie @prameshj
Any updates on this?

vivekpd15 · 2021-09-30T20:05:11Z

+1

prameshj · 2021-09-30T20:28:12Z

Maybe I found a workaround for this - change ipvs scheduler to sh(source hash) from rr(round robin).

I am not too familiar with the IPVS loadbalancing modes, but Source hash looks up the destination based on the source IP address of the packet. In case of NodeLocalDNS reaching out to upstream, the source IP will always be the same (the node IP address, since NodeLocalDNS is in hostNetwork mode). It is possible that it picks the same coreDNS pod every-time as a result. With round-robin, it is more spread out.

Are you able to lookup the dns request count(exported by coreDNS pods, NOT nodelocaldns) grouped by coreDNS pod name? Are some pods getting way more than others? Is it possible that those coreDNS pods were scheduled first?

gjkim42 · 2021-09-30T22:23:27Z

@prameshj

Are you able to lookup the dns request count(exported by coreDNS pods, NOT nodelocaldns) grouped by coreDNS pod name? Are some pods getting way more than others? Is it possible that those coreDNS pods were scheduled first?

Yes, one coredns pod is getting a lot of requests than others - which is not desirable for me.

It is possible that it picks the same coreDNS pod every-time as a result. With round-robin, it is more spread out.

Also, I am having another problem that DNS lookup from nodes to the coredns's service IP only fails.
https://calicousers.slack.com/archives/CPEPF833L/p1633007149311200
https://kubernetes.slack.com/archives/C09QYUH5W/p1632813013251600

DNS lookup

from pods to every coredns's pod IP succeeds

from pods to every coredns's service IP succeeds

from nodes to every coredns's pod IP succeeds

from nodes to coredns's service IP only fails

And also HTTP request from nodes to http://coredns's service IP/metrics succeeds. (the coredns service is reachable by HTTP protocol)

As I know, nodelocaldns is supposed to use the host node's network namespace. So I doubt that nodelocaldns has ever succeeded in DNS lookup to the coredns's service IP.

EDIT: The issue turned out to be a timeout issue rather than a connectivity issue.

prameshj · 2021-09-30T22:34:07Z

Do you also have NetworkPolicy rules configured? This could be an interaction between iptables rules setup for NodeLocalDNS and network policy maybe.

Is the issue reproducible in:

iptables mode?
With NodeLocalDNS disabled in IPVS mode?
WIth NetworkPolicy disabled(in case you have it enabled)?

gjkim42 · 2021-10-01T00:53:13Z

Do you also have NetworkPolicy rules configured? This could be an interaction between iptables rules setup for NodeLocalDNS and network policy maybe.

$ k get globalnetworkpolicies.crd.projectcalico.org -A
No resources found
$ k get networkpolicies.crd.projectcalico.org -A
No resources found
$ k get networkpolicies.networking.k8s.io -A
No resources found

I think I don't have any NetworkPolicies as I know.

rtmie · 2021-10-04T09:12:29Z

@gjkim42 - changing the ipvs scheduler in kube-proxy from rr to sh did not make a difference on my dnsperf test.

rtmie · 2021-10-04T09:16:49Z

@prameshj wrt your comment on networkpolicies and impact on iptables - I am not clear on what I should be looking for there that would cause the effect I am seeing? We make extensive use of network policies.

gjkim42 · 2021-10-05T14:45:00Z

@prameshj

Is the issue reproducible in:

iptables mode?

The issue is reproducible in the iptables mode

Will disable nodelocaldns tomorrow.

There are many invalid records in the failing nodes. (Non-failing nodes have few invalid records)

# conntrack -S
cpu=0           found=0 invalid=685 ignore=11379 insert=0 insert_failed=0 drop=0 early_drop=0 error=677 search_restart=3 
cpu=1           found=0 invalid=39 ignore=10805 insert=0 insert_failed=0 drop=0 early_drop=0 error=39 search_restart=5 
cpu=2           found=0 invalid=214 ignore=10635 insert=0 insert_failed=0 drop=0 early_drop=0 error=212 search_restart=1 
cpu=3           found=0 invalid=121 ignore=10403 insert=0 insert_failed=0 drop=0 early_drop=0 error=119 search_restart=6 
cpu=4           found=0 invalid=289 ignore=10193 insert=0 insert_failed=0 drop=0 early_drop=0 error=289 search_restart=5 
cpu=5           found=0 invalid=43 ignore=10656 insert=0 insert_failed=0 drop=0 early_drop=0 error=41 search_restart=0 
cpu=6           found=0 invalid=64 ignore=10285 insert=0 insert_failed=0 drop=0 early_drop=0 error=64 search_restart=5 
cpu=7           found=0 invalid=84 ignore=10538 insert=0 insert_failed=0 drop=0 early_drop=0 error=82 search_restart=8 
cpu=8           found=0 invalid=274 ignore=11307 insert=0 insert_failed=0 drop=0 early_drop=0 error=274 search_restart=4 
cpu=9           found=0 invalid=152 ignore=10371 insert=0 insert_failed=0 drop=0 early_drop=0 error=152 search_restart=4 
cpu=10          found=0 invalid=103 ignore=10941 insert=0 insert_failed=0 drop=0 early_drop=0 error=103 search_restart=6 
cpu=11          found=0 invalid=697 ignore=10919 insert=0 insert_failed=0 drop=0 early_drop=0 error=697 search_restart=3 
cpu=12          found=0 invalid=303 ignore=9353 insert=0 insert_failed=0 drop=0 early_drop=0 error=303 search_restart=0 
cpu=13          found=0 invalid=28 ignore=9726 insert=0 insert_failed=0 drop=0 early_drop=0 error=28 search_restart=2 
cpu=14          found=0 invalid=414 ignore=9260 insert=0 insert_failed=0 drop=0 early_drop=0 error=414 search_restart=4 
cpu=15          found=0 invalid=687 ignore=9864 insert=0 insert_failed=0 drop=0 early_drop=0 error=687 search_restart=2 
cpu=16          found=0 invalid=113 ignore=8785 insert=0 insert_failed=0 drop=0 early_drop=0 error=113 search_restart=3 
cpu=17          found=0 invalid=521 ignore=9293 insert=0 insert_failed=0 drop=0 early_drop=0 error=521 search_restart=0 
cpu=18          found=0 invalid=986 ignore=9984 insert=0 insert_failed=0 drop=0 early_drop=0 error=986 search_restart=3 
cpu=19          found=0 invalid=284 ignore=10705 insert=0 insert_failed=0 drop=0 early_drop=0 error=284 search_restart=1 
cpu=20          found=0 invalid=59 ignore=9476 insert=0 insert_failed=0 drop=0 early_drop=0 error=59 search_restart=2 
cpu=21          found=0 invalid=656 ignore=10483 insert=0 insert_failed=0 drop=0 early_drop=0 error=656 search_restart=4 
cpu=22          found=0 invalid=277 ignore=10489 insert=0 insert_failed=0 drop=0 early_drop=0 error=277 search_restart=0 
cpu=23          found=0 invalid=238 ignore=11165 insert=0 insert_failed=0 drop=0 early_drop=0 error=238 search_restart=2

gjkim42 · 2021-10-05T15:53:39Z

In my case, disabling the offloading for the VXLAN interface completely solves the issue.
I've used calico cni with VXLAN mode.

https://kubernetes.slack.com/archives/C09QYUH5W/p1633448073291500?thread_ts=1632813013.251600&cid=C09QYUH5W

projectcalico/calico#3145

prameshj · 2021-10-05T17:23:25Z

Glad you found the issue @gjkim42

@prameshj wrt your comment on networkpolicies and impact on iptables - I am not clear on what I should be looking for there that would cause the effect I am seeing? We make extensive use of network policies.

I was wondering if there is some iptables rule that drops requests from nodelocaldns to coreDNS pod if it is in the same node. Are requests dropped 100% if they are talking within the same node?

Also is this the same issue as what @gjkim42 mentioned?

@rtmie

rtmie · 2021-10-05T18:29:26Z

@prameshj "Are requests dropped 100% if they are talking within the same node?"

No just a relatively high servfail rate by comparison with running dnsperf workload on a node where there is no coredns

say. approx 0.02% vs 0%

prameshj · 2021-11-05T18:13:45Z

@rtmie could anything in the CNI be handling intra-node traffic differently than inter-node?

I would suggest trying out in a 1-node cluster with a single coreDNS pod as well, so that all requests from nodelocaldns are forced to go to the local coreDNS pod. If requests are dropped more frequently in this case, a packet capture can be the next step.

k8s-triage-robot · 2022-02-03T19:04:16Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot · 2022-03-05T19:31:54Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot · 2022-04-04T20:31:35Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue or PR with /reopen
Mark this issue or PR as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

k8s-ci-robot · 2022-04-04T20:31:53Z

@k8s-triage-robot: Closing this issue.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue or PR with /reopen

Mark this issue or PR as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

rahul-paigavan · 2022-09-01T19:07:08Z

/reopen

I am facing the same issue and don't see any solution posted here.
Is there any way I can fix this?

k8s-ci-robot · 2022-09-01T19:07:26Z

@rahul-paigavan: You can't reopen an issue/PR unless you authored it or you are a collaborator.

In response to this:

/reopen

I am facing the same issue and don't see any solution posted here.
Is there any way I can fix this?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

smoke · 2022-09-02T08:17:25Z

@rahul-paigavan In most modern K8S systems the DNS and TLS layer is abused by not properly using Keep Alive (HTTP / TCP) of the connections, this causing DNS to be queiried as well TLS to be handshaked on each request - here is a great blog post about that https://www.lob.com/blog/use-http-keep-alive in the context of NodeJS, but others are similar.

Other than that you should check and optimize the DNS zones inside the Nodelocal, look for:

instead of Nodelocal to forward each and every request to K8S DNS, you can get some zones on the fast path
you can tune the Nodelocal to properly cache and pre-fetch commonly queried stuff
you should tune and disable IPv6 if not used and possible
you should tune the ndots setting and / or use the proper DNS policy for that service https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/#pod-s-dns-policy
you should tune and beware of Linux connection tracking and DNS https://www.weave.works/blog/racy-conntrack-and-dns-lookup-timeouts (as well check https://kb.isc.org/docs/aa-01183) and smartly use TCP on the relevant layers to avoid that, and of course use UDP on other places

Overall the DNS and TLS layer is the most abused and most neglected layer by Engineers, but it is the most important thing in a distrubuted, clusterized system - so put decent enough of your time and efforts in to that as you will regret it!

k8s-ci-robot assigned rtmie Sep 17, 2021

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 3, 2022

whites11 mentioned this issue Mar 5, 2022

Node Local DNS Cache giantswarm/roadmap#828

Closed

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 5, 2022

k8s-ci-robot added the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Mar 5, 2022

k8s-ci-robot closed this as completed Apr 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

High level of i/o timeout erros in nodelocaldns pod when coredns pod on same node #480

High level of i/o timeout erros in nodelocaldns pod when coredns pod on same node #480

rtmie commented Sep 14, 2021 •

edited

Loading

prameshj commented Sep 17, 2021

prameshj commented Sep 17, 2021

rtmie commented Sep 20, 2021 •

edited

Loading

prameshj commented Sep 20, 2021

rtmie commented Sep 21, 2021 •

edited

Loading

rtmie commented Sep 21, 2021

prameshj commented Sep 23, 2021

rtmie commented Sep 27, 2021

gjkim42 commented Sep 28, 2021

chrisohaver commented Sep 28, 2021

gjkim42 commented Sep 28, 2021

gjkim42 commented Sep 30, 2021

vivekpd15 commented Sep 30, 2021

prameshj commented Sep 30, 2021

gjkim42 commented Sep 30, 2021 •

edited

Loading

prameshj commented Sep 30, 2021

gjkim42 commented Oct 1, 2021 •

edited

Loading

rtmie commented Oct 4, 2021

rtmie commented Oct 4, 2021

gjkim42 commented Oct 5, 2021

gjkim42 commented Oct 5, 2021

prameshj commented Oct 5, 2021

rtmie commented Oct 5, 2021 •

edited

Loading

prameshj commented Nov 5, 2021

k8s-triage-robot commented Feb 3, 2022

k8s-triage-robot commented Mar 5, 2022

k8s-triage-robot commented Apr 4, 2022

k8s-ci-robot commented Apr 4, 2022

rahul-paigavan commented Sep 1, 2022

k8s-ci-robot commented Sep 1, 2022

smoke commented Sep 2, 2022 •

edited

Loading

High level of i/o timeout erros in nodelocaldns pod when coredns pod on same node #480

High level of i/o timeout erros in nodelocaldns pod when coredns pod on same node #480

Comments

rtmie commented Sep 14, 2021 • edited Loading

Environment

Config

Issue Description

Reproduce

On a node where coredns is also running

On a node where coredns is not also running

cluster service resolution vs external resolution

prameshj commented Sep 17, 2021

prameshj commented Sep 17, 2021

rtmie commented Sep 20, 2021 • edited Loading

prameshj commented Sep 20, 2021

rtmie commented Sep 21, 2021 • edited Loading

rtmie commented Sep 21, 2021

prameshj commented Sep 23, 2021

rtmie commented Sep 27, 2021

gjkim42 commented Sep 28, 2021

chrisohaver commented Sep 28, 2021

gjkim42 commented Sep 28, 2021

gjkim42 commented Sep 30, 2021

vivekpd15 commented Sep 30, 2021

prameshj commented Sep 30, 2021

gjkim42 commented Sep 30, 2021 • edited Loading

prameshj commented Sep 30, 2021

gjkim42 commented Oct 1, 2021 • edited Loading

rtmie commented Oct 4, 2021

rtmie commented Oct 4, 2021

gjkim42 commented Oct 5, 2021

gjkim42 commented Oct 5, 2021

prameshj commented Oct 5, 2021

rtmie commented Oct 5, 2021 • edited Loading

prameshj commented Nov 5, 2021

k8s-triage-robot commented Feb 3, 2022

k8s-triage-robot commented Mar 5, 2022

k8s-triage-robot commented Apr 4, 2022

k8s-ci-robot commented Apr 4, 2022

rahul-paigavan commented Sep 1, 2022

k8s-ci-robot commented Sep 1, 2022

smoke commented Sep 2, 2022 • edited Loading

rtmie commented Sep 14, 2021 •

edited

Loading

rtmie commented Sep 20, 2021 •

edited

Loading

rtmie commented Sep 21, 2021 •

edited

Loading

gjkim42 commented Sep 30, 2021 •

edited

Loading

gjkim42 commented Oct 1, 2021 •

edited

Loading

rtmie commented Oct 5, 2021 •

edited

Loading

smoke commented Sep 2, 2022 •

edited

Loading