arp_cache: neighbor table overflow! #4533

felipejfc · 2018-02-27T22:53:26Z

What kops version are you running? The command kops version, will display
this information.
1.8
What Kubernetes version are you running? kubectl version will print the
version if a cluster is running or provide the Kubernetes version specified as
a kops flag.
1.8.6
What cloud provider are you using?
aws

I'm having a log of arp_cache table overflow in my production cluster, reading this blog post about large clusters: https://blog.openai.com/scaling-kubernetes-to-2500-nodes/ they say that the solution is increasing the maximum size of the arp cache table, can I configure sysctl options:

net.ipv4.neigh.default.gc_thresh1
net.ipv4.neigh.default.gc_thresh2
net.ipv4.neigh.default.gc_thresh3

using kops?

thanks!

The text was updated successfully, but these errors were encountered:

chrislovecnm · 2018-02-27T23:17:49Z

Using a hook sure!

chrislovecnm · 2018-02-27T23:18:32Z

See https://github.com/kubernetes/kops/blob/master/docs/cluster_spec.md

For docs on hooks

felipejfc · 2018-02-27T23:20:44Z

thanks @chrislovecnm, don't you think default max arp table size should be greater though? My cluster isn't even so bigger, it has ~60 nodes and ~2000 ~3000 pods I guess... what would be the downsize on permiting a larger arp table?

chrislovecnm · 2018-02-28T01:06:14Z

I would check with sig node. Really a kernel question.

justinsb · 2018-02-28T06:38:14Z

So I think we want to keep gc_thresh1 at 0: kubernetes/kubernetes#23395

I don't see a lot of problems in raising gc_thresh1 and gc_thresh2, but I'm not sure whether we should do this across the board automatically. I think it probably depends on your networking mode, in that I think modes that tunnel traffic will only use a single ARP entry per node, whereas modes that don't tunnel will use an ARP entry per pod. But I'm not really sure. Which network mode are you using @felipejfc ?

felipejfc · 2018-02-28T12:00:22Z

@justinsb you must be right, I use calico

felipejfc · 2018-02-28T12:01:10Z

for future reference if someout needs it, I've used the following hook:

  hooks:
  - manifest: |
      Type=oneshot
      ExecStart=/sbin/sysctl net.ipv4.neigh.default.gc_thresh3=8192 ; /sbin/sysctl net.ipv4.neigh.default.gc_thresh2=4096 ; /sbin/sysctl -p
    name: increase-neigh-gc-thresh.service

felipejfc · 2018-03-02T17:04:17Z

this was causing real damage to my cluster, I've saw a big performance improvements on several services in the cluster after increasing gc_thresh3 and gc_thresh2

chrislovecnm · 2018-03-02T17:22:41Z

@caseydavenport any comments?

caseydavenport · 2018-03-02T18:11:43Z

Seems sensible to me - using Calico each node will have an arp entry for each pod running on that node, so if you've got high pod density / pod churn adjusting makes sense.

I think it probably depends on your networking mode, in that I think modes that tunnel traffic will only use a single ARP entry per node,

I think it's probably bridged vs not bridged that makes the difference here. If all the pods are on a bridge the host will only need a single ARP entry, but for routed pods the host will need an ARP entry for each.

felipejfc · 2018-03-02T21:37:25Z

@caseydavenport I guess every node will also have ARP entry for each of the pods running on other nodes as well, right? at least the ones the communicate with?

caseydavenport · 2018-03-02T22:27:01Z

guess every node will also have ARP entry for each of the pods running on other nodes as well, right? at least the ones the communicate with?

No, it shouldn't have one for every pod because the nodes themselves are the next hops for traffic, not individual pod IPs. Instead, you'll get an ARP entry for each node in the cluster. So, a given node's ARP cache should roughly be num_pods_on_that_node + num_nodes_in_cluster.

felipejfc · 2018-03-02T22:49:10Z

@caseydavenport is it possible that calico is never cleaning nodes that were deleted from the cluster from the ARP table? I'm using AWS and this seems to be the case, take a look:

admin@ip-172-20-152-253:~$ sudo arp -an
? (172.20.136.122) at <incomplete> on eth0
? (172.20.151.251) at <incomplete> on eth0
? (172.20.157.194) at <incomplete> on eth0
? (172.20.135.175) at <incomplete> on eth0
? (172.20.149.88) at <incomplete> on eth0
? (172.20.133.12) at <incomplete> on eth0
? (172.20.150.190) at <incomplete> on eth0
? (172.20.142.212) at <incomplete> on eth0
? (172.20.147.156) at <incomplete> on eth0
? (172.20.135.149) at <incomplete> on eth0
? (172.20.142.193) at <incomplete> on eth0
? (172.20.140.166) at <incomplete> on eth0
? (172.20.159.68) at <incomplete> on eth0
? (172.20.154.197) at <incomplete> on eth0
? (172.20.143.104) at <incomplete> on eth0
? (172.20.158.14) at <incomplete> on eth0
? (172.20.134.69) at <incomplete> on eth0
? (172.20.145.91) at <incomplete> on eth0
? (172.20.148.118) at <incomplete> on eth0
? (172.20.132.26) at <incomplete> on eth0
? (172.20.138.215) at <incomplete> on eth0
? (172.20.129.120) at <incomplete> on eth0
? (172.20.147.224) at <incomplete> on eth0
? (172.20.133.77) at 0e:c4:88:17:6e:66 [ether] on eth0
? (172.20.157.124) at <incomplete> on eth0
? (172.20.140.234) at <incomplete> on eth0
? (172.20.147.221) at <incomplete> on eth0
? (172.20.158.82) at <incomplete> on eth0
? (172.20.135.202) at <incomplete> on eth0
? (172.20.138.61) at <incomplete> on eth0
? (172.20.145.178) at <incomplete> on eth0
? (172.20.148.73) at <incomplete> on eth0
? (172.20.158.79) at <incomplete> on eth0
? (172.20.150.229) at <incomplete> on eth0
? (172.20.137.197) at <incomplete> on eth0
? (172.20.141.14) at <incomplete> on eth0
? (172.20.148.186) at <incomplete> on eth0
? (172.20.159.246) at <incomplete> on eth0
? (172.20.154.119) at <incomplete> on eth0
? (172.20.146.141) at <incomplete> on eth0
? (172.20.153.2) at <incomplete> on eth0
? (172.20.133.145) at <incomplete> on eth0
? (172.20.151.121) at <incomplete> on eth0
? (172.20.154.96) at <incomplete> on eth0
? (172.20.128.22) at <incomplete> on eth0
? (172.20.138.20) at <incomplete> on eth0
? (172.20.135.45) at <incomplete> on eth0
? (172.20.157.64) at <incomplete> on eth0
? (172.20.130.162) at <incomplete> on eth0
? (172.20.152.193) at <incomplete> on eth0
? (172.20.134.247) at <incomplete> on eth0
? (172.20.131.213) at <incomplete> on eth0
? (172.20.150.60) at <incomplete> on eth0
...
...

I use cluster autoscaler and there are nodes being started and deleted all time, this is the output from a machine thats only 4 hours old and it has 1174 entries in arp table despite my cluster only having like 60 nodes... and there are this ips that seems to be ips of brokers that are not alive anymore and stays with incomplete state.

caseydavenport · 2018-03-02T23:07:27Z

@felipejfc while Calico isn't responsible for modifying the ARP table directly, I suspect this is a result of the same root cause as this issue: #3224

Basically Calico node configuration isn't getting cleaned up when nodes go away, so Calico will continue to try to access those nodes and thus will create a bunch of ARP entries which it can't complete (since the nodes are no longer there).

Adding in the node controller to kops should fix this as well.

…CALICO_K8S_NODE_REF in calico-node, this commit fixes kubernetes#3224 and kubernetes#4533

alienth · 2018-04-18T02:42:48Z

The kernel should be expiring stale arp entries. Seems like the bug which was referenced in https://forums.aws.amazon.com/thread.jspa?messageID=572171 wasn't actually forwarded onto the kernel devs?

I think the ideal kernel behaviour would be to GC entries down to the minimum, but GC beyond that for stale entries.

alienth · 2018-04-18T02:50:18Z

Worth noting: If you're getting neighbor table overflow! log entries, this indicates that even after a synchronous GC of the ARP table, there was not enough room to store the neighbour entry. In this event the kernel just drops the packet entirely. The specific threshold you're hitting there is gc_thresh3.

Perf issues abound when this happens because the neighbour table is locked while the synchronous GC is performed. As such, you'll definitely want to ensure that gc_thresh3 is far higher than what you expect your ARP table to be.

fejta-bot · 2018-07-17T03:33:50Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

caseydavenport · 2018-08-15T22:01:38Z

Can this be closed now?

felipejfc · 2018-08-15T22:19:33Z

sure

felipejfc added a commit to felipejfc/kops that referenced this issue Mar 6, 2018

[Calico] Activate node controller in calico-kube-controllers and add …

1393247

…CALICO_K8S_NODE_REF in calico-node, this commit fixes kubernetes#3224 and kubernetes#4533

felipejfc added a commit to felipejfc/kops that referenced this issue Mar 6, 2018

[Calico] Activate node controller in calico-kube-controllers and add …

c9610ac

…CALICO_K8S_NODE_REF in calico-node, this commit fixes kubernetes#3224 and kubernetes#4533

felipejfc added a commit to felipejfc/kops that referenced this issue Mar 6, 2018

[Calico] Activate node controller in calico-kube-controllers and add …

468d941

…CALICO_K8S_NODE_REF in calico-node, this commit fixes kubernetes#3224 and kubernetes#4533

felipejfc mentioned this issue Mar 6, 2018

[Calico] Fix delay setting up ip routes in new nodes #4588

Closed

vendrov pushed a commit to vendrov/kops that referenced this issue Mar 21, 2018

[Calico] Activate node controller in calico-kube-controllers and add …

6fd7670

…CALICO_K8S_NODE_REF in calico-node, this commit fixes kubernetes#3224 and kubernetes#4533

rdrgmnzs pushed a commit to rdrgmnzs/kops that referenced this issue Apr 6, 2018

[Calico] Activate node controller in calico-kube-controllers and add …

b715f17

…CALICO_K8S_NODE_REF in calico-node, this commit fixes kubernetes#3224 and kubernetes#4533

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 17, 2018

felipejfc closed this as completed Aug 15, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

arp_cache: neighbor table overflow! #4533

arp_cache: neighbor table overflow! #4533

felipejfc commented Feb 27, 2018

chrislovecnm commented Feb 27, 2018

chrislovecnm commented Feb 27, 2018

felipejfc commented Feb 27, 2018

chrislovecnm commented Feb 28, 2018

justinsb commented Feb 28, 2018

felipejfc commented Feb 28, 2018

felipejfc commented Feb 28, 2018

felipejfc commented Mar 2, 2018

chrislovecnm commented Mar 2, 2018 •

edited

Loading

caseydavenport commented Mar 2, 2018

felipejfc commented Mar 2, 2018

caseydavenport commented Mar 2, 2018 •

edited

Loading

felipejfc commented Mar 2, 2018 •

edited

Loading

caseydavenport commented Mar 2, 2018

alienth commented Apr 18, 2018 •

edited

Loading

alienth commented Apr 18, 2018 •

edited

Loading

fejta-bot commented Jul 17, 2018

caseydavenport commented Aug 15, 2018

felipejfc commented Aug 15, 2018

arp_cache: neighbor table overflow! #4533

arp_cache: neighbor table overflow! #4533

Comments

felipejfc commented Feb 27, 2018

chrislovecnm commented Feb 27, 2018

chrislovecnm commented Feb 27, 2018

felipejfc commented Feb 27, 2018

chrislovecnm commented Feb 28, 2018

justinsb commented Feb 28, 2018

felipejfc commented Feb 28, 2018

felipejfc commented Feb 28, 2018

felipejfc commented Mar 2, 2018

chrislovecnm commented Mar 2, 2018 • edited Loading

caseydavenport commented Mar 2, 2018

felipejfc commented Mar 2, 2018

caseydavenport commented Mar 2, 2018 • edited Loading

felipejfc commented Mar 2, 2018 • edited Loading

caseydavenport commented Mar 2, 2018

alienth commented Apr 18, 2018 • edited Loading

alienth commented Apr 18, 2018 • edited Loading

fejta-bot commented Jul 17, 2018

caseydavenport commented Aug 15, 2018

felipejfc commented Aug 15, 2018

chrislovecnm commented Mar 2, 2018 •

edited

Loading

caseydavenport commented Mar 2, 2018 •

edited

Loading

felipejfc commented Mar 2, 2018 •

edited

Loading

alienth commented Apr 18, 2018 •

edited

Loading

alienth commented Apr 18, 2018 •

edited

Loading