Excessive "Failed to receive from netlink: no buffer space available" errors #779

zihaoyu · 2017-07-14T23:50:13Z

We run flannel for our Kubernetes clusters, but not as a CNI plugin yet. Still the old way - run flannel, output the properties file, let docker read --bip flag from the properties file.

We noticed a large amount of the following errors when the cluster is under load:

Jul 14 15:11:46 ip-10-72-134-42.ec2.internal flannel-wrapper[1528]: E0714 15:11:46.419175    1528 device.go:222] Failed to receive from netlink: no buffer space available"

Expected Behavior

We should see fewer errors in flannel logs?

Current Behavior

We see a lot of such errors. Below is a Kibana screenshot.

Possible Solution

Steps to Reproduce (for bugs)

Scale Kubernetes cluster to ~300 minions.
Increase load/traffic on the cluster
Check flannel logs.

Context

We are seeing network timeouts in almost all of our microservices in the cluster, not sure if they are related, but highly susceptible.

Your Environment

Flannel version: v0.7.0
Backend used (e.g. vxlan or udp): vxlan
Etcd version: v3.1.9
Kubernetes version (if used): v1.5.7
Operating System and version: CoreOS 1353.7.0 stable
Link to your project (optional):

The text was updated successfully, but these errors were encountered:

tomdee · 2017-07-21T22:12:28Z

It looks like it might be worth trying to increase the buffer size of the netlink socket
From https://www.netfilter.org/documentation/FAQ/netfilter-faq-4.html

these are standard Netlink sockets, and you can tune their receive buffer sizes via /proc/sys/net/core, sysctl, or use the SO_RCVBUF socket option on the file descriptor.

The file descriptor is available by calling GetFd() on nlsock here - https://github.com/coreos/flannel/blob/master/backend/vxlan/device.go#L214

zihaoyu mentioned this issue Jul 16, 2017

Errors in log using vxlan backend under CPU load #414

Closed

tomdee mentioned this issue Jul 27, 2017

backend/vxlan: simplify vxlan processing #785

Merged

tomdee added the area/performance label Aug 14, 2017

tomdee closed this as completed in #785 Aug 14, 2017

jpiper mentioned this issue Mar 10, 2018

Missing routes with many nodes on vxlan #958

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Excessive "Failed to receive from netlink: no buffer space available" errors #779

Excessive "Failed to receive from netlink: no buffer space available" errors #779

zihaoyu commented Jul 14, 2017

tomdee commented Jul 21, 2017 •

edited

Loading

Excessive "Failed to receive from netlink: no buffer space available" errors #779

Excessive "Failed to receive from netlink: no buffer space available" errors #779

Comments

zihaoyu commented Jul 14, 2017

Expected Behavior

Current Behavior

Possible Solution

Steps to Reproduce (for bugs)

Context

Your Environment

tomdee commented Jul 21, 2017 • edited Loading

tomdee commented Jul 21, 2017 •

edited

Loading