Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Excessive "Failed to receive from netlink: no buffer space available" errors #779

Closed
zihaoyu opened this issue Jul 14, 2017 · 1 comment
Closed

Comments

@zihaoyu
Copy link

zihaoyu commented Jul 14, 2017

We run flannel for our Kubernetes clusters, but not as a CNI plugin yet. Still the old way - run flannel, output the properties file, let docker read --bip flag from the properties file.

We noticed a large amount of the following errors when the cluster is under load:

Jul 14 15:11:46 ip-10-72-134-42.ec2.internal flannel-wrapper[1528]: E0714 15:11:46.419175    1528 device.go:222] Failed to receive from netlink: no buffer space available"

Expected Behavior

We should see fewer errors in flannel logs?

Current Behavior

We see a lot of such errors. Below is a Kibana screenshot.

screen shot 2017-07-14 at 7 49 08 pm

Possible Solution

Steps to Reproduce (for bugs)

  1. Scale Kubernetes cluster to ~300 minions.
  2. Increase load/traffic on the cluster
  3. Check flannel logs.

Context

We are seeing network timeouts in almost all of our microservices in the cluster, not sure if they are related, but highly susceptible.

Your Environment

  • Flannel version: v0.7.0
  • Backend used (e.g. vxlan or udp): vxlan
  • Etcd version: v3.1.9
  • Kubernetes version (if used): v1.5.7
  • Operating System and version: CoreOS 1353.7.0 stable
  • Link to your project (optional):
@tomdee
Copy link
Contributor

tomdee commented Jul 21, 2017

It looks like it might be worth trying to increase the buffer size of the netlink socket
From https://www.netfilter.org/documentation/FAQ/netfilter-faq-4.html

these are standard Netlink sockets, and you can tune their receive buffer sizes via /proc/sys/net/core, sysctl, or use the SO_RCVBUF socket option on the file descriptor.

The file descriptor is available by calling GetFd() on nlsock here - https://github.com/coreos/flannel/blob/master/backend/vxlan/device.go#L214

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants