-
Notifications
You must be signed in to change notification settings - Fork 30
Docker completely stuck - unregister_netdevice: waiting for lo to become free. Usage count = 1 #254
Comments
We are also seeing this. We wrapped phantomjs with a timeout script ( http://www.bashcookbook.com/bashinfo/source/bash-4.0/examples/scripts/timeout3 ) which seemed to help but can reproduce this easily on deploys when lots of containers are running (300+). It looks like the container does not shutdown because phantomjs is locked and docker gets stuck trying to unregister the net device. Even after the container processes are long gone something in docker hangs on to the device. I saw a bug upstream somewhere but I cant seem to find it now, and I have seen this behavior on ubuntu and debian as well, so I am assuming that this is in dockers hands but someone may be able to follow up on that better than me. |
Upstream ticket : moby/moby#5618 Issues is widespread across docker installs on multiple OS's |
This issue requires a reboot of a node, which messes everything up. |
@rkononov are you running ubuntu within your containers? There is a long running ubuntu bug report, with activity here : https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1403152 |
@popsikle yep we're using ubuntu |
+1, using ubuntu. Very critical bug unfortunately |
Is this still an issue with the later versions of CoreOS and Docker? |
Closing due to inactivity. |
It looks like this may be a kernel bug with a patch released upstream. Though I haven't tested it myself. |
I'm having this bug with a home-compiled Kernel 4.3 running on KVM + Debian8 |
Also experiencing this issue. CentOS 7.2 running 4.3.3 (elrepo) kernel, guest is Debian running nginx (nginx:latest official stock container). Please let me know if there is any way I can help. |
Having this problem on CoreOS alpha 921.0.0, on a cluster brought up by your very own |
@iameli Can you give 928.0.0 a shot? This has the 4.4 kernel with the patch mentioned by @binarybana. |
kudos!!! I have been seeing this all the time! |
@crawford Sure. The cluster is up and running, I'll check back in here if it breaks. |
@crawford I took over for @iameli. FYI, 928.0.0 still exhibits the problem. Happened to one node after about 6 days of uptime. The other node in the cluster is fine, strangely. I did notice influxdb and heapster running out of their allocated cgroup memory and getting OOM-killed, so I scaled both of those down to zero (since we don't need them as far as I know). The kernel message is a bit different this time... I don't see a stack trace, but docker does go zombie. Here's the tail end of
|
have this issue on debian running a centos7 image. hadoop@jin: Server: |
This issue happens regularly in my CoreOS machines
Is there a work around this issue? |
@crawford We're still receiving this on
|
The bug that will never die. |
Still experience this under heavy load on Docker version 1.12.6, build 7392c3b/1.12.6 on 4.4.39-34.54.amzn1.x86_64 AWS Linux AMI. |
1 similar comment
Still experience this under heavy load on Docker version 1.12.6, build 7392c3b/1.12.6 on 4.4.39-34.54.amzn1.x86_64 AWS Linux AMI. |
@cjpetrus @jamesjryan It sounds like you're using Amazon Linux, not Container Linux. Perhaps you meant to comment on moby/moby#5618 or the AWS support forums? |
@euank indeed I did, thanks. |
I've got same issues. Attachments are all of logs that made from ecs-logs-collector script. |
Same issue on centos 7.3 bare metal docker host with docker 1.13.1 |
@truongexp, @virtuman, it looks like neither of you are using Container Linux. Perhaps you meant to post on moby/moby#5618 or to contact your respective distro vendor's issue trackers. I don't see any reports yet for one of our releases with a 4.9.9 kernel, but it might be too soon for me to get my hopes up. |
Same issue when I mounted a NFS volume in a privileged container. |
Another source of leakage has been fixed by torvalds/linux@f647821, which is part of 4.11. |
torvalds/linux@b7c8487 is in the same batch of commits, is also in 4.11, and fixes another possible cause |
moby/moby#5618 references torvalds/linux@d747a7a which is in 4.12 |
If you're not using Container Linux, you're looking for moby/moby#5618 I'm optimistically closing this bug with the assertion that kernel 4.12 (in Container Linux beta/alpha channels now) should have fixed all the common causes. There hasn't been much noise on this issue actually related to Container Linux for a while now, which I also think is a good sign we might be in the clear to close it. If you can still reproduce this issue on an up to date Container Linux machine, please do give a shout. |
Centos 6.8 Fix: |
same isshue with CentOS Linux release 7.3.1611and Docker 1.12.6 the only container running is nginx:stable-alpine |
RHEL.. same issue... Dokcer EE under heavy load run |
Centos 7 — kernel:unregister_netdevice: waiting for lo to become free. Usage count = 2 |
Same Isue on Centos 7 in AWS |
Locking this issue. If you're not using CoreOS Container Linux, please see moby/moby#5618. If you're seeing this issue on a current release of Container Linux, please open a new bug. |
After few hours of running phantomjs inside docker containers(we constantly rotating them) we started getting " unregister_netdevice: waiting for lo to become free. Usage count = 1" and systemd show that docker.service unit became unresponsive, restarting docker.service doesn't help.
callstack:
Coreos version
uname -a:
docker -H 0.0.0.0:2375 version:
also i've noticed that docker process became zombie:
699 ? Zsl 36:30 [docker] <defunct>
not sure that there is easy way to reproduce :(
The text was updated successfully, but these errors were encountered: