-
Notifications
You must be signed in to change notification settings - Fork 30
docker ps hangs #1654
Comments
@chancez pointed out that moby/moby#13885 and moby/libnetwork#1507 might be related. |
We run 5 machines and get this hang ~weekly (for the last 3 months). Is there anything we could run on the machines when the hang happens again? |
machines are 8CPU/30gb. During the last hang we were able to start new containers while |
I am also experiencing this issue with the following setup: CoreOS Version: 1185.3.0 Not only does Nov 10 10:59:20 <aws_dns>.eu-west-1.compute.internal docker-compose[5872]: An HTTP request took too long to complete. Retry with --verbose to obtain debug information.
Nov 10 10:59:20 <aws_dns>.eu-west-1.compute.internal docker-compose[5872]: If you encounter this issue regularly because of slow network conditions, consider setting COMPOSE_HTTP_TIMEOUT to a higher value (current value: 60). In the Docker
I am able to temporarily resolve the issue by manually restarting |
@Jwpe when this happens again, try
|
@matti thanks for the tip, will do! |
@matti for me |
@Jwpe (or anyone else reading this) -- please send SIGUSR1 to the docker daemon process and paste in the logs. |
@matti I get a huge goroutine stack dump on sending that signal:
|
@Jwpe yes... please paste the rest of it... @juhazi over here formatted our stacktrace here: https://gist.github.com/juhazi/fbf22602561b719e9480f8be4f8a4740 |
and for the record: this time all daemon API's were jammed:
|
We are also seeing this behavior on some 1185.3.0 nodes (on VMWare ESXi) where we are using systemd units with the Here are some goroutine stack dump's of 3 different machines that had the issue that In our case the units also created the following error messages:
Even though the stop/kill/rm said the container was not there. Which looks a lot like moby/moby#21198 which might be related Just to be complete, we are also running cadvisor and a custom container that calls list containers about every 5min. |
We are also running cadvisor (as the cluster runs http://www.kontena.io) |
Having same issue since upgrading to 1185.3.0, here are the logs: https://gist.github.com/anonymous/0ad15cacc028c38ff7759abba7ace198 |
Same here at Digital Ocean. Let me know if you need more logs, happy to help.
|
I've updated CoreOS 1185.3.0 to 1192.2.0 (Docker 1.11.2 to 1.12.1). Will keep you guys posted! |
Has anyone had any joy here? I'm seeing this every two or three restarts of my containers, and it's currently limiting my ability to repeatably deploy. |
We had to roll back to 1122.3.0 and it is working Version 1185.3.0 it is not usable, every ~24hours the docker daemon gets unresponsive. |
Daemon jammed again, this time not even |
@bkleef are you seeing this issue still on 1192.2.0? Thanks! |
@Jwpe yes it is fixed at 1192.2.0! |
@Jwpe It's not fixed for us in 1192.2.0. |
I don't think that bug is related, it was opened in Jun 2015 and this started to happen recently with the latests CoreOS versions. If that's the bug, then it should've happened in older CoreOS versions |
@victorgp It is almost sure that it is related if you read all the comments or at least the latest ones. Just it is more often under Docker v1.11.0 and up. Which is only in the latest CoreOS (stable) not the previous one. Anyway my suggestion is working for me for about a week. And it was not too hard to rewrite my services, which are needed to run processes or new containers, to |
@Raffo I haven't encountered the issue since upgrading to 1192.2.0, but I might just not have recreated the scenario where it occurs yet. |
We're also seeing this on 1185.3.0. At least on the one system it's currently happening on |
We had a hanging We are now waiting for Docker 1.13 release and inclusion in CoreOS: zalando-incubator/kubernetes-on-aws#167 |
I was also able to reproduce this issue using https://github.com/crosbymichael/docker-stress on a Kubernetes worker node running CoreOS Stable 1185.3.0. Running Upgraded to CoreOS Beta 1235.1.0 and I haven't been able to reproduce. Whereas running 5 concurrent docker_stress workers would kill CoreOS Stable after a few minutes, I was able to run with 10 and 15 concurrent workers until test completion using CoreOS Beta. CoreOS Stable 1185.3.0kernel: 4.7.3docker: 1.11.2CoreOS Beta 1235.1.0kernel: 4.8.6docker: 1.12.3 |
Just as an FYI, I'm not entirely sure this is CoreOS specific. We are running into the same issues on CentOS. Its feels more relates to Docker atm but I've been unable to get a dump out of Docker on failure to validate. |
@mward29 yes, it's a Docker bug and fixed in 1.13 (and will be backported to 1.12 AFAIK, see moby/moby#28889) --- still it should be fixed in CoreOS (by upgrading to fixed Docker as soon as it's released). |
We are updating to Docker 1.12.4, which contains the upstream fixes for the docker daemon deadlocks (moby/moby#29095, moby/moby#29141). It will be available in the alpha later this week. You can reopen this if problems persist. |
This kept happening to me at about 10% of the time. For non-alpha releases, I solved this by adding a unit that constantly checks docker version and restarts if it hangs.
Edit: s/version/ps |
This comment has been minimized.
This comment has been minimized.
How do you stop the stdout of docker logs? |
@hjacobs i am seeing this on 1.12.6 Is this fixed, if yes in which version ? docker version Server: strace for docker ps
|
I'm also seeing this in docker server version 1.12.6 |
sudo systemctl status docker.service after docker ps: systemctl status docker.service |
That seems reasonable; docker is a socket-activated service. |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This ticket is unfortunately starting to attract spurious follow-ups not related to ContainerLinux. I'm thus going to lock this to prevent further unrelated comments. |
Issue Report
Bug
CoreOS Version
1185.3.0
Environment
AWS and GCE confirmed. Likely all environments.
Expected Behavior
docker ps
properly returns the list of running containers.Actual Behavior
docker ps
eventually hangs (not sure which conditions cause it yet).Reproduction Steps
In @matti's case:
Other Information
strace:
The text was updated successfully, but these errors were encountered: