-
Notifications
You must be signed in to change notification settings - Fork 30
docker ps hangs #965
Comments
@yifan-gu you were showing me this behavior the other day, but I forget what was causing it. Can you refresh my memory? |
@crawford @rwky |
Running into this as well, see kubernetes/kubernetes#17525 |
@rwky have you experienced this after your initial report? |
@mischief yep at least once a week. |
@rwky any tips on reproducing it? is it intermittent, or are you doing something to work around it? |
I don't know how to predictably replicate it, however the server in question spawns and destroys (using the --rm flag) several docker containers every minute. I suspect that just running a CoreOS server that does that for a few days will trigger the hang. The only way to resolve is to reboot the server (you could probably kill and restart docker but rebooting does the job). |
I'm seeing the same problem now (I'm on IRC, and will post the content of IRC here), I'll have to reboot the server in 2 hours, if in the mean time, you want me to run commands, let me know! Ok, I'm rebooting now, here is the IRC discussion: https://framabin.org/?3b8e73977c597b06#7wIlBFnWRaGO0+o34QkfauXwhm2kSvbk+NInpxe6kes= |
Same for me on docker 1.10.1 after intensive using of dockerui web app (https://hub.docker.com/r/crosbymichael/dockerui/). |
I experienced similar hang with CoreOS stable (835.12.0). Okay, I have no idea if this is the same issue, but it is reproducible with this command: |
+1 for a rapid resolution of this issue on CoreOS 970.1.0 with Docker 1.10.1 with kubernetes 1.1.7. Is there an active investigation of the problem? There was a suggestion that it might be a kernel issue (moby/moby#13885 (comment)) but I am not sure how it is addressed or should be addressed in CoreOS. |
@gopinatht can you give the Beta channel a shot? That has the upstream fix for device mapper. |
@crawford Thanks for the info. I will give it a try. Just to confirm, the alpha channel does not have the fix (I am using the absolute latest version of Alpha)? Also, |
Sorry, I missed the CoreOS version before. That kernel also has the patch. I doubt it will behave any differently. |
@crawford No worries. Any other insights on where I can investigate? I am kind of stuck on what I should do next. This issue has become a blocker for my team using a CoreOS/kubernetes/docker solution. |
Guys, I logged an issue with Docker team and they are indicating that the docker build on CoreOS is different enough (different Go version and custom build on CoreOS) (moby/moby#20871 (comment)) for this to be more of a CoreOS issue. Help or pointers from CoreOS team would be appreciated. |
I really don't know where to start on this one. We are going to bump to Docker 1.10.2 in the next Alpha and then bump to Go 1.5.3 in the following one. So, by this time next week, we will have a mostly vanilla Docker shipping in CoreOS. Given how long this bug has been open, I don't expect the version bump to change anything, but at least we'll be able to debug with the Docker folks a bit easier. Let's reevaluate the situation next week. |
To help debugging. If I add the |
@vpal Thanks for this info. My system is a little different in that I am not using hyperkube on my AWS cluster. But I wonder why mounting /sys as read only mount should solve the docker hang issue. I am not able to make the connection. |
@crawford I investigated the issue further and posted the results of my investigation there: moby/moby#20871 (comment) FYI. |
I asked this question on the docker bug page but thought it would be relevant here as well: Could this issue be related in any way to the hairpin NAT setting? Is is possible there is some sort of thread deadlocks on a device? When I ran |
@gopinatht as of 983.0.0, we are shipping Docker built with Go 1.5.3, so that should eliminate a few variables. |
@crawford Thanks for the heads-up. I will check it out. |
I came across same issue with docker 1.8.3. I noticed docker built with Go Version 1.4.2 . [root@abhi ]# docker version Server: |
Is this still an issue with 1.10.3? I haven't seen this one in a while (in fact, I forgot about this bug). |
@crawford I've not seen this on 1.10.3 so far as I can remember, the server I normally see it on runs the beta channel so has been on 1.10.3 for a while. Just checked my logs, not seen it since May 3rd. |
@rwky Thank you for checking. Let's go ahead and close this one. We can reopen it if it turns out to still be a problem. |
seeing this still:
|
We see this quite often on 1068.6.0 and Kubernetes 1.3.3. Please consider reopening this issue.
Also seeing these a lot in
|
This just happened to me again, however at the same time I'm experiencing moby/moby#5618 which I suspect is the cause @feelobot @zihaoyu perhaps you are experiencing the same thing? |
@rwky Not sure if it is related, but I did see
|
Yep that looks like it's the problem. |
This is happening for me too. Docker version 1.10.3, build 3cd164c CoreOS stable (1068.8.0) Kernel Version: 4.6.3-coreos
dmesg shows the folowing
We are using Flannal driver |
Echo @rwky: reported details here: moby/moby#5618 (comment) . I'm probably going to try @sercand 's suggestion (moby/moby#5618 (comment)) It should probably also be mentioned that this of course affects, by default, clusters build by the |
sounds like a dup of #254 |
@mischief, doesn't look like a dup, because we run container the same way as @rwky (every minute with --rm) and I can see that docker hangs exactly on the container, long before
|
I don't have
strace:
so Coreos:
running in google cloud platform. |
Since this is a new bug, let's move this conversation over to #1654. |
systemctl restart docker |
After a while of running various containers for no apparent reason docker ps hangs. All other docker commands work
Docker info:
Server:
Version: 1.8.3
API version: 1.20
Go version: go1.5.1
Git commit: cedd534-dirty
Built: Fri Oct 16 04:20:25 UTC 2015
OS/Arch: linux/amd64
Containers: 23
Images: 520
Storage Driver: overlay
Backing Filesystem: extfs
Execution Driver: native-0.2
Logging Driver: json-file
Kernel Version: 4.2.2-coreos
Operating System: CoreOS 835.1.0
CPUs: 8
Total Memory: 31.46 GiB
ID: KOW4:OCYI:PTET:G2F7:BPHH:Y5GU:XMXT:U2D3:5SKA:MO63:SE2Y:I42C
I ran strace docker ps via toolbox to see if there's a clue of what's going on and it produced
So it looks like it's writing the get command but the server isn't responding.
The text was updated successfully, but these errors were encountered: