-
Notifications
You must be signed in to change notification settings - Fork 18.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docker fails to mount the block device for the container on devicemapper #4036
Comments
Same here, running ubuntu precise with lts-raring kernel (3.8.0-35-generic). |
This is due to a race with udev. The problem is that starting a container creates a dm device, activates it and then immediately deactivates it and then activates it again. This races with the udev device-node creation such that you end up in a state where the device node created by udev is removed by docker. |
Runtime.Register() called driver.Get()/Put() in order to read back the basefs of the container. However, this is not needed, as the basefs is read during container.Mount() anyway, and basefs is only valid while mounted (and all current calls satisfy this). This seems minor, but this is actually problematic, as the Get/Put pair will create a spurious mount/unmount cycle that is not needed and slows things down. Additionally it will create a supurious devicemapper activate/deactivate cycle that causes races with udev as seen in moby#4036. With this change devicemapper is now race-free, and container startup is slightly faster. Docker-DCO-1.1-Signed-off-by: Alexander Larsson <[email protected]> (github: alexlarsson)
@alexlarsson Could we work around it somehow? It looks like every non-Fedora system has this problem. |
@unclejack #4067 fixes it |
The problem seems to be fixed for runs, but not for builds. When building this Dockerfile:
docker fails with the same error:
|
@discordianfish I'll have a look at that. |
CmdRun() calls first run() and then wait() to wait for it to exit, then it runs commit(). The run command will mount the container and the container exiting will unmount it. Then the commit will immediately mount it again to do a diff. This seems minor, but this is actually problematic, as the Get/Put pair will create a spurious mount/unmount cycle that is not needed and slows things down. Additionally it will create a supurious devicemapper activate/deactivate cycle that causes races with udev as seen in moby#4036. To ensure that we only unmount once we split up run() into create() and run() and reference the mount until after the commit(). With this change docker build on devicemapper is now race-free, and slightly faster. Docker-DCO-1.1-Signed-off-by: Alexander Larsson <[email protected]> (github: alexlarsson)
#4096 fixes a similar spurious mount/unmount cycle during build. |
While the above fixes will fix this for most people we should probably leave this open to track the actual problem of the udev race, as it would be good to have a fix for that too. |
The above fixes are a big improvement, but when restarting container it sometimes still happens:
|
CmdRun() calls first run() and then wait() to wait for it to exit, then it runs commit(). The run command will mount the container and the container exiting will unmount it. Then the commit will immediately mount it again to do a diff. This seems minor, but this is actually problematic, as the Get/Put pair will create a spurious mount/unmount cycle that is not needed and slows things down. Additionally it will create a supurious devicemapper activate/deactivate cycle that causes races with udev as seen in moby#4036. To ensure that we only unmount once we split up run() into create() and run() and reference the mount until after the commit(). With this change docker build on devicemapper is now race-free, and slightly faster. Docker-DCO-1.1-Signed-off-by: Alexander Larsson <[email protected]> (github: alexlarsson)
Runtime.Register() called driver.Get()/Put() in order to read back the basefs of the container. However, this is not needed, as the basefs is read during container.Mount() anyway, and basefs is only valid while mounted (and all current calls satisfy this). This seems minor, but this is actually problematic, as the Get/Put pair will create a spurious mount/unmount cycle that is not needed and slows things down. Additionally it will create a supurious devicemapper activate/deactivate cycle that causes races with udev as seen in moby#4036. With this change devicemapper is now race-free, and container startup is slightly faster. Docker-DCO-1.1-Signed-off-by: Alexander Larsson <[email protected]> (github: alexlarsson)
Stopping the container will typicall cause it to unmount, to keep it mounted over the stop/start cycle we aquire a temporary reference to it during this time. This helps with moby#4036 Docker-DCO-1.1-Signed-off-by: Alexander Larsson <[email protected]> (github: alexlarsson)
@discordianfish The restart issue should be fixed in #4180 |
These issues aren't occurring any more on any of my test systems. |
I believe it is still possible to run into the race if you e.g. do: |
Short description of what I believe is happening: fish_: its a lot more complex |
Reopening still the actual problem is still there. |
I hate to be participant 132 in a huge thread, but after skimming this ticket, I think I "should" be OK, but am not. Using CentOS7:
This is an example of the error I receive:
This happens 5-20% of the time it seems. A few days ago, I had Docker totally trash all its containers. I researched some, and learned that on CentOS 7 hosts, the preferred method is a "raw" LVM thin provisioned, so that's what I did:
I'm not listing here all the LVM stuff I did, but it was working for a few days... docker info
|
It definitely seems to be a race condition, when my Jenkins tries to spawn a handful of containers at once. I manually tried this on the server:
And it ran flawlessly. However, if I background them instead, to force all 25 to try to create at once:
So 2 of 25 failed. And they "stay dead":
|
@AaronDMarasco-VSI @RoelVdP could you open a new issue instead? (but feel free to link to this issue). This discussion is already really long, and I think it's better to start a "fresh" one. |
I'm getting...
For no apparent reason. I'm guessing this is related? I'm trying to run a golang app in Docker on Ubuntu 14.04. |
@EwanValentine looks not related to this one; think that's this one #18098 |
I'm not sure how they're related @thaJeztah there doesn't seem to be a suggested fix or definitive cause in that reference :( |
Similar error while trying to build with docker 1.9.0 and 1.9.1:
Building process works well after a few times retry. Here is my docker info:
My Host OS is OSX EI Capitan 10.11.2 + Virtual Box 5.10 + Vagrant 1.7.4. |
@thisiswangle I too have the same problem my setup is running on guest os Ubuntu 14.04 x86_64 3.13.0-77-generic |
I'm getting this error every 2-3 builds... i have to restart the command and then it works.
on DigitalOcean 1GB instance |
What I've done to fix it (apparently it did) was installing the docker-engine instead of the docker-lxc in the ubuntu machine I have... I think docker-lxc is probably legacy and shouldn't be used. In the docker-engine installation I don't get
|
@xarem how did you install docker; did you use the apt repository, or install a static binary? I see you're using devicemapper without Udev sync support; it's strongly discouraged to run without Udev sync, as that will lead to data loss, and strange behavior like this. The default storage driver for Ubuntu is aufs, which will be used if you install using the installation procedure in https://docs.docker.com/engine/installation/ubuntulinux/ You will need to wipe your /var/lib/docker to do a fresh install though (otherwise the devicemapper dir will be still there) |
++@andrecp was about to post the same answer. Installing the latest "docker-engine" from the repository actually fixes this issue. What's important here is that you need to have all the extra linux kernel features installed (so called extras) prior installing docker-engine. Also the key to success is to get the dynamically linked docker binary:
Above results with enabled udev sync support for devicemapper. |
I ran docker info (sorry that I can't publish kernel and os)
|
This is still a problem? I have: docker --versionDocker version 1.11.1, build 5604cbe Getting: docker build -t mongodb .Sending build context to Docker daemon 16.9 kB docker versionClient: Server: So pretty fresh I think. |
@davehodg can you show your |
docker infoContainers: 32 |
If I were to get the freshest most up to date docker, where would I get it from? |
OK, got the source from here. Tried make: makedocker build -t "docker-dev:master" -f "Dockerfile" . I think I might make a new RHEL7 vmware image... |
Not really telling me anything new :( |
@davehodg no, problem is indeed that it's quite an unpredictable issue. It's known to be problematic on systems without udev-sync, also running on "loop devices" doesn't help. |
Thanks. I'm so far down the rabbit hole at this point. |
Register my whine, but I've moved to running Docker on my Mac. Let's see how that transpires. |
I also encountered this issue in our environment, and I worked out a way to reproduce it, hope it may help to resolve the issue. docker info
way to reproducemake sure all containers removed
get image readya get script readyA Python script is needed to eagerly detect and open any docker created non-init dm devices: import os
import time
while True:
try:
for line in os.listdir('/dev/mapper'):
if 'init' in line:continue
if len(line) >= 80:
filename=os.path.join('/dev/mapper/',line)
f = open(filename)
print 'got it %s' % filename
time.sleep(3*10**7)
except Exception:
pass make sure all container or image related dm devices removedrun run the script and reproduce the issuefollowing is one sample of my reproducing logs: # docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
# dmsetup info
Name: docker-252:1-58724678-pool
State: ACTIVE
Read Ahead: 8192
Tables present: LIVE
Open count: 0
Event number: 0
Major, minor: 253, 0
Number of targets: 1
# python opendm.py &
[1] 3757
# cid=`docker create -it busybox sh`
got it /dev/mapper/docker-252:1-58724678-097a341a6a5f17cc86f9d83ebdff35b8da86fd8ab6d614ac8445f3fabe1d6e25
# docker diff $cid
# docker start $cid
Error response from daemon: open /dev/mapper/docker-252:1-58724678-097a341a6a5f17cc86f9d83ebdff35b8da86fd8ab6d614ac8445f3fabe1d6e25: no such file or directory
Error: failed to start containers: d3cbd8f86704136511b5d4cfdafeb1398a46f962ff9af7389ec7368f8472433e the essential commands are: python opendm.py &
docker diff $cid
docker start $cid get the corrupted container back to normal# kill %1
# dmsetup remove docker-252:1-58724678-097a341a6a5f17cc86f9d83ebdff35b8da86fd8ab6d614ac8445f3fabe1d6e25
[1]+ Terminated python opendm.py
# ls /dev/dm-*
/dev/dm-0
# ls /dev/mapper/
control docker-252:1-58724678-pool
# docker start $cid
d3cbd8f86704136511b5d4cfdafeb1398a46f962ff9af7389ec7368f8472433e the essential operations to recover the container include:
|
@jizhilong can you please open a new issue? The original issue reported here was resolved and related to |
I'm locking this issue, and provide a link and quote to the resolution for the issue that was originally reported: #4036 (comment). If you still encounter this, please open a new issue with as much information as possible and steps to reproduce. (I modified some outdated information in the description below) This issue is resolvedKey item to observe before commenting on this issue, is whether Udev syncDevicemapper storage driver expects to be synchronized with udev. When this info reports CausesThere are a couple of causes for Udev sync to not be supported:
Solutions
Docker 1.11 and up will refuse to start the daemon if Udev sync is not supported (see #21097) |
When running something like
for i in {0..100}; do docker run busybox echo test; done
with Docker running on devicemapper, errors are thrown and containers fail to run:Fedora 20 with kernel 3.12.9 doesn't seem to be affected.
kernel version, distribution, docker info and docker version:
The Docker binary is actually master with PR #4017 merged.
/cc @alexlarsson
The text was updated successfully, but these errors were encountered: