Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker ApplyLayer fails while attempting to pull an image #9624

Closed
soltysh opened this issue Jun 29, 2016 · 22 comments
Closed

Docker ApplyLayer fails while attempting to pull an image #9624

soltysh opened this issue Jun 29, 2016 · 22 comments
Assignees
Labels
component/containers kind/test-flake Categorizes issue or PR as related to test flakes. priority/P2

Comments

@soltysh
Copy link
Contributor

soltysh commented Jun 29, 2016

FAILURE after 18.659s: test/end-to-end/core.sh:74: executing 'docker pull centos/ruby-22-centos7:latest' expecting success: the command returned the wrong error code
Standard output from the command:
Trying to pull repository docker.io/centos/ruby-22-centos7 ... 
latest: Pulling from docker.io/centos/ruby-22-centos7
a3ed95caeb02: Already exists
da71393503ec: Already exists
a3ed95caeb02: Already exists
a3ed95caeb02: Already exists
a3ed95caeb02: Pulling fs layer
0ae955058062: Pulling fs layer
96f18d0972cc: Pulling fs layer
fe0dcb307ca7: Pulling fs layer
49808d1fea6f: Pulling fs layer
eaa692e177a7: Pulling fs layer
9df7bf51dad8: Pulling fs layer
bfdc90607585: Pulling fs layer
fe0dcb307ca7: Waiting
49808d1fea6f: Waiting
eaa692e177a7: Waiting
9df7bf51dad8: Waiting
bfdc90607585: Waiting
0ae955058062: Verifying Checksum
0ae955058062: Download complete
a3ed95caeb02: Verifying Checksum
a3ed95caeb02: Download complete
fe0dcb307ca7: Verifying Checksum
fe0dcb307ca7: Download complete
a3ed95caeb02: Pull complete
a3ed95caeb02: Pull complete
eaa692e177a7: Verifying Checksum
eaa692e177a7: Download complete
9df7bf51dad8: Verifying Checksum
9df7bf51dad8: Download complete
bfdc90607585: Verifying Checksum
bfdc90607585: Download complete
49808d1fea6f: Verifying Checksum
49808d1fea6f: Download complete
0ae955058062: Pull complete
0ae955058062: Pull complete
96f18d0972cc: Verifying Checksum
96f18d0972cc: Download complete
96f18d0972cc: Pull complete
96f18d0972cc: Pull complete
fe0dcb307ca7: Pull complete
fe0dcb307ca7: Pull complete
49808d1fea6f: Download complete
eaa692e177a7: Download complete
9df7bf51dad8: Download complete
bfdc90607585: Download complete
failed to register layer: ApplyLayer open /dev/mapper/docker-202:2-117588014-6b4a98e0d8608315d4875e5e207b1a4eaacc430535adce611956cdd54155584d: no such file or directory stdout: {"layerSize":0}
 stderr: 

Standard error from the command:
failed to register layer: ApplyLayer open /dev/mapper/docker-202:2-117588014-6b4a98e0d8608315d4875e5e207b1a4eaacc430535adce611956cdd54155584d: no such file or directory stdout: {"layerSize":0}
 stderr: 
[ERROR] PID 32141: hack/cmd_util.sh:230: `return "${return_code}"` exited with status 1.
[ERROR] Command with variables substituted is: 
[ERROR]     return 1
[INFO]      Stack Trace: 
[INFO]        1: hack/cmd_util.sh:230: `return "${return_code}"`
[INFO]        2: test/end-to-end/core.sh:74: os::cmd::expect_success
[INFO]   Exiting with code 1.

https://ci.openshift.redhat.com/jenkins/job/test_pull_requests_origin_integration/2613/console

@soltysh soltysh added priority/P2 kind/test-flake Categorizes issue or PR as related to test flakes. labels Jun 29, 2016
@soltysh
Copy link
Contributor Author

soltysh commented Jun 29, 2016

Similarly in #9597:

FAILURE after 26.396s: test/end-to-end/core.sh:74: executing 'docker pull centos/ruby-22-centos7:latest' expecting success: the command returned the wrong error code
Standard output from the command:
Trying to pull repository docker.io/centos/ruby-22-centos7 ... 
latest: Pulling from docker.io/centos/ruby-22-centos7
a3ed95caeb02: Already exists
da71393503ec: Already exists
a3ed95caeb02: Already exists
a3ed95caeb02: Already exists
a3ed95caeb02: Pulling fs layer
0ae955058062: Pulling fs layer
96f18d0972cc: Pulling fs layer
fe0dcb307ca7: Pulling fs layer
49808d1fea6f: Pulling fs layer
eaa692e177a7: Pulling fs layer
9df7bf51dad8: Pulling fs layer
bfdc90607585: Pulling fs layer
fe0dcb307ca7: Waiting
49808d1fea6f: Waiting
eaa692e177a7: Waiting
9df7bf51dad8: Waiting
bfdc90607585: Waiting
a3ed95caeb02: Verifying Checksum
a3ed95caeb02: Download complete
0ae955058062: Verifying Checksum
0ae955058062: Download complete
fe0dcb307ca7: Verifying Checksum
fe0dcb307ca7: Download complete
a3ed95caeb02: Pull complete
a3ed95caeb02: Pull complete
eaa692e177a7: Verifying Checksum
eaa692e177a7: Download complete
9df7bf51dad8: Verifying Checksum
9df7bf51dad8: Download complete
bfdc90607585: Verifying Checksum
bfdc90607585: Download complete
49808d1fea6f: Verifying Checksum
49808d1fea6f: Download complete
0ae955058062: Pull complete
0ae955058062: Pull complete
96f18d0972cc: Verifying Checksum
96f18d0972cc: Download complete
96f18d0972cc: Pull complete
96f18d0972cc: Pull complete
fe0dcb307ca7: Pull complete
fe0dcb307ca7: Pull complete
49808d1fea6f: Pull complete
49808d1fea6f: Pull complete
9df7bf51dad8: Download complete
bfdc90607585: Download complete
failed to register layer: devicemapper: Error running deviceResume dm_task_run failed

Standard error from the command:
failed to register layer: devicemapper: Error running deviceResume dm_task_run failed
[ERROR] Command with variables substituted is: 
[ERROR]     return 1
[INFO] Stacked plot for log subset "iops" written to /tmp/openshift/test-end-to-end-docker//logs/iops.pdf
[INFO] Stacked plot for log subset "paging" written to /tmp/openshift/test-end-to-end-docker//logs/paging.pdf
[INFO] Stacked plot for log subset "cpu" written to /tmp/openshift/test-end-to-end-docker//logs/cpu.pdf
[INFO] Stacked plot for log subset "queue" written to /tmp/openshift/test-end-to-end-docker//logs/queue.pdf
[INFO] Stacked plot for log subset "memory" written to /tmp/openshift/test-end-to-end-docker//logs/memory.pdf

@mrunalp
Copy link
Member

mrunalp commented Jun 30, 2016

@rhvgoyal could you take a look?

@liggitt
Copy link
Contributor

liggitt commented Jul 7, 2016

https://ci.openshift.redhat.com/jenkins/job/test_pull_requests_origin_integration/2950/:

Running test/end-to-end/core.sh:76: executing 'docker pull centos/ruby-22-centos7:latest' expecting success...
FAILURE after 4.100s: test/end-to-end/core.sh:76: executing 'docker pull centos/ruby-22-centos7:latest' expecting success: the command returned the wrong error code
Standard output from the command:
Trying to pull repository docker.io/centos/ruby-22-centos7 ... 
latest: Pulling from docker.io/centos/ruby-22-centos7
0653bff3c5cf: Already exists
a3ed95caeb02: Pulling fs layer
1da9c0484401: Pulling fs layer
88905dd8ffe4: Pulling fs layer
e9c3689754d9: Pulling fs layer
91ed7dc53855: Pulling fs layer
6bfd35e78c63: Pulling fs layer
d06f194b368d: Pulling fs layer
8afc589cb07e: Pulling fs layer
e9c3689754d9: Waiting
91ed7dc53855: Waiting
6bfd35e78c63: Waiting
d06f194b368d: Waiting
8afc589cb07e: Waiting
1da9c0484401: Verifying Checksum
1da9c0484401: Download complete
a3ed95caeb02: Verifying Checksum
a3ed95caeb02: Download complete
e9c3689754d9: Verifying Checksum
e9c3689754d9: Download complete
6bfd35e78c63: Verifying Checksum
6bfd35e78c63: Download complete
d06f194b368d: Verifying Checksum
d06f194b368d: Download complete
8afc589cb07e: Verifying Checksum
8afc589cb07e: Download complete
91ed7dc53855: Verifying Checksum
91ed7dc53855: Download complete
a3ed95caeb02: Pull complete
a3ed95caeb02: Pull complete
88905dd8ffe4: Verifying Checksum
88905dd8ffe4: Download complete
1da9c0484401: Download complete
88905dd8ffe4: Download complete
e9c3689754d9: Download complete
91ed7dc53855: Download complete
6bfd35e78c63: Download complete
d06f194b368d: Download complete
8afc589cb07e: Download complete
failed to register layer: ApplyLayer open /dev/mapper/docker-202:2-117757614-99fb723441468f7fa037e8bf2b0aaf69ab838b937c6b11c3585915d55d6429ed: no such file or directory stdout: {"layerSize":0}
 stderr: 

Standard error from the command:
failed to register layer: ApplyLayer open /dev/mapper/docker-202:2-117757614-99fb723441468f7fa037e8bf2b0aaf69ab838b937c6b11c3585915d55d6429ed: no such file or directory stdout: {"layerSize":0}
 stderr: 
[ERROR] PID 32415: hack/lib/cmd.sh:230: `return "${return_code}"` exited with status 1.

@bparees
Copy link
Contributor

bparees commented Jul 14, 2016

@smarterclayton
Copy link
Contributor

Saw this again

failed to register layer: ApplyLayer open /dev/mapper/docker-202:2-50501374-69a158cd976fb09763ffdc84a498aa2bb2e34b4da4f6b069b2cad6f8e7b56ed7: no such file or directory stdout: {"layerSize":0}
 stderr: 
F0803 08:16:56.435014       1 helpers.go:108] error: build error: failed to register layer: ApplyLayer open /dev/mapper/docker-202:2-50501374-69a158cd976fb09763ffdc84a498aa2bb2e34b4da4f6b069b2cad6f8e7b56ed7: no such file or directory stdout: {"layerSize":0}
 stderr:

https://ci.openshift.redhat.com/jenkins/job/test_pull_requests_origin_conformance/4623/consoleFull

@smarterclayton
Copy link
Contributor

@rhvgoyal saw this again on the latest RH Docker 1.10 - can you take a look?

@mrunalp
Copy link
Member

mrunalp commented Aug 3, 2016

@rhvgoyal is on PTO. @shishir-a412ed could probably take a look.

@shishir-a412ed
Copy link

shishir-a412ed commented Aug 3, 2016

@soltysh

  1. What is your host OS linux ?
  2. can you provide output of:
    a) docker version
    b) docker info
    c) vgs
    d) lvs

Shishir

@shishir-a412ed
Copy link

@smarterclayton I have a Red Hat Enterprise Linux (RHEL) VM and I tried in on that. I was able to successfully pull down this image docker.io/centos/ruby-22-centos7.

sh-4.2# cat /etc/redhat-release 
Red Hat Enterprise Linux Server release 7.2 (Maipo)
sh-4.2# docker images
REPOSITORY                              TAG                 IMAGE ID            CREATED             SIZE
docker.io/centos/ruby-22-centos7        latest              4bf7a6c33303        2 days ago          442.4 MB

I am running docker on loopback devices. Here is the output of docker version and docker info.

docker version:

sh-4.2# docker version
Client:
 Version:         1.10.3
 API version:     1.22
 Package version: docker-common-1.10.3-44.el7.x86_64
 Go version:      go1.4.2
 Git commit:      7ffc8ee-unsupported
 Built:           Fri Jun 17 15:27:21 2016
 OS/Arch:         linux/amd64

Server:
 Version:         1.10.3
 API version:     1.22
 Package version: docker-common-1.10.3-44.el7.x86_64
 Go version:      go1.4.2
 Git commit:      7ffc8ee-unsupported
 Built:           Fri Jun 17 15:27:21 2016
 OS/Arch:         linux/amd64

docker info:

Containers: 2
 Running: 0
 Paused: 0
 Stopped: 2
Images: 9
Server Version: 1.10.3
Storage Driver: devicemapper
 Pool Name: docker-253:0-69612513-pool
 Pool Blocksize: 65.54 kB
 Base Device Size: 107.4 GB
 Backing Filesystem: xfs
 Data file: /dev/loop0
 Metadata file: /dev/loop1
 Data Space Used: 2.917 GB
 Data Space Total: 107.4 GB
 Data Space Available: 11.08 GB
 Metadata Space Used: 3.105 MB
 Metadata Space Total: 2.147 GB
 Metadata Space Available: 2.144 GB
 Udev Sync Supported: true
 Deferred Removal Enabled: false
 Deferred Deletion Enabled: false
 Deferred Deleted Device Count: 0
 Data loop file: /var/lib/docker/devicemapper/devicemapper/data
 WARNING: Usage of loopback devices is strongly discouraged for production use. Either use `--storage-opt dm.thinpooldev` or use `--storage-opt dm.no_warn_on_loop_devices=true` to suppress this warning.
 Metadata loop file: /var/lib/docker/devicemapper/devicemapper/metadata
 Library Version: 1.02.107-RHEL7 (2015-12-01)
Execution Driver: native-0.2
Logging Driver: json-file
Plugins: 
 Volume: local
 Network: bridge null host
 Authorization: rhel-push-plugin
Kernel Version: 3.10.0-327.18.2.el7.x86_64
Operating System: Employee SKU
OSType: linux
Architecture: x86_64
Number of Docker Hooks: 2
CPUs: 2
Total Memory: 1.954 GiB
Name: localhost.localdomain
ID: XR6T:J7OH:KVBU:ZZCZ:HCYE:FPU4:LCSH:7GMT:R6OO:MI5S:B4O3:DAIW
WARNING: bridge-nf-call-ip6tables is disabled
Registries: registry.access.redhat.com (secure), docker.io (secure)

I also tried this on upstream docker 1.12 on a Fedora 23, works fine for me.

Shishir

@smarterclayton
Copy link
Contributor

This is a race condition that occurs a few times out of every hundred or
thousand pulls. So it's more subtle than "just broken" and either needs
debugging added to docker (so that when it happens we can figure out why)
or an assessment of the code in question to why it would race.

This is RHEL with the latest Docker 1.10 packages.

On Wed, Aug 3, 2016 at 1:50 PM, Shishir [email protected] wrote:

@smarterclayton https://github.com/smarterclayton I have a Red Hat
Enterprise Linux (RHEL) VM and I tried in on that. I was able to
successfully pull down this image docker.io/centos/ruby-22-centos7.

sh-4.2# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 7.2 (Maipo)
sh-4.2# docker images
REPOSITORY TAG IMAGE ID CREATED SIZEdocker.io/centos/ruby-22-centos7 latest 4bf7a6c33303 2 days ago 442.4 MB

I am running docker on loopback devices. Here is the output of docker
version and docker info.

docker version:

sh-4.2# docker version
Client:
Version: 1.10.3
API version: 1.22
Package version: docker-common-1.10.3-44.el7.x86_64
Go version: go1.4.2
Git commit: 7ffc8ee-unsupported
Built: Fri Jun 17 15:27:21 2016
OS/Arch: linux/amd64

Server:
Version: 1.10.3
API version: 1.22
Package version: docker-common-1.10.3-44.el7.x86_64
Go version: go1.4.2
Git commit: 7ffc8ee-unsupported
Built: Fri Jun 17 15:27:21 2016
OS/Arch: linux/amd64

docker info:

Containers: 2
Running: 0
Paused: 0
Stopped: 2
Images: 9
Server Version: 1.10.3
Storage Driver: devicemapper
Pool Name: docker-253:0-69612513-pool
Pool Blocksize: 65.54 kB
Base Device Size: 107.4 GB
Backing Filesystem: xfs
Data file: /dev/loop0
Metadata file: /dev/loop1
Data Space Used: 2.917 GB
Data Space Total: 107.4 GB
Data Space Available: 11.08 GB
Metadata Space Used: 3.105 MB
Metadata Space Total: 2.147 GB
Metadata Space Available: 2.144 GB
Udev Sync Supported: true
Deferred Removal Enabled: false
Deferred Deletion Enabled: false
Deferred Deleted Device Count: 0
Data loop file: /var/lib/docker/devicemapper/devicemapper/data
WARNING: Usage of loopback devices is strongly discouraged for production use. Either use --storage-opt dm.thinpooldev or use --storage-opt dm.no_warn_on_loop_devices=true to suppress this warning.
Metadata loop file: /var/lib/docker/devicemapper/devicemapper/metadata
Library Version: 1.02.107-RHEL7 (2015-12-01)
Execution Driver: native-0.2
Logging Driver: json-file
Plugins:
Volume: local
Network: bridge null host
Authorization: rhel-push-plugin
Kernel Version: 3.10.0-327.18.2.el7.x86_64
Operating System: Employee SKU
OSType: linux
Architecture: x86_64
Number of Docker Hooks: 2
CPUs: 2
Total Memory: 1.954 GiB
Name: localhost.localdomain
ID: XR6T:J7OH:KVBU:ZZCZ:HCYE:FPU4:LCSH:7GMT:R6OO:MI5S:B4O3:DAIW
WARNING: bridge-nf-call-ip6tables is disabled
Registries: registry.access.redhat.com (secure), docker.io (secure)

I also tried this on upstream docker 1.12 on a Fedora 23, works fine for
me.

Shishir


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#9624 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABG_p4g2FNoWVeUEeF9lq48lG3cBGa_Qks5qcNTzgaJpZM4JBBqE
.

@smarterclayton
Copy link
Contributor

+ vagrant ssh -c 'sudo docker version'
Client:
 Version:         1.10.3
 API version:     1.22
 Package version: docker-common-1.10.3-46.el7.10.x86_64
 Go version:      go1.6.2
 Git commit:      2a93377-unsupported
 Built:           Fri Jul 29 13:45:25 2016
 OS/Arch:         linux/amd64

Server:
 Version:         1.10.3
 API version:     1.22
 Package version: docker-common-1.10.3-46.el7.10.x86_64
 Go version:      go1.6.2
 Git commit:      2a93377-unsupported
 Built:           Fri Jul 29 13:45:25 2016
 OS/Arch:         linux/amd64

from the CI machines. Judging by the timing, we've only seen this since moving to the RH 1.10 stream.

@smarterclayton smarterclayton changed the title executing 'docker pull centos/ruby-22-centos7:latest' expecting success: the command returned the wrong error code Docker ApplyLayer fails while attempting to pull an image Aug 3, 2016
@soltysh
Copy link
Contributor Author

soltysh commented Aug 4, 2016

@shishir-a412ed this is a flake on our test infrastructure, not something easily reproducible, unfortunately, like Clayton already mentioned.

@mfojtik
Copy link
Contributor

mfojtik commented Aug 29, 2016

@mrunalp is this fixed?

@rhvgoyal
Copy link

We have fixed one devicemapper race which happens when deferred removal is scheduled and cancelled before device could go away. In those cases also, device node is not present one gets error messages like open of device failed. Not sure if this error will be fixed by that or not.

projectatomic/docker#188

@mfojtik
Copy link
Contributor

mfojtik commented Oct 12, 2016

@rhvgoyal @mrunalp do we have the fixed docker in Jenkins? I can still see this happening...

@mrunalp
Copy link
Member

mrunalp commented Nov 17, 2016

@mfojtik Is this still happening? cc: @rhvgoyal

@jwhonce jwhonce self-assigned this Nov 18, 2016
@imcleod
Copy link

imcleod commented Nov 18, 2016

How about we whip up a standalone pull stress tester to try to reproduce this outside of OpenShift?

@mrunalp
Copy link
Member

mrunalp commented Nov 18, 2016

@imcleod SGTM. @rhvgoyal @runcom WDYT?

@runcom
Copy link
Member

runcom commented Nov 18, 2016

sgtm as well 👍

@rhvgoyal
Copy link

So ApplyLayer Fails and complains about associated device not being present. I am wondering if image is still present or not (underlying layer on this this diff layer is being applied on).

@rhvgoyal
Copy link

Have people noticed this problem in 1.10.3? In 1.10.3, we had fixed couple of races.

@rhvgoyal
Copy link

This ApplyLayer failure message seems to be coming from applyLayerHandler() in pkg/chrootarchive/diff_unix.go

This in turn calls docker-applyLayer with dest argument which is supposed to be a directory on top of which layer has to be applied.

But somebody is trying to open the device /dev/mapper/<foo> and that is not making much sense. That must have happened in Get() call before calling applyLayerHandler().

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/containers kind/test-flake Categorizes issue or PR as related to test flakes. priority/P2
Projects
None yet
Development

No branches or pull requests