-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Very specific set of circumstances leads to zero-byte (empty) file being created #1015
Comments
I can reproduce this on current Manjaro (
|
Both Arch linux and Manjaro are distributions that are not officially supported. Packages distributed by those distributions are maintained by those distributions and use differences in both the build process, build-time dependencies, as well as different versions of (e.g.) containerd and runc. I'd recommend opening a ticket in the corresponding distribution's issue tracker for that reason. I'm closing this issue because of the above, but feel free to continue the conversation |
@pedantic-git have you resolved this? did you create an issue elsewhere that I can follow? I'm seeing the same issue with the official ruby 2.6 docker image on an up to date manjaro install (updating an existing install doesnt seem to trigger the error) |
@kthibodeaux I didn't create an issue anywhere else because I didn't have time to test it with the official build of Docker - have you been able to do that? It seems like even though it's happening to everyone on Arch/Manjaro it won't be supported here if it's only the Arch build of Docker. |
Did either of you get anywhere with this @pedantic-git @kthibodeaux? Got another example here, also on Arch. |
Sadly not. I never got past the "Arch isn't supported" message above. I guess taking it up with the maintainers of the Arch package is the way to go. Looks like that's Morten Linderud [email protected] if you'd like to raise a case and copy me in (quinn at fishpercolator.co.uk) but I'm not really even sure how one goes about doing that. |
I'm able to reproduce this on Fedora 33 using the package I get the exact same output as @pedantic-git My
|
@thaJeztah Now that this issue can be reproduced in an official build (albeit of Moby Engine rather than CE) is it possible this issue could warrant reopening? |
@thaJeztah any chance to reopen this issue? Same issue here(on ubunto 20.04), running images Ubuntu:18:04, Ubuntu:20.04, and Ruby:3.0.0.
|
Same here on Fedora 33 |
I just encountered the same issue and hope that it will be reopened and addressed here as it was far from easy to debug. In the meantime I'll have to add this (Ruby) monkey patch to my docker setup: module FileUtils
class Entry_
def copy_file(dest)
Open3.capture2('cp', path(), dest)
end
end
end |
@makmic There's actually no need to shell out to a separate and heavyweight |
I'm facing to the same issue on CentOS.
|
@pedantic-git I don't suppose you have an example no-op chmod or utime that's effective? and do you think they'd be faster than my current workaround?
Could this be an interaction with LUKS Full Disk Encryption? That's the main disk related difference on the affected system I have, when the same project is fine on all others. |
@dwarburt Have you seen the example repo in the original post? The last two tests are the no-op ones. They're basically the same as your |
I tried the reproduct-copystream in my old laptop and could not reproduce the bug. Working config :
Failing config :
This was also failing with Docker version 20.10.6, build 370c289 in my new laptop. |
So I thought it might be a regression appeared after 19.13.1, but a team mate made the test and does not have the bug with 20.10.6. So it's not only related to a docker version. working config :
|
Several our Bitbucket Pipelines customers are facing the same issue after our FlatCarOS upgrade 2605.12.0 to 2765.2.2 (kernel move from 5.4.92 directly to 5.10.25)
|
I ran into this same problem with ruby's
I was also able to work around it without changing source or spec code by mounting the
|
Reopening as this is being seen in supported configurations. |
Seeing this on docker for Mac as well. Very strange. I can reproduce on one image, but not another. |
I first came across this issue on Arch on 2020-05-28 in a failing RSpec test using ActiveStorage, returning an @maddymarkovitz and I did much investigation on this in the following months. With The bug probably resides in the overlay2 subsystem, as it was changed in that release, and from my results of trying out different Docker storage drivers:
Enabling/disabling We put together a minimal C file reproducing the bug by executing the #include <unistd.h>
#include <sys/syscall.h>
#include <fcntl.h>
int main(int argc, char *argv[]) {
int input = open(argv[1], O_RDONLY);
int temp1 = open("/tmp/copy_file_range_test", O_WRONLY|O_CREAT|O_TRUNC, 0100644);
syscall(SYS_copy_file_range, input, NULL, temp1, NULL, 6, 0);
close(input);
close(temp1);
int temp2 = open("/tmp/copy_file_range_test", O_RDONLY);
int dest = open("destination", O_WRONLY|O_CREAT|O_TRUNC, 0100644);
syscall(SYS_copy_file_range, temp2, NULL, dest, NULL, 6, 0);
close(temp2);
close(dest);
} #!/bin/bash
set -Eeuo pipefail
source=${1-Gemfile}
check() {
if wc -c destination | grep -E '^0 ' > /dev/null; then
echo 'Copy failed'
else
echo OK
fi
echo
}
printf "FROM debian:10.8-slim\nRUN apt update && apt install -y gcc strace" > /tmp/strace_Dockerfile
docker build -f /tmp/strace_Dockerfile -t strace .
echo
echo Local:
(
# set -x
gcc test.c
strace ./a.out "$source" &> test-strace
)
check
echo Docker - mounted:
docker run --rm -v "$(pwd):/work" -w /work strace bash -c "
#set -x &&
gcc test.c &&
strace ./a.out '$source' &> test-strace-docker-mounted
"
check
echo Docker - copied:
docker run --rm -v "$(pwd):/work" -w /work strace bash -c "
#set -x &&
cp '$source' /x &&
gcc test.c &&
strace ./a.out /x &> test-strace-docker-copied
"
check Bug output:
We had meant to report the bug to the Linux project, but never quite got around to it. Has anyone else done so already? Running the script today while writing this (on
Our colleagues using Mac were not seeing this issue at the time. Now however, Mac people would be seeing this bug too, due to Docker for Mac updating the Linux kernel version used in the VM (to a version above Linux 5.6.0). In Docker for Mac 3.0.0 the Linux kernel sees a massive upgrade, to 5.10.25. |
Our team also found a comment in Rust stdlib about the issue of This is the related PR: This is the related issue: Maybe it would help to identify the impact of the issue |
Our team using both Mac and Linux have been seeing this issue while using test-kitchen inside docker containers to test our chef cookbooks. As @ZimbiX mentioned the bug seems to have been introduced in 5.6 kernel, it however did not seem to be present in kernel versions >= 5.11. Users originally saw the following errors which I am including below incase anyone else hits them as this issue was pretty rough to track down.
While test-kitchen was staging the files to be sent to the remote vm, it created a bunch of 0-byte cookbook files and json files. Depending on what we we're testing we hit the above different errors. We ended up using the following monkeypatch in our testing environment's docker container to workaround the issue. # frozen_string_literal: true
require 'fileutils'
module FileUtils
class Entry_
def copy_file(dest)
File.open(path()) do |s|
File.open(dest, 'wb', s.stat.mode) do |f|
IO.copy_stream(s, f)
f.chmod f.lstat.mode
end
end
end
end
end We also put guards around it so it would only be loaded on the problematic kernels, so in the future in case we forgot about it... it wouldn't bite us... |
It seems some distributions that run a 5.4 kernel, such as Ubuntu 18.04.6, are also affected because they have pulled in torvalds/linux@1a980b8: stanhu@stanhu-ubuntu-18:/tmp$ grep ovl_splice linux-gcp-5.4-5.4.0/fs/overlayfs/file.c
static ssize_t ovl_splice_read(struct file *in, loff_t *ppos,
ovl_splice_write(struct pipe_inode_info *pipe, struct file *out,
.splice_read = ovl_splice_read,
.splice_write = ovl_splice_write, I've reported this to Ubuntu in https://bugs.launchpad.net/ubuntu/+source/linux-base/+bug/1953199. I'd suggest others run the reproduction step in #1015 (comment) and report the bugs to the distribution maintainers. |
…cleCI issues As seen in this issue, certain versions of Docker and certain versions of the Linux kernel manifest a problem where creating tempfiles wiht `IO.copy_stream` doesn't work: docker/for-linux#1015 CircleCI seems to have upgraded their linux kernels and now we're seeing this problem. A temporary workaround is to change the ActiveStorage code to not use IO.copy_stream by making a custom service (since the Disk service is only for test anyway so we can do whatever we want)
…cleCI issues As seen in this issue, certain versions of Docker and certain versions of the Linux kernel manifest a problem where creating tempfiles wiht `IO.copy_stream` doesn't work: docker/for-linux#1015 CircleCI seems to have upgraded their linux kernels and now we're seeing this problem. A temporary workaround is to change the ActiveStorage code to not use IO.copy_stream by making a custom service (since the Disk service is only for test anyway so we can do whatever we want)
CentOS 8 also has this bug even though it's shipping with a 4.18 kernel. I reported it here: https://bugs.centos.org/view.php?id=18370 RedHat Enterprise Linux 8.3 also has this problem: https://bugzilla.redhat.com/show_bug.cgi?id=2028998 |
This commit is a workaround for the issue #537. It allows the Content Publisher and Whitehall test suites to pass in the GOV.UK Docker development environment. Once docker/for-linux#1015 has been fixed, this workaround will no longer be needed.
Linux v5.10.84 has now been tagged with the fixes for the overlay filesystem: |
Thanks for the update, @stanhu - we added an internal ticket to either patch the kernel version as used by Docker Desktop, or to wait for the upstream kernel to include the fix; having it merged in upstream definitely makes it easier 👍 /cc @fredericdalleau @djs55 FYI |
Hi @thaJeztah, I see that an internal ticket was opened to fix this kernel issue. I just wanted to ask if this has been planned and if there's an ETA? I've been holding off on updating Docker for Mac for a long time now (due to this issue, which is caused by the underlying kernel problem), and would love to be able to update without having to find a workaround. Thanks heaps for your help! |
@smartygus Kernel 5.10.105 has been released fixing this bug! |
@emanuelhfarias this is great news!! Thanks for the heads up! :) |
Yes, I think we can close this issue. I see that upgrading Docker for Mac to v4.6.0 upgraded the kernel to |
Other related distribution notes:
|
According to https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8-beta/html/8.6_release_notes/new-features#enhancement_kernel, RedHat 8.6 will be released on May 22, 2022 with |
For some reason, temporary files weren’t created correctly when running tests inside our Docker development environment. This seems to be a Docker issue that can be worked around by mounting the `/tmp` directory. References: alphagov/govuk-docker#539 docker/for-linux#1015
For some reason, temporary files weren’t created correctly when running tests inside our Docker development environment. This seems to be a Docker issue that can be worked around by mounting the `/tmp` directory. References: alphagov/govuk-docker#539 docker/for-linux#1015
For some reason, temporary files weren’t created correctly when running tests inside our Docker development environment. This seems to be a Docker issue that can be worked around by mounting the `/tmp` directory. References: alphagov/govuk-docker#539 docker/for-linux#1015
For some reason, temporary files weren’t created correctly when running tests inside our Docker development environment. This seems to be a Docker issue that can be worked around by mounting the `/tmp` directory. References: alphagov/govuk-docker#539 docker/for-linux#1015
For some reason, temporary files weren’t created correctly when running tests inside our Docker development environment. This seems to be a Docker issue that can be worked around by mounting the `/tmp` directory. References: alphagov/govuk-docker#539 docker/for-linux#1015
For some reason, temporary files weren’t created correctly when running tests inside our Docker development environment. This seems to be a Docker issue that can be worked around by mounting the `/tmp` directory. References: alphagov/govuk-docker#539 docker/for-linux#1015
For some reason, temporary files weren’t created correctly when running tests inside our Docker development environment. This seems to be a Docker issue that can be worked around by mounting the `/tmp` directory. References: alphagov/govuk-docker#539 docker/for-linux#1015
For some reason, temporary files weren’t created correctly when running tests inside our Docker development environment. This seems to be a Docker issue that can be worked around by mounting the `/tmp` directory. References: alphagov/govuk-docker#539 docker/for-linux#1015
FOSS port of pe-puppetdb's 0886afb401129df40da1d0965b91ee2c08c76e7a There's a particularly gnarly bug in Linux kernels 5.6 to 5.10 that can result in 0 byte files being written when copying files inside containers in a specific workflow as described in: docker/for-linux#1015 This impacts the packaging gem used by ezbake when it renders ERB templates, resulting in 0 byte files critical to the execution of the build process. Using a different filesystem (like tmpfs) for /tmp as a workaround didn't seem to work. Since there is no way to explicitly control the kernel version in environments like Travis, the best approach is to monkey-patch the _Entry class in Ruby that supports FileUtils.cp, such that a no-op mode change is performed on the source and destination files before and after being written to prevent the 0 byte files from being written. In local OSX testing, no-op modifying the source file prior to copy is the solution, but other users reported that no-op modifying the destination file worked for them -- both solutions are therefore employed for completeness. This is a really hacky solution, but it only impacts two specific scenarios: * Developer builds against any commit in a branch without using mismatched packages * Travis CI PR testing
… gets broken The bug replaces: ``` docker/for-linux#1015 (comment) ``` with: ``` https://htmlpreview.github.io/?https://github.com/ZimbiX/brendan-weibrecht-resume/blob/master/build/brendan-weibrecht-resume.html#issuecomment-841915668 ``` Issue: htmlpreview/htmlpreview.github.com#133
The story
A few weeks ago, a bunch of tests in Rails code I have started failing in my Docker development environment. My dev environment is Arch so the actual set of libraries and things installed is very much a moving target.
Lots of digging around led me to create a repo that can reproduce the issue using only Ruby (no gems at all), at the lowest level possible. And it's really really weird. More information below.
Expected behavior
When copying a file in Ruby using the
IO.copy_stream
method, the file should be an identical copy.Actual behavior
Under a very specific set of circumstances, the resulting file is 0 bytes.
I'm reasonably certain this is a Docker issue, because one of the conditions that needs to be true is that the source file is located on a mounted volume.
The conditions that need to be true for the
IO.copy_stream
operation to fail are:IO.copy_stream
. Copying it usingFile.write
andtempfile.read
does not cause the issue.Steps to reproduce the behavior
Please check out this repo: https://github.com/fishpercolator/reproduce-copystream and run the command in the README.
The output of this command on my env is:
As you can see, the file is copied successfully if it is:
If any of the 4 statements above are true, the file copies fine. If all 4 are false, the file is created but it has a 0-byte size.
I've checked to see if the source filesystem of the mounted volume makes any difference but I get the same effect with eCryptfs, ext4 and tmpfs mountpoints.
Likewise I've tried different versions of Ruby to see if it was a regression in Ruby, but I can still reproduce the issue with the official Docker images of much older versions where I was certain it worked.
Does anyone have any ideas? Can anyone else reproduce this using my repo?
Output of
docker version
:Output of
docker info
:Additional environment details (AWS, VirtualBox, physical, etc.)
It's a physical Dell XPS-13 9360 running Arch Linux. Arch doesn't have version numbers but all packages are up to date. Docker was installed from the Arch package with is built using this PKGBUILD script from the official Docker sources.
The text was updated successfully, but these errors were encountered: