-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fs::copy hangs on docker (Linux) #75446
Comments
Update: I reimplemented the Linux version, and the |
Any chance you can |
It's an endless |
According to man, |
This particular case of 0 is not being handled in the current implementation, only the errors. Adding match arms and redirecting 0 to one of the errors to force using fallback mode could work. However, there may be cases where 0 gets returned legitimately? |
The current implementation queries the size of the source file and then does a decrementing loop. Does the file in question get truncated by another thread while the copy is in progress? |
Can you check which syscalls are used in the direction that works?
Recent kernels support cross device copy_file_range under certain circumstances, e.g. overlayfs can delegate to the underlying device and btrfs can copy between volumes. Bind mounts also should count as the same device. But maybe there's a bug in there in combination with selinux? |
If selinux is non-enforcing (per OP), that should rule it out. |
It still seems really odd. The kernel's copy_file_range implementation itself does a lot of fallbacks internally when the preferred methods return a 0, i.e. 0 bytes copied ultimately lead to a splice operation from file to file which is not all that different than what we're doing in userspace. So 0 bytes being copied is quite unexpected for a non-empty file. There are lots of layers involved though and any of them could be the cause. Adding yet another workaround is possible, but if possible I'd like to identify the root cause so we can report it to the responsible parties. |
Coreutils tests for a zero-byte copy_file_range() and falls back on a read()/write() loop if that happens: coreutils/coreutils@4b04a0c Their code has a comment mentioning that this happens for /proc special files. I tried to reproduce it with fs::copy("/proc/self/cmdline", "./foo"); but ran into another bug: stat() says that file is empty, so fs::copy() creates an empty file and doesn't copy anything. I think trusting st_size is fundamentally broken, for both this reason and the race that @the8472 mentioned. |
@rustbot claim |
@tavianator that's another good reason to change the logic, but it'd still be good to know what's actually causing it so we can add the root cause to comments or report things upstream if they haven't been fixed already. |
This is interesting. Kernel's generic_copy_file_checks sets the length to zero upon return, because i_size_read returns 0. Which ultimately means, that the inode's size in the kernel is zero, while the metadata returns size of 41205. Possible kernel bug? Yet, this behaviour only occurs when copying from docker's system over to overlayed home directory, all technically on the same overlayfs, but on different filesystems on the host side. |
Maybe it's getting |
I'm trying to reproduce it from basic pieces but so far it works just fine mkdir direct overlay upper lower work
mount -t overlay overlay -o lowerdir=lower,upperdir=upper,workdir=work overlay
echo "foo" > lower/IN
strace -ffe openat,copy_file_range ./copy.rs #!/usr/bin/env run-cargo-script
fn main() -> std::io::Result<()> {
println!("{}", std::fs::copy("./overlay/IN", "./direct/OUT")?);
Ok(())
}
|
I'm sorry I made one horrible mistake, the directories are not mounted on the same overlayfs, they seem to be bind mounted Output on docker container:
I am not sure how exactly would one reproduce this environment without actually using docker/podman to do so. |
Not only that, I noticed that the behaviour is very strange. Some files do cause the lock ups, some don't. Now, building by using Running fs::copy script with the path causes it to lock up. But, if I cp the file elsewhere, and copy it back in place, fs::copy does not cause any more lockups. If I append a new line to the file, no more lock ups are caused, so much so as changing file permissions fixes it. Basically, if I modify the file in any shape or form, it seems to update itself and work just fine. This really seems like some serious, hard to reproduce kernel bug to me. |
My current kernel version is |
Those are the target directories. But the source also matters and depending on the docker storage driver you're using that might be overlayfs.
Yeah, again if overlayfs is involved that may make a difference whether the file comes from the upper or the lower.
A step by step reduced testcase would help. I grasp the rough outline what is happening but there are many details that might make a difference. I can add a workaround without that, but then I can't verify that the issue is fixed and it would make reporting things upstream more difficult too. |
…triplett Workarounds for copy_file_range issues fixes rust-lang#75387 fixes rust-lang#75446
Running Fedora 32 with selinux set to non-enforcing, building a project with OpenCV.
I am building a rather weird docker setup, but what I ran into was hanging during build stage of opencv-rust, around here.
I expected to see this happen: build succeed rather quickly
Instead, this happened: execution froze on fs::copy call, and cpu got stuck running almost full speed.
My project was mounted using
-v $PWD/project:/project:Z
(name changed), opencv was manually cloned (for debugging) into the root of the docker image, at/opencv-rust
.Copies from
/opencv-rust
to/project
don't work (example:/opencv-rust/bindings/cpp/opencv_4/aruco.cpp
=>/project/target/release/build/opencv-50ff47d79816a5ea/out/aruco.cpp
)File copies from
/project
to/opencv-rust
work just fine (example:/project/target/release/build/opencv-50ff47d79816a5ea/out/xobjdetect_types.hpp
=>/opencv-rust/bindings/cpp/opencv_4/xobjdetect_types.hpp
)It appears to be an issue in the implementation of Linux's fs::copy, as implementing the more generic version above does not freeze the operation.
Meta
rustc --version --verbose
:The text was updated successfully, but these errors were encountered: