-
-
Notifications
You must be signed in to change notification settings - Fork 30.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-81340: Use copy_file_range
in shutil.copyfile
copy functions
#93152
base: main
Are you sure you want to change the base?
Conversation
@giampaolo I saw that you are marked as a code owner for shutil, but GitHub did not request your review for some reason. Line 138 in 2176898
Would you like to take a look? |
@illia-v to my understanding As for reflink / CoW copy, I also submitted a patch some time ago, but I wanted to add Windows support before merging it. To my understanding reflink / CoW copy on Linux is achieved by using FICLONE, not copy_file_range(). |
@giampaolo |
cc @pablogsal |
Would you mind to create a first PR to mention it in copy_file_range() documentation? https://docs.python.org/dev/library/os.html#os.copy_file_range |
It terms of API, I would prefer to change the parameter name to just For example, socket.sendfile() tries to use sendfile() but falls back on send() if sendfile() doesn't work or is not available. |
The Unix cp command has 3 modes for
Here you only propose 2 modes. Do we need to implement the 3rd "always" mode? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your PR is quite big. Would it be possible to extract the tests refactoring and start with a first PR just for that?
I see a lot of confusion between which function/syscall is used, server-side copy, and copy of write. There are expectections, hopes, and ... disillusions :-) IMO the doc should be completed to better explain differences between server-side copy and copy-on-write, and attempt to document when and how one method is attempted, when it can fail, etc. |
I am glad the issue has received this attention. @vstinner thanks for valid points, especially one about the three modes. I think it will be nice to support all of them in Python too. But
I will create a PR to extend Do you think it is possible to launch the three-mode reflink functionality for Linux firstly and add support for other OSes in later pull requests? |
If possible, the API should be the same and should be usable on all platform. For example, "as an user, I would like to call shutil.copyfile() with the same arguments on all platforms and get a similar behavior on all platforms". For example, the API should attempt to use server-side copy and/or use Copy-on-Write, but silently switch to regular read+write copy if no modern copy function is available on the OS or on the source/destination filesystems. For the os module, it's fine to have different functions depending on the OS. But the shutil module is more a high-level module with a (mostly) portable API. |
Co-authored-by: Victor Stinner <[email protected]>
Done in #93182, please review it. |
I am swaying away from the idea in favor of a separate |
The problem with that is you don't get the automatic free win, which is contrary to what others are doing. e.g. GNOME's glib (not glibc), KDE's kio, and even GCC's libstdc++ upgrade automatically. GCC 14 will automatically use There doesn't appear to be much precedent for carving this out into its own opt-in functionality? EDIT: It's been pointed out to me that Emacs, of all things, unconditionally uses |
@illia-v are you still interested in working on this? I wonder if we even need the |
@barneygale yes, I am interested and can revise the changes this week |
@barneygale I dropped |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
As I've previously stated in #81338 (comment), personally I consider copy and CoW / clone 2 distinct operations. E.g. one may want to occupy disk space right now instead of later. For this reason I think It appears to me that the tool which solved all these controversies is
We may do:
But if we do this, I think it would make sense to expose My 2 cents and sorry for not replying earlier on this topic. |
There are some compelling arguments and examples from other languages in #81338. Personally I'm persuaded that we should enable it (without an opt-out), rather than second-guessing what the OS provides and what other language runtimes are doing. We have plenty of time til 3.14 ships if we need to back it out. But this is by no means my area of expertise, so I'm happy to be overruled. |
Add a `Path.copy()` method that copies the content of one file to another. This method is similar to `shutil.copyfile()` but differs in the following ways: - Uses `fcntl.FICLONE` where available (see GH-81338) - Uses `os.copy_file_range` where available (see GH-81340) - Uses `_winapi.CopyFile2` where available, even though this copies more metadata than the other implementations. This makes `WindowsPath.copy()` more similar to `shutil.copy2()`. The method is presently _less_ specified than the `shutil` functions to allow OS-specific optimizations that might copy more or less metadata. Incorporates code from GH-81338 and GH-93152. Co-authored-by: Eryk Sun <[email protected]>
This seems alright to me, too. We're expecting/hoping Windows will start doing transparent CoW when doing file copies, so that ought to give an idea pretty quickly if there's a genuine need to avoid it, or if we'd be adding the opt-out without justification. |
Add a `Path.copy()` method that copies the content of one file to another. This method is similar to `shutil.copyfile()` but differs in the following ways: - Uses `fcntl.FICLONE` where available (see pythonGH-81338) - Uses `os.copy_file_range` where available (see pythonGH-81340) - Uses `_winapi.CopyFile2` where available, even though this copies more metadata than the other implementations. This makes `WindowsPath.copy()` more similar to `shutil.copy2()`. The method is presently _less_ specified than the `shutil` functions to allow OS-specific optimizations that might copy more or less metadata. Incorporates code from pythonGH-81338 and pythonGH-93152. Co-authored-by: Eryk Sun <[email protected]>
Add a `Path.copy()` method that copies the content of one file to another. This method is similar to `shutil.copyfile()` but differs in the following ways: - Uses `fcntl.FICLONE` where available (see pythonGH-81338) - Uses `os.copy_file_range` where available (see pythonGH-81340) - Uses `_winapi.CopyFile2` where available, even though this copies more metadata than the other implementations. This makes `WindowsPath.copy()` more similar to `shutil.copy2()`. The method is presently _less_ specified than the `shutil` functions to allow OS-specific optimizations that might copy more or less metadata. Incorporates code from pythonGH-81338 and pythonGH-93152. Co-authored-by: Eryk Sun <[email protected]>
Add a `Path.copy()` method that copies the content of one file to another. This method is similar to `shutil.copyfile()` but differs in the following ways: - Uses `fcntl.FICLONE` where available (see pythonGH-81338) - Uses `os.copy_file_range` where available (see pythonGH-81340) - Uses `_winapi.CopyFile2` where available, even though this copies more metadata than the other implementations. This makes `WindowsPath.copy()` more similar to `shutil.copy2()`. The method is presently _less_ specified than the `shutil` functions to allow OS-specific optimizations that might copy more or less metadata. Incorporates code from pythonGH-81338 and pythonGH-93152. Co-authored-by: Eryk Sun <[email protected]>
@barneygale based on your experience of having #119058 present in main for more than a month, can we merge this too? |
There have been no releases from |
@barneygale thank you for the response. Can we risk merging this PR before the alpha to be able to discover issues (if any) in early stages of 3.14? |
This makes
shutil.copyfile
prefer thecopy_file_range
system call on Linux.copy_file_range
gives filesystems an opportunity to implement the use of reflinks or server-side copy, but we cannot determine whether any of them are implemented. Therefore, I added anallow_reflink
argument in case anyone wants to disable copy-on-write.GNU Coreutils enables copy-on-write by default, that is why I set
allow_reflink
to true by default (unlike @vstinner proposed in #81338).Note, there is a known
copy_file_range
bug for copying from special filesystems like procfs and sysfs. It seems not to be fixed yet, so we have to check for a silentcopy_file_range
fail as Coreutils and Go do.