shutil: add reflink=False to file copy functions to control clone/CoW copies (use copy_file_range) #81338

vstinner · 2019-06-04T21:36:41Z

BPO	37157
Nosy	@giampaolo, @albertz, @desbma, @koobs, @pablogsal
Files	cow.diff cow2.diff

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = None
created_at = <Date 2019-06-04.21:36:40.594>
labels = ['library', '3.9', 'performance']
title = 'shutil: add reflink=False to file copy functions to control clone/CoW copies (use copy_file_range)'
updated_at = <Date 2021-10-07.19:23:42.385>
user = 'https://github.com/vstinner'

bugs.python.org fields:

activity = <Date 2021-10-07.19:23:42.385>
actor = 'vstinner'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['Library (Lib)']
creation = <Date 2019-06-04.21:36:40.594>
creator = 'vstinner'
dependencies = []
files = ['48390', '48393']
hgrepos = []
issue_num = 37157
keywords = ['patch']
message_count = 14.0
messages = ['344648', '344651', '344667', '344692', '344694', '344697', '344702', '344709', '350623', '384015', '384105', '403414', '403418', '403423']
nosy_count = 6.0
nosy_names = ['giampaolo.rodola', 'Albert.Zeyer', 'desbma', 'koobs', 'pablogsal', 'dulanic']
pr_nums = []
priority = 'normal'
resolution = None
stage = None
status = 'open'
superseder = None
type = 'performance'
url = 'https://bugs.python.org/issue37157'
versions = ['Python 3.9']

vstinner · 2019-06-04T21:36:40Z

bpo-26826 added a new os.copy_file_range() function:
https://docs.python.org/dev/library/os.html#os.copy_file_range

As os.sendfile(), this new Linux syscall avoids memory copies between kernel space and user space. It matters for performance, especially since Meltdown vulnerability required Windows, Linux, FreeBSD, etc. to use a different address space for the kernel (like Linux Kernel page-table isolation, KPTI).

shutil has been modified in Python 3.8 to use os.sendfile() on Linux:
https://docs.python.org/dev/whatsnew/3.8.html#optimizations

But according to Pablo Galindo Salgado, copy_file_range() goes further:
"But copy_file_rane can leverage more filesystem features like deduplication and copy offload stuff."

https://bugs.python.org/issue26826#msg344582

Giampaolo Rodola' added:

"I think data deduplication / CoW / reflink copy is better implemented via FICLONE. "cp --reflink" uses it, I presume because it's older than copy_file_range(). I have a working patch adding CoW copy support for Linux and OSX (but not Windows). I think that should be exposed as a separate shutil.reflink() though, and copyfile() should just do a standard copy."

"Actually "man copy_file_range" claims it can do server-side copy, meaning no network traffic between client and server if *src* and *dst* live on the same network fs. So I agree copy_file_range() should be preferred over sendfile() after all. =)
I have a wrapper for copy_file_range() similar to what I did in shutil in bpo-33671 which I can easily integrate, but I wanted to land this one first:
https://bugs.python.org/issue37096
Also, I suppose we cannot land this in time for 3.8?"

https://bugs.python.org/issue26826#msg344586

--

There was already a discussion about switching shutil to copy-on-write:
https://bugs.python.org/issue33671#msg317989

One problem is that modifying the "copied" file can suddenly become slower if it was copied using "cp --reflink".

It seems like adding a new reflink=False parameter to file copy functions to control clone/CoW copies is required to prevent bad surprises.

vstinner · 2019-06-04T21:51:32Z

Random notes.

Extract of Linux manual page of "cp":

   --reflink[=WHEN]
          control clone/CoW copies. See below


   When --reflink[=always] is specified, perform a lightweight copy, where
   the data blocks are copied only when modified.  If this is not possible
   the copy fails, or if --reflink=auto is specified, fall back to a stan‐
   dard copy.  Use --reflink=never to ensure a standard copy is performed.

--

"Why is cp --reflink=auto not the default behaviour?":
https://unix.stackexchange.com/questions/80351/why-is-cp-reflink-auto-not-the-default-behaviour

--

reflinks are supported by BTRFS and OCFS2.

XFS seems to have an experimental support for reflink, 2 years old article:
https://strugglers.net/~andy/blog/2017/01/10/xfs-reflinks-and-deduplication/

Linux version of ZFS doesn't support reflink yet:
openzfs/zfs#405

--

Python binding using cffi to get reflink:
https://gitlab.com/rubdos/pyreflink
"Btrfs, XFS, OCFS2 reflink support. Btrfs is tested the most.
Apple macOS APFS clonefile support. Little testing, be careful. It might eat data."

--

"reflink for Windows":
https://github.com/0xbadfca11/reflink
"Windows Server 2016 introduce Block Cloning feature."
=> https://docs.microsoft.com/en-us/windows-server/storage/refs/block-cloning

"ReFS v2 is only available in Windows Server 2016 and Windows 10 version 1703 (build 15063) or later.
Windows 10 version 1607 (build 14393) and earlier Windows only can use ReFS v1."

--

Linux has 2 ioctl:

       #include <sys/ioctl.h>
       #include <linux/fs.h>

       int ioctl(int dest_fd, FICLONERANGE, struct file_clone_range *arg);
       int ioctl(int dest_fd, FICLONE, int src_fd);

http://man7.org/linux/man-pages/man2/ioctl_ficlonerange.2.html

giampaolo · 2019-06-05T05:02:24Z

I'm attaching an initial PoC using FICLONE on Linux and clonefile(3) on OSX. It is also possible to support Windows but it requires a ReFS partition to test against which I currently don't have. I opted for exposing reflink() as a separate function, mostly because:

conceptually standard copy and CoW copy are 2 different things
shutil already provides a distinction between copy functions (copy(), copy2(), copyfile()) which can be used as callbacks for copytree() and move(). As such one can follow the same approach and do:

>>> copytree(src, dst, copy_function=reflink).

This initial patch provides a callback=None parameter in case the CoW operation fails because not supported by the underlying filesystems but this is debatable because we can get different errors depending on the platform (which is not good). As such a more generic ReflinkNotSupportedError exception is probably a better choice.

vstinner · 2019-06-05T10:00:25Z

cow.diff: I'm not sure that attempt to call unlink() if FICLONE fails is a good idea. unlink() can raise a new exception which can be confusing. IMHO it's up to the caller to deal with that. Said differently, I dislike the *fallback* parameter of reflink().

Why not exposing clonefile() as os.clonefile() but os._clonefile()?

+#if defined(MAC_OS_X_VERSION_10_12)
+#include <sys/clonefile.h>
+#define HAVE_CLONEFILE
+#endif

Is Python compiled to target macOS 10.12 and newer? Mac/BuildScript/build-installer.py contains:

# $MACOSX_DEPLOYMENT_TARGET -> minimum OS X level
DEPTARGET = '10.5'

But I don't know well macOS. "#if defined(MAC_OS_X_VERSION_10_12)" is a check at build time. Does it depend on DEPTARGET? Would it be possible to use a runtime check?

You might open a dedicated issue to expose clonefile() since it seems like every tiny detail of this issue is very subtle and should be properly discussed ;-) (I like the idea of exposing native functions like clonefile() directly in the os module!)

vstinner · 2019-06-05T10:10:06Z

This initial patch provides a callback=None parameter in case the CoW operation fails because not supported by the underlying filesystems but this is debatable because we can get different errors depending on the platform (which is not good). As such a more generic ReflinkNotSupportedError exception is probably a better choice.

(Oh, my laptop only uses btrfs. Hum, I created a loop device to test an ext4 partition :-))

On an ext4 partition, cp --reflink simply fails with an error: it doesn't fallback on a regular copy.

vstinner@apu$ dd if=/dev/urandom of=urandom bs=1k count=1k
1024+0 records in
1024+0 records out
1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.0123142 s, 85.2 MB/s

vstinner@apu$ cp --reflink urandom urandom2
'urandom' -> 'urandom2'
cp: failed to clone 'urandom2' from 'urandom': Operation not supported

vstinner@apu$ file urandom2
urandom2: empty
vstinner@apu$ stat urandom2
File: urandom2
Size: 0 Blocks: 2 IO Block: 1024 regular empty file
Device: 700h/1792d Inode: 13 Links: 1
Access: (0664/-rw-rw-r--) Uid: ( 1000/vstinner) Gid: ( 1000/vstinner)
Context: unconfined_u:object_r:unlabeled_t:s0
Access: 2019-06-05 12:08:23.000000000 +0200
Modify: 2019-06-05 12:08:23.000000000 +0200
Change: 2019-06-05 12:08:23.000000000 +0200
Birth: -

Not only it fails, but it leaves an empty file.

I suggest to mimick the Linux cp command: don't automatically fallback (there are too many error conditions, too many risks of raising a new error while handling the previous error) and don't try to remove the created empty file if reflink() fails.

giampaolo · 2019-06-05T10:22:18Z

I'm not sure that attempt to call unlink() if FICLONE fails is a good idea
Agreed.

I dislike the *fallback* parameter of reflink().

Me too. A specific exception is better.

Why not exposing clonefile() as os.clonefile() but os._clonefile()?

Mmm... I'm not sure it's worth it. The only reason one may want to use clonefile() directly is for passing CLONE_NOFOLLOW and CLONE_NOOWNERCOPY flags (the only possible ones):

CLONE_NOFOLLOW can be exposed via "follow_symlinks=True" (like other shutil.* functions) and used internally
CLONE_NOOWNERCOPY should also be passed internally by default because all other functions of shutil do not copy ownership (there's a warning at the top of the doc), so I think it makes sense for reflink() to do the same.

+#if defined(MAC_OS_X_VERSION_10_12): Would it be possible to use a runtime check?

Good point. It should definitively be loaded at runtime. I will look into that (but not soon).

giampaolo · 2019-06-05T10:45:14Z

Adding a new patch (still a PoC, will create a PR when I have something more solid).

vstinner · 2019-06-05T11:58:38Z

I'm curious: is it possible to query the filesystem to check if a copied is copied using CoW? I guess that it's possible, it will be non portable. So I guess that it's better to avoid checking that in unit tests.

vstinner@apu$ dd if=/dev/urandom of=urandom bs=1k count=1k
1024+0 records in
1024+0 records out
1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.0309671 s, 33.9 MB/s
vstinner@apu$ cp --reflink urandom urandom2
'urandom' -> 'urandom2'

vstinner@apu$ stat urandom
File: urandom
Size: 1048576 Blocks: 2048 IO Block: 4096 regular file
Device: 31h/49d Inode: 16265363 Links: 1
Access: (0664/-rw-rw-r--) Uid: ( 1000/vstinner) Gid: ( 1000/vstinner)
Context: unconfined_u:object_r:user_home_t:s0
Access: 2019-06-05 13:56:21.381196972 +0200
Modify: 2019-06-05 13:56:21.412197007 +0200
Change: 2019-06-05 13:56:21.412197007 +0200
Birth: 2019-06-05 13:56:21.381196972 +0200

vstinner@apu$ stat urandom2
File: urandom2
Size: 1048576 Blocks: 2048 IO Block: 4096 regular file
Device: 31h/49d Inode: 16265364 Links: 1
Access: (0664/-rw-rw-r--) Uid: ( 1000/vstinner) Gid: ( 1000/vstinner)
Context: unconfined_u:object_r:user_home_t:s0
Access: 2019-06-05 13:56:24.487200453 +0200
Modify: 2019-06-05 13:56:24.496200463 +0200
Change: 2019-06-05 13:56:24.496200463 +0200
Birth: 2019-06-05 13:56:24.487200453 +0200

Using stat command line tool, I don't see anything obvious saying that the two files share the same data on disk.

koobs · 2019-08-27T09:40:19Z

See Also: bpo-26826

albertz · 2020-12-29T17:05:38Z

Is FICLONE really needed? Doesn't copy_file_range already supports the same?

I posted the same question here: https://stackoverflow.com/questions/65492932/ficlone-vs-ficlonerange-vs-copy-file-range-for-copy-on-write-support

albertz · 2020-12-31T09:20:13Z

I did some further research (with all details here: https://stackoverflow.com/a/65518879/133374).

See vfs_copy_file_range in the Linux kernel. This first tries to call remap_file_range if possible.

FICLONE calls ioctl_file_clone. ioctl_file_clone calls vfs_clone_file_range. vfs_clone_file_range calls remap_file_range. I.e. FICLONE == remap_file_range.

So using copy_file_range (if available) should be the most generic solution, which includes copy-on-write support, and server-side copy support.

dulanic · 2021-10-07T14:24:26Z

As a note, coreutils 9.0 cp defaults now to reflink=auto.

https://www.phoronix.com/scan.php?page=news_item&px=GNU-Coreutils-9.0

giampaolo · 2021-10-07T14:37:05Z

So using copy_file_range (if available) should be the most generic solution, which includes copy-on-write support, and server-side copy support.

Doesn't this imply to pass some flag to copy_file_range()? "man copy_file_range" says:

The flags argument is provided to allow for future extensions and currently must be set to 0.

How is CoW copy supposed to be done by using copy_file_range() exactly?

albertz · 2021-10-07T15:07:03Z

How is CoW copy supposed to be done by using copy_file_range() exactly?

I think copy_file_range() will just always use copy-on-write and/or server-side-copy when available. You cannot even turn that off.

illia-v · 2022-04-18T19:42:38Z

Since copy_file_range gives filesystems an opportunity to implement the use of reflinks or server-side copy, but we cannot determine whether any of them are implemented, can we add allow_reflink=False instead of reflink=False?
A bare reflink argument makes an impression that reflink is guaranteed for the copy.

Coreutils has allow_reflink in its code.

I've sketched a patch that adds allow_reflink, it also checks for a silent copy_file_range fail as Coreutils and Go do. Please let me know if this is worth opening a pull request and adding tests.

Diff

diff --git a/Lib/shutil.py b/Lib/shutil.py
index de82453aa5..73417bdafb 100644
--- a/Lib/shutil.py
+++ b/Lib/shutil.py
@@ -43,6 +43,7 @@
 # This should never be removed, see rationale in:
 # https://bugs.python.org/issue43743#msg393429
 _USE_CP_SENDFILE = hasattr(os, "sendfile") and sys.platform.startswith("linux")
+_USE_CP_COPY_FILE_RANGE = hasattr(os, "copy_file_range")
 _HAS_FCOPYFILE = posix and hasattr(posix, "_fcopyfile")  # macOS
 
 # CMD defaults in Windows 10
@@ -103,6 +104,66 @@ def _fastcopy_fcopyfile(fsrc, fdst, flags):
         else:
             raise err from None
 
+def _get_linux_fastcopy_blocksize(infd):
+    """Determine blocksize for fastcopying on Linux.
+
+    Hopefully the whole file will be copied in a single call.
+    sendfile() and copy_file_range() are called in a loop 'till EOF is
+    reached (0 return) so a bufsize smaller or bigger than the actual
+    file size should not make any difference, also in case the file
+    content changes while being copied.
+    """
+    try:
+        blocksize = max(os.fstat(infd).st_size, 2 ** 23)  # min 8MiB
+    except OSError:
+        blocksize = 2 ** 27  # 128MiB
+    # On 32-bit architectures truncate to 1GiB to avoid OverflowError,
+    # see bpo-38319.
+    if sys.maxsize < 2 ** 32:
+        blocksize = min(blocksize, 2 ** 30)
+    return blocksize
+
+def _fastcopy_copy_file_range(fsrc, fdst):
+    """Copy data from one regular mmap-like fd to another by using
+    a high-performance copy_file_range(2) syscall that gives filesystems
+    an opportunity to implement the use of reflinks or server-side copy.
+
+    This should work on Linux >= 4.5 only.
+    """
+    try:
+        infd = fsrc.fileno()
+        outfd = fdst.fileno()
+    except Exception as err:
+        raise _GiveupOnFastCopy(err)  # not a regular file
+
+    blocksize = _get_linux_fastcopy_blocksize(infd)
+    offset = 0
+    while True:
+        try:
+            n_copied = os.copy_file_range(infd, outfd, blocksize, offset_dst=offset)
+        except OSError as err:
+            # ...in oder to have a more informative exception.
+            err.filename = fsrc.name
+            err.filename2 = fdst.name
+
+            if err.errno == errno.ENOSPC:  # filesystem is full
+                raise err from None
+
+            # Give up on first call and if no data was copied.
+            if offset == 0 and os.lseek(outfd, 0, os.SEEK_CUR) == 0:
+                raise _GiveupOnFastCopy(err)
+
+            raise err
+        else:
+            if n_copied == 0:
+                # If no bytes have been copied yet, copy_file_range
+                # might silently fail.
+                # https://lore.kernel.org/linux-fsdevel/[email protected]/T/#m05753578c7f7882f6e9ffe01f981bc223edef2b0
+                if offset == 0:
+                    raise _GiveupOnFastCopy()
+                break
+            offset += n_copied
+
 def _fastcopy_sendfile(fsrc, fdst):
     """Copy data from one regular mmap-like fd to another by using
     high-performance sendfile(2) syscall.
@@ -124,20 +185,7 @@ def _fastcopy_sendfile(fsrc, fdst):
     except Exception as err:
         raise _GiveupOnFastCopy(err)  # not a regular file
 
-    # Hopefully the whole file will be copied in a single call.
-    # sendfile() is called in a loop 'till EOF is reached (0 return)
-    # so a bufsize smaller or bigger than the actual file size
-    # should not make any difference, also in case the file content
-    # changes while being copied.
-    try:
-        blocksize = max(os.fstat(infd).st_size, 2 ** 23)  # min 8MiB
-    except OSError:
-        blocksize = 2 ** 27  # 128MiB
-    # On 32-bit architectures truncate to 1GiB to avoid OverflowError,
-    # see bpo-38319.
-    if sys.maxsize < 2 ** 32:
-        blocksize = min(blocksize, 2 ** 30)
-
+    blocksize = _get_linux_fastcopy_blocksize(infd)
     offset = 0
     while True:
         try:
@@ -223,7 +271,7 @@ def _stat(fn):
 def _islink(fn):
     return fn.is_symlink() if isinstance(fn, os.DirEntry) else os.path.islink(fn)
 
-def copyfile(src, dst, *, follow_symlinks=True):
+def copyfile(src, dst, *, follow_symlinks=True, allow_reflink=False):
     """Copy data from src to dst in the most efficient way possible.
 
     If follow_symlinks is not set and src is a symbolic link, a new
@@ -264,12 +312,20 @@ def copyfile(src, dst, *, follow_symlinks=True):
                         except _GiveupOnFastCopy:
                             pass
                     # Linux
-                    elif _USE_CP_SENDFILE:
-                        try:
-                            _fastcopy_sendfile(fsrc, fdst)
-                            return dst
-                        except _GiveupOnFastCopy:
-                            pass
+                    elif _USE_CP_SENDFILE or _USE_CP_COPY_FILE_RANGE:
+                        # reflink may be implicit in copy_file_range.
+                        if _USE_CP_COPY_FILE_RANGE and allow_reflink:
+                            try:
+                                _fastcopy_copy_file_range(fsrc, fdst)
+                                return dst
+                            except _GiveupOnFastCopy:
+                                pass
+                        if _USE_CP_COPY_FILE_RANGE:
+                            try:
+                                _fastcopy_sendfile(fsrc, fdst)
+                                return dst
+                            except _GiveupOnFastCopy:
+                                pass
                     # Windows, see:
                     # https://github.com/python/cpython/pull/7160#discussion_r195405230
                     elif _WINDOWS and file_size > 0:
@@ -402,7 +458,7 @@ def lookup(name):
             else:
                 raise
 
-def copy(src, dst, *, follow_symlinks=True):
+def copy(src, dst, *, follow_symlinks=True, allow_reflink=False):
     """Copy data and mode bits ("cp src dst"). Return the file's destination.
 
     The destination may be a directory.
@@ -416,11 +472,11 @@ def copy(src, dst, *, follow_symlinks=True):
     """
     if os.path.isdir(dst):
         dst = os.path.join(dst, os.path.basename(src))
-    copyfile(src, dst, follow_symlinks=follow_symlinks)
+    copyfile(src, dst, follow_symlinks=follow_symlinks, allow_reflink=allow_reflink)
     copymode(src, dst, follow_symlinks=follow_symlinks)
     return dst
 
-def copy2(src, dst, *, follow_symlinks=True):
+def copy2(src, dst, *, follow_symlinks=True, allow_reflink=False):
     """Copy data and metadata. Return the file's destination.
 
     Metadata is copied with copystat(). Please see the copystat function
@@ -433,7 +489,7 @@ def copy2(src, dst, *, follow_symlinks=True):
     """
     if os.path.isdir(dst):
         dst = os.path.join(dst, os.path.basename(src))
-    copyfile(src, dst, follow_symlinks=follow_symlinks)
+    copyfile(src, dst, follow_symlinks=follow_symlinks, allow_reflink=allow_reflink)
     copystat(src, dst, follow_symlinks=follow_symlinks)
     return dst
</details>

illia-v · 2022-06-03T20:32:54Z

Since copy_file_range gives filesystems an opportunity to implement the use of reflinks or server-side copy, but we cannot determine whether any of them are implemented, can we add allow_reflink=False instead of reflink=False? A bare reflink argument makes an impression that reflink is guaranteed for the copy.

On the second thought, I agree with @giampaolo about exposing reflink as a separate function instead of adding any of the arguments to copyfile. There are copytree and move functions that take a copy_function argument, and transmitting a reflink argument to copyfile through them may be inconvenient.

Note, if we choose to have a separate reflink function and not to add reflink support to copyfile, we will not be able to call copy_file_range neither in reflink nor in copyfile because it may or may not create a reflink implicitly.
Also, then shutil will cover the always and never options of the Linux cp --reflink= command. I think that covering the remaining auto option is very important for cross-platform code, this may be achived by adding new reflink_or_copy and reflink_or_copy2 functions that are similar to the existing copy and copy2 ones but call reflink instead of copyfile if the former is supported. And also, between trying reflink and copyfile, they may call copy_file_range to use its features.

@giampaolo can I please play with your patches a bit? I want to try to add Windows block cloning support to cow2.diff.

Also, I am going to create a small pull request to add FICLONE and FICLONERANGE constants to fcntl to have a low-level reflink support there.

vstinner · 2022-06-06T08:20:56Z

Starting by adding a os.reflink() function sounds like a good idea. It sounds less controversial. But again, the most difficult part will be to write the documentation and very clearly describe limitations ;-)

vstinner · 2022-06-06T08:21:08Z

cc @pablogsal

illia-v · 2022-06-06T17:43:23Z

Starting by adding a os.reflink() function sounds like a good idea.

Do you think it should be in os instead of shutil?

But again, the most difficult part will be to write the documentation and very clearly describe limitations ;-)

Right. Also, that is why having something like reflink_or_copy, which can be just used without diving into the documentation, in the end is important 🙂

BTW, please take a look at #93478.

vstinner · 2022-06-06T21:32:51Z

Do you think it should be in os instead of shutil?

It depends on how it's implemented. It depends if it fails if a reflink cannot be used, or if it falls back on a regular (slow) copy.

Usually, os functions are thin wrapper to syscalls / libc functions, whereas shutil implement fallbacks and are more high level.

bbappserver · 2022-07-15T21:01:04Z

@illia-v @vstinner
Well shutil.move() can figure out whether it needs to do a rename() or a copy(), and it's not called move_or_copy(). It also exposes the copy function so you can override it and could just put in None. If move() is flexible, it makes sense that shutil.copy() should be able to specify reflink=False, since it seems the current mode is to use reflink=True. I think because the OS fastcopy mechanism is being used, then on a COW filesystem the OS will use the filesystem's copy operation which it seems is to just fork the file metadata leaving the extents where they are unless explicitly told otherwise. To preserve this behaviour reflink should default to true and should explicitly be set false if desired.

As for where to put reflink(old,new) I think quite sensible it should live next to the other link() family of functions in os even if strictly speaking it is not a thin wrapper because apparently like cp --reflink does weird nonsense like

openat(AT_FDCWD, "old", O_RDONLY) = 3
fstat(3, {st_mode=S_IFREG|0664, st_size=760, ...}) = 0
openat(AT_FDCWD, "new", O_WRONLY|O_TRUNC) = 4
fstat(4, {st_mode=S_IFREG|0664, st_size=0, ...}) = 0
ioctl(4, BTRFS_IOC_CLONE or FICLONE, 3) = 0
//FICLONE file descriptor old(3) to file new(4)

https://stackoverflow.com/questions/52766388/how-can-i-use-the-copy-on-write-of-a-btrfs-from-c-code

But the idea in python is that I shouldn't have to care about that. os already wraps platform dependant stuff so it's maybe not as thin as it appears.
As an example os already wraps posix

On Unix, the os module provides a superset of the posix interface. On non-Unix operating systems the posix module is not available, but a subset is always available through the os interface. Once os is imported, there is no performance penalty in using it instead of posix. In addition, os provides some additional functionality, such as automatically calling putenv() when an entry in os.environ is changed.

https://docs.python.org/3/library/os.html#os.link
https://docs.python.org/3/library/os.html#os.symlink

ThomasWaldmann · 2023-02-11T18:58:41Z

Pity that this hasn't proceeded further yet, we could use it for borgbackup (it's problematic if a file changes while it is being backed up).

nanonyme · 2023-02-11T19:12:55Z

@ThomasWaldmann if this is about Linux, I would strongly recommend considering implementing something on top of os.copy_file_range until this moves forward. While it has some quirks (you need to handle ENOSYS and EXDEV correctly), it's not exactly rocket science either. Even if this moves forward into standard library for 3.12 (which is not a given, there is no agreement on spec), it will still not be generally available for a while.

ThomasWaldmann · 2023-02-11T19:24:14Z

I'ld need it for as many OSes as possible (Linux, BSDs, macOS, native Windows, OpenIndiana, ...).

But thanks for the pointer. Having it on Linux and for py38+ at least would be better than nothing.

@nanonyme Hmm, guess os.copy_file_range will make a (potentially very large) full copy if the fs does not support reflinks.

koobs · 2023-02-11T23:34:36Z

Note also that FreeBSD has copy_file_range(2) in 13+ See:

https://reviews.freebsd.org/D20584

https://man.freebsd.org/copy_file_range

calumapplepie · 2023-12-13T05:34:44Z

It's been a few months, and the world of modern filesystems is curious if Python will be joining them?

Also, my two cents on the whole "should we reflink by default" business: on a working COW system, having a reflink is ONLY positive. There is no upside to making a full copy; no performance gains later, no loss of safety in the face of modifications, no increased compatibility with other software. Full copies only provide a benefit if we are attempting to backup against partial on-disk corruption. Most of the time, when I make a copy of a file, I don't care about disk corruption; that's what backups are for. Thus, unless you're a backup software, you don't care if you're making a reflink or a copy; the semantics of the two are identical. Even if you are a backup software, reflinking only applies to within the same filesystem; and it goes without saying that two copies of a file on one disk[1] isn't a real backup.

Having reflinking not be the default in a high-level language is nuts. Even offering the ability to do a non-reflinked copy is somewhat questionable; if a filesystem claims to support reflinking, it is asserting that there is no semantic difference between files that are duplicated on-disk and ones sharing extents (ie, reflinked). You need filesystem-specific tools or side-channel attacks[2] to even tell if two files are reflinked. reflink=false is a pointless footgun; it risks users confusing reflinks with hardlinks and causing excess disk usage for no reason.

[1]: Yes, some filesystems can be on multiple disks. However, a plain copy supplies no guarantee that the file winds up on a different disk, so you're in the same boat.

[2]: Kick both files out of the page cache (either by filling it with junk or with madvise()), then measure the time it takes to access each. If the second one is much faster, then it may have been reflinked. Repeat a few thousand times, varying the order, to gain confidence. In my experience, most applications don't care if their file accesses are faster.

[edit: apologies, on post-posting proofread, my sass may be a bit strong. Thank you for your work on making great software, even if you haven't managed perfection quite yet]

TLDR: Python is a high level langauge, it's insane that it doesn't reflink by default yet.

cburroughs · 2024-04-09T20:18:14Z

An imperfect attempt to survey how other languages have approached this. While of course not dispositive I think it does suggest that the "free win" of having this be the default behavior is compelling under many different constraints.

GCC

Use when available.

Golang

Since 1.15 io.Copy uses copy_file_range when possible.

Emacs Lisp

Used when available.

Java

Does not implement. (Claiming no performance benefit(!?).

Nodejs

Low level support, flags need to be passed along.

PHP

copy_file_range used [unconditionally(?)](https://github.com/php/php-src/pull/8413]

Ruby

Used by IO.copy_stream when possible.

Rust

std::fs::copy and std::io::copy will transparently use facilities like copy_file_range (Linux) and fclonefileat (MacOS)

bronger · 2024-04-11T06:21:48Z

Java

Does not implement. (Claiming no performance benefit(!?).

Maybe @bplb has not tested on a CoW filesystem?

Hi-Angel · 2024-04-11T06:33:31Z

Java

Does not implement. (Claiming no performance benefit(!?).

Maybe @bplb has not tested on a CoW filesystem?

Most likely. The Java report has no mention of CoW at all, so if @bplb didn't know that the main benefit of copy_file_range is on FSes like XFS, BTRFS, BCacheFS, they've probably tested on something like ext4.

bplb · 2024-04-11T16:17:02Z

If memory serves, I tested CoW on BTRFS (for Linux) and on APFS (for macOS) and there was a large performance improvement in both cases.

thesamesam · 2024-04-11T16:18:58Z

@cburroughs Yeah, I did my own survey a while ago in https://social.treehouse.systems/@thesamesam/110626072185161018 too.

nanonyme · 2024-04-11T16:24:27Z

Even not taking CoW into account, copy_file_range typically avoids a lot of memory copies between userspace and kernel so it's usually overall a win.

thesamesam · 2024-04-11T16:26:12Z

Yes, it's especially useful with modern NFS to avoid an unnecessary roundtrip too.

bplb · 2024-04-11T17:03:18Z

Even not taking CoW into account, copy_file_range typically avoids a lot of memory copies between userspace and kernel so it's usually overall a win.

Also, copy_file_range apparently calls ioctl_ficlonerange(2) internally so there's no need for more flow control.

bplb · 2024-04-11T18:17:00Z

An imperfect attempt to survey how other languages have approached this.

Java

Does not implement.

Untrue: implemented in JDK 20, released March 2023 (cf. JDK-8264744).

cburroughs · 2024-04-19T18:43:19Z

Untrue: implemented in JDK 20, released March 2023 (cf. JDK-8264744).

Thank you for the correction! JDK looks like another ecosystem that decided on enabling by default and not with a different api.

Add a `Path.copy()` method that copies the content of one file to another. This method is similar to `shutil.copyfile()` but differs in the following ways: - Uses `fcntl.FICLONE` where available (see GH-81338) - Uses `os.copy_file_range` where available (see GH-81340) - Uses `_winapi.CopyFile2` where available, even though this copies more metadata than the other implementations. This makes `WindowsPath.copy()` more similar to `shutil.copy2()`. The method is presently _less_ specified than the `shutil` functions to allow OS-specific optimizations that might copy more or less metadata. Incorporates code from GH-81338 and GH-93152. Co-authored-by: Eryk Sun <[email protected]>

Add a `Path.copy()` method that copies the content of one file to another. This method is similar to `shutil.copyfile()` but differs in the following ways: - Uses `fcntl.FICLONE` where available (see pythonGH-81338) - Uses `os.copy_file_range` where available (see pythonGH-81340) - Uses `_winapi.CopyFile2` where available, even though this copies more metadata than the other implementations. This makes `WindowsPath.copy()` more similar to `shutil.copy2()`. The method is presently _less_ specified than the `shutil` functions to allow OS-specific optimizations that might copy more or less metadata. Incorporates code from pythonGH-81338 and pythonGH-93152. Co-authored-by: Eryk Sun <[email protected]>

Use `cp` since `shutil.copy` doesn't do reflinks: python/cpython#81338 reflinks are a lot more efficient. Signed-off-by: Colin Walters <[email protected]>

vstinner added 3.9 only security fixes stdlib Python modules in the Lib dir performance Performance or resource usage labels Jun 4, 2019

ezio-melotti transferred this issue from another repository Apr 10, 2022

dstufft mentioned this issue May 11, 2022

optimize package installation for space and speed by using copy-on-write file clones ("reflinks") and storing wheel cache unpacked pypa/pip#11092

Open

1 task

illia-v mentioned this issue May 23, 2022

gh-81340: Use copy_file_range in shutil.copyfile copy functions #93152

Open

illia-v mentioned this issue Jun 3, 2022

Expose FICLONE and FICLONERANGE constants in fcntl #93475

Closed

iritkatriel added 3.12 bugs and security fixes and removed 3.9 only security fixes labels Sep 12, 2022

ThomasWaldmann mentioned this issue Feb 11, 2023

create: better handling "file changed" warnings, mitigation by retries and/or reflinks borgbackup/borg#6622

Closed

Infernio mentioned this issue Mar 19, 2023

Native Linux Support wrye-bash/wrye-bash#243

Closed

13 tasks

Infernio mentioned this issue Oct 21, 2023

File operations wrye-bash/wrye-bash#241

Open

7 tasks

erlend-aasland added 3.13 bugs and security fixes and removed 3.12 bugs and security fixes labels Jan 5, 2024

eskultety mentioned this issue Mar 7, 2024

Replace reflink dependency with an implementation using Python stdlib primitives containerbuildsystem/cachi2#483

Closed

rcoup mentioned this issue Apr 18, 2024

CI: run Python tests in random order/parallel [WIP] OSGeo/gdal#9266

Closed

3 tasks

thesamesam mentioned this issue May 16, 2024

[WIP] GH-73991: Add os.copy() and friends #119079

Closed

barneygale mentioned this issue Jun 3, 2024

GH-73991: Add pathlib.Path.copy() #119058

Merged

barneygale mentioned this issue Aug 26, 2024

Support moving across filesystems in pathlib.Path, as shutil.move() does #73991

Closed

psss pushed a commit to teemtee/testcloud that referenced this issue Oct 16, 2024

image: Use reflinks if available

e1b320f

Use `cp` since `shutil.copy` doesn't do reflinks: python/cpython#81338 reflinks are a lot more efficient. Signed-off-by: Colin Walters <[email protected]>

shutil: add reflink=False to file copy functions to control clone/CoW copies (use copy_file_range) #81338

shutil: add reflink=False to file copy functions to control clone/CoW copies (use copy_file_range) #81338

Comments

vstinner commented Jun 4, 2019

vstinner commented Jun 4, 2019

vstinner commented Jun 4, 2019

giampaolo commented Jun 5, 2019

vstinner commented Jun 5, 2019

vstinner commented Jun 5, 2019

giampaolo commented Jun 5, 2019

giampaolo commented Jun 5, 2019

vstinner commented Jun 5, 2019

koobs commented Aug 27, 2019

albertz mannequin commented Dec 29, 2020

albertz mannequin commented Dec 31, 2020

dulanic mannequin commented Oct 7, 2021

giampaolo commented Oct 7, 2021

albertz mannequin commented Oct 7, 2021

illia-v commented Apr 18, 2022

illia-v commented Jun 3, 2022

vstinner commented Jun 6, 2022

vstinner commented Jun 6, 2022

illia-v commented Jun 6, 2022

vstinner commented Jun 6, 2022

bbappserver commented Jul 15, 2022

ThomasWaldmann commented Feb 11, 2023

nanonyme commented Feb 11, 2023 • edited Loading

ThomasWaldmann commented Feb 11, 2023 • edited Loading

koobs commented Feb 11, 2023

calumapplepie commented Dec 13, 2023 • edited Loading

cburroughs commented Apr 9, 2024

GCC

Golang

Emacs Lisp

Java

Nodejs

PHP

Ruby

Rust

bronger commented Apr 11, 2024

Java

Hi-Angel commented Apr 11, 2024

Java

bplb commented Apr 11, 2024

thesamesam commented Apr 11, 2024

nanonyme commented Apr 11, 2024

thesamesam commented Apr 11, 2024

bplb commented Apr 11, 2024

bplb commented Apr 11, 2024

Java

cburroughs commented Apr 19, 2024

nanonyme commented Feb 11, 2023 •

edited

Loading

ThomasWaldmann commented Feb 11, 2023 •

edited

Loading

calumapplepie commented Dec 13, 2023 •

edited

Loading