Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-93180: Update documentation of os.copy_file_range #93182

Merged
merged 10 commits into from
Jun 8, 2022
17 changes: 15 additions & 2 deletions Doc/library/os.rst
Original file line number Diff line number Diff line change
Expand Up @@ -798,18 +798,31 @@ as internal buffering of data.
Copy *count* bytes from file descriptor *src*, starting from offset
*offset_src*, to file descriptor *dst*, starting from offset *offset_dst*.
If *offset_src* is None, then *src* is read from the current position;
respectively for *offset_dst*. The files pointed by *src* and *dst*
respectively for *offset_dst*.

In Linux kernel older than 5.3, the files pointed by *src* and *dst*
must reside in the same filesystem, otherwise an :exc:`OSError` is
raised with :attr:`~OSError.errno` set to :data:`errno.EXDEV`.

This copy is done without the additional cost of transferring data
from the kernel to user space and then back into the kernel. Additionally,
some filesystems could implement extra optimizations. The copy is done as if
some filesystems could implement extra optimizations, such as the use of
reflinks (i.e., two or more inodes that share pointers to the same
copy-on-write disk blocks; supported file systems include btrfs and XFS)
and server-side copy (in the case of NFS). The copy is done as if
both files are opened as binary.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function arguments are file descriptors. What does "binary" mean for file descriptors?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sentence is not new. I guess it means that bytes written to a destination file will be equal to ones read from the source file.

In contrary, if a file range is read in the text mode in Python and it is written in the text mode to a different file, bytes may differ. For example, line endings may be changed. See https://docs.python.org/3/tutorial/inputoutput.html#reading-and-writing-files.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind to rephase the sentence to explain that? Explain that the copy does not change the line endings, for example?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done 069651d.

BTW, shutil.copyfile looks to behave the same, but shutil.copyfileobj seems to respect the text mode.


The return value is the amount of bytes copied. This could be less than the
amount requested.

.. note::

This function should not be used for copying a file from special
filesystems like procfs and sysfs because of
`a known issue <https://lore.kernel.org/linux-fsdevel/20210126135012.1.If45b7cdc3ff707bc1efa17f5366057d60603c45f@changeid/>`_.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The link contains a fix. Was the fix merged into Linux?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it wasn't. I tested os.copy_file_range for copying /proc/cpuinfo on Linux 5.17, it returned 0 and the destination file was empty.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, some Python functions that operate files are prone to similar issues too :(. I'll create a few bug reports.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, some Python functions that operate files are prone to similar issues too :(. I'll create a few bug reports.

#93300 and #93296 are two examples.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to not add a link to an email.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about such a note?

:func:os.copy_file_range should not be used for copying a range of a pseudo file from a special filesystem like procfs and sysfs. It will always copy no bytes and return 0 as if the file was empty because of a known Linux kernel issue.

It will copy nothing and return 0 because file sizes there are not known
to the system.

.. availability:: Linux kernel >= 4.5 or glibc >= 2.27.

.. versionadded:: 3.8
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Update documentation of :func:`os.copy_file_range`. Patch by Illia Volochii.
illia-v marked this conversation as resolved.
Show resolved Hide resolved