Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update docstring of btrfs_util_set_default_subvolume #924

Closed
wants to merge 36 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
79ce9b6
libbtrfsutil: bump btrfsutil version, add release steps
kdave Sep 18, 2024
762c7e8
btrfs-progs: libbtrfsutil/python: use MANIFEST.in for headers
adam900710 Sep 21, 2024
c2c922f
btrfs-progs: libbtrfsutil/python: reuse existing README.md for long d…
adam900710 Sep 21, 2024
5d2ef32
btrfs-progs: receive: make option quiet work for chroot
s-hamann Sep 28, 2024
59f74c2
btrfs-progs: mkfs: rework --subvol CLI option
maharmstone Sep 30, 2024
6a1d4ad
btrfs-progs: mkfs: avoid dynamic allocation when doing --subvol
maharmstone Sep 30, 2024
bb1e846
btrfs-progs: tests: also test --subvol modifiers in mkfs-tests/036-ro…
maharmstone Sep 30, 2024
dbabafa
btrfs-progs: fix usage warning in common/help.c
asj Oct 11, 2024
1cc034e
btrfs-progs: docs: update 6.11 contribution graphs
kdave Jan 9, 2024
e671013
btrfs-progs: docs: add kernel changelogs for 6.9, 6.10 and 6.11
kdave Mar 11, 2024
2595fbf
btrfs-progs: docs: fix typo and dead link
maharmstone Oct 16, 2024
f6dc0e8
btrfs-progs: check: explicit holes in log tree don't get csummed
maharmstone Oct 4, 2024
175cbfc
btrfs-progs: check: handle compressed extents when checking tree log
maharmstone Oct 4, 2024
6c7d2a3
btrfs-progs: check: check main csum root if csum not in log tree
maharmstone Oct 7, 2024
9d533a6
btrfs-progs: fix the typo inside kernel-by-version.rst
adam900710 Oct 20, 2024
48655de
btrfs-progs: docs: Added DEV_ITEM object id.
hanyuwei70 Oct 15, 2024
8312024
btrfs-progs: add an image explicit hole to fsck-tests/065-valid-log-tree
maharmstone Oct 24, 2024
d260910
btrfs-progs: add an image with compressed extent to fsck-tests/065-va…
maharmstone Oct 24, 2024
58dfcf8
btrfs-progs: add an image to fsck-tests/065-valid-log-tree where csum…
maharmstone Oct 24, 2024
d2c40c1
btrfs-progs: print-tree: use ARRAY_SIZE() to replace open-coded ones
adam900710 Oct 4, 2024
010b93f
btrfs-progs: print-tree: cleanup __print_readable_flag()
adam900710 Oct 4, 2024
d72ab68
btrfs-progs: print-tree: use readable_flag_entry for inode flags
adam900710 Oct 4, 2024
5b0452d
btrfs-progs: docs: say where to get help
csnover Aug 6, 2024
b3f6300
btrfs-progs: docs: clarify transid verify error recoverability
csnover Aug 6, 2024
d832a32
btrfs-progs: docs: clarify how btrfs check works with replicas
csnover Aug 6, 2024
18dcd2d
btrfs-procs: docs: add warning about balance to replace failing device
csnover Aug 8, 2024
66f08f9
btrfs-progs: docs: enhance the scrub chapter
csnover Aug 5, 2024
f4b2a33
btrfs-progs: docs: fix double word in intro
silopolis Oct 23, 2024
324bea5
btrfs-progs: docs: fix BTRFS capitalization
silopolis Oct 23, 2024
881f1e3
btrfs-progs: docs: subvolume intro editing
silopolis Oct 23, 2024
499a049
btrfs-progs: docs: auto-repair editing
Oct 23, 2024
b5d8449
btrfs-progs: docs: checksumming editing
Oct 23, 2024
479103c
btrfs-progs: docs: add 6.12 kernel development statistics
kdave Jan 9, 2024
bc574b1
btrfs-progs: device-utils: include libgen.h for musl
martinetd Nov 19, 2024
206baf7
btrfs-progs: utils: ask_user: flush stdout after prompt
martinetd Nov 19, 2024
1b28d8d
Update docstring of btrfs_util_set_default_subvolume
dcermak Nov 19, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 13 additions & 11 deletions Documentation/Auto-repair.rst
Original file line number Diff line number Diff line change
@@ -1,14 +1,16 @@
Auto-repair on read
===================

Data or metadata that are found to be damaged (e.g. because the checksum does
not match) at the time they're read from a device can be salvaged in case the
filesystem has another valid copy when using block group profile with redundancy
(DUP, RAID1-like, RAID5/6). The correct data are returned to the user application
and the damaged copy is replaced by it. When this happen a message is emitted
to the system log.

If there are more copies of data and one of them is damaged but not read by
user application then this is not detected. To verify all data and metadata
copies there's :doc:`scrub<Scrub>` that needs to be started manually, automatic
repairs happens in that case.
If data or metadata that are found to be damaged at the time they’re read from a device,
for example because the checksum does not match, they can be salvaged if the filesystem
has another valid copy. This can be achieved by using a block group profile with redundancy
like `DUP`, RAID1-like, or RAID5/6.

The correct data is returned to the user application and the damaged copy is replaced by it.
When this happens, a message is emitted to the system log.

If there are multiple copies of data and one of them is damaged but not read by the user
application, then this is not detected.

To ensure the verification and automatic repair of all data and metadata copies, the
:doc:`scrub<Scrub>` operation must be initiated manually.
1 change: 1 addition & 0 deletions Documentation/Contributors.rst
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,7 @@ Statistics for 6.x series
"6.9", "19", "110727", "161231", "147", "+2476 -1323"
"6.10", "21", "110878", "161751", "154", "+2993 -2473"
"6.11", "18", "111848", "163484", "188", "+5776 -4043"
"6.12", "20", "111881", "163548", "148", "+1868 -1804"


Legend:
Expand Down
2 changes: 1 addition & 1 deletion Documentation/Introduction.rst
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ Feature overview:
* :doc:`Offline filesystem check<btrfs-check>`
* :doc:`In-place conversion<Convert>` of existing ext2/3/4 and reiserfs filesystems
* :doc:`Seeding device.<Seeding-device>` Create a (readonly) filesystem that
acts as a template to seed other Btrfs filesystems. The original filesystem
acts as a template to seed other BTRFS filesystems. The original filesystem
and devices are included as a readonly starting point for the new filesystem.
Using copy on write, all modifications are stored on different devices; the
original is unchanged.
Expand Down
125 changes: 123 additions & 2 deletions Documentation/Kernel-by-version.rst
Original file line number Diff line number Diff line change
Expand Up @@ -557,16 +557,137 @@ Fixes:
6.9 (May 2024)
^^^^^^^^^^^^^^

TBD
Pull requests:
`v6.9-rc1 (1)<https://git.kernel.org/linus/43a7548e28a6df12a6170421d9d016c576010baa>`__,
`v6.9-rc1 (2)<https://git.kernel.org/linus/7b65c810a1198b91ed6bdc49ddb470978affd122>`__,
`v6.9-rc2 <https://git.kernel.org/linus/400dd456bda8be0b566f2690c51609ea02f85766>`__,
`v6.9-rc3 <https://git.kernel.org/linus/20cb38a7af88dc40095da7c2c9094da3873fea23>`__,
`v6.9-rc5 <https://git.kernel.org/linus/8cd26fd90c1ad7acdcfb9f69ca99d13aa7b24561>`__,
`v6.9-rc6 <https://git.kernel.org/linus/e88c4cfcb7b888ac374916806f86c17d8ecaeb67>`__,
`v6.9-rc7 <https://git.kernel.org/linus/f03359bca01bf4372cf2c118cd9a987a5951b1c8>`__,
`v6.9-rc8 <https://git.kernel.org/linus/dccb07f2914cdab2ac3a5b6c98406f765acab803>`__,

Performance improvements:

- minor speedup in logging when repeatedly allocated structure is preallocated
only once, improves latency and decreases lock contention

- minor throughput increase (+6%), reduced lock contention after clearing
delayed allocation bits, applies to several common workload types

- features under CONFIG_BTRFS_DEBUG:
- sysfs knob for setting the how checksums are calculated when submitting IO,
inline or offloaded to a thread, this affects latency and throughput on some
block group profiles

Notable fixes:

- fix device tracking in memory that broke grub-probe

- zoned mode fixes:
- use zone-aware super block access during scrub
- delete zones that are 100% unusable to reclaim space

Other notable changes:

- additional validation of devices by major:minor numbers

6.10 (Jul 2024)
^^^^^^^^^^^^^^^

TBD
Pull requests:
`v6.10-rc1 (1)<https://git.kernel.org/linus/a3d1f54d7aa4c3be2c6a10768d4ffa1dcb620da9>`__,
`v6.10-rc1 (2)<https://git.kernel.org/linus/02c438bbfffeabf8c958108f9cf88cdb1a11a323>`__,
`v6.10-rc3 (1)<https://git.kernel.org/linus/19ca0d8a433ff37018f9429f7e7739e9f3d3d2b4>`__,
`v6.10-rc3 (2)<https://git.kernel.org/linus/07978330e63456a75a6d5c1c5053de24bdc9d16f>`__,
`v6.10-rc5 <https://git.kernel.org/linus/50736169ecc8387247fe6a00932852ce7b057083>`__,
`v6.10-rc6 <https://git.kernel.org/linus/66e55ff12e7391549c4a85a7a96471dcf891cb03>`__,
`v6.10-rc7 (1)<https://git.kernel.org/linus/cfbc0ffea88c764d23f69efe6ecb74918e0f588e>`__,
`v6.10-rc7 (2)<https://git.kernel.org/linus/661e504db04c6b7278737ee3a9116738536b4ed4>`__,
`v6.10-rc8 <https://git.kernel.org/linus/975f3b6da18020f1c8a7667ccb08fa542928ec03>`__,

Performance improvements:

- inline b-tree locking functions, improvement in metadata-heavy changes

- relax locking on a range that's being reflinked, allows read operations to
run in parallel

- speed up NOCOW write checks (throughput +9% on a sample test)

- extent locking ranges have been reduced in several places, namely around
delayed ref processing

Notable fixes or changes:

- add back mount option *norecovery*, deprecated long time ago and removed in
6.8 but still in use

- fix potential infinite loop when doing block group reclaim

- extent map shrinker, allow memory consumption reduction for direct io loads


6.11 (Sep 2024)
^^^^^^^^^^^^^^^

Pull requests:
`v6.11-rc1 (1)<https://git.kernel.org/linus/a1b547f0f217cfb06af7eb4ce8488b02d83a0370>`__,
`v6.11-rc1 (2)<https://git.kernel.org/linus/53a5182c8a6805d3096336709ba5790d16f8c369>`__,
`v6.11-rc2 <https://git.kernel.org/linus/e4fc196f5ba36eb7b9758cf2c73df49a44199895>`__,
`v6.11-rc3 <https://git.kernel.org/linus/6a0e38264012809afa24113ee2162dc07f4ed22b>`__,
`v6.11-rc4 <https://git.kernel.org/linus/1fb918967b56df3262ee984175816f0acb310501>`__,
`v6.11-rc4 <https://git.kernel.org/linus/57b14823ea68592bd67e4992a2bf0dd67abb68d6>`__,
`v6.11-rc6 <https://git.kernel.org/linus/2840526875c7e3bcfb3364420b70efa203bad428>`__,
`v6.11-rc7 <https://git.kernel.org/linus/1263a7bf8a0e77c6cda8f5a40509d99829216a45>`__,

User visible features:

- dynamic block group reclaim:
- tunable framework to avoid situations where eager data allocations prevent
creating new metadata chunks due to lack of unallocated space
- reuse sysfs knob bg_reclaim_threshold (otherwise used only in zoned mode)
for a fixed value threshold
- new on/off sysfs knob "dynamic_reclaim" calculating the value based on
heuristics, aiming to keep spare working space for relocating chunks but
not to needlessly relocate partially utilized block groups or reclaim newly
allocated ones
- stats are exported in sysfs per block group type, files "reclaim_*"
- this may increase IO load at unexpected times but the corner case of no
allocatable block groups is known to be worse

- automatically remove qgroup of deleted subvolumes:
- adjust qgroup removal conditions, make sure all related subvolume data are
already removed, or return EBUSY, also take into account setting of sysfs
drop_subtree_threshold
- also works in squota mode

- mount option updates: new modes of 'rescue=' that allow to mount images
(read-only) that could have been partially converted by user space tools
- ignoremetacsums - invalid metadata checksums are ignored
- ignoresuperflags - super block flags that track conversion in progress
(like UUID or checksums)

Other notable changes or fixes:

- space cache v1 marked as deprecated (a warning printed in syslog), the
free-space tree (i.e. the v2) has been default in "mkfs.btrfs" since 5.15,
the kernel code will be removed in the future on a conservative schedule

- tree checker improvements:
- validate data reference items
- validate directory item type

- send also detects last extent suitable for cloning (and not a write)

- extent map shrinker (a memory reclaim optimization) added in 6.10 now
available only under CONFIG_BTRFS_DEBUG due to performance problems

- update target inode's ctime on unlink,
`mandated by POSIX <https://pubs.opengroup.org/onlinepubs/9699919799/functions/unlink.html>`__

- in zoned mode, detect unexpected zone write pointer change

5.x
---

Expand Down
7 changes: 7 additions & 0 deletions Documentation/btrfs-check.rst
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,13 @@ by the option *--readonly*.

:command:`btrfsck` is an alias of :command:`btrfs check` command and is now deprecated.

.. note::
Even though the filesystem checker requires a device argument, it scans for all
devices belonging to the same filesystem, thus it should not cause a difference
using different devices of the same filesystem.
Furthermore `btrfs check` will automatically choose the good mirror, thus as long
as there is a good copy for metadata, it will not report such case as an error.

.. warning::
Do not use *--repair* unless you are advised to do so by a developer
or an experienced user, and then only after having accepted that no *fsck*
Expand Down
44 changes: 22 additions & 22 deletions Documentation/ch-checksumming.rst
Original file line number Diff line number Diff line change
@@ -1,32 +1,32 @@
Data and metadata are checksummed by default, the checksum is calculated before
write and verified after reading the blocks from devices. The whole metadata
block has a checksum stored inline in the b-tree node header, each data block
Data and metadata are checksummed by default. The checksum is calculated before
writing and verified after reading the blocks from devices. The whole metadata
block has an inline checksum stored in the b-tree node header. Each data block
has a detached checksum stored in the checksum tree.

There are several checksum algorithms supported. The default and backward
compatible is *crc32c*. Since kernel 5.5 there are three more with different
compatible algorithm is *crc32c*. Since kernel 5.5 there are three more with different
characteristics and trade-offs regarding speed and strength. The following list
may help you to decide which one to select.

CRC32C (32bit digest)
default, best backward compatibility, very fast, modern CPUs have
CRC32C (32 bits digest)
Default, best backward compatibility. Very fast, modern CPUs have
instruction-level support, not collision-resistant but still good error
detection capabilities
detection capabilities.

XXHASH (64bit digest)
can be used as CRC32C successor, very fast, optimized for modern CPUs utilizing
instruction pipelining, good collision resistance and error detection
XXHASH (64 bits digest)
Can be used as CRC32C successor. Very fast, optimized for modern CPUs utilizing
instruction pipelining, good collision resistance and error detection.

SHA256 (256bit digest)
a cryptographic-strength hash, relatively slow but with possible CPU
instruction acceleration or specialized hardware cards, FIPS certified and
in wide use
SHA256 (256 bits digest)
Cryptographic-strength hash. Relatively slow but with possible CPU
instruction acceleration or specialized hardware cards. FIPS certified and
in wide use.

BLAKE2b (256bit digest)
a cryptographic-strength hash, relatively fast with possible CPU acceleration
using SIMD extensions, not standardized but based on BLAKE which was a SHA3
finalist, in wide use, the algorithm used is BLAKE2b-256 that's optimized for
64bit platforms
BLAKE2b (256 bits digest)
Cryptographic-strength hash. Relatively fast, with possible CPU acceleration
using SIMD extensions. Not standardized but based on BLAKE which was a SHA3
finalist, in wide use. The algorithm used is BLAKE2b-256 that's optimized for
64-bit platforms.

The *digest size* affects overall size of data block checksums stored in the
filesystem. The metadata blocks have a fixed area up to 256 bits (32 bytes), so
Expand Down Expand Up @@ -61,8 +61,8 @@ The accelerated versions are however provided by the modules and must be loaded
explicitly (:command:`modprobe sha256`) before mounting the filesystem to make use of
them. You can check in :file:`/sys/fs/btrfs/FSID/checksum` which one is used. If you
see *sha256-generic*, then you may want to unmount and mount the filesystem
again, changing that on a mounted filesystem is not possible.
Check the file :file:`/proc/crypto`, when the implementation is built-in, you'd find
again. Changing that on a mounted filesystem is not possible.
Check the file :file:`/proc/crypto`, when the implementation is built-in, you'd find:

.. code-block:: none

Expand All @@ -72,7 +72,7 @@ Check the file :file:`/proc/crypto`, when the implementation is built-in, you'd
priority : 100
...

while accelerated implementation is e.g.
While accelerated implementation is e.g.:

.. code-block:: none

Expand Down
53 changes: 44 additions & 9 deletions Documentation/ch-scrub-intro.rst
Original file line number Diff line number Diff line change
@@ -1,14 +1,49 @@
Scrub is a pass over all filesystem data and metadata and verifying the
checksums. If a valid copy is available (replicated block group profiles) then
the damaged one is repaired. All copies of the replicated profiles are validated.
Scrub is a validation pass over all filesystem data and metadata that detects
data checksum errors, basic super block errors, basic metadata block header errors,
and disk read errors.

Scrub is done on a per-device base, if a device is specified to `btrfs scrub`, then
only that device will be scrubbed. Although btrfs will also try to read other device
to find a good copy, if the mirror on that specified device failed to be read or pass
verification.

If a path of btrfs is specified to `btrfs scrub`, btrfs will scrub all devices
in parallel.

On filesystems that use replicated block group profiles (e.g. raid1), read-write
scrub will also automatically repair any damage by copying verified good data
from one of the other replicas.

Such automatic repair is also carried out when reading metadata or data from a
read-write mounted btrfs.

.. warning::
Setting the ``No_COW`` (``chattr +C``) attribute on a file implicitly enables
``nodatasum``. This means that while metadata for these files continues to
be validated and corrected by scrub, the actual file data is not.

Furthermore, btrfs does not currently mark missing or failed disks as
unreliable, so will continue to load-balance reads to potentially damaged
replicas. This is not a problem normally because damage is detected by
checksum validation, but because ``No_COW`` files are
not protected by checksum, btrfs has no idea which mirror is good thus it can
return the bad contents to the user space tool.

Detecting and recovering from such failure requires manual intervention.

Notably, `systemd sets +C on journals by default <https://github.com/systemd/systemd/commit/11689d2a021d95a8447d938180e0962cd9439763>`_,
and `libvirt ≥ 6.6 sets +C on storage pool directories by default <https://www.libvirt.org/news.html#v6-6-0-2020-08-02>`_.
Other applications or distributions may also set +C to try to improve
performance.

.. note::
Scrub is not a filesystem checker (fsck) and does not verify nor repair
structural damage in the filesystem. It really only checks checksums of data
and tree blocks, it doesn't ensure the content of tree blocks is valid and
consistent. There's some validation performed when metadata blocks are read
from disk (:doc:`Tree-checker`) but it's not extensive and cannot substitute
full :doc:`btrfs-check` run.
Scrub is not a filesystem checker (fsck). It can only detect filesystem damage
using the checksum validation, and it can only repair
filesystem damage by copying from other known good replicas.

:doc:`btrfs-check` performs more exhaustive checking and can sometimes be
used, with expert guidance, to rebuild certain corrupted filesystem structures
in the absence of any good replica.

The user is supposed to run it manually or via a periodic system service. The
recommended period is a month but it could be less. The estimated device bandwidth
Expand Down
Loading
Loading