-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
zfs_bclone_wait_dirty=1 broken for files with unallocated blocks at the end #15994
Labels
Type: Defect
Incorrect behavior (e.g. crash, hang)
Comments
13 tasks
amotin
added a commit
to amotin/zfs
that referenced
this issue
Mar 18, 2024
- When reading L0 block pointers handle buffers without ones and without dirty records as a holes. Those appear when dnode size was increased, but the end was never written, so there are no new indirection levels to store the pointers. It makes no sense to return EAGAIN here, since sync won't create new indirection levels until there will be actual writes. - When cloning blocks always set destination logical birth time to the current TXG. Otherwise if we are cloning over existing data, newly created holes may not be properly replicated later. While there, stop messing with physical birth time, it was already copied as part of the pointer and should already be zero for holes. Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Closes: openzfs#15994
amotin
added a commit
to amotin/zfs
that referenced
this issue
Mar 19, 2024
- When reading L0 block pointers handle buffers without ones and without dirty records as a holes. Those appear when dnode size was increased, but the end was never written, so there are no new indirection levels to store the pointers. It makes no sense to return EAGAIN here, since sync won't create new indirection levels until there will be actual writes. - When cloning blocks set destination hole logical birth time to the current TXG. Otherwise if we are cloning over existing data, newly created holes may not be properly replicated later. While there, stop messing with physical birth time, it was already copied as part of the pointer and should already be zero for holes. Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Closes: openzfs#15994
amotin
added a commit
to amotin/zfs
that referenced
this issue
Mar 20, 2024
- When reading L0 block pointers handle buffers without ones and without dirty records as a holes. Those appear when dnode size was increased, but the end was never written, so there are no new indirection levels to store the pointers. It makes no sense to return EAGAIN here, since sync won't create new indirection levels until there will be actual writes. - When cloning blocks set destination hole logical birth time to the current TXG. Otherwise if we are cloning over existing data, newly created holes may not be properly replicated later. Use BP_SET_BIRTH() when possible to not replicate its logic. Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Closes: openzfs#15994
amotin
added a commit
to amotin/zfs
that referenced
this issue
Mar 25, 2024
- When reading L0 block pointers handle buffers without ones and without dirty records as a holes. Those appear when dnode size was increased, but the end was never written, so there are no new indirection levels to store the pointers. It makes no sense to return EAGAIN here, since sync won't create new indirection levels until there will be actual writes. - When cloning blocks set destination hole logical birth time to the current TXG. Otherwise if we are cloning over existing data, newly created holes may not be properly replicated later. Use BP_SET_BIRTH() when possible to not replicate its logic. Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Closes: openzfs#15994
amotin
added a commit
to amotin/zfs
that referenced
this issue
Mar 26, 2024
- When reading L0 block pointers handle buffers without ones and without dirty records as a holes. Those appear when dnode size was increased, but the end was never written, so there are no new indirection levels to store the pointers. It makes no sense to return EAGAIN here, since sync won't create new indirection levels until there will be actual writes. - When cloning blocks set destination hole logical birth time to the current TXG. Otherwise if we are cloning over existing data, newly created holes may not be properly replicated later. Use BP_SET_BIRTH() when possible to not replicate its logic. Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Closes: openzfs#15994
amotin
added a commit
to amotin/zfs
that referenced
this issue
Apr 17, 2024
- When reading L0 block pointers handle buffers without ones and without dirty records as a holes. Those appear when dnode size was increased, but the end was never written, so there are no new indirection levels to store the pointers. It makes no sense to return EAGAIN here, since sync won't create new indirection levels until there will be actual writes. - When cloning blocks set destination hole logical birth time to the current TXG. Otherwise if we are cloning over existing data, newly created holes may not be properly replicated later. Use BP_SET_BIRTH() when possible to not replicate its logic. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Closes openzfs#15994 Closes openzfs#16007
behlendorf
pushed a commit
that referenced
this issue
Apr 19, 2024
- When reading L0 block pointers handle buffers without ones and without dirty records as a holes. Those appear when dnode size was increased, but the end was never written, so there are no new indirection levels to store the pointers. It makes no sense to return EAGAIN here, since sync won't create new indirection levels until there will be actual writes. - When cloning blocks set destination hole logical birth time to the current TXG. Otherwise if we are cloning over existing data, newly created holes may not be properly replicated later. Use BP_SET_BIRTH() when possible to not replicate its logic. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Closes #15994 Closes #16007
lundman
pushed a commit
to openzfsonwindows/openzfs
that referenced
this issue
Sep 4, 2024
- When reading L0 block pointers handle buffers without ones and without dirty records as a holes. Those appear when dnode size was increased, but the end was never written, so there are no new indirection levels to store the pointers. It makes no sense to return EAGAIN here, since sync won't create new indirection levels until there will be actual writes. - When cloning blocks set destination hole logical birth time to the current TXG. Otherwise if we are cloning over existing data, newly created holes may not be properly replicated later. Use BP_SET_BIRTH() when possible to not replicate its logic. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Closes openzfs#15994 Closes openzfs#16007
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
System information
Describe the problem you're observing
With
zfs_bclone_enabled=1
andzfs_bclone_wait_dirty=1
, copying a file with unallocated blocks at the end gets stuck in the kernel forever. During this time, the kernel is also forcing txg syncs in an infinite loop.This behavior is also observed in #15933.
(The same underlying issue also means that bclone always fails for files with unallocated blocks at the end if
zfs_bclone_wait_dirty=0
.)Describe how to reproduce the problem
cp
never finishes and is stuck in an uninterruptible state unresponsive to bothSIGINT
andSIGQUIT
.Setting
zfs_bclone_wait_dirty=0
whilecp
is still running causescp
to finish immediately with errorcp: failed to clone 'dst' from 'src': Resource temporarily unavailable
Include any warning/errors/backtraces from the system logs
Nothing immediately in
dmesg
ordbgmsg
, but during the failure/proc/spl/kstat/zfs/testpool/txgs
shows that zfs is generating a lot of empty txgs:Root cause
#15842 adds logic to wait for sync when encountering dirty blocks implemented as syncing when
dmu_read_l0_bps
returnsEAGAIN
, but the logic is broken. cc @behlendorfNormally
dmu_read_l0_bps
returnsEAGAIN
for dirty blocks. However it also returnsEAGAIN
wheneverdb->db_blkptr == NULL
. This normally occurs for newly-written blocks not-yet-allocated, but it also occurs for sparse, unallocated blocks beyond the end of a fully-synced object. (More specifically, this occurs for any of the conditions that causedbuf_findbp
to returnENOENT
when holding the dbuf.)In this situation,
zfs_clone_range
tries to force a sync whenzfs_bclone_wait_dirty=1
, but syncing does not allocate any blocks since none are actually dirty. Then the next attempt runs into the same condition and syncs again in an infinite loop. Settingzfs_bclone_wait_dirty=0
breaks the loop and returns an error tocp
.This is trivially reproducible by creating an empty sparse file, as seen by
zdb
:Note that despite being a 256MiB file size according to ZPL metadata, the actual on-disk object is still a 1-level object with
dn_maxblkid == 0
and no indirect blocks which is sufficient to trigger thedb_blkptr == NULL
case upondbuf_hold
.The text was updated successfully, but these errors were encountered: