forked from openzfs/zfs
-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
linux: overlayfs support #101
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Open files, which aren't present in the snapshot, which is being roll-backed to, need to disappear from the visible VFS image of the dataset. Kernel provides d_drop function to drop invalid entry from the dcache, but inode can be referenced by dentry multiple dentries. The introduced zpl_d_drop_aliases function walks and invalidates all aliases of an inode. Signed-off-by: Pavel Snajdr <[email protected]>
This allows for much cleaner VERIFY-level assertions. Signed-off-by: Aleksa Sarai <[email protected]>
This is in preparation for RENAME_EXCHANGE and RENAME_WHITEOUT support for ZoL, but the changes here allow for far nicer fallbacks than the previous implementation (the source and target are re-linked in case of the final link failing). In addition, a small cleanup was done for the "target exists but is a different type" codepath so that it's more understandable. Signed-off-by: Aleksa Sarai <[email protected]>
ghost
changed the title
linux: renameat2 flags support
linux: renameat2 flags support and zpl_revalidate removal
Oct 19, 2022
ghost
changed the title
linux: renameat2 flags support and zpl_revalidate removal
linux: overlayfs support
Oct 19, 2022
ghost
force-pushed
the
rename2-backport
branch
from
October 19, 2022 19:57
52d3842
to
fbb9b45
Compare
ixhamza
reviewed
Oct 19, 2022
alek-p
reviewed
Oct 19, 2022
ghost
commented
Oct 19, 2022
ghost
commented
Oct 19, 2022
ghost
force-pushed
the
rename2-backport
branch
2 times, most recently
from
October 20, 2022 17:35
e881e8a
to
fe2c0e5
Compare
I reworked the backport of the commit adding the new TX types so they line up with the numbers in master (because we don't have TX_SETSAXATTR backported), and I adjusted the replay vectors to include an error function in the missing locations. |
Implement support for Linux's RENAME_* flags (for renameat2). Aside from being quite useful for userspace (providing race-free ways to exchange paths and implement mv --no-clobber), they are used by overlayfs and are thus required in order to use overlayfs-on-ZFS. In order for us to represent the new renameat2(2) flags in the ZIL, we create two new transaction types for the two flags which need transactional-level support (RENAME_EXCHANGE and RENAME_WHITEOUT). RENAME_NOREPLACE does not need any ZIL support because we know that if the operation succeeded before creating the ZIL entry, there was no file to be clobbered and thus it can be treated as a regular TX_RENAME. As these new transaction flags modify the on-disk format, it is necessary to add feature flags (rename_exchange and rename_whiteout) -- two flags rather than one so that any future extensions to renameat2 can be handled consistently with another feature flag for the new feature. In order to reduce compatibility issues of pools between Linux (which supports these flags) and other operating systems (which currently don't), these feature flags are only activated once a new TX_RENAME_{EXCHANGE,WHITEOUT} is added to the ZIL and are deactivated when the ZIL is destroyed. This means that a cleanly exported pool can be imported onto a non-Linux system without issue (a pool with a ZIL log containing a new TX_RENAME_* entry can also be imported on such systems, but only in readonly mode). While this design is very similar to zilsaxattr, zilsaxattr is activated whenever the ZIL is instantiated (even if the new transaction type is never used) -- the rename_* feature flags are only activated when a new rename transaction is being added to the ZIL. Pavel helped with carrying this PR and reworking the ZIL bits after I originally wrote it, but the final version was effectively rewritten completely because the ZIL handling needed to be reworked entirely to be crash safe and to use feature flags. Cc: Pavel Snajdr <[email protected]> Signed-off-by: Aleksa Sarai <[email protected]>
EL7 has backported renameat2(2) support, but this was done by creating a wrapper struct around the inode_operations struct and adding the rename2 callback to the wrapping struct. The semantics are the same as the upstream callback, so just detect and use the wrapper. Since we only need this wrapper for rename2 (and renaming is a directory operation) we only use the wrapper for the directory iops struct. Signed-off-by: Pavel Snajdr <[email protected]> Signed-off-by: Aleksa Sarai <[email protected]>
Since mv(1) doesn't expose the RENAME_* flags we need to have our own variation in the tests/ tree. The tests are fairly obvious functional tests, though in the future (once we solve the d_revalidate issue) we might also add a full-stack overlayfs integration test. Signed-off-by: Aleksa Sarai <[email protected]>
Signed-off-by: Aleksa Sarai <[email protected]>
ghost
force-pushed
the
rename2-backport
branch
from
October 20, 2022 18:09
fe2c0e5
to
75a9df9
Compare
Fixed zdb to handle the hole for setsaxattr, and dropped manpage changes for the features. |
This means in cases where a pool is not cleanly exported you may end up with loss of unflushed TX_RENAME_EXCHANGE and TX_RENAME_WHITEOUT transactions and all transactions that follow them in the ZIL for that dataset if the pool is then imported by a release build of ZFS that does not recognize the new TX types. A warning message is logged to dmesg and the pool is fully usable, just missing the last changes. Debug builds will panic when they fail to replay the full log. It's also possible a different new TX type might land upstream before this, in which case similar results are likely to be encountered. The alternative is a pool that cannot be imported on any version of ZFS lacking support for TX_RENAME_EXCHANGE and TX_RENAME_WHITEOUT replay when uncleanly exported. That prevents people from either upgrading their pool for the ability to use overlayfs features or going back to a previous boot environment after having upgraded the pool. Signed-off-by: Ryan Moeller <[email protected]>
ghost
force-pushed
the
rename2-backport
branch
from
October 20, 2022 19:14
75a9df9
to
6ac99af
Compare
Removed unused slog_017_neg.ksh test related to features. |
amotin
approved these changes
Oct 20, 2022
ghost
deleted the
rename2-backport
branch
October 21, 2022 15:13
This pull request was closed.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Backported openzfs#9600 and openzfs#12209 with new feature flags removed.
This means in cases where a pool is not cleanly exported you may end up
with loss of unflushed TX_RENAME_EXCHANGE and TX_RENAME_WHITEOUT
transactions and all transactions that follow them in the ZIL for that
dataset if the pool is then imported by a release build of ZFS that
does not recognize the new TX types. A warning message is logged to
dmesg and the pool is fully usable, just missing the last changes.
Debug builds will panic when they fail to replay the full log.
The alternative is a pool that cannot be imported on any version of ZFS
lacking support for TX_RENAME_EXCHANGE and TX_RENAME_WHITEOUT replay
when uncleanly exported. That prevents people from either upgrading
their pool for the ability to use overlayfs features or going back to a
previous boot environment after having upgraded the pool.
Jira: https://ixsystems.atlassian.net/browse/NAS-109036