Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

linux: overlayfs support #101

Merged
8 commits merged into from
Oct 21, 2022
Merged

linux: overlayfs support #101

8 commits merged into from
Oct 21, 2022

Conversation

ghost
Copy link

@ghost ghost commented Oct 19, 2022

Backported openzfs#9600 and openzfs#12209 with new feature flags removed.

This means in cases where a pool is not cleanly exported you may end up
with loss of unflushed TX_RENAME_EXCHANGE and TX_RENAME_WHITEOUT
transactions and all transactions that follow them in the ZIL for that
dataset if the pool is then imported by a release build of ZFS that
does not recognize the new TX types. A warning message is logged to
dmesg and the pool is fully usable, just missing the last changes.
Debug builds will panic when they fail to replay the full log.

The alternative is a pool that cannot be imported on any version of ZFS
lacking support for TX_RENAME_EXCHANGE and TX_RENAME_WHITEOUT replay
when uncleanly exported. That prevents people from either upgrading
their pool for the ability to use overlayfs features or going back to a
previous boot environment after having upgraded the pool.

Jira: https://ixsystems.atlassian.net/browse/NAS-109036

snajpa and others added 3 commits October 19, 2022 17:30
Open files, which aren't present in the snapshot, which is being
roll-backed to, need to disappear from the visible VFS image of
the dataset.

Kernel provides d_drop function to drop invalid entry from
the dcache, but inode can be referenced by dentry multiple dentries.

The introduced zpl_d_drop_aliases function walks and invalidates
all aliases of an inode.

Signed-off-by: Pavel Snajdr <[email protected]>
This allows for much cleaner VERIFY-level assertions.

Signed-off-by: Aleksa Sarai <[email protected]>
This is in preparation for RENAME_EXCHANGE and RENAME_WHITEOUT support
for ZoL, but the changes here allow for far nicer fallbacks than the
previous implementation (the source and target are re-linked in case of
the final link failing).

In addition, a small cleanup was done for the "target exists but is a
different type" codepath so that it's more understandable.

Signed-off-by: Aleksa Sarai <[email protected]>
@ghost ghost requested review from amotin and ixhamza October 19, 2022 19:37
@ghost ghost changed the title linux: renameat2 flags support linux: renameat2 flags support and zpl_revalidate removal Oct 19, 2022
@ghost ghost changed the title linux: renameat2 flags support and zpl_revalidate removal linux: overlayfs support Oct 19, 2022
@ghost ghost force-pushed the rename2-backport branch from 52d3842 to fbb9b45 Compare October 19, 2022 19:57
AUTHORS Show resolved Hide resolved
module/zfs/zfs_replay.c Outdated Show resolved Hide resolved
module/zfs/zfs_replay.c Outdated Show resolved Hide resolved
@ghost ghost force-pushed the rename2-backport branch 2 times, most recently from e881e8a to fe2c0e5 Compare October 20, 2022 17:35
@ghost
Copy link
Author

ghost commented Oct 20, 2022

I reworked the backport of the commit adding the new TX types so they line up with the numbers in master (because we don't have TX_SETSAXATTR backported), and I adjusted the replay vectors to include an error function in the missing locations.

cyphar and others added 4 commits October 20, 2022 18:07
Implement support for Linux's RENAME_* flags (for renameat2). Aside from
being quite useful for userspace (providing race-free ways to exchange
paths and implement mv --no-clobber), they are used by overlayfs and are
thus required in order to use overlayfs-on-ZFS.

In order for us to represent the new renameat2(2) flags in the ZIL, we
create two new transaction types for the two flags which need
transactional-level support (RENAME_EXCHANGE and RENAME_WHITEOUT).
RENAME_NOREPLACE does not need any ZIL support because we know that if
the operation succeeded before creating the ZIL entry, there was no file
to be clobbered and thus it can be treated as a regular TX_RENAME.

As these new transaction flags modify the on-disk format, it is
necessary to add feature flags (rename_exchange and rename_whiteout) --
two flags rather than one so that any future extensions to renameat2 can
be handled consistently with another feature flag for the new feature.
In order to reduce compatibility issues of pools between Linux (which
supports these flags) and other operating systems (which currently
don't), these feature flags are only activated once a new
TX_RENAME_{EXCHANGE,WHITEOUT} is added to the ZIL and are deactivated
when the ZIL is destroyed. This means that a cleanly exported pool can
be imported onto a non-Linux system without issue (a pool with a ZIL log
containing a new TX_RENAME_* entry can also be imported on such systems,
but only in readonly mode).

While this design is very similar to zilsaxattr, zilsaxattr is activated
whenever the ZIL is instantiated (even if the new transaction type is
never used) -- the rename_* feature flags are only activated when a new
rename transaction is being added to the ZIL.

Pavel helped with carrying this PR and reworking the ZIL bits after I
originally wrote it, but the final version was effectively rewritten
completely because the ZIL handling needed to be reworked entirely to be
crash safe and to use feature flags.

Cc: Pavel Snajdr <[email protected]>
Signed-off-by: Aleksa Sarai <[email protected]>
EL7 has backported renameat2(2) support, but this was done by creating a
wrapper struct around the inode_operations struct and adding the rename2
callback to the wrapping struct. The semantics are the same as the
upstream callback, so just detect and use the wrapper.

Since we only need this wrapper for rename2 (and renaming is a directory
operation) we only use the wrapper for the directory iops struct.

Signed-off-by: Pavel Snajdr <[email protected]>
Signed-off-by: Aleksa Sarai <[email protected]>
Since mv(1) doesn't expose the RENAME_* flags we need to have our own
variation in the tests/ tree. The tests are fairly obvious functional
tests, though in the future (once we solve the d_revalidate issue) we
might also add a full-stack overlayfs integration test.

Signed-off-by: Aleksa Sarai <[email protected]>
@ghost ghost force-pushed the rename2-backport branch from fe2c0e5 to 75a9df9 Compare October 20, 2022 18:09
@ghost
Copy link
Author

ghost commented Oct 20, 2022

Fixed zdb to handle the hole for setsaxattr, and dropped manpage changes for the features.

This means in cases where a pool is not cleanly exported you may end up
with loss of unflushed TX_RENAME_EXCHANGE and TX_RENAME_WHITEOUT
transactions and all transactions that follow them in the ZIL for that
dataset if the pool is then imported by a release build of ZFS that
does not recognize the new TX types.  A warning message is logged to
dmesg and the pool is fully usable, just missing the last changes.
Debug builds will panic when they fail to replay the full log.

It's also possible a different new TX type might land upstream before
this, in which case similar results are likely to be encountered.

The alternative is a pool that cannot be imported on any version of ZFS
lacking support for TX_RENAME_EXCHANGE and TX_RENAME_WHITEOUT replay
when uncleanly exported.  That prevents people from either upgrading
their pool for the ability to use overlayfs features or going back to a
previous boot environment after having upgraded the pool.

Signed-off-by: Ryan Moeller <[email protected]>
@ghost ghost force-pushed the rename2-backport branch from 75a9df9 to 6ac99af Compare October 20, 2022 19:14
@ghost
Copy link
Author

ghost commented Oct 20, 2022

Removed unused slog_017_neg.ksh test related to features.

@ghost ghost merged commit a250721 into truenas/zfs-2.1-release Oct 21, 2022
@ghost ghost deleted the rename2-backport branch October 21, 2022 15:13
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants