Use simple folio migration function to avoid BUG in migrate_folio_extra #16723

tstabrawa · 2024-11-05T06:31:17Z

Motivation and Context

Starting in kernel version 6.3.0, Linux page migration is done in a batched manner, where several pages are first unmapped, and then the pages moved. See @RodoMa92's bisection test results in #15140 for details on the kernel change. As a result of this change, and due to ZFS's implicit use of fallback_migrate_folio (by not specifying a .migrate_folio function in its address_space_operations), when dirty pages are migrated, they are written back during the "move" step. Then, when the "move" step is retried, fallback_migrate_folio calls into migrate_folio, which in turn calls into migrate_folio_extra, and migrate_folio_extra BUG's out due to the page still being under writeback.

Note: The kernel page migration code will actually wait for any previously-started writeback during the "unmap" step, which could explain why unbatched retries didn't hit this problem prior to kernel 6.3.0. For example, in 6.2.16, unmap_and_move will wait for writeback to complete each time it retries migration on a page (such as would be the case if fallback_migrate_folio called writeout for the page). In contrast, in 6.3.1, since only the "move" step is repeated on move-related retries, the kernel doesn't wait for writeback operations started by fallback_migrate_folio.

A previous attempt at fixing this problem was merged as part of PR #16568, but this turned out not to be a complete fix. @RodoMa92 re-raised the problem and explained that, while the problem occurred less frequently, it still occurred approximately once per 20 attempts.

Description

Rather than ZFS using the default fallback_migrate_folio function as a result of not specifying a .migrate_folio function, this PR causes it to use the simple folio migration function (migrate_folio) instead. Notably, migrate_folio (since it's intended for filesystems that don't use buffers / private data) doesn't start writeback like fallback_migrate_folio does, and therefore migrate_folio_extra wouldn't BUG() on pages still under writeback, even without us setting PagePrivate on every page.

Additionally, this PR reverts the changes merged as part of PR #16568, as these changes are no longer necessary when using migrate_folio instead of fallback_migrate_folio.

A side effect of this change could be that ZFS page migration gets noticeably faster and/or more effective (as it would no longer be skipping and/or starting writeback on all dirty pages being migrated).

How Has This Been Tested?

Tested by user @RodoMa92 - results in the following comments on #16568:

Also, regression-tested by running ZFS Test Suite (on Ubunutu 23.10, running kernel version 6.5.0-44-generic. No crashes were observed, though several tests were skipped/failed/killed. I don't expect these errors were caused by my changes but would defer to your expertise if you have any concerns:

zfs-tests - 47700cf35384909b88fb70893a9395dc28e4344d.txt

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Performance enhancement (non-breaking change which improves efficiency)
Code cleanup (non-breaking change which makes code smaller or more readable)
Breaking change (fix or feature that would cause existing functionality to change)
Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
Documentation (a change to man pages or other documentation)

Checklist:

My code follows the OpenZFS code style requirements.
I have updated the documentation accordingly.
I have read the contributing document.
I have added tests to cover my changes.
I have run the ZFS Test Suite with this change applied.
All commit messages are properly formatted and contain Signed-off-by.

This reverts commit b052035. Signed-off-by: tstabrawa <[email protected]>

Avoids using fallback_migrate_folio, which starts unnecessary writeback (leading to BUG in migrate_folio_extra). Signed-off-by: tstabrawa <[email protected]>

tstabrawa · 2024-11-05T15:59:58Z

Looks like I had a typo in the code for older kernel APIs (.migrate_page should be .migratepage). Amending my commit to fix.

diff --git a/module/os/linux/zfs/zpl_file.c b/module/os/linux/zfs/zpl_file.c
index f7ae381..f6e0143 100644
--- a/module/os/linux/zfs/zpl_file.c
+++ b/module/os/linux/zfs/zpl_file.c
@@ -1094,7 +1094,7 @@ const struct address_space_operations zpl_address_space_operations = {
 #ifdef HAVE_VFS_MIGRATE_FOLIO
 	.migrate_folio	= migrate_folio,
 #else
-	.migrate_page	= migrate_page,
+	.migratepage	= migrate_page,
 #endif
 };

behlendorf

Thanks for running this down so quickly. This is a much cleaner fix and clearly the intended way to handle this.

Avoids using fallback_migrate_folio, which starts unnecessary writeback (leading to BUG in migrate_folio_extra). Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Brian Atkinson <[email protected]> Signed-off-by: tstabrawa <[email protected]> Closes #16568 Closes #16723

This reverts commit b052035. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Brian Atkinson <[email protected]> Signed-off-by: tstabrawa <[email protected]> Closes openzfs#16568 Closes openzfs#16723

Avoids using fallback_migrate_folio, which starts unnecessary writeback (leading to BUG in migrate_folio_extra). Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Brian Atkinson <[email protected]> Signed-off-by: tstabrawa <[email protected]> Closes openzfs#16568 Closes openzfs#16723

RodoMa92 · 2024-11-09T19:16:44Z

Thanks for this, @tstabrawa. Still held up fine as of today. I feel much more confident now to say it's finally closed, and allow me again to use the amazing features of ZFS.

Again, thanks a lot :)

Marco

Avoids using fallback_migrate_folio, which starts unnecessary writeback (leading to BUG in migrate_folio_extra). Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Brian Atkinson <[email protected]> Signed-off-by: tstabrawa <[email protected]> Closes openzfs#16568 Closes openzfs#16723

Revert "Avoid BUG in migrate_folio_extra"

669e3e6

This reverts commit b052035. Signed-off-by: tstabrawa <[email protected]>

AllKind mentioned this pull request Nov 5, 2024

zfs-2.2.7 patchset #16720

Open

13 tasks

Use simple folio migration function

e784c6d

Avoids using fallback_migrate_folio, which starts unnecessary writeback (leading to BUG in migrate_folio_extra). Signed-off-by: tstabrawa <[email protected]>

tstabrawa force-pushed the fix_15140_take2 branch from 47700cf to e784c6d Compare November 5, 2024 16:00

amotin added the Status: Code Review Needed Ready for review and testing label Nov 5, 2024

behlendorf requested a review from bwatkinson November 6, 2024 01:22

behlendorf approved these changes Nov 6, 2024

View reviewed changes

bwatkinson approved these changes Nov 6, 2024

View reviewed changes

tonyhutter approved these changes Nov 6, 2024

View reviewed changes

behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels Nov 6, 2024

behlendorf closed this in f38e2d2 Nov 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use simple folio migration function to avoid BUG in migrate_folio_extra #16723

Use simple folio migration function to avoid BUG in migrate_folio_extra #16723

tstabrawa commented Nov 5, 2024 •

edited

Loading

tstabrawa commented Nov 5, 2024

behlendorf left a comment

RodoMa92 commented Nov 9, 2024

Use simple folio migration function to avoid BUG in migrate_folio_extra #16723

Use simple folio migration function to avoid BUG in migrate_folio_extra #16723

Conversation

tstabrawa commented Nov 5, 2024 • edited Loading

Motivation and Context

Description

How Has This Been Tested?

Types of changes

Checklist:

tstabrawa commented Nov 5, 2024

behlendorf left a comment

Choose a reason for hiding this comment

RodoMa92 commented Nov 9, 2024

tstabrawa commented Nov 5, 2024 •

edited

Loading