-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use simple folio migration function to avoid BUG in migrate_folio_extra #16723
Conversation
This reverts commit b052035. Signed-off-by: tstabrawa <[email protected]>
Avoids using fallback_migrate_folio, which starts unnecessary writeback (leading to BUG in migrate_folio_extra). Signed-off-by: tstabrawa <[email protected]>
Looks like I had a typo in the code for older kernel APIs ( diff --git a/module/os/linux/zfs/zpl_file.c b/module/os/linux/zfs/zpl_file.c
index f7ae381..f6e0143 100644
--- a/module/os/linux/zfs/zpl_file.c
+++ b/module/os/linux/zfs/zpl_file.c
@@ -1094,7 +1094,7 @@ const struct address_space_operations zpl_address_space_operations = {
#ifdef HAVE_VFS_MIGRATE_FOLIO
.migrate_folio = migrate_folio,
#else
- .migrate_page = migrate_page,
+ .migratepage = migrate_page,
#endif
}; |
47700cf
to
e784c6d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for running this down so quickly. This is a much cleaner fix and clearly the intended way to handle this.
Avoids using fallback_migrate_folio, which starts unnecessary writeback (leading to BUG in migrate_folio_extra). Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Brian Atkinson <[email protected]> Signed-off-by: tstabrawa <[email protected]> Closes #16568 Closes #16723
This reverts commit b052035. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Brian Atkinson <[email protected]> Signed-off-by: tstabrawa <[email protected]> Closes openzfs#16568 Closes openzfs#16723
Avoids using fallback_migrate_folio, which starts unnecessary writeback (leading to BUG in migrate_folio_extra). Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Brian Atkinson <[email protected]> Signed-off-by: tstabrawa <[email protected]> Closes openzfs#16568 Closes openzfs#16723
Thanks for this, @tstabrawa. Still held up fine as of today. I feel much more confident now to say it's finally closed, and allow me again to use the amazing features of ZFS. Again, thanks a lot :) Marco |
Avoids using fallback_migrate_folio, which starts unnecessary writeback (leading to BUG in migrate_folio_extra). Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Brian Atkinson <[email protected]> Signed-off-by: tstabrawa <[email protected]> Closes openzfs#16568 Closes openzfs#16723
Motivation and Context
Starting in kernel version 6.3.0, Linux page migration is done in a batched manner, where several pages are first unmapped, and then the pages moved. See @RodoMa92's bisection test results in #15140 for details on the kernel change. As a result of this change, and due to ZFS's implicit use of
fallback_migrate_folio
(by not specifying a.migrate_folio
function in itsaddress_space_operations
), when dirty pages are migrated, they are written back during the "move" step. Then, when the "move" step is retried,fallback_migrate_folio
calls into migrate_folio, which in turn calls into migrate_folio_extra, andmigrate_folio_extra
BUG's out due to the page still being under writeback.Note: The kernel page migration code will actually wait for any previously-started writeback during the "unmap" step, which could explain why unbatched retries didn't hit this problem prior to kernel 6.3.0. For example, in 6.2.16, unmap_and_move will wait for writeback to complete each time it retries migration on a page (such as would be the case if
fallback_migrate_folio
calledwriteout
for the page). In contrast, in 6.3.1, since only the "move" step is repeated on move-related retries, the kernel doesn't wait for writeback operations started byfallback_migrate_folio
.A previous attempt at fixing this problem was merged as part of PR #16568, but this turned out not to be a complete fix. @RodoMa92 re-raised the problem and explained that, while the problem occurred less frequently, it still occurred approximately once per 20 attempts.
Description
Rather than ZFS using the default
fallback_migrate_folio
function as a result of not specifying a.migrate_folio
function, this PR causes it to use the simple folio migration function (migrate_folio
) instead. Notably,migrate_folio
(since it's intended for filesystems that don't use buffers / private data) doesn't start writeback likefallback_migrate_folio
does, and thereforemigrate_folio_extra
wouldn'tBUG()
on pages still under writeback, even without us settingPagePrivate
on every page.Additionally, this PR reverts the changes merged as part of PR #16568, as these changes are no longer necessary when using
migrate_folio
instead offallback_migrate_folio
.A side effect of this change could be that ZFS page migration gets noticeably faster and/or more effective (as it would no longer be skipping and/or starting writeback on all dirty pages being migrated).
How Has This Been Tested?
Tested by user @RodoMa92 - results in the following comments on #16568:
Also, regression-tested by running ZFS Test Suite (on Ubunutu 23.10, running kernel version 6.5.0-44-generic. No crashes were observed, though several tests were skipped/failed/killed. I don't expect these errors were caused by my changes but would defer to your expertise if you have any concerns:
Types of changes
Checklist:
Signed-off-by
.