Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid BUG in migrate_folio_extra #16568

Merged
merged 1 commit into from
Sep 26, 2024
Merged

Conversation

tstabrawa
Copy link
Contributor

Linux page migration code won't wait for writeback to complete unless it needs to call release_folio. Call SetPagePrivate wherever PageUptodate is set and define .release_folio, to cause fallback_migrate_folio to wait for us.

Motivation and Context

Thanks for considering this PR.

I came across issue #15140 from the Proxmox VE 8.1 release notes, and gave it a good long look over. As far as I can tell, what's happening is that the Linux kernel page migration code is starting writeback on some pages, not waiting for writeback to complete, and then throwing a BUG when it finds that pages are still under writeback.

Pretty much all of the interesting action happens in fallback_migrate_folio(), which doesn't show up in the stack traces listed in #15140, but suffice it to say that it's called from move_to_new_folio(), which does appear in the stack traces. What appears to be happening in the case of the crashes described in #15140 is that fallback_migrate_folio() is being called upon dirty ZFS page-cache pages, so it's starting writeback by calling writeout(). Then, since ZFS doesn't store private data in any page cache pages, it skips the call to filemap_release_folio() (because folio_test_private() returns false), and immediately calls migrate_folio(), which in turn calls migrate_folio_extra(). Then, at the beginning of migrate_folio_extra(), it BUGs out because the page is still under writeback (folio_test_writeback() returns true).

Notably, if the page did have private data, then fallback_migrate_folio() would call into filemap_release_folio(), which would return false for pages under writeback, causing fallback_migrate_folio() to exit before calling migrate_folio().

So, in summary, in order for the BUG to happen a few things need to be true:

I went through the code for all of the filesystems in the Linux kernel and didn't see any that met all three conditions. Notably, pretty much all traditional filesystems store buffers in page private data. Those filesystems that don't store buffers either store something else in page_private (e.g. shmem/tmpfs, iomap), or don't do asynchronous writeback (e.g. ecryptfs, fuse, romfs, squashfs). So it would appear as if ZFS may be the only filesystem that experiences this particular behavior. As far as I can tell, the above-described behavior goes back all the way to when page migration was first implemented in kernel 2.6.16.

The way I see it, there are two ways to make the problem go away:

  • Change the Linux kernel so that fallback_migrate_folio() won't call migrate_folio() if the page is under writeback, even for pages without private data.
  • Change ZFS so that it stores some private data (or at least indicates as if it does)

I assume the latter may be preferable (even if only temporarily) so that ZFS can avoid this crash for any/all kernel versions, but I'm happy to defer to the ZFS devs on which option(s) you choose to pursue.

The latter is the approach I took in the patch proposed here.

Description

Call SetPagePrivate wherever PageUptodate is set and define .release_folio, to cause fallback_migrate_folio to wait for writeback to complete.

How Has This Been Tested?

Tested by user @JKDingwall - results in the following comments on #15140:

Also, regression-tested by running ZFS Test Suite (on Ubunutu 23.10, running kernel version 6.5.0-35-generic. No new test failures were observed. See attached files:

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

Linux page migration code won't wait for writeback to complete unless
it needs to call release_folio.  Call SetPagePrivate wherever
PageUptodate is set and define .release_folio, to cause
fallback_migrate_folio to wait for us.

Signed-off-by: tstabrawa <[email protected]>
@behlendorf behlendorf added the Status: Code Review Needed Ready for review and testing label Sep 25, 2024
Copy link
Contributor

@behlendorf behlendorf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tstabrawa thanks for digging to to this and the clear analysis of the bug! I agree, setting PagePrivate() and registering the .releasepage and .invalidatepage sure does look like the best way to handle this. Plus it has the advantage of aligning the ZFS code a bit more closely with the other kernel filesystems which we can hope will help us avoid this code of bug in the future.

Interestingly, I see that private page check in fallback_migrate_folio() was already dropped by torvalds/linux@0201ebf for the 6.5 kernel so filemap_release_folio() is now called unconditionally.

Unrelated to the fix it looks like the CI had an infrastructure glitch cloning the repository on some of the builders. I'll go ahead and resubmit those builds once the running tests wrap up.

@tstabrawa
Copy link
Contributor Author

Thanks for the quick PR approval!

Interestingly, I see that private page check in fallback_migrate_folio() was already dropped by torvalds/linux@0201ebf for the 6.5 kernel so filemap_release_folio() is now called unconditionally.

I don't think it matters RE: this pull request, but it may be worth being aware of anyway. The patch you reference didn't really remove the private-page check; it just moves it to folio_needs_release, which is now called by filemap_release_folio. So, the BUG should still be reproducible in 6.5 kernels as well (and was reproduced on 6.5.1 by @RodoMa92).

@behlendorf behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels Sep 26, 2024
@behlendorf behlendorf merged commit b052035 into openzfs:master Sep 26, 2024
20 checks passed
@tstabrawa tstabrawa deleted the fix_15140 branch September 26, 2024 16:17
@tstabrawa
Copy link
Contributor Author

Thanks again for the PR approval and merge! Let me know if you run into further trouble related to this problem and/or change.

@RodoMa92
Copy link

RodoMa92 commented Oct 30, 2024

Yeah, sadly this hasn't fixed anything on my end. Just switched back to ZFS yesterday. Same issue occurs. Stacktrace in the original issue that will need to be reopened I guess.

@behlendorf
Copy link
Contributor

@RodoMa92 what version of ZFS and kernel did you manage to hit this with again?

@tstabrawa
Copy link
Contributor Author

tstabrawa commented Nov 1, 2024

@behlendorf From this comment on #15140, it sounds like @RodoMa92 was using ZFS 6.10.14 and kernel 6.10.14-273-tkg-eevdf.

@RodoMa92
Copy link

RodoMa92 commented Nov 1, 2024

@behlendorf From this comment on #15140, it sounds like @RodoMa92 was using ZFS 6.10.14 and kernel 6.10.14-273-tkg-eevdf.

ZFS-dkms 2.2.6, and both latest LTS + latest supported kernel.

@RodoMa92
Copy link

RodoMa92 commented Nov 2, 2024

I can easily reproduce this, so if any of you have any idea on what might also trigger this feel free to feed me patches to test. I should be able to quickly test this, I seriously want this fixed, but after taking a deeper look at it a couple of days ago I might not be able to do it by myself.

@RodoMa92
Copy link

RodoMa92 commented Nov 3, 2024

Oh, god damn it, I checked the module sources and this change is not there? I'll triple check my module install then now, but I'm not understanding how can I be to 2.2.6 and not have this patchset included in, unless it's not merged on stable yet.

@RodoMa92
Copy link

RodoMa92 commented Nov 3, 2024

Yep, wasn't clear for me that this change wasn't included in 2.2.6. Updated to git, checked that the change was actually there this time and now my VMs works 100% without any crashes.

Sorry for the noise, but now I can also confirm that's fixed.

Thanks a lot for this! :)

@RodoMa92
Copy link

RodoMa92 commented Nov 3, 2024

Spoke too soon, seems that zvols are still affected. I'll revert back to qcow2 and test a lot more with that to see if it's just zvols that cause this now or if it is just luck of the draw with this patch enabled.

@RodoMa92
Copy link

RodoMa92 commented Nov 3, 2024

I might have a clue on what is going on.
Since I could still reproduce the same issue sometimes, I have the suspicion that this is the correct approach, but it introduces a race condition since SetPageuptodate() is called before setting the private page flags.
I'm testing now the same identical patch, but with the private check moved before the Set call to the kernel.

For now I don't seem to be able to reproduce it anymore, I might write a small script to exercise this automatically for some time to triple check if this is my remaining issue.

@RodoMa92
Copy link

RodoMa92 commented Nov 3, 2024

Nah, it's still present. It's just harder to hit now, but I can still reproduce the original bug every 20 tries on my end. @behlendorf If you can reopen #15140, that would be lovely. This hasn't completely fixed the issue, sadly.

@tstabrawa
Copy link
Contributor Author

@RodoMa92 Thanks for your persistence (and patience!) in helping get to the bottom of this problem. I think I was able to identify one more way that PagePrivate can be unset on some pages, which could explain why you're still seeing the problem (though less frequently). Ironically, it's actually on previously-migrated pages, as the default page migration handler doesn't transfer PagePrivate flags to destination pages.

Upon looking into writing a .migrate_folio function that would transfer the PagePrivate flag, I may have identified a much simpler way to address this problem. That is, rather than ZFS using the default fallback_migrate_folio function as a result of not specifying a .migrate_folio function, it could use the simple folio migration function (migrate_folio) instead. Notably, migrate_folio (since it's intended for filesystems that don't use buffers / private data) doesn't start writeback like fallback_migrate_folio does, and therefore migrate_folio_extra wouldn't BUG() on pages still under writeback, even without us setting PagePrivate on every page.

Note: The kernel page migration code will actually wait for any previously-started writeback during the "unmap" step, which could explain why unbatched retries didn't hit this problem prior to kernel 6.3.0. For example, in 6.2.16, unmap_and_move will wait for writeback to complete each time it retries migration on a page (such as would be the case if fallback_migrate_folio called writeout for the page). In contrast, in 6.3.1, since only the "move" step is repeated on move-related retries, the kernel doesn't wait for writeback operations started by fallback_migrate_folio.

A side effect of my new patch could be that ZFS page migration gets noticeably faster and/or more effective (as it would no longer be skipping and/or starting writeback on all dirty pages being migrated).

When you have the chance, would you please try the new patch (commit 47700cf on my fix_15140_take2 branch) and let us know if the problem remains? If you'd like to be sure you're running the correct patch, you could follow the instructions in my May 31 comment on #15140 (except, replace the old fix_15150 branch name with fix_15140_take2). In this case, you would expect the output of git describe to be zfs-2.3.99-52-g47700cf35.

@RodoMa92
Copy link

RodoMa92 commented Nov 4, 2024

@RodoMa92 Thanks for your persistence (and patience!) in helping get to the bottom of this problem. I think I was able to identify one more way that PagePrivate can be unset on some pages, which could explain why you're still seeing the problem (though less frequently). Ironically, it's actually on previously-migrated pages, as the default page migration handler doesn't transfer PagePrivate flags to destination pages.

Upon looking into writing a .migrate_folio function that would transfer the PagePrivate flag, I may have identified a much simpler way to address this problem. That is, rather than ZFS using the default fallback_migrate_folio function as a result of not specifying a .migrate_folio function, it could use the simple folio migration function (migrate_folio) instead. Notably, migrate_folio (since it's intended for filesystems that don't use buffers / private data) doesn't start writeback like fallback_migrate_folio does, and therefore migrate_folio_extra wouldn't BUG() on pages still under writeback, even without us setting PagePrivate on every page.

Note: The kernel page migration code will actually wait for any previously-started writeback during the "unmap" step, which could explain why unbatched retries didn't hit this problem prior to kernel 6.3.0. For example, in 6.2.16, unmap_and_move will wait for writeback to complete each time it retries migration on a page (such as would be the case if fallback_migrate_folio called writeout for the page). In contrast, in 6.3.1, since only the "move" step is repeated on move-related retries, the kernel doesn't wait for writeback operations started by fallback_migrate_folio.

A side effect of my new patch could be that ZFS page migration gets noticeably faster and/or more effective (as it would no longer be skipping and/or starting writeback on all dirty pages being migrated).

When you have the chance, would you please try the new patch (commit 47700cf on my fix_15140_take2 branch) and let us know if the problem remains? If you'd like to be sure you're running the correct patch, you could follow the instructions in my May 31 comment on #15140 (except, replace the old fix_15150 branch name with fix_15140_take2). In this case, you would expect the output of git describe to be zfs-2.3.99-52-g47700cf35.

Loaded half an hour ago on my system and for now it seems to work as your initial patch, no crash yet. Haven't seen an huge difference in the times that it takes to bootup (maybe a second less on dropping caches?), but not having any kernel oops is far more important for me :P

I'll keep you updated, but as an initial impression it looks quite good :)

@RodoMa92
Copy link

RodoMa92 commented Nov 4, 2024

After 20 runs I still haven't been able to reproduce it yet. I'll probably test it further, but to me this looks like a proper workaround for my original bug.

Thanks a lot for digging and fixing this issue!

Marco

@tstabrawa
Copy link
Contributor Author

After 20 runs I still haven't been able to reproduce it yet. I'll probably test it further, but to me this looks like a proper workaround for my original bug.

Thanks a lot for digging and fixing this issue!

Thanks for testing the proposed fix. Assuming I don't hear of any further problems, I plan to file a new PR later on tonight.

@behlendorf
Copy link
Contributor

@tstabrawa I'm looking forward to the PR. That is a much cleaner fix.

behlendorf pushed a commit that referenced this pull request Nov 6, 2024
This reverts commit b052035.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
Reviewed-by: Brian Atkinson <[email protected]>
Signed-off-by: tstabrawa <[email protected]>
Closes #16568
Closes #16723
behlendorf pushed a commit that referenced this pull request Nov 6, 2024
Avoids using fallback_migrate_folio, which starts unnecessary writeback
(leading to BUG in migrate_folio_extra).

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
Reviewed-by: Brian Atkinson <[email protected]>
Signed-off-by: tstabrawa <[email protected]>
Closes #16568
Closes #16723
behlendorf pushed a commit to behlendorf/zfs that referenced this pull request Nov 6, 2024
This reverts commit b052035.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
Reviewed-by: Brian Atkinson <[email protected]>
Signed-off-by: tstabrawa <[email protected]>
Closes openzfs#16568
Closes openzfs#16723
behlendorf pushed a commit to behlendorf/zfs that referenced this pull request Nov 6, 2024
Avoids using fallback_migrate_folio, which starts unnecessary writeback
(leading to BUG in migrate_folio_extra).

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
Reviewed-by: Brian Atkinson <[email protected]>
Signed-off-by: tstabrawa <[email protected]>
Closes openzfs#16568
Closes openzfs#16723
ixhamza pushed a commit to truenas/zfs that referenced this pull request Nov 11, 2024
Avoids using fallback_migrate_folio, which starts unnecessary writeback
(leading to BUG in migrate_folio_extra).

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
Reviewed-by: Brian Atkinson <[email protected]>
Signed-off-by: tstabrawa <[email protected]>
Closes openzfs#16568
Closes openzfs#16723
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Accepted Ready to integrate (reviewed, tested)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants