-
Notifications
You must be signed in to change notification settings - Fork 822
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RCU stalls on copy_to_user actions in kernel #11274
Comments
Please refer to other issues for logs facing this problem, and a thread in the RCU kernel discussion that suggests in the direction of improper fault handling of the |
An update after some more rounds of patch Tuesdays and more WSL pre-releases. After updating to WSL 2.2.2 today the stack traces changed somewhat. Now instead of the previous
The symptoms are still exactly the same as before, however, and just as crippling. As you can see, I am not running the WSL stock kernel but a custom 6.7.7 kernel built with
|
And after another patch Tuesday upgrade I'm back to getting the same old
@craigloewen-msft, @benhillis and @pmartincic, can any of you please provide any update on this (even if it is just to say that this won't be getting attention any time soon)? Has the bug fix mentioned in #10667 (comment) been released? Have you hade a chance to look at the logs that has been provided in all the Github issues regarding this problem? |
@maxboone: Do you see the same behavior with the stock WSL kernel ? If so can you share /logs of a repro ? |
Yes, please refer to other issues that got stale, there are logs in there.
I will collect logs again over the weekend. An exact repro is hard as this just happens over time. |
Here are some logs for you, @OneBlue. As @maxboone says, it's really hard to collect logs. Not because it's hard to reproduce the problem (I can do it within one minute; see #10667 for a detailed description of my use case) but because it isn't a "specific event" but rather a case of intermittent stalls and slowdowns that increasingly gets worse over time (for example, I get a bunch of I have, however, tried to collect logs at three points of slowdown/hang. I really hope they help!
EDIT: This is done using the WSL2 pre-release (but not with any custom kernel) PS C:\Users\DavidNordvall> wsl --version
WSL-version: 2.2.2.0
Kernelversion: 5.15.150.1-2
WSLg-version: 1.0.61
MSRDC-version: 1.2.5105
Direct3D-version: 1.611.1-81528511
DXCore-version: 10.0.25131.1002-220531-1700.rs-onecore-base2-hyp
Windows-version: 10.0.22631.3447 david@DavidsSrfcPro9:~$ uname -a
Linux DavidsSrfcPro9 5.15.150.1-microsoft-standard-WSL2 #1 SMP Thu Mar 7 03:23:44 UTC 2024 aarch64 aarch64 aarch64 GNU/Linux |
@OneBlue can you confirm these logs (from @david-nordvall) are sufficient to continue? |
I recently encountered the RCU stall again, but the stack trace looks different. Here it is:
The kernel is the latest official one compiled with RSEQ disabled. |
@OneBlue, just curious if you've had the chance to have a look at the logs I provided? Are they any help? |
@OneBlue any update? |
Still happening with kernel v6.9 and wsl version 2.2.4.0:
|
@OneBlue @pmartincic with the announcement of the new surface copilot pc series, any update? |
Not to mention with the release of Docker Desktop for Windows on ARM. I'm traveling and won't be able to test the official Docker Desktop release on my Surface Pro 9 5G for a couple of weeks. Has anyone else tested it? |
Still happening on kernel 6.9.5 under WSL version: 2.2.4.0:
|
@jhovold does this problem look like anything you've come across building the kernel for the ThinkPads with SQ3? |
@pmartincic @OneBlue checking in |
@kelsey-steele tagging as releaser of kernel sources - is there anything related to this on the roadmap or in the current out-of-tree? |
I was plagued by extremely frequent rcu stalls resulting in high CPU usage on Win11 23H2 arm64 (lenovo x13s), but after updating to 24H2 (26100.1150) the problem has completely disappeared. WSL is finally usable on this machine now. WSL/wslg/kernel versions appear the same as on 23H3, so maybe some underlying OS/Hyper-V bugs were fixed?
|
I'm starting to believe this is true! (Too many times I've been told it was fixed when it wasn't) I upgraded to 24H2 and am now able to use WSL for at least a few hours and have seen no instability at all. Could it be true that it's FINALLY fixed!!!??? I'm happy as a clam now with my Robo & Kala tablet. |
How are you able to install 24H2? Have you joined an insider channel (Release Preview?)? I'd rather not since my Surface Pro 9 is my daily driver. Rally encouraging news either way! |
I'm going to update today and will report back, I hope it'll work! @craigloewen-msft @benhillis could you confirm something's in 23H2 that would fix this? |
* Refer: microsoft/WSL#11274 Signed-off-by: Yang Jeong Hun <[email protected]>
Yes, indeed you do have to follow Release Preview. For what it's worth, the language used to describe what that means makes it sound like a very low-risk channel to follow. |
Same on Surface Pro X SQ2, since the 24H2 update I haven't run into any RSEQ / RCU related stalls. It's a shame that I can't mount the Hyper-V disks to switch back to WSL, but that works fine through sudo -i
modprobe nbd max_part=16
qemu-nbd -c /dev/nbd0 /mnt/c/ProgramData/Microsoft/Windows/Virtual\ Hard\ Disks/ubuntu0.vhdx
partprobe /dev/nbd0
mount /dev/ubuntu-vg/ubuntu-lv /mnt/ubuntu0 I'll keep monitoring whether the stalls really stay away, but it looks like this issue has been fixed! |
After running the 24H2 update for two days, I can gladly say that I am not running into these stalls anymore. Closing this issue. |
@maxboone what's the availability of 24H2? latest preview channel update I'm able to pull is 2024-07 Cumulative Preview for 23H2 |
After I pulled that one I got the update for 24H2 over the Release Preview. |
Windows Version
Microsoft Windows [Version 10.0.22631.3155]
WSL Version
2.1.1.0
Are you using WSL 1 or WSL 2?
Kernel Version
5.15.146.1-2
Distro Version
Ubuntu 22.04
Other Software
No response
Repro Steps
Expected Behavior
No RCU stalls, copy_to_user actions fault correctly in the kernel and free up the CPU again
Actual Behavior
System locks up.
Diagnostic Logs
When the kernel is built with
CONFIG_RSEQ
:When the kernel is built without
CONFIG_RSEQ
(performance and time till stall is significantly better):And
The text was updated successfully, but these errors were encountered: