RCU stalls on copy_to_user actions in kernel #11274

maxboone · 2024-03-08T09:58:57Z

Windows Version

Microsoft Windows [Version 10.0.22631.3155]

WSL Version

2.1.1.0

Are you using WSL 1 or WSL 2?

WSL 2
WSL 1

Kernel Version

5.15.146.1-2

Distro Version

Ubuntu 22.04

Other Software

No response

Repro Steps

Start WSL on Linux
Work on the system for a while
It'll start dumping RCU stalls in the kernel logs

Expected Behavior

No RCU stalls, copy_to_user actions fault correctly in the kernel and free up the CPU again

Actual Behavior

System locks up.

Diagnostic Logs

When the kernel is built with CONFIG_RSEQ:

[ 1301.906019] rcu: INFO: rcu_sched self-detected stall on CPU
[ 1301.907897] rcu:     1-....: (104816 ticks this GP) idle=9b9/1/0x4000000000000002 softirq=5231/5231 fqs=51500
[ 1301.909720]  (t=105011 jiffies g=8973 q=3510)
[ 1301.910689] Task dump for CPU 1:
[ 1301.911380] task:weston          state:R  running task     stack:    0 pid:  137 ppid:   133 flags:0x0000000a
[ 1301.913333] Call trace:
[ 1301.913785]  dump_backtrace+0x0/0x1b0
[ 1301.914694]  show_stack+0x1c/0x24
[ 1301.915494]  sched_show_task+0x164/0x190
[ 1301.916105]  dump_cpu_task+0x48/0x54
[ 1301.916837]  rcu_dump_cpu_stacks+0xec/0x130
[ 1301.917522]  rcu_sched_clock_irq+0x908/0xa40
[ 1301.918803]  update_process_times+0xa0/0x190
[ 1301.919619]  tick_sched_timer+0x5c/0xd0
[ 1301.920287]  __hrtimer_run_queues+0x140/0x32c
[ 1301.921390]  hrtimer_interrupt+0xf4/0x240
[ 1301.922079]  hv_stimer0_isr+0x28/0x30
[ 1301.922699]  hv_stimer0_percpu_isr+0x14/0x20
[ 1301.923514]  handle_percpu_devid_irq+0x8c/0x1b4
[ 1301.924360]  handle_domain_irq+0x64/0x90
[ 1301.924952]  gic_handle_irq+0x58/0x128
[ 1301.925609]  call_on_irq_stack+0x20/0x38
[ 1301.926268]  do_interrupt_handler+0x54/0x5c
[ 1301.926894]  el1_interrupt+0x2c/0x4c
[ 1301.927543]  el1h_64_irq_handler+0x14/0x20
[ 1301.928252]  el1h_64_irq+0x74/0x78
[ 1301.928911]  clear_rseq_cs.isra.0+0x4c/0x60
[ 1301.929527]  do_notify_resume+0xf8/0xeb0
[ 1301.931746]  el0_svc+0x3c/0x50
[ 1301.932741]  el0t_64_sync_handler+0x9c/0x120
[ 1301.933676]  el0t_64_sync+0x158/0x15c
[ 1468.089387] rcu: INFO: rcu_sched self-detected stall on CPU
[ 1468.090681] rcu:     1-....: (149813 ticks this GP) idle=9b9/1/0x4000000000000002 softirq=5231/5231 fqs=73593
[ 1468.092491]  (t=150019 jiffies g=8973 q=4968)
[ 1468.093259] Task dump for CPU 1:
[ 1468.093982] task:weston          state:R  running task     stack:    0 pid:  137 ppid:   133 flags:0x0000000a
[ 1468.095831] Call trace:
[ 1468.096253]  dump_backtrace+0x0/0x1b0
[ 1468.097658]  show_stack+0x1c/0x24
[ 1468.098868]  sched_show_task+0x164/0x190
[ 1468.099487]  dump_cpu_task+0x48/0x54
[ 1468.100903]  rcu_dump_cpu_stacks+0xec/0x130
[ 1468.101611]  rcu_sched_clock_irq+0x908/0xa40
[ 1468.102838]  update_process_times+0xa0/0x190
[ 1468.103719]  tick_sched_timer+0x5c/0xd0
[ 1468.104406]  __hrtimer_run_queues+0x140/0x32c
[ 1468.105287]  hrtimer_interrupt+0xf4/0x240
[ 1468.105953]  hv_stimer0_isr+0x28/0x30
[ 1468.106588]  hv_stimer0_percpu_isr+0x14/0x20
[ 1468.107497]  handle_percpu_devid_irq+0x8c/0x1b4
[ 1468.109079]  handle_domain_irq+0x64/0x90
[ 1468.109684]  gic_handle_irq+0x58/0x128
[ 1468.110504]  call_on_irq_stack+0x20/0x38
[ 1468.111322]  do_interrupt_handler+0x54/0x5c
[ 1468.111945]  el1_interrupt+0x2c/0x4c
[ 1468.112568]  el1h_64_irq_handler+0x14/0x20
[ 1468.114268]  el1h_64_irq+0x74/0x78
[ 1468.115010]  clear_rseq_cs.isra.0+0x4c/0x60
[ 1468.115848]  do_notify_resume+0xf8/0xeb0
[ 1468.116700]  el0_svc+0x3c/0x50
[ 1468.117973]  el0t_64_sync_handler+0x9c/0x120
[ 1468.118875]  el0t_64_sync+0x158/0x15c
[ 1634.272774] rcu: INFO: rcu_sched self-detected stall on CPU
[ 1634.273960] rcu:     1-....: (194688 ticks this GP) idle=9b9/1/0x4000000000000002 softirq=5231/5231 fqs=95152
[ 1634.275462]  (t=195027 jiffies g=8973 q=6426)
[ 1634.276232] Task dump for CPU 1:
[ 1634.276900] task:weston          state:R  running task     stack:    0 pid:  137 ppid:   133 flags:0x0000000a
[ 1634.278492] Call trace:
[ 1634.278904]  dump_backtrace+0x0/0x1b0
[ 1634.279530]  show_stack+0x1c/0x24
[ 1634.280323]  sched_show_task+0x164/0x190
[ 1634.281025]  dump_cpu_task+0x48/0x54
[ 1634.281731]  rcu_dump_cpu_stacks+0xec/0x130
[ 1634.282321]  rcu_sched_clock_irq+0x908/0xa40
[ 1634.283189]  update_process_times+0xa0/0x190
[ 1634.284083]  tick_sched_timer+0x5c/0xd0
[ 1634.284730]  __hrtimer_run_queues+0x140/0x32c
[ 1634.285560]  hrtimer_interrupt+0xf4/0x240
[ 1634.286247]  hv_stimer0_isr+0x28/0x30
[ 1634.286878]  hv_stimer0_percpu_isr+0x14/0x20
[ 1634.287682]  handle_percpu_devid_irq+0x8c/0x1b4
[ 1634.288558]  handle_domain_irq+0x64/0x90
[ 1634.289195]  gic_handle_irq+0x58/0x128
[ 1634.289787]  call_on_irq_stack+0x20/0x38
[ 1634.290395]  do_interrupt_handler+0x54/0x5c
[ 1634.291090]  el1_interrupt+0x2c/0x4c
[ 1634.291835]  el1h_64_irq_handler+0x14/0x20
[ 1634.292476]  el1h_64_irq+0x74/0x78
[ 1634.293174]  clear_rseq_cs.isra.0+0x4c/0x60
[ 1634.293958]  do_notify_resume+0xf8/0xeb0
[ 1634.294543]  el0_svc+0x3c/0x50
[ 1634.295654]  el0t_64_sync_handler+0x9c/0x120
[ 1634.297186]  el0t_64_sync+0x158/0x15c
[ 1800.463929] rcu: INFO: rcu_sched self-detected stall on CPU
[ 1800.539655] rcu:     1-....: (238052 ticks this GP) idle=9b9/1/0x4000000000000002 softirq=5231/5231 fqs=115916
[ 1800.691505]  (t=240096 jiffies g=8973 q=7884)
[ 1800.692592] Task dump for CPU 1:
[ 1800.725924] task:weston          state:R  running task     stack:    0 pid:  137 ppid:   133 flags:0x0000000a
[ 1800.789528] Call trace:
[ 1800.791436]  dump_backtrace+0x0/0x1b0
[ 1800.899671]  show_stack+0x1c/0x24
[ 1800.913866]  sched_show_task+0x164/0x190
[ 1800.943585]  dump_cpu_task+0x48/0x54
[ 1800.960038]  rcu_dump_cpu_stacks+0xec/0x130
[ 1800.970630]  rcu_sched_clock_irq+0x908/0xa40
[ 1801.002321]  update_process_times+0xa0/0x190
[ 1801.010376]  tick_sched_timer+0x5c/0xd0
[ 1801.016018]  __hrtimer_run_queues+0x140/0x32c
[ 1801.039996]  hrtimer_interrupt+0xf4/0x240
[ 1801.132611]  hv_stimer0_isr+0x28/0x30
[ 1801.177669]  hv_stimer0_percpu_isr+0x14/0x20
[ 1801.233281]  handle_percpu_devid_irq+0x8c/0x1b4
[ 1801.234490]  handle_domain_irq+0x64/0x90
[ 1801.335334]  gic_handle_irq+0x58/0x128
[ 1801.336804]  call_on_irq_stack+0x20/0x38
[ 1801.338003]  do_interrupt_handler+0x54/0x5c
[ 1801.353157]  el1_interrupt+0x2c/0x4c
[ 1801.403627]  el1h_64_irq_handler+0x14/0x20
[ 1801.467756]  el1h_64_irq+0x74/0x78
[ 1801.491109]  clear_rseq_cs.isra.0+0x4c/0x60
[ 1801.519716]  do_notify_resume+0xf8/0xeb0
[ 1801.570939]  el0_svc+0x3c/0x50
[ 1801.587231]  el0t_64_sync_handler+0x9c/0x120
[ 1801.598718]  el0t_64_sync+0x158/0x15c
[ 1836.057995] hrtimer: interrupt took 44005651 ns
[ 1962.769498] Exception:
[ 1962.770161] Operation canceled @p9io.cpp:258 (AcceptAsync)
[ 1962.944713]
[ 1967.784952] rcu: INFO: rcu_sched self-detected stall on CPU
[ 1967.856279] rcu:     1-....: (276088 ticks this GP) idle=9b9/1/0x4000000000000002 softirq=5231/5231 fqs=132893
[ 1968.000378]  (t=285406 jiffies g=8973 q=9331)
[ 1968.103228] rcu: rcu_sched kthread starved for 1379 jiffies! g8973 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=2
[ 1968.320482] rcu:     Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior.
[ 1968.600515] rcu: RCU grace-period kthread stack dump:
[ 1968.683365] task:rcu_sched       state:R  running task     stack:    0 pid:   14 ppid:     2 flags:0x00000008
[ 1968.852308] Call trace:
[ 1968.888048]  __switch_to+0xb4/0xec
[ 1968.999212]  __schedule+0x2cc/0x800
[ 1969.061683]  schedule+0x64/0x100
[ 1969.144007]  schedule_timeout+0x9c/0x184
[ 1969.220361]  rcu_gp_fqs_loop+0x100/0x364
[ 1969.296220]  rcu_gp_kthread+0x108/0x140
[ 1969.388705]  kthread+0x124/0x130
[ 1969.468734]  ret_from_fork+0x10/0x20
[ 1969.557124] rcu: Stack dump where RCU GP kthread last ran:
[ 1969.689180] Task dump for CPU 2:
[ 1969.770725] task:weston          state:R  running task     stack:    0 pid:  145 ppid:   133 flags:0x0000000f
[ 1970.150921] Call trace:
[ 1970.246591]  __switch_to+0xb4/0xec
[ 1970.344106]  0xffff000002cb1d00
[ 1970.437906] Task dump for CPU 1:
[ 1970.528765] task:weston          state:R  running task     stack:    0 pid:  137 ppid:   133 flags:0x0000000b
[ 1971.000258] Call trace:
[ 1971.129089]  dump_backtrace+0x0/0x1b0
[ 1971.277596]  show_stack+0x1c/0x24
[ 1971.362667]  sched_show_task+0x164/0x190
[ 1971.434626]  dump_cpu_task+0x48/0x54
[ 1971.554021]  rcu_dump_cpu_stacks+0xec/0x130
[ 1971.691116]  rcu_sched_clock_irq+0x908/0xa40
[ 1971.826492]  update_process_times+0xa0/0x190
[ 1971.899642]  tick_sched_timer+0x5c/0xd0
[ 1971.964201]  __hrtimer_run_queues+0x140/0x32c
[ 1972.133174]  hrtimer_interrupt+0xf4/0x240
[ 1972.185289]  hv_stimer0_isr+0x28/0x30
[ 1972.278692]  hv_stimer0_percpu_isr+0x14/0x20
[ 1972.415058]  handle_percpu_devid_irq+0x8c/0x1b4
[ 1972.495565]  handle_domain_irq+0x64/0x90
[ 1972.577916]  gic_handle_irq+0x58/0x128
[ 1972.695828]  call_on_irq_stack+0x20/0x38
[ 1972.740457]  do_interrupt_handler+0x54/0x5c
[ 1972.783890]  el1_interrupt+0x2c/0x4c
[ 1972.887542]  el1h_64_irq_handler+0x14/0x20
[ 1972.939049]  el1h_64_irq+0x74/0x78
[ 1972.993798]  clear_rseq_cs.isra.0+0x4c/0x60
[ 1973.103571]  do_notify_resume+0xf8/0xeb0
[ 1973.162704]  el0_svc+0x3c/0x50
[ 1973.203463]  el0t_64_sync_handler+0x9c/0x120
[ 1973.282090]  el0t_64_sync+0x158/0x15c

When the kernel is built without CONFIG_RSEQ (performance and time till stall is significantly better):

[  675.812339] rcu: INFO: rcu_sched self-detected stall on CPU
[  675.814587] rcu:     3-....: (14893 ticks this GP) idle=762c/1/0x4000000000000000 softirq=6920/6920 fqs=6610
[  675.815606] rcu:     (t=15001 jiffies g=50497 q=1304 ncpus=8)
[  675.816520] CPU: 3 PID: 232 Comm: snapfuse Not tainted 6.7.7-WSL2-STABLE+ #2
[  675.816550] pstate: 20400005 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[  675.816553] pc : __arch_copy_to_user+0x1a0/0x240
[  675.817689] lr : _copy_to_iter+0xf0/0x560
[  675.818069] sp : ffff800082ceba80
[  675.818070] x29: ffff800082cebac0 x28: 0000000001b2c000 x27: 0000000000000005
[  675.818074] x26: 0000000000000000 x25: ffff00004c491000 x24: 0000000000000000
[  675.818076] x23: 0000000000001000 x22: 0000040000000000 x21: ffff800082cebd30
[  675.818079] x20: ffff800082cebd30 x19: 0000000000001000 x18: 0000000000000000
[  675.818081] x17: 0000000000000000 x16: 0000000000000000 x15: ffff00004c491000
[  675.818083] x14: 9887db4ae914c054 x13: 6bcd444ce14effe5 x12: 0b22b481c6001041
[  675.818086] x11: 7513c0250d7df247 x10: b85affa4063b12c7 x9 : 368beb85bc648557
[  675.818088] x8 : 217c88df9795370e x7 : a16d77942052b4ab x6 : 0000aaf844516fff
[  675.818090] x5 : 0000aaf844517e2f x4 : 0000000000000000 x3 : 0000000000003daf
[  675.818092] x2 : 0000000000000dc0 x1 : ffff00004c491210 x0 : 0000aaf844516e2f
[  675.818096] Call trace:
[  675.818143]  __arch_copy_to_user+0x1a0/0x240
[  675.818147]  copy_page_to_iter+0xbc/0x140
[  675.818150]  filemap_read+0x1b0/0x398
[  675.818427]  generic_file_read_iter+0x48/0x168
[  675.818429]  ext4_file_read_iter+0x58/0x288
[  675.818681]  vfs_read+0x1e8/0x280
[  675.818804]  ksys_pread64+0x90/0xf0
[  675.818806]  __arm64_sys_pread64+0x24/0x48
[  675.818807]  invoke_syscall.constprop.0+0x54/0x128
[  675.818912]  do_el0_svc+0x44/0xf0
[  675.818914]  el0_svc+0x24/0xb0
[  675.819041]  el0t_64_sync_handler+0x138/0x148
[  675.819043]  el0t_64_sync+0x14c/0x150
[  681.501178] block sda: the capability attribute has been deprecated.
[  741.700330] rcu: INFO: rcu_sched self-detected stall on CPU
[  741.701707] rcu:     4-....: (14940 ticks this GP) idle=8074/1/0x4000000000000000 softirq=13021/13037 fqs=6400
[  741.703152] rcu:     (t=15001 jiffies g=50713 q=5093 ncpus=8)
[  741.704017] CPU: 4 PID: 194 Comm: systemd-journal Not tainted 6.7.7-WSL2-STABLE+ #2
[  741.704047] pstate: 20400005 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[  741.704050] pc : __arch_copy_to_user+0x190/0x240
[  741.704424] lr : _copy_to_iter+0xf0/0x560
[  741.704565] sp : ffff800082cfb870
[  741.704566] x29: ffff800082cfb8b0 x28: ffff00000fe96b18 x27: 0000000000000085
[  741.704569] x26: 0000000000000000 x25: ffff0000061e1c00 x24: 0000000000000000
[  741.704571] x23: 0000000000000085 x22: ffff000117d2a600 x21: ffff800082cfbd90
[  741.704574] x20: ffff0000061e1c00 x19: 0000000000000085 x18: 0000000000000000
[  741.704608] x17: 0000000000000000 x16: 0000000000000000 x15: ffff0000061e1c00
[  741.704610] x14: 62616c6961766120 x13: 7365746164707520 x12: 6f6e207361682070
[  741.704612] x11: 616e73203a687365 x10: 7266657220746f6e x9 : 6e6163203a313937
[  741.704614] x8 : 3a6f672e73726570 x7 : 6c656865726f7473 x6 : 0000ab58c96fd6f0
[  741.704617] x5 : 0000ab58c96fd775 x4 : 0000000000000000 x3 : 0000000000000000
[  741.704619] x2 : 0000000000000005 x1 : ffff0000061e1c40 x0 : 0000ab58c96fd6f0
[  741.704621] Call trace:
[  741.704647]  __arch_copy_to_user+0x190/0x240
[  741.704651]  simple_copy_to_iter+0x48/0x98
[  741.704939]  __skb_datagram_iter+0x7c/0x280
[  741.704941]  skb_copy_datagram_iter+0x48/0xc8
[  741.704943]  unix_stream_read_actor+0x30/0x68
[  741.705137]  unix_stream_read_generic+0x304/0xb70
[  741.705139]  unix_stream_recvmsg+0xc0/0xd0
[  741.705140]  sock_recvmsg+0x88/0x108
[  741.705170]  ____sys_recvmsg+0x78/0x198
[  741.705171]  ___sys_recvmsg+0x80/0xf0
[  741.705173]  __sys_recvmsg+0x5c/0xd0
[  741.705175]  __arm64_sys_recvmsg+0x28/0x50
[  741.705177]  invoke_syscall.constprop.0+0x54/0x128
[  741.705316]  do_el0_svc+0xcc/0xf0
[  741.705317]  el0_svc+0x24/0xb0
[  741.705369]  el0t_64_sync_handler+0x138/0x148
[  741.705371]  el0t_64_sync+0x14c/0x150
[  743.232431] rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 4-.... } 15347 jiffies s: 721 root: 0x10/.
[  743.234297] rcu: blocking rcu_node structures (internal RCU debug):
[  743.235477] Sending NMI from CPU 1 to CPUs 4:
[  743.235491] NMI backtrace for cpu 4
[  743.235531] CPU: 4 PID: 194 Comm: systemd-journal Not tainted 6.7.7-WSL2-STABLE+ #2
[  743.235535] pstate: 20400005 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[  743.235537] pc : __arch_copy_to_user+0x190/0x240
[  743.235598] lr : _copy_to_iter+0xf0/0x560
[  743.235603] sp : ffff800082cfb870
[  743.235604] x29: ffff800082cfb8b0 x28: ffff00000fe96b18 x27: 0000000000000085
[  743.235607] x26: 0000000000000000 x25: ffff0000061e1c00 x24: 0000000000000000
[  743.235610] x23: 0000000000000085 x22: ffff000117d2a600 x21: ffff800082cfbd90
[  743.235612] x20: ffff0000061e1c00 x19: 0000000000000085 x18: 0000000000000000
[  743.235614] x17: 0000000000000000 x16: 0000000000000000 x15: ffff0000061e1c00
[  743.235617] x14: 62616c6961766120 x13: 7365746164707520 x12: 6f6e207361682070
[  743.235619] x11: 616e73203a687365 x10: 7266657220746f6e x9 : 6e6163203a313937
[  743.235621] x8 : 3a6f672e73726570 x7 : 6c656865726f7473 x6 : 0000ab58c96fd6f0
[  743.235623] x5 : 0000ab58c96fd775 x4 : 0000000000000000 x3 : 0000000000000000
[  743.235626] x2 : 0000000000000005 x1 : ffff0000061e1c40 x0 : 0000ab58c96fd6f0
[  743.235628] Call trace:
[  743.235630]  __arch_copy_to_user+0x190/0x240
[  743.235632]  simple_copy_to_iter+0x48/0x98
[  743.235636]  __skb_datagram_iter+0x7c/0x280
[  743.235639]  skb_copy_datagram_iter+0x48/0xc8
[  743.235641]  unix_stream_read_actor+0x30/0x68
[  743.235644]  unix_stream_read_generic+0x304/0xb70
[  743.235646]  unix_stream_recvmsg+0xc0/0xd0
[  743.235647]  sock_recvmsg+0x88/0x108
[  743.235650]  ____sys_recvmsg+0x78/0x198
[  743.235651]  ___sys_recvmsg+0x80/0xf0
[  743.235653]  __sys_recvmsg+0x5c/0xd0
[  743.235655]  __arm64_sys_recvmsg+0x28/0x50
[  743.235657]  invoke_syscall.constprop.0+0x54/0x128
[  743.235661]  do_el0_svc+0xcc/0xf0
[  743.235663]  el0_svc+0x24/0xb0
[  743.235667]  el0t_64_sync_handler+0x138/0x148
[  743.235668]  el0t_64_sync+0x14c/0x150

And

[ 1559.425979] rcu:     7-....: (14977 ticks this GP) idle=d4ec/1/0x4000000000000000 softirq=18636/18636 fqs=5263
[ 1559.431367] rcu:     (t=15002 jiffies g=67965 q=36939 ncpus=8)
[ 1559.432083] rcu: rcu_sched kthread starved for 2866 jiffies! g67965 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0
[ 1559.433645] rcu:     Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior.
[ 1559.434604] rcu: RCU grace-period kthread stack dump:
[ 1559.435549] rcu: Stack dump where RCU GP kthread last ran:
[ 1739.451891] rcu: INFO: rcu_sched self-detected stall on CPU
[ 1739.452511] rcu:     7-....: (59616 ticks this GP) idle=d4ec/1/0x4000000000000000 softirq=18636/18636 fqs=5263
[ 1739.453498] rcu:     (t=60008 jiffies g=67965 q=36939 ncpus=8)
[ 1739.454053] rcu: rcu_sched kthread starved for 47871 jiffies! g67965 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0
[ 1739.455135] rcu:     Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior.
[ 1739.456110] rcu: RCU grace-period kthread stack dump:
[ 1739.456687] rcu: Stack dump where RCU GP kthread last ran:
[ 1919.467822] rcu: INFO: rcu_sched self-detected stall on CPU
[ 1919.468776] rcu:     7-....: (104010 ticks this GP) idle=d4ec/1/0x4000000000000000 softirq=18636/18636 fqs=5263
[ 1919.470405] rcu:     (t=105012 jiffies g=67965 q=36941 ncpus=8)
[ 1919.472599] rcu: rcu_sched kthread starved for 92875 jiffies! g67965 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0
[ 1919.474013] rcu:     Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior.
[ 1919.474977] rcu: RCU grace-period kthread stack dump:
[ 1919.475641] rcu: Stack dump where RCU GP kthread last ran:

The text was updated successfully, but these errors were encountered:

maxboone · 2024-03-08T10:03:26Z

Please refer to other issues for logs facing this problem, and a thread in the RCU kernel discussion that suggests in the direction of improper fault handling of the copy_to_user. Also, note that this doesn't happen on regular Gen2 VMs in Hyper-V, but the CPU virtualization seems very different between a Hyper-V VM and WSL VM.

david-nordvall · 2024-04-08T13:20:49Z

An update after some more rounds of patch Tuesdays and more WSL pre-releases. After updating to WSL 2.2.2 today the stack traces changed somewhat. Now instead of the previous copy_to_user errors, I get this:

[ 2474.978293] rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 1-.... 4-.... 7-.... } 15058 jiffies s: 497 root: 0x92/.
[ 2474.980247] rcu: blocking rcu_node structures (internal RCU debug):
[ 2474.980704] Sending NMI from CPU 2 to CPUs 1:
[ 2474.980711] NMI backtrace for cpu 1
[ 2474.980715] CPU: 1 PID: 4179 Comm: docker Not tainted 6.7.7-WSL2-STABLE+ #2
[ 2474.980718] pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 2474.980721] pc : mm_release+0x48/0x110
[ 2474.980730] lr : mm_release+0x20/0x110
[ 2474.980732] sp : ffff800086733bb0
[ 2474.980733] x29: ffff800086733bb0 x28: ffff0000a81a0788 x27: 0000000000000009
[ 2474.980736] x26: 0000000000000000 x25: 0000ab760004f300 x24: 0000000000000008
[ 2474.980739] x23: 0000000000000000 x22: ffff000020828000 x21: ffff000000734840
[ 2474.980741] x20: ffff000000734840 x19: ffff0000a81a0000 x18: 0000000000000000
[ 2474.980744] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
[ 2474.980746] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
[ 2474.980748] x11: 0000000000000000 x10: 0000000000000000 x9 : ffff80008003b750
[ 2474.980749] x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000000000
[ 2474.980751] x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000000
[ 2474.980753] x2 : 0000ff29c740f1d0 x1 : 0000000000000000 x0 : 0000ff29c740f1d0
[ 2474.980755] Call trace:
[ 2474.980757]  mm_release+0x48/0x110
[ 2474.980759]  exit_mm_release+0x2c/0x50
[ 2474.980761]  do_exit+0x21c/0x9f0
[ 2474.980763]  do_group_exit+0x38/0xa0
[ 2474.980765]  get_signal+0x978/0x980
[ 2474.980767]  do_notify_resume+0x12c/0xe28
[ 2474.980770]  el0_svc+0x90/0xb0
[ 2474.980774]  el0t_64_sync_handler+0x138/0x148
[ 2474.980776]  el0t_64_sync+0x14c/0x150
.
.
.

The symptoms are still exactly the same as before, however, and just as crippling.

As you can see, I am not running the WSL stock kernel but a custom 6.7.7 kernel built with CONFIG_RSEQ.

PS C:\Users\DavidNordvall> wsl --version
WSL-version: 2.2.2.0
Kernelversion: 5.15.150.1-2
WSLg-version: 1.0.61
MSRDC-version: 1.2.5105
Direct3D-version: 1.611.1-81528511
DXCore-version: 10.0.25131.1002-220531-1700.rs-onecore-base2-hyp
Windows-version: 10.0.22631.3374

david-nordvall · 2024-04-10T06:04:34Z

And after another patch Tuesday upgrade I'm back to getting the same old copy_to_user stack traces.

PS C:\Users\DavidNordvall> wsl --version
WSL-version: 2.2.2.0
Kernelversion: 5.15.150.1-2
WSLg-version: 1.0.61
MSRDC-version: 1.2.5105
Direct3D-version: 1.611.1-81528511
DXCore-version: 10.0.25131.1002-220531-1700.rs-onecore-base2-hyp
Windows-version: 10.0.22631.3447

@craigloewen-msft, @benhillis and @pmartincic, can any of you please provide any update on this (even if it is just to say that this won't be getting attention any time soon)? Has the bug fix mentioned in #10667 (comment) been released? Have you hade a chance to look at the logs that has been provided in all the Github issues regarding this problem?

OneBlue · 2024-04-18T19:57:25Z

@maxboone: Do you see the same behavior with the stock WSL kernel ?

If so can you share /logs of a repro ?

maxboone · 2024-04-18T20:14:57Z

@maxboone: Do you see the same behavior with the stock WSL kernel ?

If so can you share /logs of a repro ?

Yes, please refer to other issues that got stale, there are logs in there.

Please refer to other issues for logs facing this problem, and a thread in the RCU kernel discussion that suggests in the direction of improper fault handling of the copy_to_user. Also, note that this doesn't happen on regular Gen2 VMs in Hyper-V, but the CPU virtualization seems very different between a Hyper-V VM and WSL VM.

https://lore.kernel.org/rcu/[email protected]/

WSL hang, arm64 cpu, not solved by other solutions #9454 (comment)

wsl.exe hangs from powershell until subsystem killed - arm64 aarch64 #10309

WSL2 intermittently slows/hangs on Windows on ARM #10667

WSL2 freezes with high CPU usage on arm64 #9135

I will collect logs again over the weekend. An exact repro is hard as this just happens over time.

david-nordvall · 2024-04-19T06:16:33Z

Here are some logs for you, @OneBlue.

As @maxboone says, it's really hard to collect logs. Not because it's hard to reproduce the problem (I can do it within one minute; see #10667 for a detailed description of my use case) but because it isn't a "specific event" but rather a case of intermittent stalls and slowdowns that increasingly gets worse over time (for example, I get a bunch of hrtimer: interrupt took 16343583 ns events in my dmesg -w logs, which indicates to me that things just takes longer than expected). So exactly "when" should I collect these logs?

I have, however, tried to collect logs at three points of slowdown/hang. I really hope they help!

WslLogs-2024-04-19_07-53-21.zip
WslLogs-2024-04-19_07-57-27.zip (collected when closed down VS Code, which 99% of the times hangs the VmmemWSL process and pegs the CPU at 99 %)
WslLogs-2024-04-19_08-06-16.zip

EDIT: This is done using the WSL2 pre-release (but not with any custom kernel)

PS C:\Users\DavidNordvall> wsl --version
WSL-version: 2.2.2.0
Kernelversion: 5.15.150.1-2
WSLg-version: 1.0.61
MSRDC-version: 1.2.5105
Direct3D-version: 1.611.1-81528511
DXCore-version: 10.0.25131.1002-220531-1700.rs-onecore-base2-hyp
Windows-version: 10.0.22631.3447

david@DavidsSrfcPro9:~$ uname -a
Linux DavidsSrfcPro9 5.15.150.1-microsoft-standard-WSL2 #1 SMP Thu Mar 7 03:23:44 UTC 2024 aarch64 aarch64 aarch64 GNU/Linux

maxboone · 2024-04-24T05:13:44Z

@OneBlue can you confirm these logs (from @david-nordvall) are sufficient to continue?

PeronGH · 2024-04-28T21:41:03Z

I recently encountered the RCU stall again, but the stack trace looks different. Here it is:

[ 6199.460642] rcu: INFO: rcu_sched self-detected stall on CPU
[ 6199.461362] rcu:     1-....: (13402 ticks this GP) idle=3a7/1/0x4000000000000002 softirq=19983/19983 fqs=6423
[ 6199.462090]  (t=15000 jiffies g=57053 q=1668)
[ 6199.462515] Task dump for CPU 1:
[ 6199.462818] task:weston          state:R  running task     stack:    0 pid:  857 ppid:   645 flags:0x0000080b
[ 6199.463667] Call trace:
[ 6199.463913]  dump_backtrace+0x0/0x1b0
[ 6199.464635]  show_stack+0x1c/0x24
[ 6199.464959]  sched_show_task+0x164/0x190
[ 6199.465521]  dump_cpu_task+0x48/0x54
[ 6199.466252]  rcu_dump_cpu_stacks+0xec/0x130
[ 6199.466542]  rcu_sched_clock_irq+0x914/0xa50
[ 6199.467061]  update_process_times+0xa0/0x190
[ 6199.467529]  tick_sched_timer+0x5c/0xd0
[ 6199.467869]  __hrtimer_run_queues+0x13c/0x330
[ 6199.468327]  hrtimer_interrupt+0xf4/0x240
[ 6199.468694]  hv_stimer0_isr+0x28/0x30
[ 6199.469128]  hv_stimer0_percpu_isr+0x14/0x20
[ 6199.469550]  handle_percpu_devid_irq+0x8c/0x1b4
[ 6199.469982]  handle_domain_irq+0x64/0x90
[ 6199.470382]  gic_handle_irq+0x58/0x128
[ 6199.470692]  call_on_irq_stack+0x20/0x38
[ 6199.471010]  do_interrupt_handler+0x54/0x5c
[ 6199.471349]  el1_interrupt+0x2c/0x4c
[ 6199.471699]  el1h_64_irq_handler+0x14/0x20
[ 6199.472039]  el1h_64_irq+0x74/0x78
[ 6199.472500]  mm_release+0xd8/0x140
[ 6199.473073]  exit_mm_release+0x2c/0x40
[ 6199.473630]  do_exit+0x19c/0xa60
[ 6199.474011]  do_group_exit+0x3c/0xa4
[ 6199.474326]  get_signal+0x1e4/0x9b0
[ 6199.474732]  do_notify_resume+0x138/0xe6c
[ 6199.475091]  el0_svc+0x3c/0x4c
[ 6199.475581]  el0t_64_sync_handler+0x9c/0x120
[ 6199.476007]  el0t_64_sync+0x158/0x15c
[ 6199.476323] Task dump for CPU 4:
[ 6199.476673] task:rdp-source      state:R  running task     stack:    0 pid:  861 ppid:   645 flags:0x0000080b
[ 6199.477631] Call trace:
[ 6199.478165]  __switch_to+0xb4/0xec
[ 6199.478831]  0xffff000048ae4880
[ 6200.364438] rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 1-... 4-... } 15226 jiffies s: 357 root: 0x12/.
[ 6200.365645] rcu: blocking rcu_node structures (internal RCU debug):
[ 6200.366292] Task dump for CPU 1:
[ 6200.366673] task:weston          state:R  running task     stack:    0 pid:  857 ppid:   645 flags:0x0000080b
[ 6200.367605] Call trace:
[ 6200.367822]  __switch_to+0xb4/0xec
[ 6200.368293]  0xffff0000026a8e80
[ 6200.368629] Task dump for CPU 4:
[ 6200.368932] task:rdp-source      state:R  running task     stack:    0 pid:  861 ppid:   645 flags:0x0000080b
[ 6200.369669] Call trace:
[ 6200.369834]  __switch_to+0xb4/0xec
[ 6200.370124]  0xffff000048ae4880
[ 6208.996562] hrtimer: interrupt took 28039200 ns
[ 6379.492381] rcu: INFO: rcu_sched self-detected stall on CPU
[ 6379.493137] rcu:     4-....: (59969 ticks this GP) idle=e35/1/0x4000000000000002 softirq=20957/20957 fqs=23939
[ 6379.493761]  (t=60008 jiffies g=57053 q=1696)
[ 6379.494037] Task dump for CPU 1:
[ 6379.494290] task:weston          state:R  running task     stack:    0 pid:  857 ppid:   645 flags:0x0000080b
[ 6379.494897] Call trace:
[ 6379.495030]  __switch_to+0xb4/0xec
[ 6379.495236]  0xffff0000026a8e80
[ 6379.495426] Task dump for CPU 4:
[ 6379.495640] task:rdp-source      state:R  running task     stack:    0 pid:  861 ppid:   645 flags:0x0000080b
[ 6379.496227] Call trace:
[ 6379.496432]  dump_backtrace+0x0/0x1b0
[ 6379.496758]  show_stack+0x1c/0x24
[ 6379.497047]  sched_show_task+0x164/0x190
[ 6379.497323]  dump_cpu_task+0x48/0x54
[ 6379.497707]  rcu_dump_cpu_stacks+0xec/0x130
[ 6379.498010]  rcu_sched_clock_irq+0x914/0xa50
[ 6379.498318]  update_process_times+0xa0/0x190
[ 6379.498588]  tick_sched_timer+0x5c/0xd0
[ 6379.498856]  __hrtimer_run_queues+0x13c/0x330
[ 6379.499170]  hrtimer_interrupt+0xf4/0x240
[ 6379.499381]  hv_stimer0_isr+0x28/0x30
[ 6379.499589]  hv_stimer0_percpu_isr+0x14/0x20
[ 6379.499865]  handle_percpu_devid_irq+0x8c/0x1b4
[ 6379.500236]  handle_domain_irq+0x64/0x90
[ 6379.500536]  gic_handle_irq+0x58/0x128
[ 6379.500789]  call_on_irq_stack+0x20/0x38
[ 6379.501045]  do_interrupt_handler+0x54/0x5c
[ 6379.501352]  el1_interrupt+0x2c/0x4c
[ 6379.501636]  el1h_64_irq_handler+0x14/0x20
[ 6379.501892]  el1h_64_irq+0x74/0x78
[ 6379.502067]  mm_release+0xd8/0x140
[ 6379.502233]  exit_mm_release+0x2c/0x40
[ 6379.502398]  do_exit+0x19c/0xa60
[ 6379.502567]  do_group_exit+0x3c/0xa4
[ 6379.502730]  get_signal+0x1e4/0x9b0
[ 6379.502895]  do_notify_resume+0x138/0xe6c
[ 6379.503057]  el0_svc+0x3c/0x4c
[ 6379.503220]  el0t_64_sync_handler+0x9c/0x120
[ 6379.503430]  el0t_64_sync+0x158/0x15c
[ 6394.924424] rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 1-... 4-... } 63866 jiffies s: 357 root: 0x12/.
[ 6394.925358] rcu: blocking rcu_node structures (internal RCU debug):
[ 6394.925924] Task dump for CPU 1:
[ 6394.926295] task:weston          state:R  running task     stack:    0 pid:  857 ppid:   645 flags:0x0000080b
[ 6394.927296] Call trace:
[ 6394.927445]  __switch_to+0xb4/0xec
[ 6394.927736]  0xffff0000026a8e80
[ 6394.927998] Task dump for CPU 4:
[ 6394.928223] task:rdp-source      state:R  running task     stack:    0 pid:  861 ppid:   645 flags:0x0000080b
[ 6394.928838] Call trace:
[ 6394.928969]  __switch_to+0xb4/0xec
[ 6394.929175]  0xffff000048ae4880

WSL version: 2.2.4.0
Kernel version: 5.15.153.1-2
WSLg version: 1.0.61
MSRDC version: 1.2.5326
Direct3D version: 1.611.1-81528511
DXCore version: 10.0.26091.1-240325-1447.ge-release
Windows version: 10.0.22635.3566

The kernel is the latest official one compiled with RSEQ disabled.

david-nordvall · 2024-05-02T06:10:00Z

@OneBlue, just curious if you've had the chance to have a look at the logs I provided? Are they any help?

maxboone · 2024-05-11T10:13:08Z

@OneBlue any update?

mazugrin · 2024-05-15T17:02:59Z

Still happening with kernel v6.9 and wsl version 2.2.4.0:

[12689.226208] rcu: INFO: rcu_sched self-detected stall on CPU
[12689.226969] rcu:     2-....: (20995 ticks this GP) idle=1ae4/1/0x4000000000000000 softirq=717849/717849 fqs=9028
[12689.228228] rcu:     (t=21006 jiffies g=766217 q=139020 ncpus=8)
[12689.228885] CPU: 2 PID: 299 Comm: rsyslogd Not tainted 6.9.0 #1
[12689.228899] pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[12689.228905] pc : clear_rseq_cs.isra.0+0x20/0x28
[12689.228921] lr : __rseq_handle_notify_resume+0x6c/0x348
[12689.228927] sp : ffff8000852a3db0
[12689.228930] x29: ffff8000852a3db0 x28: ffff00010ea45b80 x27: 0000000000000000
[12689.228941] x26: 0000000000000000 x25: 0000000000000000 x24: 0000ffffb1fdec2c
[12689.228949] x23: 0000000080000000 x22: ffff8000852a3eb0 x21: ffff8000852a3eb0
[12689.228958] x20: ffff00010ea45b80 x19: 0000000000000000 x18: 0000000000000000
[12689.228966] x17: 0000000000000000 x16: 0000000000000000 x15: ffff8000852a3d28
[12689.228974] x14: ffff00010ea45c00 x13: 0000000000000001 x12: ffff800081358cc8
[12689.228982] x11: ffff000102302180 x10: 0000000000000900 x9 : ffff8000852a3700
[12689.228991] x8 : 000000385994105c x7 : 000000006644e87c x6 : 0000ffffd2d94518
[12689.228999] x5 : 0000ffffd2d94518 x4 : 0000000000400100 x3 : 0000000000000000
[12689.229007] x2 : 0000000000000000 x1 : 0000ffffb22c7fe8 x0 : 0000000000000000
[12689.229016] Call trace:
[12689.229020]  clear_rseq_cs.isra.0+0x20/0x28
[12689.229027]  do_notify_resume+0xa8/0x138
[12689.229035]  el0_svc+0xb4/0x11c
[12689.229043]  el0t_64_sync_handler+0x134/0x150
[12689.229048]  el0t_64_sync+0x14c/0x150

maxboone · 2024-06-13T22:13:12Z

@OneBlue @pmartincic with the announcement of the new surface copilot pc series, any update?

david-nordvall · 2024-06-15T18:25:50Z

Not to mention with the release of Docker Desktop for Windows on ARM. I'm traveling and won't be able to test the official Docker Desktop release on my Surface Pro 9 5G for a couple of weeks. Has anyone else tested it?

mazugrin · 2024-06-19T23:13:34Z

Still happening on kernel 6.9.5 under WSL version: 2.2.4.0:
This thing is unusable for any real work. Will it ever be fixed @OneBlue @pmartincic ?

[  978.163963] rcu: INFO: rcu_sched self-detected stall on CPU
[  978.165483] rcu:     4-....: (5249 ticks this GP) idle=978c/1/0x4000000000000000 softirq=28554/28554 fqs=2221
[  978.165840] rcu:              hardirqs   softirqs   csw/system
[  978.166034] rcu:      number:     2623         84            0
[  978.166233] rcu:     cputime:        0          0        10488   ==> 10488(ms)
[  978.166469] rcu:     (t=5250 jiffies g=42345 q=334 ncpus=8)
[  978.166678] Sending NMI from CPU 4 to CPUs 0:
[  978.166924] NMI backtrace for cpu 0
[  978.167171] CPU: 0 PID: 3149 Comm: play-dev-mode-p Not tainted 6.9.5 #6
[  978.167459] pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[  978.167721] pc : clear_rseq_cs.isra.0+0x20/0x28
[  978.168664] lr : __rseq_handle_notify_resume+0x6c/0x348
[  978.168950] sp : ffff80008909bdb0
[  978.169151] x29: ffff80008909bdb0 x28: ffff00026fae4c40 x27: 0000000000000000
[  978.169536] x26: 0000000000000000 x25: 0000000000000000 x24: 0000ffffb0039dfc
[  978.169902] x23: 0000000080001000 x22: ffff80008909beb0 x21: ffff80008909beb0
[  978.170282] x20: ffff00026fae4c40 x19: 0000000000000000 x18: 0000000000000000
[  978.170660] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
[  978.171051] x14: ffff00026fae4cc0 x13: 0000000000000001 x12: ffff800081328cc0
[  978.171449] x11: ffff000100d26400 x10: 0000000000000900 x9 : ffff80008909bb40
[  978.171879] x8 : ffff00026fae55a0 x7 : ffff0001019bf000 x6 : 0000000000000000
[  978.172225] x5 : 0000000000000000 x4 : 0000000000400040 x3 : 0000000000000000
[  978.172536] x2 : 0000000000000000 x1 : 0000fffe53bff8c8 x0 : 0000000000000000
[  978.172859] Call trace:
[  978.172998]  clear_rseq_cs.isra.0+0x20/0x28
[  978.173151]  do_notify_resume+0xa8/0x138
[  978.173325]  el0_svc+0xb4/0x11c
[  978.173841]  el0t_64_sync_handler+0x134/0x150
[  978.174151]  el0t_64_sync+0x14c/0x150
[  978.174924] CPU: 4 PID: 3148 Comm: play-dev-mode-p Not tainted 6.9.5 #6
[  978.175321] pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[  978.175703] pc : clear_rseq_cs.isra.0+0x20/0x28
[  978.176155] lr : __rseq_handle_notify_resume+0x6c/0x348
[  978.176354] sp : ffff800089093db0
[  978.176507] x29: ffff800089093db0 x28: ffff00020dd0adc0 x27: 0000000000000000
[  978.176814] x26: 0000000000000000 x25: 0000000000000000 x24: 0000ffffb0039dfc
[  978.177113] x23: 0000000080001000 x22: ffff800089093eb0 x21: ffff800089093eb0
[  978.177412] x20: ffff00020dd0adc0 x19: 0000000000000000 x18: 0000000000000000
[  978.177749] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
[  978.178047] x14: ffff00020dd0ae40 x13: 0000000000000001 x12: ffff800081328cc0
[  978.178342] x11: ffff000100d26400 x10: 0000000000000900 x9 : ffff800089093b40
[  978.178641] x8 : ffff00020dd0b720 x7 : ffff0001019bf800 x6 : 0000000000000000
[  978.178941] x5 : 0000000000000000 x4 : 0000000000400040 x3 : 0000000000000000
[  978.179236] x2 : 0000000000000000 x1 : 0000fffe541ff8c8 x0 : 0000000000000000
[  978.179533] Call trace:
[  978.179636]  clear_rseq_cs.isra.0+0x20/0x28
[  978.179795]  do_notify_resume+0xa8/0x138
[  978.180002]  el0_svc+0xb4/0x11c
[  978.180151]  el0t_64_sync_handler+0x134/0x150
[  978.180358]  el0t_64_sync+0x14c/0x150

maxboone · 2024-06-21T05:01:29Z

@jhovold does this problem look like anything you've come across building the kernel for the ThinkPads with SQ3?

maxboone · 2024-07-01T20:28:50Z

@pmartincic @OneBlue checking in

maxboone · 2024-07-03T06:44:41Z

@kelsey-steele tagging as releaser of kernel sources - is there anything related to this on the roadmap or in the current out-of-tree?

halfmanhalftaco · 2024-07-13T15:44:15Z

I was plagued by extremely frequent rcu stalls resulting in high CPU usage on Win11 23H2 arm64 (lenovo x13s), but after updating to 24H2 (26100.1150) the problem has completely disappeared. WSL is finally usable on this machine now.

WSL/wslg/kernel versions appear the same as on 23H3, so maybe some underlying OS/Hyper-V bugs were fixed?

WSL version: 2.2.4.0
Kernel version: 5.15.153.1-2
WSLg version: 1.0.61
MSRDC version: 1.2.5326
Direct3D version: 1.611.1-81528511
DXCore version: 10.0.26091.1-240325-1447.ge-release
Windows version: 10.0.26100.1150

mazugrin · 2024-07-14T23:07:49Z

I'm starting to believe this is true! (Too many times I've been told it was fixed when it wasn't) I upgraded to 24H2 and am now able to use WSL for at least a few hours and have seen no instability at all. Could it be true that it's FINALLY fixed!!!??? I'm happy as a clam now with my Robo & Kala tablet.

david-nordvall · 2024-07-15T06:46:55Z

How are you able to install 24H2? Have you joined an insider channel (Release Preview?)? I'd rather not since my Surface Pro 9 is my daily driver. Rally encouraging news either way!

maxboone · 2024-07-15T11:31:11Z

I was plagued by extremely frequent rcu stalls resulting in high CPU usage on Win11 23H2 arm64 (lenovo x13s), but after updating to 24H2 (26100.1150) the problem has completely disappeared. WSL is finally usable on this machine now.

I'm going to update today and will report back, I hope it'll work!

@craigloewen-msft @benhillis could you confirm something's in 23H2 that would fix this?

* Refer: microsoft/WSL#11274 Signed-off-by: Yang Jeong Hun <[email protected]>

mazugrin · 2024-07-15T12:57:26Z

How are you able to install 24H2? Have you joined an insider channel (Release Preview?)? I'd rather not since my Surface Pro 9 is my daily driver. Rally encouraging news either way!

Yes, indeed you do have to follow Release Preview. For what it's worth, the language used to describe what that means makes it sound like a very low-risk channel to follow.

maxboone · 2024-07-16T12:38:41Z

Same on Surface Pro X SQ2, since the 24H2 update I haven't run into any RSEQ / RCU related stalls.

It's a shame that I can't mount the Hyper-V disks to switch back to WSL, but that works fine through qemu-nbd:

sudo -i
modprobe nbd max_part=16
qemu-nbd -c /dev/nbd0 /mnt/c/ProgramData/Microsoft/Windows/Virtual\ Hard\ Disks/ubuntu0.vhdx
partprobe /dev/nbd0
mount /dev/ubuntu-vg/ubuntu-lv /mnt/ubuntu0

I'll keep monitoring whether the stalls really stay away, but it looks like this issue has been fixed!

maxboone · 2024-07-18T13:28:19Z

After running the 24H2 update for two days, I can gladly say that I am not running into these stalls anymore. Closing this issue.

f0o · 2024-07-27T16:30:38Z

@maxboone what's the availability of 24H2? latest preview channel update I'm able to pull is 2024-07 Cumulative Preview for 23H2

maxboone · 2024-07-27T17:30:19Z

@maxboone what's the availability of 24H2?

After I pulled that one I got the update for 24H2 over the Release Preview.

PeronGH mentioned this issue Mar 9, 2024

Frequent RCU Stalls on ARM64 #11210

Closed

2 tasks

PeronGH mentioned this issue Apr 4, 2024

[BUG][OLD] Disable RSEQ for ARM64 build Nevuly/WSL2-Rolling-Kernel-Issue#6

Closed

microsoft-github-policy-service bot added the needs-author-feedback label Apr 18, 2024

microsoft-github-policy-service bot removed the needs-author-feedback label Apr 18, 2024

david-nordvall mentioned this issue May 24, 2024

Please support Docker Desktop (WSL2) on Windows on ARM docker/roadmap#91

Open

Nevuly added a commit to Nevuly/WSL2-Linux-Kernel-Rolling that referenced this issue Jul 15, 2024

arm64: Kconfig: Remove support RSEQ (Nevuly/WSL2-Rolling-Kernel-Issue#6)

6f3aa94

* Refer: microsoft/WSL#11274 Signed-off-by: Yang Jeong Hun <[email protected]>

maxboone closed this as completed Jul 18, 2024

This was referenced Jul 18, 2024

WSL2 freezes with high CPU usage on arm64 #9135

Open

WSL2 intermittently slows/hangs on Windows on ARM #10667

Closed

wsl.exe hangs from powershell until subsystem killed - arm64 aarch64 #10309

Open

WSL hang, arm64 cpu, not solved by other solutions #9454

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RCU stalls on copy_to_user actions in kernel #11274

RCU stalls on copy_to_user actions in kernel #11274

maxboone commented Mar 8, 2024 •

edited

Loading

maxboone commented Mar 8, 2024 •

edited

Loading

david-nordvall commented Apr 8, 2024

david-nordvall commented Apr 10, 2024

OneBlue commented Apr 18, 2024

maxboone commented Apr 18, 2024 •

edited

Loading

david-nordvall commented Apr 19, 2024 •

edited

Loading

maxboone commented Apr 24, 2024 •

edited

Loading

PeronGH commented Apr 28, 2024

david-nordvall commented May 2, 2024

maxboone commented May 11, 2024

mazugrin commented May 15, 2024 •

edited

Loading

maxboone commented Jun 13, 2024

david-nordvall commented Jun 15, 2024

mazugrin commented Jun 19, 2024

maxboone commented Jun 21, 2024

maxboone commented Jul 1, 2024

maxboone commented Jul 3, 2024

halfmanhalftaco commented Jul 13, 2024

mazugrin commented Jul 14, 2024

david-nordvall commented Jul 15, 2024

maxboone commented Jul 15, 2024

mazugrin commented Jul 15, 2024

maxboone commented Jul 16, 2024

maxboone commented Jul 18, 2024

f0o commented Jul 27, 2024

maxboone commented Jul 27, 2024

RCU stalls on copy_to_user actions in kernel #11274

RCU stalls on copy_to_user actions in kernel #11274

Comments

maxboone commented Mar 8, 2024 • edited Loading

Windows Version

WSL Version

Are you using WSL 1 or WSL 2?

Kernel Version

Distro Version

Other Software

Repro Steps

Expected Behavior

Actual Behavior

Diagnostic Logs

maxboone commented Mar 8, 2024 • edited Loading

david-nordvall commented Apr 8, 2024

david-nordvall commented Apr 10, 2024

OneBlue commented Apr 18, 2024

maxboone commented Apr 18, 2024 • edited Loading

david-nordvall commented Apr 19, 2024 • edited Loading

maxboone commented Apr 24, 2024 • edited Loading

PeronGH commented Apr 28, 2024

david-nordvall commented May 2, 2024

maxboone commented May 11, 2024

mazugrin commented May 15, 2024 • edited Loading

maxboone commented Jun 13, 2024

david-nordvall commented Jun 15, 2024

mazugrin commented Jun 19, 2024

maxboone commented Jun 21, 2024

maxboone commented Jul 1, 2024

maxboone commented Jul 3, 2024

halfmanhalftaco commented Jul 13, 2024

mazugrin commented Jul 14, 2024

david-nordvall commented Jul 15, 2024

maxboone commented Jul 15, 2024

mazugrin commented Jul 15, 2024

maxboone commented Jul 16, 2024

maxboone commented Jul 18, 2024

f0o commented Jul 27, 2024

maxboone commented Jul 27, 2024

maxboone commented Mar 8, 2024 •

edited

Loading

maxboone commented Mar 8, 2024 •

edited

Loading

maxboone commented Apr 18, 2024 •

edited

Loading

david-nordvall commented Apr 19, 2024 •

edited

Loading

maxboone commented Apr 24, 2024 •

edited

Loading

mazugrin commented May 15, 2024 •

edited

Loading