Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fedora Image hung / froze when left on for prolonged period of time #34

Closed
warthog9 opened this issue May 12, 2021 · 32 comments
Closed
Assignees

Comments

@warthog9
Copy link

Sadly almost nothing to include with the report, left the current April 19th Fedora image, kernel 5.10.6+ running overnight.

System apparently wandered off and just stopped responding, dropped network and was unresponsive on serial port or keyboard/hdmi console. There was no output on the serial console indicating the the kernel had hung or crashed, it was as I had left it overnight.

Ethernet port was still blinking in response to external network traffic, but was not responding to it's expected IPv4 address.

I had gpios 0, 2, 20 set high to drive an RGB LED, and gpio 46 set for input from the kernel (I've got a button set on it).

@warthog9
Copy link
Author

Happened again overnight, though I got slightly more debugging information this time. It looks like the rcu_scheduler runs out of scheduled time, and OOMs the kernel, which is obviously a very odd state to get into

IMAG5853

Sadly didn't have the serial line attached overnight so I don't have the text dump, but that's a picture of what was on the monitor when I checked it this morning

@pdp7
Copy link
Collaborator

pdp7 commented May 13, 2021

Thanks for the additional information. Could you reword the title of this issue to say that Fedora crashes overnight? I think the meaning of "wander off" does not translate well.

@pdp7
Copy link
Collaborator

pdp7 commented May 13, 2021

Might be related to starfive-tech/linux#18 (comment)

@warthog9 warthog9 changed the title Fedora Image wandered off overnight Fedora Image hung / froze when left on for prolonged period of time May 13, 2021
@oaken-source
Copy link

I'm capturing the kernel output through a serial interface. Will post a full trace here if it happens again.

@oaken-source
Copy link

It is happening again. The first symptom is always this:

[84141.632159] watchdog: BUG: soft lockup - CPU#0 stuck for 31s! [makeinfo:328617]
[84141.639538] Modules linked in:
[84141.642631] CPU: 0 PID: 328617 Comm: makeinfo Not tainted 5.12.0-gnu #2
[84141.649304] Hardware name: sifive,freedom-u74-arty (DT)
[84141.654545] epc : 0000003ff74a7fda ra : 0000003ff6fe6084 sp : 0000003fff88a720
[84141.661791]  gp : 0000002aab5868f0 tp : 0000003ff7211700 t0 : dd0ff03400000000
[84141.669032]  t1 : 0000003ff6fe3a1c t2 : 0000000000635f64 s0 : 0000002ab5732fe8
[84141.676272]  s1 : 0000002aab587260 a0 : 0000002aab587260 a1 : 0000000000000000
[84141.683513]  a2 : 0000000000000000 a3 : 0000002aac541784 a4 : 0000002aac541785
[84141.690754]  a5 : 00000000001d41ad a6 : 0000002ab5489ae0 a7 : 0000002ab573a4f0
[84141.697993]  s2 : 0000002aab587260 s3 : 0000002aab587260 s4 : 0000000000000000
[84141.705231]  s5 : 0000003fff88a814 s6 : 0000003ff7587910 s7 : 0000000000000000
[84141.712470]  s8 : 0000002ad1ce6020 s9 : 0000000000000000 s10: 0000000000000000
[84141.719742]  s11: 0000002ad1ca0eb8 t3 : 0000003ff74a7fc6 t4 : 656e69616c707865
[84141.727014]  t5 : dd0ff034f2ca9d62 t6 : a2e0c47812601962
[84141.732355] status: 8000000200006020 badaddr: 0000000000000000 cause: 8000000000000005

Then, a while later, this follows:

[102740.851394] rcu: INFO: rcu_sched self-detected stall on CPU
[102740.857141] rcu:    0-...!: (83 ticks this GP) idle=c32/1/0x4000000000000004 softirq=1808405/1808407 fqs=0 
[102740.866872]         (t=7363 jiffies g=2712393 q=431)
[102740.871356] Task dump for CPU 0:
[102740.874708] task:makeinfo        state:R  running task     stack:    0 pid:385627 ppid:385624 flags:0x00000008
[102740.884875] Call Trace:
[102740.887443] [<ffffffe000004810>] walk_stackframe+0x0/0xce
[102740.892987] [<ffffffe00060eac4>] dump_backtrace+0x38/0x46
[102740.898529] [<ffffffe00060eae2>] show_stack+0x10/0x18
[102740.903728] [<ffffffe000029b26>] sched_show_task+0x154/0x174
[102740.909526] [<ffffffe00060f2e8>] dump_cpu_task+0x42/0x4c
[102740.914978] [<ffffffe00060fe56>] rcu_dump_cpu_stacks+0xd8/0x116
[102740.921044] [<ffffffe00005e6dc>] rcu_sched_clock_irq+0x50a/0x66c
[102740.927197] [<ffffffe0000646b0>] update_process_times+0xa0/0xc8
[102740.933262] [<ffffffe000070478>] tick_sched_timer+0x78/0x130
[102740.939062] [<ffffffe000064b42>] __hrtimer_run_queues+0x122/0x186
[102740.945311] [<ffffffe000065554>] hrtimer_interrupt+0xcc/0x222
[102740.951209] [<ffffffe000450fde>] riscv_timer_interrupt+0x32/0x3c
[102740.957348] [<ffffffe000054228>] handle_percpu_devid_irq+0x80/0x11a
[102740.963753] [<ffffffe00004f5be>] __handle_domain_irq+0x72/0xcc
[102740.969734] [<ffffffe00032023e>] riscv_intc_irq+0x38/0x60
[102740.975268] [<ffffffe0000032b6>] ret_from_exception+0x0/0xc
[102740.980975] [<ffffffe00061a5ec>] _raw_spin_unlock+0x10/0x22

And then, even later, it locks up. I didn't leave it to lock up completely, because I couldn't afford any file system integrity damage at the moment, but I will leave it to lock next time.

@oaken-source
Copy link

oaken-source commented May 15, 2021

Another example, currently ongoing:

[53021.381908] watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [cc1plus:171665]
[53021.389191] Modules linked in:
[53021.392281] CPU: 1 PID: 171665 Comm: cc1plus Not tainted 5.12.0-gnu #2
[53021.398828] Hardware name: sifive,freedom-u74-arty (DT)
[53021.404064] epc : 0000000000cc9d6e ra : 0000000000d80428 sp : 0000003fffe6f8a0
[53021.411306]  gp : 0000000001b4b478 tp : 0000003fed692710 t0 : 0000000000000020
[53021.418544]  t1 : 0000000000000000 t2 : 0000000001bbc010 s0 : 0000003fffe6f990
[53021.425785]  s1 : 0000000001f0a4a0 a0 : 0000000001f0a4a0 a1 : 0000003fffe6f958
[53021.433025]  a2 : 0000003fffe6f8a8 a3 : 0000003fffe6f8a8 a4 : 0000003feb556090
[53021.440291]  a5 : 0000003fffe6f958 a6 : 0000000000000002 a7 : 0000003fed2edb28
[53021.447529]  s2 : 0000000000000000 s3 : 0000002adee279f0 s4 : 0000002adeba4500
[53021.454769]  s5 : 0000002adee279f0 s6 : 0000003fff81cdc0 s7 : 0000002adee1edf4
[53021.462009]  s8 : 0000002adddfe020 s9 : 0000002adde00ad4 s10: 0000002adeb0889d
[53021.469270]  s11: 000000000000005c t3 : 0000003fed6f775e t4 : 0000000000000000
[53021.476547]  t5 : 0000000000000001 t6 : 0000000000000000
[53021.481885] status: 0000000200004020 badaddr: 0000000000000000 cause: 8000000000000005

Update:

[122137.609696] rcu: INFO: rcu_sched self-detected stall on CPU
[122137.615439] rcu:    1-...!: (62 ticks this GP) idle=1b2/1/0x4000000000000002 softirq=2281956/2281958 fqs=0 
[122137.625146]         (t=3733 jiffies g=3635957 q=2938)
[122137.629716] Task dump for CPU 1:
[122137.633060] task:as              state:R  running task     stack:    0 pid:376848 ppid:376532 flags:0x00000008
[122137.643214] Call Trace:
[122137.645769] [<ffffffe000004810>] walk_stackframe+0x0/0xce
[122137.651306] [<ffffffe00060eac4>] dump_backtrace+0x38/0x46
[122137.656844] [<ffffffe00060eae2>] show_stack+0x10/0x18
[122137.662019] [<ffffffe000029b26>] sched_show_task+0x154/0x174
[122137.667810] [<ffffffe00060f2e8>] dump_cpu_task+0x42/0x4c
[122137.673256] [<ffffffe00060fe56>] rcu_dump_cpu_stacks+0xd8/0x116
[122137.679322] [<ffffffe00005e6dc>] rcu_sched_clock_irq+0x50a/0x66c
[122137.685470] [<ffffffe0000646b0>] update_process_times+0xa0/0xc8
[122137.691534] [<ffffffe000070478>] tick_sched_timer+0x78/0x130
[122137.697332] [<ffffffe000064b42>] __hrtimer_run_queues+0x122/0x186
[122137.703572] [<ffffffe000065554>] hrtimer_interrupt+0xcc/0x222
[122137.709454] [<ffffffe000450fde>] riscv_timer_interrupt+0x32/0x3c
[122137.715595] [<ffffffe000054228>] handle_percpu_devid_irq+0x80/0x11a
[122137.721997] [<ffffffe00004f5be>] __handle_domain_irq+0x72/0xcc
[122137.727975] [<ffffffe00032023e>] riscv_intc_irq+0x38/0x60
[122137.733516] [<ffffffe0000032b6>] ret_from_exception+0x0/0xc

@oaken-source
Copy link

another one:

[40382.752293] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:                                                                             
[40382.758265] rcu:     1-...!: (1 ticks this GP) idle=ff6/1/0x4000000000000002 softirq=1808567/1808567 fqs=0       
[40382.767747]  (detected by 0, t=2102 jiffies, g=3449541, q=101)                                                   
[40382.773583] Task dump for CPU 1:                                                                                 
[40382.776809] task:python          state:R  running task     stack:    0 pid:554348 ppid:553527 flags:0x00000000                              
[40382.786824] Call Trace:                                                                                          
[40382.789266] [<ffffffe000616bdc>] __schedule+0x1be/0x4a6                                                                                     
[40382.794518] rcu: rcu_sched kthread timer wakeup didn't happen for 2101 jiffies! g3449541 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402              
[40382.805910] rcu:     Possible timer handling issue on cpu=1 timer-softirq=694166                                                
[40382.813035] rcu: rcu_sched kthread starved for 2102 jiffies! g3449541 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=1             
[40382.823470] rcu:     Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior.                           
[40382.832424] rcu: RCU grace-period kthread stack dump:                                                              
[40382.837467] task:rcu_sched       state:I stack:    0 pid:   11 ppid:     2 flags:0x00000000                        
[40382.845827] Call Trace:                                                                                          
[40382.848284] [<ffffffe000616bdc>] __schedule+0x1be/0x4a6                                                            
[40382.853526] [<ffffffe000616f1a>] schedule+0x56/0xca                                                              
[40382.858417] [<ffffffe000619b2a>] schedule_timeout+0x68/0xca                                                        
[40382.864008] [<ffffffe00005cddc>] rcu_gp_kthread+0x510/0x93e                                                      
[40382.869607] [<ffffffe000023bfa>] kthread+0xfe/0x10c                                                                
[40382.874504] [<ffffffe0000032b6>] ret_from_exception+0x0/0xc                                                      
[40382.880091] rcu: Stack dump where RCU GP kthread last ran:                                                       
[40382.885581] Task dump for CPU 1:                                                                                   
[40382.888814] task:python          state:R  running task     stack:    0 pid:554348 ppid:553527 flags:0x00000000   
[40382.898846] Call Trace:                                                                                          
[40382.901298] [<ffffffe000616bdc>] __schedule+0x1be/0x4a6                                                            
[40445.799637] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:                                                  
[40445.805631] rcu:     1-...!: (1 ticks this GP) idle=ff6/1/0x4000000000000002 softirq=1808567/1808567 fqs=0         
[40445.815134]  (detected by 0, t=8407 jiffies, g=3449541, q=151)                                                   
[40445.820980] Task dump for CPU 1:                                                                               
[40445.824215] task:python          state:R  running task     stack:    0 pid:554348 ppid:553527 flags:0x00000000 
[40445.834250] Call Trace:                                                                                        
[40445.836704] [<ffffffe000616bdc>] __schedule+0x1be/0x4a6                                                            
[40445.841965] rcu: rcu_sched kthread timer wakeup didn't happen for 8406 jiffies! g3449541 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x200              
[40445.853373] rcu:     Possible timer handling issue on cpu=1 timer-softirq=694166                                                
[40445.860509] rcu: rcu_sched kthread starved for 8407 jiffies! g3449541 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x200 ->cpu=1         
[40445.870962] rcu:     Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior.                           
[40445.879935] rcu: RCU grace-period kthread stack dump:                                                              
[40445.884987] task:rcu_sched       state:R stack:    0 pid:   11 ppid:     2 flags:0x00000000                        
[40445.893364] Call Trace:                                                                                          
[40445.895816] [<ffffffe000616bdc>] __schedule+0x1be/0x4a6                                                                                     
[40445.901060] [<ffffffe000616f1a>] schedule+0x56/0xca                        
[40445.905951] [<ffffffe000619b2a>] schedule_timeout+0x68/0xca                                                                                              
[40445.911545] [<ffffffe00005cddc>] rcu_gp_kthread+0x510/0x93e                                                                                              
[40445.917144] [<ffffffe000023bfa>] kthread+0xfe/0x10c                        
[40445.922041] [<ffffffe0000032b6>] ret_from_exception+0x0/0xc                                                                                              
[40445.927629] rcu: Stack dump where RCU GP kthread last ran:                                                                                               
[40445.933118] Task dump for CPU 1:                                           
[40445.936353] task:python          state:R  running task     stack:    0 pid:554348 ppid:553527 flags:0x00000000                                           
[40445.946386] Call Trace:             
[40445.948837] [<ffffffe000616bdc>] __schedule+0x1be/0x4a6   
[40508.846980] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:                                                                                          
[40508.852971] rcu:     1-...!: (1 ticks this GP) idle=ff6/1/0x4000000000000002 softirq=1808567/1808567 fqs=0                                               
[40508.862477]  (detected by 0, t=14712 jiffies, g=3449541, q=216)                                                                                          
[40508.868410] Task dump for CPU 1:                                                                                                                         
[40508.871645] task:python          state:R  running task     stack:    0 pid:554348 ppid:553527 flags:0x00000000                                           
[40508.881681] Call Trace:                                                                                                                                  
[40508.884135] [<ffffffe000616bdc>] __schedule+0x1be/0x4a6                                                                                                  
[40508.889399] rcu: rcu_sched kthread timer wakeup didn't happen for 14711 jiffies! g3449541 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x200                          
[40508.900895] rcu:     Possible timer handling issue on cpu=1 timer-softirq=694166                                                                         
[40508.908032] rcu: rcu_sched kthread starved for 14712 jiffies! g3449541 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x200 ->cpu=1                                     
[40508.918561] rcu:     Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior.                                                    
[40508.927535] rcu: RCU grace-period kthread stack dump:                      
[40508.932586] task:rcu_sched       state:R stack:    0 pid:   11 ppid:     2 flags:0x00000000                                                              
[40508.940964] Call Trace:             
[40508.943417] [<ffffffe000616bdc>] __schedule+0x1be/0x4a6                                                                                                  
[40508.948658] [<ffffffe000616f1a>] schedule+0x56/0xca                        
[40508.953550] [<ffffffe000619b2a>] schedule_timeout+0x68/0xca                                                                                              
[40508.959144] [<ffffffe00005cddc>] rcu_gp_kthread+0x510/0x93e                                                                                              
[40508.964743] [<ffffffe000023bfa>] kthread+0xfe/0x10c                        
[40508.969641] [<ffffffe0000032b6>] ret_from_exception+0x0/0xc                                                                                              
[40508.975228] rcu: Stack dump where RCU GP kthread last ran:                                                                                               
[40508.980718] Task dump for CPU 1:                                           
[40508.983952] task:python          state:R  running task     stack:    0 pid:554348 ppid:553527 flags:0x00000000                                           
[40508.993985] Call Trace:             
[40508.996437] [<ffffffe000616bdc>] __schedule+0x1be/0x4a6                                                                                                  
[40528.208749] systemd[1]: systemd-journald.service: State 'stop-watchdog' timed out. Killing.                                                              
[40528.217523] systemd[1]: systemd-journald.service: Killing process 131 (systemd-journal) with signal SIGKILL.                                             
[40571.894325] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[40571.900318] rcu:     1-...!: (1 ticks this GP) idle=ff6/1/0x4000000000000002 softirq=1808567/1808567 fqs=0 
[40571.909820]  (detected by 0, t=21017 jiffies, g=3449541, q=312)
[40571.915754] Task dump for CPU 1:
[40571.918989] task:python          state:R  running task     stack:    0 pid:554348 ppid:553527 flags:0x00000000
[40571.929026] Call Trace:
[40571.931483] [<ffffffe000616bdc>] __schedule+0x1be/0x4a6
[40571.936749] rcu: rcu_sched kthread timer wakeup didn't happen for 21016 jiffies! g3449541 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x200
[40571.948247] rcu:     Possible timer handling issue on cpu=1 timer-softirq=694166
[40571.955384] rcu: rcu_sched kthread starved for 21017 jiffies! g3449541 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x200 ->cpu=1
[40571.965916] rcu:     Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior.
[40571.974890] rcu: RCU grace-period kthread stack dump:
[40571.979944] task:rcu_sched       state:R stack:    0 pid:   11 ppid:     2 flags:0x00000000
[40571.988333] Call Trace:
[40571.990787] [<ffffffe000616bdc>] __schedule+0x1be/0x4a6
[40571.996029] [<ffffffe000616f1a>] schedule+0x56/0xca
[40572.000922] [<ffffffe000619b2a>] schedule_timeout+0x68/0xca
[40572.006514] [<ffffffe00005cddc>] rcu_gp_kthread+0x510/0x93e
[40572.012114] [<ffffffe000023bfa>] kthread+0xfe/0x10c
[40572.017010] [<ffffffe0000032b6>] ret_from_exception+0x0/0xc
[40572.022599] rcu: Stack dump where RCU GP kthread last ran:
[40572.028090] Task dump for CPU 1:
[40572.031322] task:python          state:R  running task     stack:    0 pid:554348 ppid:553527 flags:0x00000000
[40572.041356] Call Trace:
[40572.043807] [<ffffffe000616bdc>] __schedule+0x1be/0x4a6

@pdp7
Copy link
Collaborator

pdp7 commented May 18, 2021

@MichaelZhuxx Have you ever seen error occuring with rcu_sched?

@stffrdhrn
Copy link

I am running the same environment, I will see if anything happens overnight. @warthog9 was there any load on the system? I mean are you running any extra apps or services that might be causing it to run out of memory or cause IRQs timeout? Or, is this a default install?

@oaken-source
Copy link

my theory is that these lockups are related to long durations of high I/O load on the sd-card. I've been compiling software on the board for hours at a time, which would trigger these issues. Since I've started compiling software in a tmpfs instead of on the sd card, I've not seen any lockups.

I would need to verify this theory with some sort of I/O benchmark.

@pdp7
Copy link
Collaborator

pdp7 commented May 20, 2021

@oaken-source have you seen this issue happen again with the new Fedora image from this week? Were you using the SD card for storage when this occurs?
https://github.com/starfive-tech/Fedora_on_StarFive#download-the-latest-image

@ryantong1
Copy link

This problem occurs occasionally during network-related operations:

[root@fedora-starfive ~]# ping baidu.com
PING baidu.com (220.181.38.148) 56(84) bytes of data.
64 bytes from 220.181.38.148 (220.181.38.148): icmp_seq=1 ttl=51 time=55.7 ms
64 bytes from 220.181.38.148 (220.181.38.148): icmp_seq=2 ttl=51 time=43.2 ms
�
--- baidu.com ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1002ms
rtt min/avg/max/mdev = 43.208/49.455/55.702/6.247 ms
[root@fedora-starfive ~]# [ 6433.856087] dw_mmc 10010000.sdio1: Unexpected interrupt latency
[ 6541.386406] rcu: INFO: rcu_sched self-detected stall on CPU
[ 6541.392002] rcu:     0-...!: (10 ticks this GP) idle=7ca/1/0x4000000000000004 softirq=49747/49749 fqs=1 
[ 6541.401201]  (t=10528 jiffies g=71205 q=66)
[ 6541.405380] Task dump for CPU 0:
[ 6541.408610] task:systemd-journal state:R  running task     stack:    0 pid:  282 ppid:     1 flags:0x00000080
[ 6541.418507] Call Trace:
[ 6541.420974] [<ffffffe0002036bc>] walk_stackframe+0x0/0xcc
[ 6541.426370] [<ffffffe000c5a938>] show_stack+0x40/0x4c
[ 6541.431423] [<ffffffe000239c5e>] sched_show_task+0x146/0x16a
[ 6541.437067] [<ffffffe000c5b172>] dump_cpu_task+0x50/0x5a
[ 6541.442364] [<ffffffe000c5be80>] rcu_dump_cpu_stacks+0xb2/0xee
[ 6541.448182] [<ffffffe000280098>] rcu_sched_clock_irq+0x6da/0x898
[ 6541.454173] [<ffffffe000289e2e>] update_process_times+0x78/0xa8
[ 6541.460080] [<ffffffe00029894a>] tick_sched_handle+0x36/0x68
[ 6541.465723] [<ffffffe000298f7e>] tick_sched_timer+0x62/0xaa
[ 6541.471280] [<ffffffe00028a934>] __hrtimer_run_queues+0x150/0x250
[ 6541.477352] [<ffffffe00028b476>] hrtimer_interrupt+0xe2/0x230
[ 6541.483089] [<ffffffe000a4ad5a>] riscv_timer_interrupt+0x3c/0x46
[ 6541.489075] [<ffffffe000270afc>] handle_percpu_devid_irq+0x90/0x1c2
[ 6541.495327] [<ffffffe00026b2e6>] __handle_domain_irq+0x8a/0xee
[ 6541.501142] [<ffffffe00073300e>] riscv_intc_irq+0x46/0x74
[ 6541.506534] [<ffffffe0002019d2>] ret_from_exception+0x0/0xc
dw_mmc 10010000.sdio1: Unexpected interrupt latency

another one:

[root@fedora-starfive ~]# [ 6090.322527] [dhd] CFG80211-ERROR) wl_cfg80211_netdev_notifier_call : wdev null. Do nothing
[ 6090.342267] [dhd] CFG80211-ERROR) wl_cfg80211_netdev_notifier_call : wdev null. Do nothing
[ 6090.358598] [dhd] CFG80211-ERROR) wl_cfg80211_netdev_notifier_call : wdev null. Do nothing
[ 6116.677701] [dhd] CFG80211-ERROR) wl_cfg80211_netdev_notifier_call : wdev null. Do nothing
[ 6116.687784] [dhd] CFG80211-ERROR) wl_cfg80211_netdev_notifier_call : wdev null. Do nothing
[ 6116.698126] [dhd] CFG80211-ERROR) wl_cfg80211_netdev_notifier_call : wdev null. Do nothing
[ 6116.707013] [dhd] CFG80211-ERROR) wl_cfg80211_netdev_notifier_call : wdev null. Do nothing

[root@fedora-starfive ~]# [ 6126.869139] dw_mmc 10010000.sdio1: Unexpected interrupt latency
[ 6185.756613] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[ 6185.762545] rcu:     0-...0: (55 ticks this GP) idle=6f2/1/0x4000000000000000 softirq=48733/48735 fqs=3000 
[ 6185.771990]  (detected by 1, t=6002 jiffies, g=69577, q=67)
[ 6185.777539] Task dump for CPU 0:
[ 6185.780755] task:midori          state:R  running task     stack:    0 pid: 2751 ppid:     1 flags:0x00000008
[ 6185.790636] Call Trace:
[ 6185.793093] [<ffffffe000c6e652>] __schedule+0x1ea/0x71c
[ 6185.798301] [<ffffffe000c6ebce>] schedule+0x4a/0xb4
[ 6185.803174] [<ffffffe000201918>] skip_context_tracking+0x0/0x5e

[root@fedora-starfive ~]# 

@archanox
Copy link

I appear to be experiencing this issue too, with the latest image.
It wasn't on for long, I was just extracting a tar file to and from the microSD, and it completely halted.

@lorforlinux
Copy link

I can confirm, It's also happening with the latest image (16th May 2021).
Just like @archanox's case, it wasn't ON for long.

@oaken-source
Copy link

otoh, when actively avoiding I/O by writing everything to a tmpfs instead, I have an uptime under heavy load of almost 5 days.

@archanox
Copy link

archanox commented May 22, 2021 via email

@pdp7
Copy link
Collaborator

pdp7 commented May 27, 2021

otoh, when actively avoiding I/O by writing everything to a tmpfs instead, I have an uptime under heavy load of almost 5 days.

This does seem to indicate it might be related SD card activity.

@pdp7
Copy link
Collaborator

pdp7 commented May 27, 2021

@oaken-source @lorforlinux @archanox @stffrdhrn @warthog9

For everyone having issues, please specify:

  1. what brand and model of SD card?
  2. are you using Fedora image? If so, which image date?
  3. what kernel are you runninig? (uname -r)
  4. please include any errors from dmesg
  5. please indicate the uboot version printed on the console at boot

It seems like this issue may be due to the SD card and we are trying to figure out what patterns there may be to understand the root cause.

@oaken-source
Copy link

oaken-source commented May 27, 2021

@pdp7 Answers to your questions above:

  1. I'm using a SanDisk 128GB Ultra, purchased here: https://www.amazon.com/gp/product/B08GYKNCCP
  2. I'm not running the fedora image, I'm running the latest snapshot of parabola GNU/Linux-libre
  3. The traces above were generated on 5.12-gnu, now I'm running 5.12.6-gnu, rebasing the commits from https://github.com/esmil/linux/tree/starlight-5.12.y on top of linux-libre (https://www.fsfla.org/ikiwiki/selibre/linux-libre/), using the default defconfig from esmil with minor modifications (removing firmware blobs, enabling loadable module support, and disabling the broadcom wifi driver)
  4. kernel output from the lock ups is included in my previous posts on this issue. I include a full new trace generated on 5.12.6-gnu below.
  5. bootloader is stock and unchanged: U-Boot 2021.04-rc4-g5b63233bc6-dirty (Apr 08 2021 - 14:09:59 +0800)

new trace, captured on the serial console:

OpenSBI v0.9                                                                                                                                                                    
   ____                    _____ ____ _____                                                                                                                                     
  / __ \                  / ____|  _ \_   _|                                            
 | |  | |_ __   ___ _ __ | (___ | |_) || |                                                                                                                                      
 | |  | | '_ \ / _ \ '_ \ \___ \|  _ < | |                                                                                                                                      
 | |__| | |_) |  __/ | | |____) | |_) || |_                                                                                                                                     
  \____/| .__/ \___|_| |_|_____/|____/_____|                                            
        | |                                                                                                                                                                     
        |_|                                                                                                                                                                     
                                                                                        
Platform Name             : StarFive VIC7100                                                                                                                                    
Platform Features         : timer,mfdeleg                                                                                                                                       
Platform HART Count       : 2                                                           
Firmware Base             : 0x80000000                                                                                                                                          
Firmware Size             : 92 KB                                                                                                                                               
Runtime SBI Version       : 0.2                                                                                                                                                 
                                                                                                                                                                                
Domain0 Name              : root                                                                                                                                                
Domain0 Boot HART         : 1                                                           
Domain0 HARTs             : 0*,1*                                                                                                                                               
Domain0 Region00          : 0x0000000080000000-0x000000008001ffff ()                                                                                                            
Domain0 Region01          : 0x0000000000000000-0x0000007fffffffff (R,W,X)                                                                                                       
Domain0 Next Address      : 0x0000000080020000                                          
Domain0 Next Arg1         : 0x0000000088000000                                          
Domain0 Next Mode         : S-mode                                                                                                                                              
Domain0 SysReset          : yes                         
                                                                                                                                                                                
Boot HART ID              : 1                                                                                                                                                   
Boot HART Domain          : root                                                                                                                                                
Boot HART ISA             : rv64imafdcsux                                                                                                                                       
Boot HART Features        : scounteren,mcounteren                                       
Boot HART PMP Count       : 16              
Boot HART PMP Granularity : 4096            
Boot HART PMP Address Bits: 36              
Boot HART MHPM Count      : 0               
Boot HART MHPM Count      : 0               
Boot HART MIDELEG         : 0x0000000000000222                                          
Boot HART MEDELEG         : 0x000000000000b109                                          


U-Boot 2021.04-rc4-g5b63233bc6-dirty (Apr 08 2021 - 14:09:59 +0800)                     

CPU:   rv64imafdc                           
DRAM:  8 GiB                                
MMC:   sdio0@10000000: 0, sdio1@10010000: 1                                             
Net:   dwmac.10020000                       
Autoboot in 2 seconds                       
MMC CD is 0x1, force to True.               
MMC CD is 0x1, force to True.               
switch to partitions #0, OK                 
mmc0 is current device                      
Scanning mmc 0:1...                         
Found /extlinux/extlinux.conf               
Retrieving file: /extlinux/extlinux.conf    
2290 bytes read in 8 ms (279.3 KiB/s)       
parabola GNU/Linux-libre riscv64 Boot Options                                           
1:      parabola GNU/Linux-libre 5.12.6-gnu                                             
2:      parabola GNU/Linux-libre 5.12.6-gnu (fallback)                                  
3:      parabola GNU/Linux-libre 5.12.4-gnu                                             
4:      parabola GNU/Linux-libre 5.12.4-gnu (fallback)                                  
5:      parabola GNU/Linux-libre 5.12.0-gnu                                             
6:      parabola GNU/Linux-libre 5.12.0-gnu (fallback)                                  
7:      parabola GNU/Linux-libre 5.12.0-gnu (tracing)                                   
Enter choice: 1:        parabola GNU/Linux-libre 5.12.6-gnu                             
Retrieving file: /initramfs-linux-libre.img                                             
6289924 bytes read in 1304 ms (4.6 MiB/s)   
Retrieving file: /vmlinuz-linux-libre-5.12.6                                            
17600512 bytes read in 3641 ms (4.6 MiB/s)                                                                                                                            [481/5267]
append: ro root=UUID=6e17984d-acf3-41f5-856f-ea15639913ea rhgb console=tty0 console=ttyS0,115200 earlycon=sbi rootwait stmmaceth=chain_mode:1 selinux=0 LANG=en_US.UTF-8        
Moving Image from 0x84000000 to 0x80200000, end=8131a000                                
## Flattened Device Tree blob at fffd0d40   
   Booting using the fdt blob at 0xfffd0d40                                             
   Using Device Tree in place at 00000000fffd0d40, end 00000000fffd9e0d                 

Starting kernel ...                         

Linux version 5.12.6-gnu (andi@parabolabook-BM15) (riscv64-unknown-linux-gnu-gcc (GCC) 10.2.0, GNU ld (GNU Binutils) 2.36.1) #2 SMP Tue May 25 12:38:17 CEST 2021               
OF: fdt: Ignoring memory range 0x80000000 - 0x80200000                                  
Machine model: sifive,freedom-u74-arty      
efi: UEFI not found.                        
Reserved memory: created CMA memory pool at 0x00000000a0000000, size 640 MiB            
OF: reserved mem: initialized node linux,cma, compatible id shared-dma-pool             
Reserved memory: created DMA memory pool at 0x00000000f9000000, size 16 MiB             
OF: reserved mem: initialized node framebuffer@f9000000, compatible id shared-dma-pool                                                                                          
Reserved memory: created DMA memory pool at 0x00000000fb000000, size 32 MiB             
OF: reserved mem: initialized node framebuffer@fb000000, compatible id shared-dma-pool                                                                                          
Zone ranges:                                
  DMA32    [mem 0x0000000080200000-0x00000000ffffffff]                                  
  Normal   [mem 0x0000000100000000-0x000000027fffffff]                                  
Movable zone start for each node            
Early memory node ranges                    
  node   0: [mem 0x0000000080200000-0x00000000f8ffffff]                                 
  node   0: [mem 0x00000000f9000000-0x00000000f9ffffff]                                 
  node   0: [mem 0x00000000fa000000-0x00000000faffffff]                                 
  node   0: [mem 0x00000000fb000000-0x00000000fcffffff]                                 
  node   0: [mem 0x00000000fd000000-0x000000027fffffff]                                 
Initmem setup node 0 [mem 0x0000000080200000-0x000000027fffffff]                        
SBI specification v0.2 detected             
SBI implementation ID=0x1 Version=0x9       
SBI v0.2 TIME extension detected            
SBI v0.2 IPI extension detected             
SBI v0.2 RFENCE extension detected          
software IO TLB: mapped [mem 0x000000009c000000-0x00000000a0000000] (64MB)              
SBI v0.2 HSM extension detected             
riscv: ISA extensions acdfim                
riscv: ELF capabilities acdfim              
percpu: Embedded 18 pages/cpu s33432 r8192 d32104 u73728                                
Built 1 zonelists, mobility grouping on.  Total pages: 2067975                          
Kernel command line: ro root=UUID=6e17984d-acf3-41f5-856f-ea15639913ea rhgb console=tty0 console=ttyS0,115200 earlycon=sbi rootwait stmmaceth=chain_mode:1 selinux=0 L[440/5267]
UTF-8
Dentry cache hash table entries: 1048576 (order: 11, 8388608 bytes, linear)             
Inode-cache hash table entries: 524288 (order: 10, 4194304 bytes, linear)               
Sorting __ex_table...                       
mem auto-init: stack:off, heap alloc:off, heap free:off                                 
Memory: 6744036K/8386560K available (6609K kernel code, 4811K rwdata, 2048K rodata, 2136K init, 321K bss, 987164K reserved, 655360K cma-reserved)                               
SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=2, Nodes=1                              
rcu: Hierarchical RCU implementation.       
rcu:    RCU restricting CPUs from NR_CPUS=4 to nr_cpu_ids=2.                            
        Tracing variant of Tasks RCU enabled.                                           
rcu: RCU calculated value of scheduler-enlistment delay is 25 jiffies.                  
rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=2                            
NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0                                          
riscv-intc: 64 local interrupts mapped      
plic: disable L2 cache irq 128 in plic      
plic: disable L2 cache irq 131 in plic      
plic: disable L2 cache irq 129 in plic      
plic: disable L2 cache irq 130 in plic      
plic: disable L2 cache irq 128 in plic      
plic: disable L2 cache irq 131 in plic      
plic: disable L2 cache irq 129 in plic      
plic: disable L2 cache irq 130 in plic      
plic: plic@c000000: mapped 127 interrupts with 2 handlers for 4 contexts.               
riscv_timer_init_dt: Registering clocksource cpuid [0] hartid [1]                       
clocksource: riscv_clocksource: mask: 0xffffffffffffffff max_cycles: 0x171024e6b, max_idle_ns: 440795202301 ns                                                                  
sched_clock: 64 bits at 6MHz, resolution 160ns, wraps every 4398046511040ns             
Console: colour dummy device 80x25          
printk: console [tty0] enabled              
Calibrating delay loop (skipped), value calculated using timer frequency.. 12.50 BogoMIPS (lpj=25000)                                                                           
pid_max: default: 32768 minimum: 301        
Mount-cache hash table entries: 16384 (order: 5, 131072 bytes, linear)                  
Mountpoint-cache hash table entries: 16384 (order: 5, 131072 bytes, linear)             
ASID allocator disabled                     
rcu: Hierarchical SRCU implementation.      
EFI services will not be available.         
smp: Bringing up secondary CPUs ...         
smp: Brought up 1 node, 2 CPUs              
devtmpfs: initialized                       
random: get_random_u32 called from bucket_table_alloc.isra.0+0x50/0x144 with crng_init=0                                                                                        
clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645041785100000 ns                                                                                 
futex hash table entries: 512 (order: 3, 32768 bytes, linear)                           
NET: Registered protocol family 16          
OF: /soc/i2c@118b0000: could not find phandle 49500000                                  
OF: /soc/i2c@118c0000: could not find phandle 49500000                                  
OF: /soc/i2c@12450000: could not find phandle 50000000                                  
raid6: skip pq benchmark and using algorithm int64x8                                    
raid6: using intx1 recovery algorithm       
starfive_gpio 11910000.gpio: StarFive GPIO chip registered 0 GPIOs                      
SCSI subsystem initialized                  
usbcore: registered new interface driver usbfs                                          
usbcore: registered new interface driver hub                                            
usbcore: registered new device driver usb   
OF: /soc/i2c@118b0000: could not find phandle 49500000                                  
i2c_designware 118b0000.i2c: Using 'clocks' : 49500000 / 1000                           
i2c_designware 118b0000.i2c: Using 'clocks' : 49500000 / 1000                           
i2c_designware 118b0000.i2c: Using 'clocks' : 49500000 / 1000                           
i2c_designware 118b0000.i2c: running with gpio recovery mode! scl,sda                   
OF: /soc/i2c@118c0000: could not find phandle 49500000                                  
i2c_designware 118c0000.i2c: Using 'clocks' : 49500000 / 1000                           
i2c_designware 118c0000.i2c: Using 'clocks' : 49500000 / 1000                           
i2c_designware 118c0000.i2c: Using 'clocks' : 49500000 / 1000                           
i2c_designware 118c0000.i2c: running with gpio recovery mode! scl,sda                   
OF: /soc/i2c@12450000: could not find phandle 50000000                                  
i2c_designware 12450000.i2c: Using 'clocks' : 50000000 / 1000                           
i2c_designware 12450000.i2c: Using 'clocks' : 50000000 / 1000                           
i2c_designware 12450000.i2c: Using 'clocks' : 50000000 / 1000                           
i2c_designware 12450000.i2c: running with gpio recovery mode! scl,sda                   
clocksource: Switched to clocksource riscv_clocksource                                  
NET: Registered protocol family 2           
IP idents hash table entries: 131072 (order: 8, 1048576 bytes, linear)                  
tcp_listen_portaddr_hash hash table entries: 4096 (order: 4, 65536 bytes, linear)       
TCP established hash table entries: 65536 (order: 7, 524288 bytes, linear)              
TCP bind hash table entries: 65536 (order: 8, 1048576 bytes, linear)                    
TCP: Hash tables configured (established 65536 bind 65536)                              
UDP hash table entries: 4096 (order: 5, 131072 bytes, linear)                           
UDP-Lite hash table entries: 4096 (order: 5, 131072 bytes, linear)                      
NET: Registered protocol family 1           
RPC: Registered named UNIX socket transport module.                                     
RPC: Registered udp transport module.       
RPC: Registered tcp transport module.       
RPC: Registered tcp NFSv4.1 backchannel transport module.        
Unpacking initramfs...                                                                                                                                                [358/5267]
Freeing initrd memory: 6140K                
Initialise system trusted keyrings          
workingset: timestamp_bits=62 max_order=21 bucket_order=0                               
xor: measuring software checksum speed      
   8regs           :  2186 MB/sec           
   8regs_prefetch  :  2182 MB/sec           
   32regs          :  2346 MB/sec           
   32regs_prefetch :  2339 MB/sec           
xor: using function: 32regs (2346 MB/sec)   
Key type asymmetric registered              
Asymmetric key parser 'x509' registered     
Block layer SCSI generic (bsg) driver version 0.4 loaded (major 252)                    
io scheduler bfq registered                 
starfive,vpp-lcdc 12000000.sfivefb: starfive_fb_probe                                   
panel_name: tda_998x_1080p                  
of_parse_rd_cmd_info: failed, key=id_read_cmd_info                                      
of_parse_rd_cmd_info: failed, key=esd_read_cmd_info                                     
of_parse_wr_cmd: failed, key=panel-on-command                                           
of_parse_wr_cmd: failed, key=panel-off-command                                          
of_parse_reset_seq:707, error reading reset-sequence, length found = 0                  
panel_name: seeed_5_inch                    
of_parse_rd_cmd_info: failed, key=id_read_cmd_info                                      
of_parse_rd_cmd_info: failed, key=esd_read_cmd_info                                     
of_parse_wr_cmd: failed, key=panel-on-command                                           
of_parse_wr_cmd: failed, key=panel-off-command                                          
of_parse_reset_seq:707, error reading reset-sequence, length found = 0                  
starfive,vpp-lcdc 12000000.sfivefb: select displayer: tda_998x_1080p                    
starfive,vpp-lcdc 12000000.sfivefb: DDR to LCDC                                         
starfive,vpp-lcdc 12000000.sfivefb: sf_fb_set_addr NO use PPx                           
Console: switching to colour frame buffer device 240x67                                 
starfive,vpp-lcdc 12000000.sfivefb: sf_fb_set_addr NO use PPx                           
select displayer: tda_998x_1080p            
tda998x 0-0070: found TDA19988              
random: fast init done                      
dw_axi_dmac_platform 100b0000.sgdma2p: DesignWare AXI DMA Controller, 4 channels        
dw_axi_dmac_platform 10500000.sgdma1p: DesignWare AXI DMA Controller, 16 channels       
L2CACHE: No. of Banks in the cache: 2       
L2CACHE: No. of ways per bank: 16           
L2CACHE: Sets per bank: 1024                
L2CACHE: Bytes per cache block: 64          
L2CACHE: Index of the largest way enabled: 0                                            
Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled                                
printk: console [ttyS0] disabled            
12440000.serial: ttyS0 at MMIO 0x12440000 (irq = 6, base_baud = 6250000) is a 16550A    
printk: console [ttyS0] enabled             
dw-apb-uart 11870000.hs_serial: detected caps 00000000 should be 00000100               
11870000.hs_serial: ttyS1 at MMIO 0x11870000 (irq = 7, base_baud = 4640625) is a 16550                                                                                          
vic-rng 118d0000.trng: Initialized          
loop: module loaded                         
zram: Added device: zram0                   
dw_spi_mmio 12410000.spi2: DMA init failed  
wireguard: WireGuard 1.0.0 loaded. See www.wireguard.com for information.               
wireguard: Copyright (C) 2015-2019 Jason A. Donenfeld <[email protected]>. All Rights Reserved.                                                                                   
libphy: Fixed MDIO Bus: probed              
tun: Universal TUN/TAP device driver, 1.6   
stmmaceth 10020000.gmac: IRQ eth_lpi not found                                          
stmmaceth 10020000.gmac: force_sf_dma_mode is ignored if force_thresh_dma_mode is set.                                                                                          
stmmaceth 10020000.gmac: PTP uses main clock                                            
stmmaceth 10020000.gmac: User ID: 0x59, Synopsys ID: 0x37                               
stmmaceth 10020000.gmac:        DWMAC1000   
stmmaceth 10020000.gmac: DMA HW capability register supported                           
stmmaceth 10020000.gmac: RX Checksum Offload Engine supported                           
stmmaceth 10020000.gmac: COE Type 2         
stmmaceth 10020000.gmac: Wake-Up On Lan supported                                       
stmmaceth 10020000.gmac: Enhanced/Alternate descriptors                                 
stmmaceth 10020000.gmac: Enabled extended descriptors                                   
stmmaceth 10020000.gmac: Chain mode enabled                                             
stmmaceth 10020000.gmac: Enable RX Mitigation via HW Watchdog Timer                     
stmmaceth 10020000.gmac: device MAC address 2c:f7:f1:1b:e3:34                           
libphy: stmmac: probed                      
Micrel KSZ9031 Gigabit PHY stmmac-0:07: attached PHY driver (mii_bus:phy_addr=stmmac-0:07, irq=POLL)                                                                            
xhci-hcd xhci-hcd.0.auto: xHCI Host Controller                                          
xhci-hcd xhci-hcd.0.auto: new USB bus registered, assigned bus number 1                 
xhci-hcd xhci-hcd.0.auto: hcc params 0x200073c8 hci version 0x100 quirks 0x0000002000018010                                                                                     
xhci-hcd xhci-hcd.0.auto: irq 12, io mem 0x104d0000                                     
usb usb1: New USB device found, idVendor=1d6b, idProduct=0002, bcdDevice= 5.12          
usb usb1: New USB device strings: Mfr=3, Product=2, SerialNumber=1                      
usb usb1: Product: xHCI Host Controller     
usb usb1: Manufacturer: Linux 5.12.6-gnu xhci-hcd                                       
usb usb1: SerialNumber: xhci-hcd.0.auto     
hub 1-0:1.0: USB hub found                  
hub 1-0:1.0: 1 port detected                
xhci-hcd xhci-hcd.0.auto: xHCI Host Controller                                          
xhci-hcd xhci-hcd.0.auto: new USB bus registered, assigned bus number 2                 
xhci-hcd xhci-hcd.0.auto: Host supports USB 3.0 SuperSpeed                              
usb usb2: We don't know the algorithms for LPM for this host, disabling LPM.            
usb usb2: New USB device found, idVendor=1d6b, idProduct=0003, bcdDevice= 5.12          
usb usb2: New USB device strings: Mfr=3, Product=2, SerialNumber=1                      
usb usb2: Product: xHCI Host Controller     
usb usb2: Manufacturer: Linux 5.12.6-gnu xhci-hcd                                       
usb usb2: SerialNumber: xhci-hcd.0.auto     
hub 2-0:1.0: USB hub found                  
hub 2-0:1.0: 1 port detected                
usbcore: registered new interface driver uas                                            
usbcore: registered new interface driver usb-storage                                    
i2c /dev entries driver                     
sfc_temp_sensor 124a0000.tmpsensor: probe   
temp(c): 33.82                              
usb 1-1: new high-speed USB device number 2 using xhci-hcd                              
Synopsys Designware Multimedia Card Interface Driver                                    
usbcore: registered new interface driver usbhid                                         
dw_mmc 10000000.sdio0: IDMAC supports 32-bit address mode.                              
usbhid: USB HID core driver                 
NET: Registered protocol family 10          
dw_mmc 10000000.sdio0: Using internal DMA controller.                                   
Segment Routing with IPv6                   
dw_mmc 10000000.sdio0: Version ID is 290a   
NET: Registered protocol family 17          
dw_mmc 10000000.sdio0: DW MMC controller at irq 30,32 bit host data width,32 deep fifo                                                                                          
Loading compiled-in X.509 certificates      
mmc_host mmc0: card is polling.             
Btrfs loaded, crc32c=crc32c-generic, zoned=no                                           
dw_mmc 10010000.sdio1: IDMAC supports 32-bit address mode.                              
cfg80211: Loading compiled-in X.509 certificates for regulatory database                
dw_mmc 10010000.sdio1: Using internal DMA controller.                                   
dw_mmc 10010000.sdio1: Version ID is 290a   
cfg80211: Loaded X.509 cert 'sforshee: 00b28ddf47aef9cea7'                              
dw_mmc 10010000.sdio1: DW MMC controller at irq 31,32 bit host data width,32 deep fifo                                                                                          
platform regulatory.0: Direct firmware load for regulatory.db failed with error -2      
mmc_host mmc1: card is non-removable.       
cfg80211: failed to load regulatory.db                                                                                                                                [237/5267]
dw-apb-uart 12440000.serial: forbid DMA for kernel console                                                                                                                      
usb 1-1: New USB device found, idVendor=05e3, idProduct=0610, bcdDevice= 4.02           
mmc_host mmc0: Bus speed (slot 0) = 100000000Hz (slot req 400000Hz, actual 400000HZ div = 125)                                                                                  
usb 1-1: New USB device strings: Mfr=1, Product=2, SerialNumber=0                       
usb 1-1: Product: USB2.0 Hub                
usb 1-1: Manufacturer: GenesysLogic         
mmc_host mmc1: Bus speed (slot 0) = 100000000Hz (slot req 400000Hz, actual 400000HZ div = 125)                                                                                  
hub 1-1:1.0: USB hub found                  
usb 2-1: new SuperSpeed Gen 1 USB device number 2 using xhci-hcd                        
hub 1-1:1.0: 4 ports detected               
usb 2-1: New USB device found, idVendor=05e3, idProduct=0616, bcdDevice= 4.02           
usb 2-1: New USB device strings: Mfr=1, Product=2, SerialNumber=0                       
usb 2-1: Product: USB3.0 Hub                
usb 2-1: Manufacturer: GenesysLogic         
hub 2-1:1.0: USB hub found                  
Freeing unused kernel memory: 2136K         
hub 2-1:1.0: 4 ports detected               
dw_mmc 10000000.sdio0: Unexpected interrupt latency                                     
mmc1: queuing unknown CIS tuple 0x80 (2 bytes)                                          
Checked W+X mappings: passed, no W+X pages found                                        
mmc1: queuing unknown CIS tuple 0x80 (3 bytes)                                          
rodata_test: all tests were successful      
mmc1: queuing unknown CIS tuple 0x80 (3 bytes)                                          
Run /init as init process                   
mmc1: queuing unknown CIS tuple 0x80 (7 bytes)                                          
dw_mmc 10000000.sdio0: Unexpected interrupt latency                                     
mmc1: queuing unknown CIS tuple 0x81 (9 bytes)                                          
dw_mmc 10000000.sdio0: Unexpected interrupt latency                                     
mmc_host mmc0: Bus speed (slot 0) = 100000000Hz (slot req 10000000Hz, actual 10000000HZ div = 5)                                                                                
:: running early hook [udev]                
mmc0: new high speed SDXC card at address aaaa                                          
Warning: /lib/modules/5.12.6-gnu/modules.devname not found - ignoring                   
mmc_host mmc1: Bus speed (slot 0) = 100000000Hz (slot req 10000000Hz, actual 10000000HZ div = 5)                                                                                
mmcblk0: mmc0:aaaa SD128 119 GiB            
mmc1: new high speed SDIO card at address 0001                                          
 mmcblk0: p1 p2                             
Starting version 248-5.parabola1-parabola   
:: running hook [udev]                      
:: Triggering uevents...                    
:: performing fsck on '/dev/mmcblk0p2'
/dev/mmcblk0p2: clean, 164400/7728656 files, 3819974/31091200 blocks                    
********************** WARNING **********************                                   
*                                                   *                                   
*  The root device is not configured to be mounted  *                                   
*  read-write! It may be fsck'd again later.        *                                   
*                                                   *                                   
*****************************************************                                   
:: mounting '/dev/mmcblk0p2' on real root   
EXT4-fs (mmcblk0p2): mounted filesystem with ordered data mode. Opts: (null). Quota mode: disabled.                                                                             
:: running cleanup hook [udev]              
systemd[1]: System time before build time, advancing clock.                             
systemd[1]: systemd 248-5.parabola1-parabola running in system mode. (+PAM +AUDIT -SELINUX -APPARMOR -IMA +SMACK -SECCOMP +GCRYPT -GNUTLS +OPENSSL +ACL +BLKID +CURL +ELFUTILS -
FIDO2 +IDN2 -IDN +IPTC +KMOD +LIBCRYPTSETUP -LIBFDISK +PCRE2 -PWQUALITY -P11KIT -QRENCODE +BZIP2 +LZ4 +XZ +ZLIB -ZSTD +XKBCOMMON +UTMP -SYSVINIT default-hierarchy=unified)
systemd[1]: Detected architecture riscv64.  

Welcome to Parabola GNU/Linux-libre!        

systemd[1]: Hostname set to <parabola-starlight>.                                       
systemd[1]: Queued start job for default target Graphical Interface.                    
random: systemd: uninitialized urandom read (16 bytes read)                             
systemd[1]: Created slice system-dhcpcd.slice.                                          
[  OK  ] Created slice system-dhcpcd.slice.                                             
random: systemd: uninitialized urandom read (16 bytes read)                             
systemd[1]: Created slice system-getty.slice.                                           
[  OK  ] Created slice system-getty.slice.  
random: systemd: uninitialized urandom read (16 bytes read)                             
systemd[1]: Created slice system-modprobe.slice.                                        
[  OK  ] Created slice system-modprobe.slice.                                           
systemd[1]: Created slice system-serial\x2dgetty.slice.                                 
[  OK  ] Created slice system-serial\x2dgetty.slice.                                    
systemd[1]: Created slice User and Session Slice.                                       
[  OK  ] Created slice User and Session Slice.                                          
systemd[1]: Started Dispatch Password Requests to Console Directory Watch.              
[  OK  ] Started Dispatch Password …ts to Console Directory Watch.                      
systemd[1]: Started Forward Password Requests to Wall Directory Watch.                  
[  OK  ] Started Forward Password R…uests to Wall Directory Watch.                      
systemd[1]: Condition check resulted in Arbitrary Executable File Formats File System Automount Point being skipped.                                                            
systemd[1]: Reached target Local Encrypted Volumes.                                     
[  OK  ] Reached target Local Encrypted Volumes.                                        
systemd[1]: Reached target Paths.           
systemd[1]: Reached target Remote File Systems.                                         
[  OK  ] Reached target Remote File Systems.                                            
systemd[1]: Reached target Slices.          
[  OK  ] Reached target Slices.             
systemd[1]: Reached target Swap.            
[  OK  ] Reached target Swap.               
systemd[1]: Reached target Local Verity Integrity Protected Volumes.                    
[  OK  ] Reached target Local Verity Integrity Protected Volumes.                       
systemd[1]: Listening on Device-mapper event daemon FIFOs.                              
[  OK  ] Listening on Device-mapper event daemon FIFOs.                                 
systemd[1]: Listening on Process Core Dump Socket.                                      
[  OK  ] Listening on Process Core Dump Socket.                                         
systemd[1]: Listening on Journal Socket (/dev/log).                                     
[  OK  ] Listening on Journal Socket (/dev/log).                                        
systemd[1]: Listening on Journal Socket.    
[  OK  ] Listening on Journal Socket.       
systemd[1]: Listening on udev Control Socket.                                           
[  OK  ] Listening on udev Control Socket.  
systemd[1]: Listening on udev Kernel Socket.                                            
[  OK  ] Listening on udev Kernel Socket.   
systemd[1]: Condition check resulted in Huge Pages File System being skipped.           
systemd[1]: Mounting POSIX Message Queue File System...                                 
         Mounting POSIX Message Queue File System...                                    
systemd[1]: Mounting Kernel Debug File System...                                        
         Mounting Kernel Debug File System...                                           
systemd[1]: Condition check resulted in Kernel Trace File System being skipped.         
systemd[1]: Mounting Temporary Directory (/tmp)...                                      
         Mounting Temporary Directory (/tmp)...                                         
systemd[1]: Condition check resulted in Create list of static device nodes for the current kernel being skipped.                                                                
systemd[1]: Starting Load Kernel Module configfs...                                     
         Starting Load Kernel Module configfs...                                        
systemd[1]: Starting Load Kernel Module drm...                                          
         Starting Load Kernel Module drm...                                             
systemd[1]: Starting Load Kernel Module fuse...                                         
         Starting Load Kernel Module fuse...                                            
systemd[1]: Condition check resulted in Set Up Additional Binary Formats being skipped.                                                                                         
systemd[1]: Starting File System Check on Root Device...                                
         Starting File System Check on Root Device...                                   
systemd[1]: systemd-journald.service: unit configures an IP firewall, but the local system does not support BPF/cgroup firewalling.                                             
systemd[1]: (This warning is only shown for the first unit using IP firewalling.)       
systemd[1]: Starting Journal Service...     
         Starting Journal Service...                                                                                                                                  [114/5267]
systemd[1]: Condition check resulted in Load Kernel Modules being skipped.              
systemd[1]: Starting Apply Kernel Variables...                                          
         Starting Apply Kernel Variables...                                             
systemd[1]: Starting Coldplug All udev Devices...                                       
         Starting Coldplug All udev Devices...                                          
systemd[1]: Mounted POSIX Message Queue File System.                                    
[  OK  ] Mounted POSIX Message Queue File Sysystemd[1]: Mounted Kernel Debug File System.                                                                                       
stem.                                       
[  OK  ] Mounted Kernel Debug File Systemystemd[1]: Mounted Temporary Directory (/tmp).                                                                                         
m.                                          
[  OK  ] Mounted Temporary Directory (/tmp)ystemd[1]: [email protected]: Deactivated successfully.                                                                      
[0m.                                        
systemd[1]: Finished Load Kernel Module configfs.                                       
[  OK  ] Finished Load Kernel Module configfs.                                          
systemd[1]: [email protected]: Deactivated successfully.                             
systemd[1]: Finished Load Kernel Module drm.                                            
[  OK  ] Finished Load Kernel Module drmsystemd[1]: [email protected]: Deactivated successfully.                                                                            
.                                           
systemd[1]: Finished Load Kernel Module fuse.                                           
[  OK  ] Finished Load Kernel Module fuseystemd[1]: Condition check resulted in FUSE Control File System being skipped.                                                         
m.                                          
systemd[1]: Condition check resulted in Kernel Configuration File System being skipped.                                                                                         
systemd[1]: Started Journal Service.        
[  OK  ] Started Journal Service.           
[  OK  ] Finished Apply Kernel Variables.   
[  OK  ] Finished File System Check on Root Device.                                     
         Starting Remount Root and Kernel File Systems...                               
EXT4-fs (mmcblk0p2): re-mounted. Opts: (null). Quota mode: disabled.                    
[  OK  ] Finished Remount Root and Kernel File Systems.                                 
         Starting Flush Journal to Persistent Storage...                                
         Starting Load/Save Random Seed...  
         Starting Create System Users...    
[  OK  ] Finished Coldplug All udev Devices.                                            
systemd-journald[125]: Received client request to flush runtime journal.                
dw_mmc 10000000.sdio0: Unexpected interrupt latency                                     
systemd-journald[125]: File /var/log/journal/2c98a39475ea48718d21505dc85ad17e/system.journal corrupted or uncleanly shut down, renaming and replacing.                          
[  OK  ] Finished Create System Users.      
         Starting Create Static Device Nodes in /dev...                                 
[  OK  ] Finished Create Static Device Nodes in /dev.                                   
[  OK  ] Reached target Local File Systems (Pre).                                       
         Mounting /home/andi/parabola/build/ramfs/andi... 
         Starting Rule-based Manage…for Device Events and Files...                      
[  OK  ] Mounted /home/andi/parabola/build/ramfs/andi.                                  
random: crng init done                      
random: 7 urandom warning(s) missed due to ratelimiting                                 
[  OK  ] Finished Load/Save Random Seed.    
[  OK  ] Started Rule-based Manager for Device Events and Files.                        
[  OK  ] Found device /dev/ttyS0.           
[  OK  ] Found device /sys/subsystem/net/devices/eth0.                                  
[  OK  ] Listening on Load/Save RF …itch Status /dev/rfkill Watch.                      
[  OK  ] Finished Flush Journal to Persistent Storage.                                  
[  OK  ] Found device /dev/disk/by-…8-5e6e-4fe0-b657-a5072a6bc68c.                      
         Mounting /boot...                  
EXT4-fs (mmcblk0p1): mounted filesystem with ordered data mode. Opts: (null). Quota mode: disabled.                                                                             
ext4 filesystem being mounted at /boot supports timestamps until 2038 (0x7fffffff)      
[  OK  ] Mounted /boot.                     
[  OK  ] Reached target Local File Systems.                                             
         Starting Rebuild Dynamic Linker Cache...                                       
         Starting Create Volatile Files and Directories...                              
[  OK  ] Finished Rebuild Dynamic Linker Cache.                                         
[  OK  ] Finished Create Volatile Files and Directories.                                
         Starting Rebuild Journal Catalog...                                            
         Starting Network Time Synchronization...                                       
         Starting Update UTMP about System Boot/Shutdown...                             
[  OK  ] Finished Update UTMP about System Boot/Shutdown.                               
[  OK  ] Finished Rebuild Journal Catalog.  
         Starting Update is Completed...    
[  OK  ] Finished Update is Completed.      
[  OK  ] Started Network Time Synchronization.                                          
[  OK  ] Reached target System Initialization.                                          
[  OK  ] Started Daily Cleanup of Temporary Directories.                                
[  OK  ] Reached target System Time Set.    
[  OK  ] Started Daily rotation of log files.                                           
[  OK  ] Started Daily man-db cache update.                                             
[  OK  ] Started Daily verification of password and group files.                        
[  OK  ] Started Daily locate database update.                                          
[  OK  ] Reached target Timers.             
[  OK  ] Listening on D-Bus System Message Bus Socket.                                  
[  OK  ] Reached target Sockets.            
[  OK  ] Reached target Basic System.       
[  OK  ] Started D-Bus System Message Bus.  
         Starting dhcpcd on eth0...         
         Starting User Login Management...  
[  OK  ] Started User Login Management.     
stmmaceth 10020000.gmac eth0: PHY [stmmac-0:07] driver [Micrel KSZ9031 Gigabit PHY] (irq=POLL)                                                                                  
dwmac1000: Master AXI performs fixed burst length                                       
stmmaceth 10020000.gmac eth0: No Safety Features support found                          
stmmaceth 10020000.gmac eth0: No MAC Management Counters available                      
stmmaceth 10020000.gmac eth0: IEEE 1588-2008 Advanced Timestamp supported               
stmmaceth 10020000.gmac eth0: configuring for phy/rgmii-txid link mode                  
[*     ] A start job is running for dhcpcd on eth0 (11s / 1min 39s)                     
stmmaceth 10020000.gmac eth0: Link is Up - 100Mbps/Full - flow control rx/tx            
[  OK  ] Started dhcpcd on eth0.            
[  OK  ] Reached target Network.            
[  OK  ] Started OpenSSH Daemon.            
         Starting Permit User Sessions...   
[  OK  ] Finished Permit User Sessions.     
[  OK  ] Started Getty on tty1.             
[  OK  ] Started Serial Getty on ttyS0.     
[  OK  ] Reached target Login Prompts.      
[  OK  ] Reached target Multi-User System.  
[  OK  ] Reached target Graphical Interface.                                            
starfive,vpp-lcdc 12000000.sfivefb: sf_fb_set_addr NO use PPx                           

Parabola GNU/Linux-libre 5.12.6-gnu  (parabola-starlight) (ttyS0)                       

parabola-starlight login: Adding 16777212k swap on /home/andi/parabola/build/swapfile.  Priority:-2 extents:11 across:17965052k SS
rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
        (detected by 1, t=32120 jiffies, g=12645089, q=32694)
rcu: All QSes seen, last rcu_sched kthread activity 32120 (4312756120-4312724000), jiffies_till_next_fqs=1, root ->qsmask 0x0
rcu: rcu_sched kthread starved for 32120 jiffies! g12645089 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=1
rcu:    Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior.
rcu: RCU grace-period kthread stack dump:
task:rcu_sched       state:R  running task     stack:    0 pid:   11 ppid:     2 flags:0x00000008
Call Trace:
[<ffffffe0000045b0>] walk_stackframe+0x0/0xce
[<ffffffe000666dbe>] dump_backtrace+0x38/0x46
[<ffffffe000666ddc>] show_stack+0x10/0x18
[<ffffffe000028476>] sched_show_task+0x140/0x160
[<ffffffe000667e8c>] rcu_check_gp_kthread_starvation+0xb6/0x10e
[<ffffffe00005963e>] rcu_sched_clock_irq+0x5fa/0x668
[<ffffffe00005f03e>] update_process_times+0xa2/0xca
[<ffffffe000069d42>] tick_sched_timer+0x76/0xec
[<ffffffe00005f4ce>] __hrtimer_run_queues+0x122/0x186
[<ffffffe00005ff54>] hrtimer_interrupt+0xca/0x21e
[<ffffffe000508c7c>] riscv_timer_interrupt+0x32/0x3c
[<ffffffe00004f06e>] handle_percpu_devid_irq+0x80/0x11a
[<ffffffe00004a5a0>] __handle_domain_irq+0x72/0xcc
[<ffffffe00038630a>] riscv_intc_irq+0x38/0x60
[<ffffffe000003182>] ret_from_exception+0x0/0xc
rcu: Stack dump where RCU GP kthread last ran:
Task dump for CPU 1:
task:rcu_sched       state:R  running task     stack:    0 pid:   11 ppid:     2 flags:0x00000008
Call Trace:
[<ffffffe0000045b0>] walk_stackframe+0x0/0xce
[<ffffffe000666dbe>] dump_backtrace+0x38/0x46
[<ffffffe000666ddc>] show_stack+0x10/0x18
[<ffffffe000028476>] sched_show_task+0x140/0x160
[<ffffffe000667630>] dump_cpu_task+0x42/0x4c
[<ffffffe000667ecc>] rcu_check_gp_kthread_starvation+0xf6/0x10e
[<ffffffe00005963e>] rcu_sched_clock_irq+0x5fa/0x668
[<ffffffe00005f03e>] update_process_times+0xa2/0xca
[<ffffffe000069d42>] tick_sched_timer+0x76/0xec
[<ffffffe00005f4ce>] __hrtimer_run_queues+0x122/0x186
[<ffffffe00005ff54>] hrtimer_interrupt+0xca/0x21e
[<ffffffe000508c7c>] riscv_timer_interrupt+0x32/0x3c
[<ffffffe00004f06e>] handle_percpu_devid_irq+0x80/0x11a
[<ffffffe00004a5a0>] __handle_domain_irq+0x72/0xcc
[<ffffffe00038630a>] riscv_intc_irq+0x38/0x60
[<ffffffe000003182>] ret_from_exception+0x0/0xc

At this point, the board is unresponsive.

@lorforlinux
Copy link

Hey @pdp7, these are the answers for my setup

  1. Sandisk Ultra 16GB
  2. Fedora image = Fedora-riscv64-vic7100-xfce-dev-Rawhide-20210516233526.n.0-sda i.e. 16th May 2021 image.
  3. uname -r = 5.10.6-210.0.riscv64.fc33.riscv64
  4. No error log.
  5. bootloader version : 210209-4547a8d

@oaken-source
Copy link

it's been pointed out to me that I'm using the u-boot device trees instead of the ones built with the linux kernel. I'll test if using the correct dtb's fixes the issue and report back.

@oaken-source
Copy link

with the updated dtb's, the system still locks up, but now it does so quietly, without any obvious related kernel output.

@oaken-source
Copy link

I experienced two more lockups after 27 and 22 hours of runtime. the first one was silent, the second one produced

[84359.606936] dw_mmc 10000000.mmc: Unexpected interrupt latency

shortly before locking up.

@oaken-source
Copy link

here is another full output: https://pastebin.com/6b2BgmFd

@pdp7
Copy link
Collaborator

pdp7 commented Jun 11, 2021

@warthog9 @lorforlinux @archanox @oaken-source
We think this issue might be resolved with this new fw_payload.bin.out that @mcd500 has linked in this post:
https://forum.beagleboard.org/t/new-firmware-boot-hang-is-fixed/29986/3

Please try flashing it by selecting "update uboot" in the boot menu.

@pdp7
Copy link
Collaborator

pdp7 commented Jul 7, 2021

@warthog9 @lorforlinux @archanox @oaken-source please try the new Fedora July 7th image

New Fedora image (July 7th) from @tekkamanninja:
https://beagleboard.org/~pdp7/Fedora-riscv64-vic7100-xfce-dev-Rawhide-20210707.n.0-sda.raw.zst 1

sha256sum:
239f439125afce187c34b530490e6e3d095bf98ee6ef03b6d212842f1a4dba32

Forum post: https://forum.beagleboard.org/t/new-fedora-image-july-7th/30217

@pdp7
Copy link
Collaborator

pdp7 commented Jul 15, 2021

@warthog9 @lorforlinux @archanox @oaken-source any problems once using the new Fedora July 7th image?
https://forum.beagleboard.org/t/new-fedora-image-july-7th/30217

@lorforlinux
Copy link

@warthog9 @lorforlinux @archanox @oaken-source any problems once using the new Fedora July 7th image?
https://forum.beagleboard.org/t/new-fedora-image-july-7th/30217

@pdp7 trying it right now will let you know very soon :)

@warthog9
Copy link
Author

I've got 22hrs of uptime right now, I'm going to tentatively say it's likely fixed. I'll let it sit here tonight again and see what it looks like in the morning. If it's fine then I'll do something crazy and throw something CPU intensive at it and let it chug away

@MichaelZhuxx
Copy link
Collaborator

@warthog9 @lorforlinux @archanox @oaken-source any update on this? if it's ok, I will close this issue

@oaken-source
Copy link

it's been ok for me.

@warthog9
Copy link
Author

[root@fedora-starfive ~]# uptime
23:50:54 up 12 days, 8:52, 1 user, load average: 0.00, 0.00, 0.00
[root@fedora-starfive ~]#

I believe it's safe to close this, I think we've got this issue sorted, and if it crops back up it'll likely be something new at this point.

Closing: seems resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants