-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fedora Image hung / froze when left on for prolonged period of time #34
Comments
Happened again overnight, though I got slightly more debugging information this time. It looks like the rcu_scheduler runs out of scheduled time, and OOMs the kernel, which is obviously a very odd state to get into Sadly didn't have the serial line attached overnight so I don't have the text dump, but that's a picture of what was on the monitor when I checked it this morning |
Thanks for the additional information. Could you reword the title of this issue to say that Fedora crashes overnight? I think the meaning of "wander off" does not translate well. |
Might be related to starfive-tech/linux#18 (comment) |
I'm capturing the kernel output through a serial interface. Will post a full trace here if it happens again. |
It is happening again. The first symptom is always this:
Then, a while later, this follows:
And then, even later, it locks up. I didn't leave it to lock up completely, because I couldn't afford any file system integrity damage at the moment, but I will leave it to lock next time. |
Another example, currently ongoing:
Update:
|
another one:
|
@MichaelZhuxx Have you ever seen error occuring with rcu_sched? |
I am running the same environment, I will see if anything happens overnight. @warthog9 was there any load on the system? I mean are you running any extra apps or services that might be causing it to run out of memory or cause IRQs timeout? Or, is this a default install? |
my theory is that these lockups are related to long durations of high I/O load on the sd-card. I've been compiling software on the board for hours at a time, which would trigger these issues. Since I've started compiling software in a tmpfs instead of on the sd card, I've not seen any lockups. I would need to verify this theory with some sort of I/O benchmark. |
@oaken-source have you seen this issue happen again with the new Fedora image from this week? Were you using the SD card for storage when this occurs? |
This problem occurs occasionally during network-related operations:
another one:
|
I appear to be experiencing this issue too, with the latest image. |
I can confirm, It's also happening with the latest image (16th May 2021). |
otoh, when actively avoiding I/O by writing everything to a tmpfs instead, I have an uptime under heavy load of almost 5 days. |
The other thing I'm encountering is that it soft bricks the image. And I
have to reflash the Fedora image onto it.
I'm currently testing the previous Fedora image to see if that has the same
issue.
|
This does seem to indicate it might be related SD card activity. |
@oaken-source @lorforlinux @archanox @stffrdhrn @warthog9 For everyone having issues, please specify:
It seems like this issue may be due to the SD card and we are trying to figure out what patterns there may be to understand the root cause. |
@pdp7 Answers to your questions above:
new trace, captured on the serial console:
At this point, the board is unresponsive. |
Hey @pdp7, these are the answers for my setup
|
it's been pointed out to me that I'm using the u-boot device trees instead of the ones built with the linux kernel. I'll test if using the correct dtb's fixes the issue and report back. |
with the updated dtb's, the system still locks up, but now it does so quietly, without any obvious related kernel output. |
I experienced two more lockups after 27 and 22 hours of runtime. the first one was silent, the second one produced
shortly before locking up. |
here is another full output: https://pastebin.com/6b2BgmFd |
@warthog9 @lorforlinux @archanox @oaken-source Please try flashing it by selecting "update uboot" in the boot menu. |
@warthog9 @lorforlinux @archanox @oaken-source please try the new Fedora July 7th image New Fedora image (July 7th) from @tekkamanninja: sha256sum: Forum post: https://forum.beagleboard.org/t/new-fedora-image-july-7th/30217 |
@warthog9 @lorforlinux @archanox @oaken-source any problems once using the new Fedora July 7th image? |
@pdp7 trying it right now will let you know very soon :) |
I've got 22hrs of uptime right now, I'm going to tentatively say it's likely fixed. I'll let it sit here tonight again and see what it looks like in the morning. If it's fine then I'll do something crazy and throw something CPU intensive at it and let it chug away |
@warthog9 @lorforlinux @archanox @oaken-source any update on this? if it's ok, I will close this issue |
it's been ok for me. |
[root@fedora-starfive ~]# uptime I believe it's safe to close this, I think we've got this issue sorted, and if it crops back up it'll likely be something new at this point. Closing: seems resolved. |
Sadly almost nothing to include with the report, left the current April 19th Fedora image, kernel 5.10.6+ running overnight.
System apparently wandered off and just stopped responding, dropped network and was unresponsive on serial port or keyboard/hdmi console. There was no output on the serial console indicating the the kernel had hung or crashed, it was as I had left it overnight.
Ethernet port was still blinking in response to external network traffic, but was not responding to it's expected IPv4 address.
I had gpios 0, 2, 20 set high to drive an RGB LED, and gpio 46 set for input from the kernel (I've got a button set on it).
The text was updated successfully, but these errors were encountered: