-
Notifications
You must be signed in to change notification settings - Fork 415
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[dunfell] Sporadic freeze due to MMC issues #1088
Comments
I would firstly try to reproduce this with an io stress program. Once that is easy to trigger, attach a serial and reproduce over serial and see if that freezes too. Try to see if that gives you a bit more insight. Nevertheless, this is most probably a kernel bug so I'd encourage you to raise it there as well. Have you tried to reproduce it on later kernel versions? If it works, we could try to bisect too. |
Hi @agherzan, thanks for the input. I tried to run What had more effect though was adding this dtoverlay (I guess you can do it also with /dts-v1/;
/plugin/;
/ {
compatible = "brcm,bcm2708";
fragment@0 {
target = <&sdhost>;
__overlay__ {
non-removable;
brcm,force-pio;
};
};
}; This disables DMA in the sdhost driver and also disables the periodic mmc_rescan check that seemed to be part of the freeze. This had the effect of having ~3-4 freezes in a day on a few test systems to only having 1 in 2 days. Could be due to the slower I/O in general though due to disabled DMA. I will check what half of the change actually triggered the benefits.
Not yet, we had a bit of trouble updating to 5.x due to some u-boot issues last time which were hard to debug too. Maybe we need to go that route though. Others reported same issues in 5.15 though so I had low hopes. But likely one of the next steps... [1]: See overlay docs here - the |
@sahib Is there anyway you can try to reproduce on RaspberryPi OS? |
Hmm, would be kinda hard. We have quite a bit of changes in our OS that is required for the apps we run on top of it. Rebuilding this on-top RaspberryPi OS would require considerable work and time. Or put differently: Maybe as desperate action if there's no progress, but not as next step. But yes, I consider it likely that we (or some other layer) did something that differs from RPi OS. My plan was to build kernel 5.4 (+ fixing some stuff) and see if the situation improves. Edit: Also we noticed that shutting down a few apps also made the freeze less likely, although we could not identify a specific apps causing problems. Data only showed a "the more the merrier" in regards to freezes. |
The only side I'd suspect are firmware, firmware config and kernel. Are you using uboot? That might also have an impact. |
We tried different firmware versions already, but to no avail. The firmware config is rather basic, as you can see in the first post. No overclocking and no special settings really. Yes, we do use u-boot from meta-updater-raspberrypi. Where do you see a connection there? |
Hmm, it really seems that the device tree overlay I posted above fixed the issue for us. We have only one device doing strange stuff after some days, but we suspect some other (maybe hardware) issue on that one. The symptoms differ a bit (full lockup, not even reacting to sysrq keys, no activity LED) and 5 other devices run fine through the whole time. I'll post again if that changes - really hard to confidently claim the freeze does not happen again. From some testing we also know that the While I'm happy this seems fixed, I would still be grateful for any pointers on why forcing PIO seems to fix this. |
This is an interesting find and I remember seeing this set for RPi4 in the main dt. We would need to ask the rpi kernel team for some pointers here. I wouldn't know without debugging this a bit. |
Also, have you seen raspberrypi/linux#1536? Looks similar. |
Alright, I still need to create a ticket over there. Got sidetracked a little & hope to get time for that soon.
Thanks for the link. Not exactly sure if related, since their issue is rather binary: either it works or it doesn't. In the meantime I captured another log of blocked tasks with only freeze-with-non-removable-but-without-force-pio.log EDIT: In other news: We had some rare freezes still on other devices. Those did not react to SysRq keys and seem to be a different issue altogether. |
Posted the issue now on the kernel issue tracker: raspberrypi/linux#5190 |
Hello,
I'm currently debugging a nasty freeze on a CM3+ with a 4.19 kernel and the dunfell branch of this repo
and I hope to find someone pushing me in the right direction. In my case the freeze symptoms look like this:
but disabling quite some of our applications make the freeze less likely.
System information
System info:
Anything similar on the internet?
Yes. There are plenty of reports of this behavior on the internet (google: "mmc_rescan freeze").
Here's a small selection:
Summary: What all of those have sadly in common: There is no clear fix. Some had luck with up- or downgrading their kernel, some with playing around with
sysctl vm.{swappiness,dirty_ratio,min_free_kbytes}
. It also happens from Pi2 up to Pi4. In my case it's a CM3+, so likely a common (software?) issue and none of the proposed fixes seem to work.Debugging attempts:
When disabling swap, the system seems a little more "stable" during the freeze. SSH connections started before the freeze seem to still run if they do not use any I/O. For example this command in a while loop reported that the memory usage of the system is quite okay:
However, if a program that is executed that needs to be read from disk or needs to write everything blocks. Listing files partly works, likely due to the page cache. This situation also happens for a short time when writing with high throughput (as with
dd if=/dev/zero of=/var/file bs=1M count=100000
), but the system normally recovers after. Therefore it seems that the attached mmc is not just slow, but not working at all. Judging fromdmesg
even journald seems to be stuck and refuses to be killed (which is, afaik, a clear sign that the process is stuck in kernel code):dmesg output:
The exact stack traces vary from time to time, but the one with
mmc_rescan
is common to all of them. The other ones seem to depend on what kind of I/O syscalls were done during the freeze. Also note, this is one freeze with swap enabled (therefore systemd itself was stuck).One could suspect undervoltage here, but it happens also in the (Compute Module IO Board)[https://www.raspberrypi.com/products /compute-module-io-board-v3] with the official power supply (as well as in our own custom hardware).
Questions
and upgrading will be hard for us for various reasons.
Also thank you all for the work on this layer. Hope this is the (a?) right place to ask.
The text was updated successfully, but these errors were encountered: