-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Interrupt collision between smsc95xx and USB storage drivers under heavy load #9
Comments
I'm seeing this issue too. Possible regression seeing as I don't recall having this problem before, despite having downloaded the same file before. With the latest files, it happens every time I download the file in question. |
You might want to put a serial tty on there to capture the errors. Someone put a screenshot up in the forum of what sounds like the same issue: |
I have seen a kernel panic from dwc_oth driver when copying files from network. Strangely the same experiment doesn't fail at work (or on machine of the colleague who knows this driver best). I had serial connected so got a call stack. Not sure if this is the same issue. Need a test case that can be made to fail on colleague's setup.
|
We may have two separate bugs then, as that doesn't look that familiar. I'll reconnect the UART and see if I can recreate the USB heavy load one. (Hexxeh pointed out on IRC that current draw could be a factor, I agree but I can only measure this if I power it via the GPIO pins - does this skip the polyfuse?) |
Pretty sure I recall reading somewhere that it /does/ indeed bypass the polyfuse. |
My Pi is in transit but I thought I would grab the Debian image and run ksymoops to investigate some of the stacktraces being posted only to discover there is no System.map on the image. It would be REALLY handy to have a System.map included with the default image. Otherwise we all have to compile our own kernels and trigger the crashes ourselves to debug these things. |
@shirro Good point. I'll include System.map with next github update. I think this is the map from latest github firmware. I think this is the map from latest debian firmware. (I believe they are the same code, but were built on different machines, so the offsets are slightly different) |
Okay the screenshot has: If you use kernel_debug.img (from github) instead of kernel.img you should get stacktrace with function names. |
ksymoops looks to be well deprecated since the 2.4 days since the kernel usually prints out the symbols these days. I must be getting old. Perhaps we need that on by default? I just grepped the number out of a pastebin mozzwald put on irc and it is DWC_MEMCPY as well. Perhaps having the html docs in there will not be such a bad thing after all :-) |
This thread http://www.raspberrypi.org/forum/troubleshooting/kernel-panic-on-concurrent-network-and-usb-storage has a screenshot of the kernel panic I've uploaded two incidents of the panic where I was transferring data to or from a usb attached hard drive on the pi |
Can anyone rule out a 5V power supply issue |
I'm using a 5V 1A HTC charger with high quality usb cable. Does that count? |
If you've measured the voltage between TP1 and TP2 then yes... |
4.75V at full load (two usb devices, ethernet, and hdmi), and error still occurs |
Here is boot log up to kernel panic while trying to download to USB hard drive: http://pastebin.com/u4C98Tfq My current setup is:
|
I use a 5V 1A supply, and measured voltage between TP1 and TP2 is around the 4.84V mark before, during and after. Fluctuates by 10mV or so during load. Unfortunately, I think some of the voltage drop is in the cable itself - 0.20V+ - direct voltage at the adapter is around 5.1V but I only measured that very early on. What would be the sort of voltage drop that would be worrying? 4V? 4.5V? 4.6V? |
swapped for another charger, got 4.8 across tp1/tp2 and error still occurred |
Well I believe USB quotes 5% so 4.75V is the limit. I would expect 4.84V to be fine, so I think this isn't (5V) power related. The guy who knows most about this driver (although this driver is written by synopsys, so noone at Broadcom knows much about it) is going to try and reproduce this with an external USB drive. Hopefully he'll be able to see it fail. I've seen the failure at home (copying from NFS mounted drive over network - no USB hard drive involved). But running exactly the same test on work's network didn't fail (and the driver guy couldn't reproduce it). Perhaps the USB drive is a better way of provoking it. |
I added symbols to the oops from @mozzwald |
To completely rule out PSU issues, maybe add an extra capacitor, so the voltage is more stable - 220 uF should be ok, and not trigger the fuse (i guess). 4,8 volt is a voltage drop equal to 200 mV, which is -4%, and that could be close to a edge of a +/- 5% limit. Think: a relatively long thin wire on the RPi PCB to the BCM2835 + a large current when the oscillator creates a clock impulse = the BCM2835 creates a relatively large voltage drop over the wire, so the 4,8 volt at the power connector now becomes maybe 4.6 volt at the BCM2835, which is too low. |
Added 220 uF capacitor to input power as suggested by @larsth, problem persists. Also, tried changing power source of pi to be 5V 2.1A and USB hub to be 5V 1.5A, problem persists. |
Someone on the forums claims that constantly dropping caches works around the issue: while true ; do echo 3 > /proc/sys/vm/drop_caches ; sleep 1 ; done & |
It might work but it doesn't mean it is the solution. If I am reading the code correctly the usb driver does a memcpy to align some data to an 8 byte boundary if DMA is enabled and sometimes it is accessing memory it should not. It tests for allocation failure so perhaps the length is wrong. Needs some printk I think. I think you could load the usb driver as a module with a parameter to disable dma and that would stop this code ever being executed but that wouldn't really be an answer either. We have the source so there is no real need to guess. |
Two things.... |
Can anyone confirm whether: helps? Whilst not a solution, it is a very useful piece of data if it does work around the problem |
I'll have a go - just dd'ing the debian image fresh to my SD. |
13/04 debian image, freshly dd'd to a 2Gb SD
Same sort of kernel error (Also, the serial logging of kernel panics seems to be at 115200baud, regardless of cmdline.txt settings. Is this set somewhere else? I can capture the bootup, but I was using 9600 to do so.) |
From Gray (not directly in response to you, but this question has been asked before): An awful lot of what is printed during the boot sequence is output by the kernel during initialization - i.e. during the set-up of devices that are later used to support the operating system implemented on the root filie system. One of the classes of devices that need to be set-up are terminal (tty) devices - so it kind-of follows that the thing being output to during this kernel initialization process isn't really a tty device. The kernel calls it (well them actually) a 'console'. The kernel command line allows you to identify and set the baud rate for these consoles and kernel output goes to them all (e.g. to the HDMI framebuffer console and to the UART console). Each console normally ends up being presented as a separate tty device in /dev. Once the operating system gets hold of the devices the kernel has left it, it configures them and uses them as it sees fit. In our case we do the standard thing of running a shell on just about any tty we can find. This is implemented in the file that controls what we do when control is first passed to the operating system - /etc/inittab. In /etc/inittab each tty is read by a program 'getty' in its own process. This explains why, once you get to a log-on prompt, [1] the baud rate might change; and [2] the output is no longer the same as it is on other console/ttys. (You may have noticed that you can log on separately to a shell over the UART and a different one over the HDMI/keyboard.) So, in short, edit /etc/inittab and change |
while true ; do echo 3 > /proc/sys/vm/drop_caches ; sleep 1 ; done & This actually makes the problem worse for me. Running it then trying to download file to USB device cause kernel panic instantly. Without it the file will download for a while before kernel panic. |
memcpy? If a device driver in kernel space uses plain C memory copying from user space, instead of using the copy_from_user(9) function, then you has maybe found the bug we is searching for. Very long list of where you can find the "copy_from_user" word in the kernel : http://lxr.free-electrons.com/ident?i=copy_from_user I know that a large part of the USB stuff is in user space (AFAIK), but some of it is of course in kernel space. |
No, unfortunately it updates from his own repository so it can lag behind. On 31 May 2012 17:24, guisacouto
|
Hexxeh's repo is up to date now. |
I already updated, however it doesn't boot properly i think. Here is an image of where it stops: This is all really odd.. I guess a kernel panic could be ok if I were out of memory since there is no swap, but in this case edit: I'm not connected with ethernet, only wireless, but I think that in kernel_debug it doesn't load the driver module |
guisacouto |
oh ok, will see how kernel_debug goes tomorrow. I hope it gives some clues |
If the kernel_debug doesn't have devtmpfs it may fail. |
@pepedog: thanks for the heads up regarding CONFIG_DEVTMPFS and udev. Even Debian sid is only using udev 175, so that requirement hadn't cropped up. It sounds like it would be worth enabling. |
I'm not sure if this is the same issue or not. Please tell me if it is different. I've been thinking a bit about this problem while downloading torrents (heavy network+usb storage), and I thought that maybe giving transmission an higher nice, so it has a lower priority could help. This way it wouldn't take all the cpu when it's needed by some system process or something. This did kind of help. Know I'm not getting a kernel panic, and the system keeps running, but "usb-storage" crashes! The dmesg is here: http://pastebin.com/Y4mnP709 |
rewolff, Your PL2303 issue is probably different. There are reports that the prolific PL-2303X has the same vendor ID and product ID as the older PL-2303. I've seen lsusb report a pl-2303 with MaxPacketSize of 64 and suspect its a 2303x. Running a x64 3.1.10 kernel I had a problem where long transfers experienced dropped data. Replacing the device with a FTDI FT232 eliminated the error. |
Been doing some research.. smsc95xx is just an ethernet chip right? |
After some updates, I'm getting a different kernel panic (a lot shorter in output), when downloading+usb storage. |
In that case, please check the following:
** [email protected] ** http://www.BitWizard.nl/ ** +31-15-2600998 ** |
@guisacouto i'm getting the exact same issue with OpenELEC. |
More people are running into this issue, so am I. It was reported with some links to threads here: raspberrypi/linux#56 |
I had the issue only using wifi. I workarounded with an access point in client mode that let me connect the RaspberryPi via ethernet but still use wifi. |
smsc95xx.turbo_mode=N to /boot/cmdline.txt fixes this problem !!!! |
It doesn't. It's still happening here with turbo mode disabled. |
Just to report, I had exactly the same error and I solved with:
Before that edits, I experienced this issue about twice or more times every hour, while using my Raspberry to do lot of network data transfers (file sharing at 250KB/s circa) and very very frequent SD file reads/writes. Never got a kernel panic, btw. After that edits, I have not experienced any problem at all for two days now. |
@benosteen |
I can't comment on this bug as I'm not in a position to fire up a RasPi and On Saturday, 20 July 2013, popcornmix wrote:
|
See: https://www.raspberrypi.org/forums/viewtopic.php?f=29&t=136445 firmware: IL ISP: Correct RGB to YUV matrices, and ignore code side info firmware: MJPEG encode: Handle stereoscopic images See: https://www.raspberrypi.org/forums/viewtopic.php?f=43&t=138325&p=918041 firmware: IL Camera: Change unspecified colour space to being JFIF See: raspberrypi/userland#78 firmware: OV5647: Option to configure auto lens shading to use potential fix firmware: arm_loader: Factor out DT support into arm_dt See: raspberrypi/linux#1394 firmware: arm_ldconfig: Switch to using arm stubs generated from tools/mkimage firmware: arm_ldconfig: Support loading arm stubs from file See: #579
Steps to reproduce:
This will at some indeterminate point freeze the system with kernel panics from the USB storage driver - "... not syncing: Fatal exception in interrupt" and kernel errors from the ethernet driver : "kevent may have dropped the interrupt."
Suggested means to replicate step 3)
If rootfs is on USB, apt-get install'ing a group of packages, apt-cache search and so on are good ways to uncover this collision.
Otherwise, searching or grepping through a reasonable number of files on the USB is enough (find . | xargs grep -i "foo") for example.
It is hard to capture this error, as the kern.log doesn't sync the errors to disc, and the errors flash by too fast on tty to see them with any clarity.
Recreated with latest kernel + UAS built in and new modules and with kernel modules from 13/04 - with rootfs on USB and with the stock rootfs on SD. Having the rootfs on SD makes it more difficult to simulate the type of storage demand required to replicate the bug however.
The text was updated successfully, but these errors were encountered: