-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error message "USB get_status request failed" after a few hours working fine #503
Comments
Which serial device type are you using? I suspect it strongly depends on serial device type and/or android vendor. In my home automation I replaced pl2303 with ftdi as these stopped working after some time and only power toggle made them working again |
Thanks for the quick answer. |
Which driver is used in this library? |
Hi Kai, I am having a similar issue as Veigidio, the device connects but after a random amount of time(time varies with how the device is connected, ie a hub or adapter) it drops a packet and then we get USB get_status request failed or ControlTransfer in test connection fails. We are using an STM32L4 as a USB CDC ACM device into an Android device. Thanks, |
Hi everyone, @kai-morich . After months of digging, I think I might have explanation for this issue and the cascading failures. I'm running a similar issue to @vegidio and @Thaejaesh-S with similar device setups. The root issue might be requesting a read bulk transfer (from device to host) when there's no data to be sent to the host. DescriptionUsing Bulk Write RequestsI have no issues/errors sending a bulk write request, regardless of payload size. Bulk Read RequestsFor some reason the bulk read requests are broken up consistently, starting at the very first byte, then sending the rest of the payload in a subsequent transfer. Then, an additional request, which always times out, and almost always a control transfer request follows. Since the control transfer request generally succeeds, no IOException is thrown, but if the timeout isn't sufficiently long enough, say 2s, I wouldn't be surprised if the IOException would be thrown. I believe I can reproduce the issue with the Lastly, I don't know the reason behind the segmentation, but I can account for every byte in the bulk read request. An abatement would be to design around this requesting bulk read transfers of a determinate length using fixed, length-prefixed, or newline terminated messages. Submitting bulk transfer read requests if there's still data to be received. My questions are
Next StepsLines 189 to 191 in a9c835b
I'm going to test using a control transfer to get the status of the endpoint I'm using, then determine whether the transfer to should be made or not. Failing that, I'll re-implement the read method within If anyone has any feedback, I'd greatly appreciate it. |
When using the |
Using an infinite timeout for reads would require an architectural change that I cannot commit to, and asynchronous writes through Unless you're proposing to instantiate another I don't have evidence to support the claim that there isn't an issue with bulk transfer reads when no data is available. I expect bulk transfer requests (BTR) to terminate the end of a complete transfer with a zero length packet or an ACK. If a BTR is made when there is no data to be read, I expect a NAK. Receiving an error code like "ETIMEDOUT" makes me think that the bulk transfer request resulted with a STALL or ERR response from the device. @kai-morich , do you know of any command line tools where I can view the protocol level USB traffic that this library is using? I'd like to verify what ioctl is sending/receiving from the USB device. Unfortunately, I'm unable to see packet traffic using How does the library access the underlying serial port? The FAQ states that the |
Try with a direct (blocking) write request. If data is smaller than USB packet size you will not notice any blocking.
Without rooting an Android device I am not aware of any possibility to see the low level USB traffic. There is no underlying serial port. It's all implemented on top of USB requests. |
This is how my application software is currently writing to the USB device. Unfortunately, this doesn't provide a root cause to the failure mode.
Now I understand the preference for infinite timeout reads or using requestWait/queue. Because Android's lack of error propagation, we have no option but to poll on a separate thread and hope there aren't any other failures Android is hiding, and if any exceptions are thrown, somehow restart the UsbDeviceConnection to return to nominal operation. This would explain why some individuals implemented their own ioctl function to support bulk transfers. This is tough. :(
|
I was wrong about ETIMEDOUT. According to the ancient tomes of Android hosted docs, it's a legit error code due to an arbitrary assigned timeout. The Host controller would experience/see no other error. |
Just a pulse update. I've got the USB protocol analyzer. I've had to shift gears for a separate project, but I'll be back on this in the new few days or so. |
Hi @jhlink @kai-morich Any update on "USB get_status request failed" and "control transfer failed" as we still get these error in the latest library though after few hours and when keeping the device idle and initiate a command. |
@Mohammedbinnazar , I should be back on this later this week. Hopefully, today once I dial in the SW changes for unrelated dev. |
@kai-morich I saw now that you posted a follow-up question, but I didn't respond. I apologize for that, but shortly after I opened this issue I had a serious family emergency that kept me away from work, and computers in general, for a couple of months. In any case, I'm back now and it seems that there are more people facing the same type of problem that I am. So, to answer your question and give more hints to anyone following this issue and hopefully we can find a solution, the driver that I'm using is Thanks again for your assistance. |
Ok, now that I'm back investigating this I found something yesterday that might give some hints of what's happening, at least in my case. First I would like to share a piece of my code:
val driver: UsbSerialDriver by lazy {
val allDrivers = UsbSerialProber.getDefaultProber().findAllDrivers(usbManager)
if (allDrivers.isEmpty()) error("No serial USB devices!")
// My USB device is from Ingenico
allDrivers.first { it.device.manufacturerName.equals("ingenico", ignoreCase = true) }
} and then I get the val device get() = driver.device The code above works great when I start using my app, but when it stops responding (I've put some code to detect that), I try to reconnect. However, this time the same code above to find the driver associated with my USB device doesn't find any result for The list So, is it safe to assume that whatever is causing the loss of connectivity with the device is not a problem in the source code? I mean, when I lose connectivity, I cannot even try to reconnect to the device because it no longer appears in the list of available drivers, but the other USB device is still there. It gives me the impression that my Android phone and my code are working as expected; it's the USB device that has entered an unrecoverable state somehow. |
Hi @jhlink Hopefully you must have started working on this project last week. Any updates on it ? |
Folks, I want to find a solution for this problem like everyone else in this thread, however let's not act as if this is a paid service and the people working on this own us anything. I'm sure that whenever a solution is found an update will be posted here. |
@kai-morich Status update. I spent the day reproducing the failure with a USB protocol analyzer. I seem to have found a fix for the issue, but @vegidio it would be great if you could test this to prove that it works. (Hope you're family is doing okay. ) OverviewReproduced the failure using the example project in the repository. It was modified to force the failure through heavy read/write throughput operations. An IO exception occurred within seconds, which indicated packet corruption due to concurrency issues. Using
After applying the modifier, this setup was run for 45+ minutes with no other connection loss or failures observed on the application, USB traffic, or on the device. @vegidio , could you apply the synchronized flag on your MethodologyAttempt 1 - get_status request failure with Legacy codeInitially, I used the USB Protocol Analyzer (USBPA for short) on the legacy software I'm maintaining. I was able to reproduce the USB
Despite the invalid PID error above, the root cause for the unexpected SET_LINE_CODING control transfer is likely due to a bug or logical failure in the application, not in the library or the hardware layer. Summary: Do not call Forcing the developer to change port configs (if the default 9600 baud doesn't apply) after port opening seems to allow for a category of software bugs. Attempt 2 - force failure with example codeIn order to prove there is a software bug in the library, I've modified the example project in the library. I've removed everything UI related to test the core issue. A dedicated HandlerThread was created for writing data to the Without incorporating the After adding the modifier and locking inter-thread write access to the UsbSerialPort, I didn't experience this issue even after aggressively high, bidirectional throughput. Summary: UsbSerialPort is not thread-safe. If read/write operations are placed in a separate thread, a mutex lock will be required on the port itself (not their buffers) to avoid undefined behavior caused by concurrency issues. Reviewing the example code for SimpleUsbTerminal, I've observed a similar usage of An argument can be made that a mutex lock may not be necessary if there is only one thread that handles serial writes. Given a bona fide RTOS, I'd agree, circumstantially. However, Android is not an RTOS, and the OS can kill/restart any thread at any time unless bound to a service or tied to the UI. Given that I, as the developer, cannot predict or guarantee thread termination/restarts, a mutex lock is required. ConclusionPreliminary tests indicate that the failure is due to concurrency issues in accessing UsbSerialPort, which is apparently not thread-safe. The issue isn't necessarily a software bug in Android's API, although greater detail in error propagation in the USB API would alleviate the need to buy a $3k+ tool to root cause USB failures. If any developer is splitting read/write operations into dedicated threads, they should take care to incorporate mutex locks. Otherwise, if the developer is handling all read/write operations on the main thread, the application won't see failures due to concurrency issues, though there will likely be failures due to high load on the main thread. Suggestions
Next StepsI'll incorporate the synchronized change in legacy code, and after a ground-up redesign, I will test the overall solution to see if the I'll wait to see if proposed change works for @vegidio . |
Thanks for this comprehensive analysis. 'write' is internally serialized with mWriteBufferLock so you found issues with concurrent Connection.controlTransfer? Do these only happen for setParameters that might reconfigure the serial device or also for other methods like set DTR? Concurrent read and write should still be possible. Did you face issues with a single write thread and another (blocking) read thread like SerialInputOutputManager? |
@jhlink Thank you very much for looking into this. I will test on Monday and come back with a feedback soon! |
Of course. @kai-morich
Yes, it seems that way, though I don't know what the desired behavior of Presently, I can see that if reads and writes were executed using SerialInputOutputManager, then access control is built into the flow of control; one can't run without the other completing as they alternate. I haven't used methods like set DTR, as it seems these are dedicated pins outside of D+/D- pins. I would imagine this would mitigate any issues since data flow direction is known. Yes. It seems that sending the control transfer during a bulk transfer causes cascading failures. I don't have the traffic logs with me, but IIRC the behavior is as follows. SET_LINE_CODING
Do you have an example project that demonstrates that both are possible without mutexes on the UsbSerialPort? I can run the example code on Monday and collect/post the USB traffic logs for you here. I can also share the example code that I've written on Monday, and it behaves like you've said, granted with timeouts for both read and write. 2.5sec is a very long time, though admittedly increasing that timeout to 3 or 4 seconds works. Though I'm hesitant on making that jump without assurance that the timeout will not need to increase again in the future. Depending on the implemented communication protocol, the packet timeout for a communication spec may not allow for 3 or 4 seconds. I'm unaware if the write thread was blocking the read thread. Thread access isn't clearly shown in the USB traffic logs. However, regarding the application code, there are problems with it that are unrelated to the library. It wasn't implemented without an understanding of multithread environments. It could be that there are two write threads or poor logic that directed reinitializing the dependencies for SerialInputOutputManager. |
@jhlink Ok, I did the change that you suggested and used synchronize in all places where Before this change, my app could run for around 16 hours before losing connection with the USB device, but there was a case where it ran for almost a full day (23 hours to be precise) before it lost connection. However, this happened only once; most of the time the connection is lost around the 16 hour mark. Now, with the change that you suggested, my app is already running for 24 hours without losing connection, which is 1 hour more than the max amount of time it ever remained connected 🎉 I'm feeling carefully optimistic, but I think you found the source of the problem. I will let my app run for 24 hours more and I will report again tomorrow. 🙂 |
That's wonderful @vegidio !
Thank you, but we're not out of the woods yet. Would you be opposed to running it for 48 hours continuously instead of 24? If we assume mean time to failure is approximately 23 hours, I'd like to try for 48 hours for at least two opportunities that the failure could occur again. Ideally, if the test time were much longer, I'd be able to provide a high degree of confidence in reliability, say running it for a week straight, but I can settle for 48 hours if this isn't convenient. |
I can definitely leave my device on for much longer. However, I hate to write this, but less than 2 hours after my last update I lost connection with the USB device 😢 There are only two places in my code where I call |
@vegidio , I'll share my example code that forced the issue tomorrow. Could you review it and see how your code differs from mine? My guess is that your code is proprietary and can't be shared, which leaves us to replicate it through contrived example code. |
@jhlink Yes, you're right, the project that I'm working on is proprietary. However, the main class that makes the connection with the USB device and also read/writes data, comes from an open source Android library that I created to help me during the development of my projects. You can find this class called You will notice that I have a variable But just to make sure that I was covering all possible edge cases, I also added Sorry, but this is the only part of my code that I can share. However, if you could share your example I will compare with the rest of my project and see if there's something else I'm missing. |
Terminal style applications like the example at the library or Serial USB Terminal do With this pattern, Your aditional lock on What is the actual read timeout your are using? With too small timeouts I got strange errors. |
Sorry. I should state the context for the reads and writes. I've been basing my conclusions on the library source code. @vegidio , I don't have experience in Kotlin, but based on the general approach, do you know if you've type-aliased [Edit] Yeah. No worries. I understand getting outside help on proprietary code is difficult. We can work around it. :) |
@jhlink You're right, However, if you unselect the other Kotlin platforms and leave only JVM, you will see that it's still supported there: There are actually two ways to use synchronize in Kotlin: we can use the In my first attempt I tried to use Besides that, after your suggestion I started to read more about how |
@vegidio I see. Just wanted to make sure the compiler wasn't ignoring the Synchronized flag. @kai-morich , If you have any suggestions on forcing the issue, I can run the tests and gather USB traffic. |
I have a similar problem now, after calling the UsbDeviceConnection controlTransfer function, it keeps blocking despite setting timeout. This is the issueTracker I submitted https://issuetracker.google.com/issues/298700517 |
The linux kernel of the android device I am using now is 5.4.19, there is a problem, the usb driver code of linux is submitted, it seems to be caused by this reason, everyone take a look |
Well. That's a very illuminating and useful share, @notcaremath . Thank you. The android os and kernel I'm using is v11 on kernel 5.4.
Speculating here, but this may explain why the synchronized qualifier seemed to resolve the issues I've seen. The bulk transfer requests are synchronous, and I believe performing an additional synchronous transfer, be it a control or bulk during an existing transfer, would result with undefined behavior. I think this might be why developers that have opted to implement asynchronous requests using queue and requestWait don't see this issue. Speculating here, but guessing that queueing USBRequests would result with a call to I think there are 4 paths ahead. Options
That patch was applied in kernel v5.15.x, which is used in Android 13 and higher. @vegidio @kai-morich @vegidio , if you're on Android 13 with kernel v5.10 or lower, could you update your kernel to v5.15.x and test with the same |
in the library
As I use I use a pool of Android devices from 4.3 to 13, but never checked the kernel version, neither observed different behavior. As various of the devices run CyanogemMod or LineageOS they come with more plain Linux + Android and might behave differently than Android versions overoptimized by big vendors. |
It is speculated that such a problem is more the unreasonable implementation of the USB protocol in the usb terminal, such as processing usb control commands in the hardware interrupt, and there is no timely response. The host is blocked. |
Ahh... yes. Was there a reason the timeout for requestWait wasn't supplied? Since API lvl 26, requestWait has that option .
Okay. Feel free to use the example project I posted previously.
Were these on emulators or on actual hardware? I don't think this issue would appear on the former. @kai-morich @notcaremath I do not understand your comment. Are you referring to Kai's USB Terminal app or the USB/core implementation in the android kernel? |
Line 186 in 34e6d98
All real devices. |
Hey folks, sorry for the late response, but here is the info about my setup: I'm running Android 8.1.0, with kernel 4.4.95, so it seems to fall into the version range that can be problematic according to your theory? Unfortunately, upgrading the Android version/kernel is not an option in my case. But I will see if I can test this application in a device with a higher Android version/kernel and check if I get different results. Thanks! |
@vegidio No worries. Glad to hear you're still with us. Yes. I think that the pre-5.14.x kernel method of handling bulk/control transfers involve submitting requests that cannot be terminated by the kernel, that the request must be resolved with a response before the next request can be processed regardless of any provided timeout. One possible avenue is testing and expanding the queue/requestWait for reads and writes. Though, I'm puzzled how @kai-morich wasn't able to run into any of these issues when testing on real devices. |
@everyone |
@Mohammedbinnazar The current proposal is to use It seems the bug would not occur in Android 13 Kernel 5.14.x+ per the source code. I have no test data to support this, nor do I intend to gather it as it is out of scope for me. The only likely release/change from this PR would be an update to the README/FAQ about best practices of the library in multi-threaded (Java) contexts using Synchronized (or at the very least a "Do Not Do" list) and maybe cautionary warnings for corresponding use in Kotlin, depending on what @vegidio discovers in his test. Based on my investigations thus far, the library is sound and stable. Improvements could be made like anything else but not essential. As in C, I believe the library should be kept simple, small, backwards compatible, and fundamental. The onus is on the developer to understand then utilize the tool in any capacity, even if it means allowing the developer to self-inflict irreparable failure. |
@vegidio , can you tell me what USB speed your target device is operating? One way to check is connecting your device by USB to your Mac/PC and determining the speed from the Device Manager. |
An update. Sorry about the long haitus. Got pulled in a couple different directions. ConclusionHardware IssueThe Android application is communicating with a USB device through a Usb 2.1 USB Hub. The USB specification is very clear about the cause of this. I believe this is a hardware issue with the USB hub chip itself. For myself, I'm using a Genesis logic controller. ResolutionUse a different USB Device with a Different USB ChipI suggest sharing a warning in the Readme that certain Genesis USB chips are defective or have faulty logic for handling the transaction translator, and it's best to use a different USB Chip manufacturer. AbatementUnfortunately, the Android app does not provide a means to detect android USB hub devices. If the USB chip cannot be changed, then the following would probably work.
Unfortunately, I'm not sure if this requires rooting the device. Beyond that, there's nothing else that comes to mind that could reset the hub/device to recover from the filled buffer beyond power cycling the USB device, if that's even possible in software. |
Okay. I'm finally rounding out this thread. @kai-morich , I motion closing this issue. :) I'd like to put this issue to rest, please. Abatement - AbstractI wrote a C module using libusb and compiled a JNI library using NDK-build. I'm nearly there, but I don't recommend this approach. The effort is herculean, requires a deep understanding of low level USB protocol, USB Spec, (or at the very least a USB protocol analyzer to see USB traffic) and assumes root access to Android + ability to modify the boot.img partition. The following is a solution to "recover" from a busy buffer aka 'get_status madness.' It doesn't resolve the root cause which is likely due to a non-standard USB-spec implementation on the USB IC but not limited to electrical issues. If you experience these "usb get-status failed" issues, the easiest solution is to junk that USB device and find a different device with a battle-tested USB Logic implementation. Ideally, something like a certain TE USB interface/controller, where USB logic is handled through discrete state machine logic instead of error prone firmware. If you can't do this and possess root access + ability to modify boot.img of the Android OS, then feel free to proceed. ActionsAfter building the JNI module, I ran into access permission issues like many other libusb developers encountered years ago. The JNI module/Android app failed to connect to the USB device directly despite being able to connect to the device via standalone libusb C program installed and executed through ADB shell (as root). Elevating the android app to "System" and changing the entire From there, the path is relatively straightforward. Now that the Android app can communicate directly with the USB device, a TT RESET Control transfer request, described in the USB Spec, can be sent to the USB device to recover the 'usb get_status` issue. ConclusionThank you @kai-morich for your contributions and continued maintenance. I hope this investigation provided closure. It certainly has for me, accompanied with a brand new appreciation and heightened skepticism for USB boards. |
Hi, I'm developing an application that connects to a device through USB and then sends a command (the same command) over and over again in intervals of 30 seconds.
Everything seems to work as expected for some hours, sometimes almost 24 hours working without any problem, but then we suddenly start to get this error
USB get_status request failed
. I searched for this error message in your repo and I found some people that encountered the same error message, but two cases in particular seem somewhat similar to mine:In this first issue the OP reported that everything was working initially but later it stopped after 2 hours; on this second issue the OP mentions that everything stopped working after 5 hours.
I find intriguing that in all theses cases, including mine, everything is working for a while, but then out of nowhere it stops working after some hours. I know this is a long shot, but do you have any idea what could be causing these problems?
Thanks!
The text was updated successfully, but these errors were encountered: