Firmware USB streaming can stall after the first 16K USB transfer following RX restart #1042

martinling · 2022-01-29T10:36:30Z

This has been very hard to pin down, but I think I've now proven it.

In both #883 and #916, we saw host-side problems that caused a rapid RX stop/start sequence to fail, due to concurrency bugs in our usage of libusb. I believe I have now fixed all these host-side issues in #1029.

However, there was still a failure mode being seen in which no libusb errors occured, yet the RX data flow failed to restart. The RX LED was on when this happened, indicating that the transceiver mode change had been processed on the device. The libusb event handling was proceeding correctly, but a bulk transfer was never filled.

I've now succeeded in using LUNA to catch this happening on the wire. A pcap file, trimmed to just the relevant section, is attached: clip.pcap.zip

After the vendor requests to change the transceiver mode to OFF, and then to RX, only a single 16KB transfer from the firmware is completed. The host continues to send IN tokens to request data from the endpoint, but the device only responds with NAK.

From the host side, it appears as if no data at all is received, because we submit libusb transfers for 256KB of data, with no timeout.

I have no idea what's going wrong on the firmware side yet, but I do know that the symptoms are reproducible both with and without the changes in #982.

The text was updated successfully, but these errors were encountered:

This fixes bug greatscottgadgets#1042, which occured when an RX->OFF->RX sequence happened quickly enough that the loop in rx_mode() did not see the change. As a result, the enable_baseband_streaming() call at the start of that function was not repeated for the new RX operation, so RX progress stalled. To solve this, the vendor request handler now increments a sequence number when it changes the transceiver mode. Instead of the RX loop checking whether the transceiver mode is still RX, it now checks whether the current sequence number is the same as when it was started. If not, there must have been at least one mode change, so the loop exits, and the main loop starts the necessary loop for the new mode. The same behaviour is implemented for the TX and sweep loops. For this approach to be reliable, we must ensure that when deciding which mode and sequence number to use, we take both values from the same set_transceiver_mode request. To achieve this, we briefly disable the USB0 interrupt to stop the vendor request handler from running whilst reading the mode and sequence number together. Then the loop dispatch proceeds using those pre-read values.

This is a defensive change to make the transceiver code easier to reason about, and to avoid the possibility of races such as that seen in greatscottgadgets#1042. Previously, set_transceiver_mode() was called in the vendor request handler for the SET_TRANSCEIVER_MODE request, as well in the callback for a USB configuration change. Both these calls are made from the USB0 ISR, so could interrupt the rx_mode(), tx_mode() and sweep_mode() functions at any point. It was hard to tell if this was safe. Instead, set_transceiver_mode() has been removed, and its work is split into three parts: - request_transceiver_mode(), which is safe to call from ISR context. All this function does is update the requested mode and increment a sequence number. This builds on work already done in PR greatscottgadgets#1029, but the interface has been simplified to use a shared volatile structure. - transceiver_startup(), which transitions the transceiver from an idle state to the configuration required for a specific mode, including setting up the RF path, configuring the M0, adjusting LEDs and UI etc. - transceiver_shutdown(), which transitions the transceiver back to an idle state. The *_loop functions that implement the transceiver modes now call transceiver_startup() before starting work, and transceiver_shutdown() before returning, and all this happens in the main thread of execution. As such, it is now guaranteed that all the steps involved happen in a consistent order, with the transceiver starting from an idle state, and being returned to an idle state before control returns to the main loop. For consistency of interface, an off_mode() function has been added to implement the behaviour of the OFF transceiver mode. Since the transceiver is already guaranteed to be in an idle state when this is called, the only work required is to set the UI mode and wait for a new mode request.

This is a defensive change to make the transceiver code easier to reason about, and to avoid the possibility of races such as that seen in greatscottgadgets#1042. Previously, set_transceiver_mode() was called in the vendor request handler for the SET_TRANSCEIVER_MODE request, as well in the callback for a USB configuration change. Both these calls are made from the USB0 ISR, so could interrupt the rx_mode(), tx_mode() and sweep_mode() functions at any point. It was hard to tell if this was safe. Instead, set_transceiver_mode() has been removed, and its work is split into three parts: - request_transceiver_mode(), which is safe to call from ISR context. All this function does is update the requested mode and increment a sequence number. This builds on work already done in PR greatscottgadgets#1029, but the interface has been simplified to use a shared volatile structure. - transceiver_startup(), which transitions the transceiver from an idle state to the configuration required for a specific mode, including setting up the RF path, configuring the M0, adjusting LEDs and UI etc. - transceiver_shutdown(), which transitions the transceiver back to an idle state. The *_mode() functions that implement the transceiver modes now call transceiver_startup() before starting work, and transceiver_shutdown() before returning, and all this happens in the main thread of execution. As such, it is now guaranteed that all the steps involved happen in a consistent order, with the transceiver starting from an idle state, and being returned to an idle state before control returns to the main loop. For consistency of interface, an off_mode() function has been added to implement the behaviour of the OFF transceiver mode. Since the transceiver is already guaranteed to be in an idle state when this is called, the only work required is to set the UI mode and wait for a new mode request.

martinling added bug software labels Jan 29, 2022

martinling self-assigned this Jan 29, 2022

This was referenced Jan 29, 2022

Repeated Start/Stop RX Produces Error #916

Closed

Freeze in Gqrx with hackRF 2021-03-1 driver version #883

Closed

martinling linked a pull request Feb 3, 2022 that will close this issue

Fixes for concurrency bugs in start/stop operations #1029

Merged

This was referenced Feb 3, 2022

Fixes for concurrency bugs in start/stop operations #1029

Merged

Firmware sample buffer management overhaul, including safe handling of TX underruns #982

Merged

mossmann closed this as completed in #1029 Feb 8, 2022

martinling mentioned this issue Feb 8, 2022

Move transceiver mode changes out of USB ISR. #1045

Merged

martinling mentioned this issue Nov 16, 2022

[Question]: Looping hackrf_sweep x times #1226

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Firmware USB streaming can stall after the first 16K USB transfer following RX restart #1042

Firmware USB streaming can stall after the first 16K USB transfer following RX restart #1042

martinling commented Jan 29, 2022 •

edited by sync-by-unito bot

Loading

Firmware USB streaming can stall after the first 16K USB transfer following RX restart #1042

Firmware USB streaming can stall after the first 16K USB transfer following RX restart #1042

Comments

martinling commented Jan 29, 2022 • edited by sync-by-unito bot Loading

martinling commented Jan 29, 2022 •

edited by sync-by-unito bot

Loading