Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Firmware USB streaming can stall after the first 16K USB transfer following RX restart #1042

Closed
martinling opened this issue Jan 29, 2022 · 0 comments · Fixed by #1029
Closed
Assignees

Comments

@martinling
Copy link
Member

martinling commented Jan 29, 2022

This has been very hard to pin down, but I think I've now proven it.

In both #883 and #916, we saw host-side problems that caused a rapid RX stop/start sequence to fail, due to concurrency bugs in our usage of libusb. I believe I have now fixed all these host-side issues in #1029.

However, there was still a failure mode being seen in which no libusb errors occured, yet the RX data flow failed to restart. The RX LED was on when this happened, indicating that the transceiver mode change had been processed on the device. The libusb event handling was proceeding correctly, but a bulk transfer was never filled.

I've now succeeded in using LUNA to catch this happening on the wire. A pcap file, trimmed to just the relevant section, is attached: clip.pcap.zip

After the vendor requests to change the transceiver mode to OFF, and then to RX, only a single 16KB transfer from the firmware is completed. The host continues to send IN tokens to request data from the endpoint, but the device only responds with NAK.

From the host side, it appears as if no data at all is received, because we submit libusb transfers for 256KB of data, with no timeout.

I have no idea what's going wrong on the firmware side yet, but I do know that the symptoms are reproducible both with and without the changes in #982.

@martinling martinling self-assigned this Jan 29, 2022
@martinling martinling linked a pull request Feb 3, 2022 that will close this issue
martinling added a commit to martinling/hackrf that referenced this issue Feb 3, 2022
This fixes bug greatscottgadgets#1042, which occured when an RX->OFF->RX sequence
happened quickly enough that the loop in rx_mode() did not see the
change. As a result, the enable_baseband_streaming() call at the start
of that function was not repeated for the new RX operation, so RX
progress stalled.

To solve this, the vendor request handler now increments a sequence
number when it changes the transceiver mode. Instead of the RX loop
checking whether the transceiver mode is still RX, it now checks whether
the current sequence number is the same as when it was started. If not,
there must have been at least one mode change, so the loop exits, and
the main loop starts the necessary loop for the new mode. The same
behaviour is implemented for the TX and sweep loops.

For this approach to be reliable, we must ensure that when deciding
which mode and sequence number to use, we take both values from the same
set_transceiver_mode request.

To achieve this, we briefly disable the USB0 interrupt to stop the
vendor request handler from running whilst reading the mode and sequence
number together. Then the loop dispatch proceeds using those pre-read
values.
martinling added a commit to martinling/hackrf that referenced this issue Feb 8, 2022
This is a defensive change to make the transceiver code easier to reason
about, and to avoid the possibility of races such as that seen in greatscottgadgets#1042.

Previously, set_transceiver_mode() was called in the vendor request
handler for the SET_TRANSCEIVER_MODE request, as well in the callback
for a USB configuration change. Both these calls are made from the USB0
ISR, so could interrupt the rx_mode(), tx_mode() and sweep_mode()
functions at any point. It was hard to tell if this was safe.

Instead, set_transceiver_mode() has been removed, and its work is split
into three parts:

- request_transceiver_mode(), which is safe to call from ISR context.
  All this function does is update the requested mode and increment a
  sequence number. This builds on work already done in PR greatscottgadgets#1029, but
  the interface has been simplified to use a shared volatile structure.

- transceiver_startup(), which transitions the transceiver from an idle
  state to the configuration required for a specific mode, including
  setting up the RF path, configuring the M0, adjusting LEDs and UI etc.

- transceiver_shutdown(), which transitions the transceiver back to an
  idle state.

The *_loop functions that implement the transceiver modes now call
transceiver_startup() before starting work, and transceiver_shutdown()
before returning, and all this happens in the main thread of execution.

As such, it is now guaranteed that all the steps involved happen in a
consistent order, with the transceiver starting from an idle state, and
being returned to an idle state before control returns to the main loop.

For consistency of interface, an off_mode() function has been added to
implement the behaviour of the OFF transceiver mode. Since the
transceiver is already guaranteed to be in an idle state when this is
called, the only work required is to set the UI mode and wait for a new
mode request.
martinling added a commit to martinling/hackrf that referenced this issue Feb 8, 2022
This is a defensive change to make the transceiver code easier to reason
about, and to avoid the possibility of races such as that seen in greatscottgadgets#1042.

Previously, set_transceiver_mode() was called in the vendor request
handler for the SET_TRANSCEIVER_MODE request, as well in the callback
for a USB configuration change. Both these calls are made from the USB0
ISR, so could interrupt the rx_mode(), tx_mode() and sweep_mode()
functions at any point. It was hard to tell if this was safe.

Instead, set_transceiver_mode() has been removed, and its work is split
into three parts:

- request_transceiver_mode(), which is safe to call from ISR context.
  All this function does is update the requested mode and increment a
  sequence number. This builds on work already done in PR greatscottgadgets#1029, but
  the interface has been simplified to use a shared volatile structure.

- transceiver_startup(), which transitions the transceiver from an idle
  state to the configuration required for a specific mode, including
  setting up the RF path, configuring the M0, adjusting LEDs and UI etc.

- transceiver_shutdown(), which transitions the transceiver back to an
  idle state.

The *_mode() functions that implement the transceiver modes now call
transceiver_startup() before starting work, and transceiver_shutdown()
before returning, and all this happens in the main thread of execution.

As such, it is now guaranteed that all the steps involved happen in a
consistent order, with the transceiver starting from an idle state, and
being returned to an idle state before control returns to the main loop.

For consistency of interface, an off_mode() function has been added to
implement the behaviour of the OFF transceiver mode. Since the
transceiver is already guaranteed to be in an idle state when this is
called, the only work required is to set the UI mode and wait for a new
mode request.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant