Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

USB MSC has some sort of race condition #2322

Open
1 task done
byteit101 opened this issue Nov 18, 2023 · 2 comments · May be fixed by #2840
Open
1 task done

USB MSC has some sort of race condition #2322

byteit101 opened this issue Nov 18, 2023 · 2 comments · May be fixed by #2840
Labels

Comments

@byteit101
Copy link

byteit101 commented Nov 18, 2023

Operating System

Linux

Board

Raspberry Pi PICO (RP2040)

Firmware

https://github.com/byteit101/test-tinyusb-pico

What happened ?

panic/assert/exit calls sometimes, but not always, on writes (I think?).

I have confirmed it on Debian 10, Linux 5.10.0-21-amd64 #1 SMP Debian 5.10.162-1 (2023-01-21) x86_64 GNU/Linux

I haven't seen it on a "blank" flash (no filesystem)

I have confirmed it with G++ 8.3 and G++ 12.2 (the latter requires uncommenting target_link_options(${PROJECT_NAME} PRIVATE "LINKER:--no-warn-rwx-segments") in the CMake file)

I have reported this downstream as raspberrypi/pico-sdk#1523

How to reproduce ?

Ensure you have a 1.5.1 PICO SDK setup in your environment.

git clone https://github.com/byteit101/test-tinyusb-pico
cd test-tinyusb-pico
mkdir build
cd build
cmake ..
make
picotool load msc_pico_failure.elf -f
picotool reboot -f
lsblk # find the drive that just appeared and is 1MB
sudo mkfs.vfat -F12 -n "TinyUSB Tst" -S 4096 /dev/sdYOURDRIVE

In one terminal, tio /dev/ttyACM0 (or some other auto-connecting serial console)
Now, watch dmesg -w or such, and repeat picotool reboot -f. Sometimes it will be ok and you will see the Test Flash load, and remain loaded, and tio serial connection remain connected.

[18441889.657799] scsi 9:0:0:0: Direct-Access     TUSB     Test Flash       1.0  PQ: 0 ANSI: 2
[18441889.658432] sd 9:0:0:0: Attached scsi generic sg8 type 0
[18441889.658676] sd 9:0:0:0: [sdh] 256 4096-byte logical blocks: (1.05 MB/1.00 MiB)

Othertimes, it will first do that, then after a few seconds, it will disconnect serial, and reload into the RP2 bootloader drive. This is the failure condition:

[18441891.961605] scsi 10:0:0:0: Direct-Access     RPI      RP2              3    PQ: 0 ANSI: 2
[18441891.962053] sd 10:0:0:0: Attached scsi generic sg8 type 0
[18441891.962397] sd 10:0:0:0: [sdh] 262144 512-byte logical blocks: (134 MB/128 MiB)

Debug Log as txt file (LOG/CFG_TUSB_DEBUG=2)

Using tinyusb 0.12, I got this log sequence, culminating in an assertion failure:

...snip...
USBD Setup Received 80 06 00 01 00 00 40 00 
  Get Descriptor Device
  Queue EP 80 with 18 bytes ...
USBD Xfer Complete on EP 80 with 18 bytes
  Queue EP 00 with 0 bytes ...
  Short packet on buffer 0 with 0 bytes
Completed transfer of 0 bytes on ep 0 out
USBD Xfer Complete on EP 00 with 0 bytes
BUS RESET
USBD Bus Reset : Full Speed
  Short packet on buffer 0 with 18 bytes
Completed transfer of 18 bytes on ep 0 in

USBD Setup Received 00 05 1E 00 00 00 00 00 
  Set Address
USBD Xfer Complete on EP 80 with 18 bytes
usbd_control_xfer_cb 171: ASSERT FAILED

In contrast, successful Set Address looks like

...snip

USBD Setup Received 00 05 10 00 00 00 00 00 
  Set Address
  Short packet on buffer 0 with 0 bytes
Completed transfer of 0 bytes on ep 0 in
USBD Xfer Complete on EP 80 with 0 bytes

...snip

The reordering, combined with the fact that it's highly irregular, sometimes failing for 5 reboots in a row, then succeeding for 50%, then succeeding for 10 reboots in a row, makes me think this is some kind of race condition in the MSC TUD driver.

The stack trace was usbd_control.c:171, called from usbd.c:562

I updated TinyUSB to 0.15.0, and now get:

*** PANIC ***

ep 0 in was already available

Screenshots

No response

I have checked existing issues, dicussion and documentation

  • I confirm I have checked existing issues, dicussion and documentation.
@jncronin
Copy link
Contributor

I think I am also seeing this issue on a custom STM32H7 board.

For a variety of reasons the host resets the bus (in my case simply sending 0 bytes to a read10 request triggers this with a windows host after a while).

The DCD_EVENT_BUS_RESET message is immediately followed by a DCD_EVENT_XFER_COMPLETE on an IN endpoint, and the subsequent TU_ASSERT(driver,) fails.

@Olstyle
Copy link

Olstyle commented Oct 10, 2024

I can also reproduce what @jncronin is describing. If I have high load on my STM2U585 MCU, probably some response is handled too late, which results in a bus reset from the host. Since this DCD_EVENT_BUS_RESET resets the device info in _usbd_dev , but the event queue is not cleared, the already present DCD_EVENT_XFER_COMPLETE is read in the next round and tries to find drivers in an empty _usbd_dev , which fails into the Assert.

EDT: the following kind of works with FreeRTOS (since osal_semaphore_reset expands to xQueueReset)

--- a/dwc2/device/usbd.c
+++ b/dwc2/device/usbd.c
@@ -585,6 +585,7 @@ void tud_task_ext(uint32_t timeout_ms, bool in_isr) {
       case DCD_EVENT_BUS_RESET:
         TU_LOG_USBD(": %s Speed\r\n", tu_str_speed[event.bus_reset.speed]);
         usbd_reset(event.rhport);
+        osal_semaphore_reset(_usbd_q);
         _usbd_dev.speed = event.bus_reset.speed;
         break;

Of course there will still be a reenumeration etc. but the device keeps on living and Windows seems to find it again as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants