Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MSD Boot (sometimes) stuck at Read start4.elf bytes #227

Closed
jurc192 opened this issue Sep 17, 2020 · 18 comments
Closed

MSD Boot (sometimes) stuck at Read start4.elf bytes #227

jurc192 opened this issue Sep 17, 2020 · 18 comments

Comments

@jurc192
Copy link

jurc192 commented Sep 17, 2020

Describe the bug
I am booting the RPi from the USB3/SSD and it sometimes gets stuck at the boot screen at
Read start4.elf bytes 2278336 hnd 0x00011128e hash '78e6729ebb0d2e84.

RPi: 4B
SSD: Samsung 970 EVO, 500 GB
USB-SSD enclosure: ICY box IB-1817M-C31 (JMicron USB chip)

To Reproduce
Steps to reproduce the behavior:

  • flash SD-card with latest RPi OS image (2020-08-20) from downloads page
  • flash SSD with same RPi image
  • boot RPi with SD-card and update EEPROM fw to latest version (September 3 2020, timestamp 1599135103)
  • shutdown, remove power, remove SD-card, connect SSD, power-up

Sometimes it works, sometimes it gets stuck at reading start4.elf. The LED on the SSD disk is blinking, but nothing happens.

Screenshots

IMG_20200917_124949

USB boot:
When booting from SD, my SSD always gets detected, using uas driver, 5000M speed

Additional context
I noticed I had similar problems to #183 , since it would sometimes also get stuck at LUN timeout... init port 3 speed 2 etc.
Using advice from the related issue, I increased the LUN timeout to 5 seconds and it seems to help. Now it only gets stuck on read start4.elf sometimes.

P.S. I want to ask the same subquestion which remained unaswered in #183 -> is there a forum/place to discuss SSD-USB enclosure issues, which ones work well, which are bugy, which can be upgraded etc.?

@timg236
Copy link
Collaborator

timg236 commented Sep 17, 2020

Please can you try connecting via a powered USB hub to see if this improves things.

UART or NETCONSOLE logs would be needed if this doesn't resolve the problem
https://www.raspberrypi.org/documentation/hardware/raspberrypi/bcm2711_bootloader_config.md

Feel free to discuss USB interoperability issues on the Raspberry General discussion forum
https://www.raspberrypi.org/forums/viewforum.php?f=63&sid=a0769fe3a50d754af51a4f94a9d921c5

@jurc192
Copy link
Author

jurc192 commented Sep 17, 2020

Connecting the disk via powered USB does not boot RPi at all (red led on, green led off).
I guess I have the same problem as mentioned here -> cheap USB hubs.
Does anyone know a reliable, working USB3 powered hub?

Setting up UART and catching the moment where it acutally freezes took some time - but here it is :)

PM_RSTS: 0x00001000
RPi: BOOTLOADER release VERSION:c305221a DATE: Sep  3 2020 TIME: 13:11:46 BOOTMODE: 0x00000006 part: 0 BUILD_TIMESTAMP=1599135103 0x6099643c 0x00c03112
uSD voltage 3.3V
Initialising SDRAM 'Micron' 16Gb x2 total-size: 32 Gbit 3200
XHCI-STOP
xHC ver: 256 HCS: 05000420 fc000031 00e70004 HCC: 002841eb
xHC ports 5 slots 32 intrs 4
Reset USB port-power 1000 ms
Boot mode: SD (01) order f4
SD HOST: 250000000 CTL0: 0x00000000 BUS: 100000 Hz actual: 100000 HZ div: 2500 (1250) status: 0x1fff0000 delay: 1080
SD HOST: 250000000 CTL0: 0x00000f00 BUS: 100000 Hz actual: 100000 HZ div: 2500 (1250) status: 0x1fff0000 delay: 1080
EMMC
SD HOST: 250000000 CTL0: 0x00000000 BUS: 100000 Hz actual: 100000 HZ div: 2500 (1250) status: 0x1fff0000 delay: 1080
SD HOST: 250000000 CTL0: 0x00000000 BUS: 100000 Hz actual: 100000 HZ div: 2500 (1250) status: 0x1fff0000 delay: 1080
SDV1
SD HOST: 250000000 CTL0: 0x00000000 BUS: 100000 Hz actual: 100000 HZ div: 2500 (1250) status: 0x1fff0000 delay: 1080
SD CMD: 0x371a0010 (55) 0x0 0x1fff0001
Failed to open device: 'sdcard' (cmd 371a0010 status 1fff0001)
Boot mode: USB-MSD (04) order f
XHCI-STOP
xHC ver: 256 HCS: 05000420 fc000031 00e70004 HCC: 002841eb
XHCI-STOP
xHC ver: 256 HCS: 05000420 fc000031 00e70004 HCC: 002841eb
xHC ports 5 slots 32 intrs 4
USB3 rport 2 status 00000000 -> 000002a0
USB3 rport 3 status 00000000 -> 000002a0
USB3 rport 4 status 00000000 -> 000002a0
USB3 rport 5 status 00000000 -> 000002a0
USB2 rport 1 status 00000000 -> 400002e1
USB2 rport 1 status 40000331 -> 40200e03
root HUB port 1 init
USB3 rport 3 status 000002a0 -> 00021203
DEV [01:00] 2.16 000000:01 class 9 VID 2109 PID 3431
HUB init [01:00] 2.16 000000:01
root HUB port 3 init
DEV [02:00] 3.32 000000:03 class 0 VID 152d PID 0580
MSD device [02:00] 3.32 000000:03 conf 0 iface 0 ep 81#1024 02#1024
MSD [02:00] 3.32 000000:03 register MSD
PM_RSTS: 0x00001000
MSD [02:00] 3.32 000000:03 LUN 0
HUB [01:00] 2.16 000000:01 init port 3 speed 2
DEV [03:01] 2.00 000003:01 class 0 VID 046d PID c069
HUB [01:00] 2.16 000000:01 init port 4 speed 2
DEV [04:01] 1.16 000004:01 class 0 VID 046d PID c31c
MSD [02:00] 3.32 000000:03 lun 0 block-count 976773168 block-size 512
MBR: 0x00002000,  524288 type: 0x0c
MBR: 0x00082000,976240688 type: 0x83
MBR: 0x00000000,       0 type: 0x00
MBR: 0x00000000,       0 type: 0x00
lba: 8192 oem: 'mkfs.fat' volume: ' boot       '
rsc 32 fat-sectors 4033 c-count 516190 c-size 1 r-dir 2 r-sec 0
PM_RSTS: 0x00001000
Partition: 0
lba: 8192 oem: 'mkfs.fat' volume: ' boot       '
rsc 32 fat-sectors 4033 c-count 516190 c-size 1 r-dir 2 r-sec 0
Read config.txt bytes     1784 hnd 0x00000386 hash '6c0f63e7068a74e6'
recover4.elf not found (6)
recovery.elf not found (6)
Read start4.elf bytes  2278336 hnd 0x0001128e hash '78e6729ebb0d2e84'

@timg236
Copy link
Collaborator

timg236 commented Sep 17, 2020

Thanks, that's great. If you are comfortable adding updating the bootloader EEPROM config then temporarily adding XHCI_DEBUG_FLAGS=0xf would be useful (0x3f is also useful but extremely verbose.

I've found that USB hubs in monitors are often good, but we don't have an official list of good/bad USB hubs because it's not possible to keep it up to date.

@jurc192
Copy link
Author

jurc192 commented Sep 17, 2020

Here's log where it gets stuck, this time with XHCI_DEBUG_FLAGS=0x3f


PM_RSTS: 0x00000020
RPi: BOOTLOADER release VERSION:c305221a DATE: Sep  3 2020 TIME: 13:11:46 BOOTMODE: 0x00000006 part: 0 BUILD_TIMESTAMP=1599135103 0x6099643c 0x00c03112
uSD voltage 3.3V
Initialising SDRAM 'Micron' 16Gb x2 total-size: 32 Gbit 3200
XHCI-STOP
xHC ver: 256 HCS: 05000420 fc000031 00e70004 HCC: 002841eb
xHC ports 5 slots 32 intrs 4
Reset USB port-power 1000 ms
Boot mode: SD (01) order f4
SD HOST: 250000000 CTL0: 0x00000000 BUS: 100000 Hz actual: 100000 HZ div: 2500 (1250) status: 0x1fff0000 delay: 1080
SD HOST: 250000000 CTL0: 0x00000f00 BUS: 100000 Hz actual: 100000 HZ div: 2500 (1250) status: 0x1fff0000 delay: 1080
EMMC
SD HOST: 250000000 CTL0: 0x00000000 BUS: 100000 Hz actual: 100000 HZ div: 2500 (1250) status: 0x1fff0000 delay: 1080
SD HOST: 250000000 CTL0: 0x00000000 BUS: 100000 Hz actual: 100000 HZ div: 2500 (1250) status: 0x1fff0000 delay: 1080
SDV1
SD HOST: 250000000 CTL0: 0x00000000 BUS: 100000 Hz actual: 100000 HZ div: 2500 (1250) status: 0x1fff0000 delay: 1080
SD CMD: 0x371a0010 (55) 0x0 0x1fff0001
Failed to open device: 'sdcard' (cmd 371a0010 status 1fff0001)
Boot mode: USB-MSD (04) order f
XHCI-STOP
xHC ver: 256 HCS: 05000420 fc000031 00e70004 HCC: 002841eb
XHCI-STOP
xHC ver: 256 HCS: 05000420 fc000031 00e70004 HCC: 002841eb
xHC ports 5 slots 32 intrs 4
USB3 rport 2 status 00000000 -> 000002a0
USB3 rport 3 status 00000000 -> 000002b1
USB3 rport 4 status 00000000 -> 000002a0
USB3 rport 5 status 00000000 -> 000002a0
USB2 rport 1 status 00000000 -> 400002e1
USB2 rport 1 status 40000331 -> 40200e03
root HUB port 1 init
DEV [01:00] 2.16 000000:01 class 9 VID 2109 PID 3431
HUB init [01:00] 2.16 000000:01
USB3 rport 3 status 000002b1 -> 00281203
root HUB port 3 init
XHCI-STOP
xHC ver: 256 HCS: 05000420 fc000031 00e70004 HCC: 002841eb
USB MSD timed out after 20 seconds
Boot mode: RESTART (0f) order 0
Restart 0 max -1
Boot mode: SD (01) order f4
SD HOST: 250000000 CTL0: 0x00000000 BUS: 100000 Hz actual: 100000 HZ div: 2500 (1250) status: 0x1fff0000 delay: 1080
SD HOST: 250000000 CTL0: 0x00000f00 BUS: 100000 Hz actual: 100000 HZ div: 2500 (1250) status: 0x1fff0000 delay: 1080
EMMC
SD HOST: 250000000 CTL0: 0x00000000 BUS: 100000 Hz actual: 100000 HZ div: 2500 (1250) status: 0x1fff0000 delay: 1080
SD HOST: 250000000 CTL0: 0x00000000 BUS: 100000 Hz actual: 100000 HZ div: 2500 (1250) status: 0x1fff0000 delay: 1080
SDV1
SD HOST: 250000000 CTL0: 0x00000000 BUS: 100000 Hz actual: 100000 HZ div: 2500 (1250) status: 0x1fff0000 delay: 1080
SD CMD: 0x371a0010 (55) 0x0 0x1fff0001
Failed to open device: 'sdcard' (cmd 371a0010 status 1fff0001)
Boot mode: USB-MSD (04) order f
XHCI-STOP
xHC ver: 256 HCS: 05000420 fc000031 00e70004 HCC: 002841eb
xHC ports 5 slots 32 intrs 4
USB3 rport 2 status 00000000 -> 000002a0
USB3 rport 3 status 00000000 -> 000002b1
USB3 rport 4 status 00000000 -> 000002a0
USB3 rport 5 status 00000000 -> 000002a0
USB2 rport 1 status 00000000 -> 400002e1
USB2 rport 1 status 40000331 -> 40200e03
root HUB port 1 init
DEV [01:00] 2.16 000000:01 class 9 VID 2109 PID 3431
HUB init [01:00] 2.16 000000:01
USB3 rport 3 status 000002b1 -> 00281203
root HUB port 3 init
XHCI-STOP
xHC ver: 256 HCS: 05000420 fc000031 00e70004 HCC: 002841eb
USB MSD timed out after 20 seconds
Boot mode: RESTART (0f) order 0
Restart 1 max -1
Boot mode: SD (01) order f4
SD HOST: 250000000 CTL0: 0x00000000 BUS: 100000 Hz actual: 100000 HZ div: 2500 (1250) status: 0x1fff0000 delay: 1080
SD HOST: 250000000 CTL0: 0x00000f00 BUS: 100000 Hz actual: 100000 HZ div: 2500 (1250) status: 0x1fff0000 delay: 1080
EMMC
SD HOST: 250000000 CTL0: 0x00000000 BUS: 100000 Hz actual: 100000 HZ div: 2500 (1250) status: 0x1fff0000 delay: 1080
SD HOST: 250000000 CTL0: 0x00000000 BUS: 100000 Hz actual: 100000 HZ div: 2500 (1250) status: 0x1fff0000 delay: 1080
SDV1
SD HOST: 250000000 CTL0: 0x00000000 BUS: 100000 Hz actual: 100000 HZ div: 2500 (1250) status: 0x1fff0000 delay: 1080
SD CMD: 0x371a0010 (55) 0x0 0x1fff0001
Failed to open device: 'sdcard' (cmd 371a0010 status 1fff0001)
Boot mode: USB-MSD (04) order f

....keeps looping like this...

Thanks for the tip with monitor USB hub, I will try that tomorrow!

@timg236
Copy link
Collaborator

timg236 commented Sep 17, 2020

Apologies, it should have been XHCI_DEBUG not XHCI_DEBUG_FLAGS
(see https://www.raspberrypi.org/documentation/hardware/raspberrypi/bcm2711_bootloader_config.md)

The behaviour did look different on that run, it didn't seem to find a USB device at all instead of finding one and stopping at start4.elf, presumably getting stuck looking for fixup4.dat

@jurc192
Copy link
Author

jurc192 commented Sep 18, 2020

Aah I didn't notice the wrong flag name - it was late in the night :)
I tried power-cycling 10 times using XHCI_DEBUG=0x3f but could not get it to freeze. Afterwards I tried power-cycling with XHCI_DEBUG=0xf and on 6th try it got stuck on this line (looping forever):

REQ [01:00] 2.16 000000:01 ep 0 163,0,0,1 4 @ 0x00000004
REQ [01:00] 2.16 000000:01 ep 0 163,0,0,2 4 @ 0x00000004
REQ [01:00] 2.16 000000:01 ep 0 163,0,0,3 4 @ 0x00000004
REQ [01:00] 2.16 000000:01 ep 0 163,0,0,4 4 @ 0x00000004

Here are the complete logs, where it gets stuck: uartLog_dbg0xf_failed.txt
And just in case it is useful, here is an example of an normal, successful boot + manual shutdown: uartLog_dbg0xf_success.txt

@timg236
Copy link
Collaborator

timg236 commented Sep 18, 2020

That's interesting, do you have any other USB devices connected?

@jurc192
Copy link
Author

jurc192 commented Sep 18, 2020

I have a keyboard and a mouse connected to both USB2 ports, but the SSD disk is the only USB3 device...

@timg236
Copy link
Collaborator

timg236 commented Sep 18, 2020

ok, it looks as though in the failure case a response is not received for the first read of fixup4.dat. The same request is sent as in the success case, but nothing comes back.

There might be something that can be done to improve the error handling/timeout in that area. I'll update the bug if we have a test binary available, might be a few days though as this code is quite complex.

MSD [02:00] 3.32 000000:03 READ lun 0 addr: 16290 count: 4
CBW 000000d1 00000800 fff8f074 00000000

@jurc192
Copy link
Author

jurc192 commented Sep 19, 2020

Ok, I am glad to hear that the logs are showing something. I didn't undestand what you meant by "...if we have a test binary available..." - should I send you my disk image? Let me know if there is anything I can help with and thanks for looking into it!

@lurch
Copy link
Contributor

lurch commented Sep 20, 2020

I think Tim meant "if/when I've produced a test binary (presumably a new EEPROM image), I'll update this bug to let you know".

@timg236
Copy link
Collaborator

timg236 commented Nov 6, 2020

Please could you try with the pieeprom-2020-10-28 beta image?

@jurc192
Copy link
Author

jurc192 commented Nov 8, 2020

Thanks for the update!

I tried running pieeprom-2020-10-28 beta using the same USB3 enclosure connected directly to the RPI4 - it did not boot.
Here is the log: uartLog2_failed.txt
Here is a more verbose log (xhci_debug=0xf): uartLog2_dbg0xf_failed.txt

However connecting the disk via powered usb hub (on the monitor) works everytime. Here's the log for a successful boot via powered usb hub: uartLog2_success.txt
Same thing, but verbose: uartLog2_dbg0xf_success.txt

Also, I tested another USB enclosure which is powered by the Realtek chipset (the best option I could find; see last paragraph) and it works flawlessly.
Here is a log of an successful boot using the RealTek chipset-based SSD enclosure: uartLog2_dbg0xf_realtekUsb.txt

Apparently the JMicron chipset has some problems (some versions at least): https://www.legitreviews.com/jmicron-jms583-controller-version-matters-for-portable-usb-drives_219422

So, Realtek RTL9210 works out of the box. The JMicron JMS583 not really. Problem is that the latter is very common, afaik.

@timg236
Copy link
Collaborator

timg236 commented Nov 9, 2020

Thanks for the report. The logs suggest a power-related issue if going via a powered hub resolves the problem. Although, that can also change delay times for USB enumeration.

The JMS583 issues happen later on when Linux select UASP because the JMicron Firmware appears to have some bugs with UAS. It's unlikely that anything will change there without an updated firmware for JMicron or possibly a JMS specific quirk in the Kernel driver (unlikely)

@lurch
Copy link
Contributor

lurch commented Nov 9, 2020

the JMicron Firmware appears to have some bugs with UAS.

@jurc192 Does https://www.raspberrypi.org/forums/viewtopic.php?t=245931 help with that?

@timg236
Copy link
Collaborator

timg236 commented Jan 13, 2021

The latest bootloader beta release (2021-12-01) will now timeout MSD/SCSI command and restart the USB boot modes. There's also some improvements which helped compatibility with JMicron chipsets.
Please could you try this release to see if it improves the behaviour?

@jurc192
Copy link
Author

jurc192 commented Jan 14, 2021

@timg236 unfortunately I am no longer working on that project and I don't have access to the equipment (lab) anymore, sorry. Thanks for your great work! :)

@timg236
Copy link
Collaborator

timg236 commented Jan 14, 2021

Ok, thanks for replying.

I'm going to close the bug for now because I was able to simulate this by unplugging the USB at random points, re-connecting it then re-inserting it to show that boot did not get stuck forever.

@timg236 timg236 closed this as completed Jan 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants