-
Notifications
You must be signed in to change notification settings - Fork 6.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
STM32H7: Bus fault when reading corrupt flash sectors #33140
Comments
This looks like a good way to handle this. |
I made a test on nucleo_h723zg : but I cannot reproduce the issue (when commenting the loop at " Check if 256 bits location is erased" )
|
@weinholtendian I'm mivong this issue to enhancement as I don't see the reported behavior as a bug, though I agree that behavior could (/should) be improved. |
@FRASTM This was tested with an STM32H745 Nucleo-144 (NUCLEO-H745ZI-Q). I believe that the H723 also will exhibit the same behavior as otherwise described in this issue. Both reference manuals (RM0399 and RM0468) contain this type of text: When DBECCERR1/2 flag is raised, a bus error is generated. In case of successive double error detections, a bus error is generated each time a new data is sent back to the requester through the AXI interface. However, the method for deliberately creating a double ECC error is just something I improvised and verified on the H745 and it might be that it doesn't work on the H723 (or something is off with my instructions). It will still be possible to encounter double ECC errors in the field due to normal causes for flash failures, or when interrupting an mcuboot upgrade (as mentioned in mcu-tools/mcuboot#713). @erwango Ack, will do this next week. |
@FRASTM I might have figured out why you couldn't reproduce the bus fault from my instructions. The Updated instructions to get a double ECC error, which can be used to verify that the suggested patch works as intended:
The flash now has a fault that you can use to verify that the patch works. After applying the patch and flashing again (
|
The STM32H7x flash has an integrated ECC that can correct single errors and detect double errors. When a double ECC error is detected, the DBECCERR1/2 flag is raised and there is a bus fault. We now mask this bus fault and check the error flags. ECC errors are logged with the offset of the data. Single ECC errors cause a warning to be logged and double ECC errors return -EIO. Fixes zephyrproject-rtos#33140. Signed-off-by: Göran Weinholt <[email protected]>
The STM32H7x flash has an integrated ECC that can correct single errors and detect double errors. When a double ECC error is detected, the DBECCERR1/2 flag is raised and there is a bus fault. We now mask this bus fault and check the error flags. ECC errors are logged with the offset of the data. Single ECC errors cause a warning to be logged and double ECC errors return -EIO. Fixes #33140. Signed-off-by: Göran Weinholt <[email protected]>
Describe the bug
The stm32h7 flash driver crashes the system with a bus fault if the data it reads has a double ECC error (DBECCERR).
The stm32h7 flash has an integrated ECC capable of correcting single errors and detecting double errors. The bus fault is expected behavior from the hardware, but very unfortunate behavior for the flash API. It would be much better to return an error code from flash_read().
To Reproduce
You will need a flash word with a double ECC error.
Expected behavior
The flash_read() API should return an error code when it finds a double ECC error, such as -EIO. It should probably also log the ECC error.
Impact
The bus fault stops mcuboot from working reliably when a firmware upgrade is interrupted in the middle, or when a flash word has developed an ECC error for some other reason.
Attempting to work around it on a different level than the flash driver is complicated, see: mcu-tools/mcuboot#713 (comment)
Logs and console output
Environment (please complete the following information):
Suggested fix
It is possible to stop the bus fault from happening by using BFHFNMIGN and FAULTMASK during the read from the flash. Without the bus fault it is then possible to read the status from the flash controller and translate that to a status code. I have a PoC available here: endiantechnologies@d541116
I'm a bit out of my depth when it comes to fiddling with such low-level ARM Cortex registers, so I think someone more knowledgeable should have a look at the code. Other places in the driver that read from the flash also need to be protected. But at least with the patch an error is returned instead of the system crashing:
And I believe that mcuboot is better able to handle this kind of error code than it can handle a bus fault.
The text was updated successfully, but these errors were encountered: