Skip to content

Commit

Permalink
[flash_ctrl] Enable firmware dealing with multi-bit ECC and ICV errors
Browse files Browse the repository at this point in the history
Before, these two errors types led to a fatal alert which is problematic
during firmware selection and verification. This commit changes the
design in the following way:
- The two relevant bits in the FAULT_STATUS CSR are made clearable by
  software. Other bits in this register remain sticky.
- The corresponding alert is no longer fatal.

This means the alert is only sent out until the two bits are cleared by
software. To be on the safe side, firmware can still classify the alert
as fatal on the receiver side (in the alert handler). For the other
error sources, the alert keeps getting triggered as before, i.e., it
remains fatal.

For more background information, refer to #21353.

This resolves #21637.

Signed-off-by: Pirmin Vogel <[email protected]>
  • Loading branch information
vogelpi committed Apr 16, 2024
1 parent 266709a commit dec4bc1
Show file tree
Hide file tree
Showing 20 changed files with 501 additions and 122 deletions.
35 changes: 31 additions & 4 deletions hw/ip/flash_ctrl/data/flash_ctrl.hjson
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,15 @@
desc: "flash standard fatal errors"
},
{ name: "fatal_err",
desc: "flash fatal errors"
desc: '''
Flash fatal errors including uncorrectable ECC errors.

Note that this alert is not always fatal.
The underlying error bits in the !!FAULT_STATUS register remain set until reset, meaning the alert keeps firing.
This doesn't hold for !!FAULT_STATUS.PHY_RELBL_ERR and !!FAULT_STATUS.PHY_STORAGE_ERR.
To enable firmware dealing with multi-bit ECC and ICV errors during firmware selection and verification, these error bits can be cleared.
After passing this stage, it is recommended that firmware classifies the corresponding alert as fatal on the receiver end, i.e, inside the alert handler.
'''
},
{ name: "fatal_prim_flash_alert",
desc: "Fatal alert triggered inside the flash primitive, including fatal TL-UL bus integrity faults of the test interface."
Expand Down Expand Up @@ -2043,27 +2051,31 @@
desc: '''
This register tabulates customized fault status of the flash.

These are errors that are impossible to have been caused by software or unrecoverable
in nature.
These are errors that are impossible to have been caused by software or unrecoverable in nature.

All errors except for multi-bit ECC errors (!!FAULT_STATUS.PHY_RELBL_ERR) and ICV (!!FAULT_STATUS.PHY_STORAGE_ERR) trigger a fatal alert.
Once set, they remain set until reset.
'''
swaccess: "ro",
hwaccess: "hrw",
fields: [
{ bits: "0",
name: "op_err",
swaccess: "ro",
desc: '''
The flash life cycle management interface has supplied an undefined operation.
See !!CONTROL.OP for list of valid operations.
'''
},
{ bits: "1",
name: "mp_err",
swaccess: "ro",
desc: '''
The flash life cycle management interface encountered a memory permission error.
'''
},
{ bits: "2",
name: "rd_err",
swaccess: "ro",
desc: '''
The flash life cycle management interface encountered a read error.
This could be a reliability ECC error or an integrity ECC error
Expand All @@ -2072,56 +2084,71 @@
},
{ bits: "3",
name: "prog_err",
swaccess: "ro",
desc: '''
The flash life cycle management interface encountered a program error.
This could be a program integirty eror, see !!STD_FAULT_STATUS for more details.
'''
},
{ bits: "4",
name: "prog_win_err",
swaccess: "ro",
desc: '''
The flash life cycle management interface encountered a program resolution error.
'''
},
{ bits: "5",
name: "prog_type_err",
swaccess: "ro",
desc: '''
The flash life cycle management interface encountered a program type error.
A program type not supported by the flash macro was issued.
'''
},
{ bits: "6",
name: "seed_err",
swaccess: "ro",
desc: '''
The seed reading process encountered an unexpected error.
'''
},
{ bits: "7",
name: "phy_relbl_err",
swaccess: "rw0c",
desc: '''
The flash macro encountered a storage reliability ECC error.

Note that this error bit can be cleared to allow firmware dealing with multi-bit ECC errors during firmware selection and verification.
After passing this stage, it is recommended that firmware classifies the corresponding alert as fatal on the receiver end, i.e, inside the alert handler.
'''
},
{ bits: "8",
name: "phy_storage_err",
swaccess: "rw0c",
desc: '''
The flash macro encountered a storage integrity ECC error.

Note that this error bit can be cleared to allow firmware dealing with ICV errors during firmware selection and verification.
After passing this stage, it is recommended that firmware classifies the corresponding alert as fatal on the receiver end, i.e, inside the alert handler.
'''
},
{ bits: "9",
name: "spurious_ack",
swaccess: "ro",
desc: '''
The flash emitted an unexpected acknowledgement.
'''
},
{ bits: "10",
name: "arb_err",
swaccess: "ro",
desc: '''
The phy arbiter encountered inconsistent results.
'''
},
{ bits: "11",
name: "host_gnt_err",
swaccess: "ro",
desc: '''
A host transaction was granted with illegal properties.
'''
Expand Down
35 changes: 31 additions & 4 deletions hw/ip/flash_ctrl/data/flash_ctrl.hjson.tpl
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,15 @@
desc: "flash standard fatal errors"
},
{ name: "fatal_err",
desc: "flash fatal errors"
desc: '''
Flash fatal errors including uncorrectable ECC errors.
Note that this alert is not always fatal.
The underlying error bits in the !!FAULT_STATUS register remain set until reset, meaning the alert keeps firing.
This doesn't hold for !!FAULT_STATUS.PHY_RELBL_ERR and !!FAULT_STATUS.PHY_STORAGE_ERR.
To enable firmware dealing with multi-bit ECC and ICV errors during firmware selection and verification, these error bits can be cleared.
After passing this stage, it is recommended that firmware classifies the corresponding alert as fatal on the receiver end, i.e, inside the alert handler.
'''
},
{ name: "fatal_prim_flash_alert",
desc: "Fatal alert triggered inside the flash primitive, including fatal TL-UL bus integrity faults of the test interface."
Expand Down Expand Up @@ -1518,27 +1526,31 @@
desc: '''
This register tabulates customized fault status of the flash.
These are errors that are impossible to have been caused by software or unrecoverable
in nature.
These are errors that are impossible to have been caused by software or unrecoverable in nature.
All errors except for multi-bit ECC errors (!!FAULT_STATUS.PHY_RELBL_ERR) and ICV (!!FAULT_STATUS.PHY_STORAGE_ERR) trigger a fatal alert.
Once set, they remain set until reset.
'''
swaccess: "ro",
hwaccess: "hrw",
fields: [
{ bits: "0",
name: "op_err",
swaccess: "ro",
desc: '''
The flash life cycle management interface has supplied an undefined operation.
See !!CONTROL.OP for list of valid operations.
'''
},
{ bits: "1",
name: "mp_err",
swaccess: "ro",
desc: '''
The flash life cycle management interface encountered a memory permission error.
'''
},
{ bits: "2",
name: "rd_err",
swaccess: "ro",
desc: '''
The flash life cycle management interface encountered a read error.
This could be a reliability ECC error or an integrity ECC error
Expand All @@ -1547,56 +1559,71 @@
},
{ bits: "3",
name: "prog_err",
swaccess: "ro",
desc: '''
The flash life cycle management interface encountered a program error.
This could be a program integirty eror, see !!STD_FAULT_STATUS for more details.
'''
},
{ bits: "4",
name: "prog_win_err",
swaccess: "ro",
desc: '''
The flash life cycle management interface encountered a program resolution error.
'''
},
{ bits: "5",
name: "prog_type_err",
swaccess: "ro",
desc: '''
The flash life cycle management interface encountered a program type error.
A program type not supported by the flash macro was issued.
'''
},
{ bits: "6",
name: "seed_err",
swaccess: "ro",
desc: '''
The seed reading process encountered an unexpected error.
'''
},
{ bits: "7",
name: "phy_relbl_err",
swaccess: "rw0c",
desc: '''
The flash macro encountered a storage reliability ECC error.
Note that this error bit can be cleared to allow firmware dealing with multi-bit ECC errors during firmware selection and verification.
After passing this stage, it is recommended that firmware classifies the corresponding alert as fatal on the receiver end, i.e, inside the alert handler.
'''
},
{ bits: "8",
name: "phy_storage_err",
swaccess: "rw0c",
desc: '''
The flash macro encountered a storage integrity ECC error.
Note that this error bit can be cleared to allow firmware dealing with ICV errors during firmware selection and verification.
After passing this stage, it is recommended that firmware classifies the corresponding alert as fatal on the receiver end, i.e, inside the alert handler.
'''
},
{ bits: "9",
name: "spurious_ack",
swaccess: "ro",
desc: '''
The flash emitted an unexpected acknowledgement.
'''
},
{ bits: "10",
name: "arb_err",
swaccess: "ro",
desc: '''
The phy arbiter encountered inconsistent results.
'''
},
{ bits: "11",
name: "host_gnt_err",
swaccess: "ro",
desc: '''
A host transaction was granted with illegal properties.
'''
Expand Down
15 changes: 14 additions & 1 deletion hw/ip/flash_ctrl/data/flash_ctrl.sv.tpl
Original file line number Diff line number Diff line change
Expand Up @@ -952,7 +952,20 @@ module flash_ctrl
reg2hw.alert_test.recov_err.q & reg2hw.alert_test.recov_err.qe
};
localparam logic [NumAlerts-1:0] IsFatal = {1'b0, 1'b1, 1'b1, 1'b1, 1'b0};
// The alert generated for errors reported in the fault status CSR (fatal_err) is not fatal.
// This is to enable firmware dealing with multi-bit ECC errors (phy_relbl_err) as well as ICV
// (phy_storage_err) errors inside the PHY during firmware selection and verification.
// Once firmware has cleared the corresponding bits in the fault status CSR and the alert
// handler has acknowledged the alert, the prim_alert_sender will stop triggering the alert.
// After firmware has passed the firmware selection / verification stage, the alert handler
// config can be adjusted to still classify the alert as fatal on the receiver side.
//
// This doesn't hold for the other errors conditions reported in the fault status CSR. The
// corresponding bits in the register cannot be unset. The alert thus keeps triggering until
// reset for these bits.
//
// For more details, refer to lowRISC/OpenTitan#21353.
localparam logic [NumAlerts-1:0] IsFatal = {1'b0, 1'b1, 1'b0, 1'b1, 1'b0};
for (genvar i = 0; i < NumAlerts; i++) begin : gen_alert_senders
prim_alert_sender #(
.AsyncOn(AlertAsyncOn[i]),
Expand Down
14 changes: 7 additions & 7 deletions hw/ip/flash_ctrl/doc/interfaces.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,13 +57,13 @@ Referring to the [Comportable guideline for peripheral device functionality](htt

## Security Alerts

| Alert Name | Description |
|:-----------------------|:--------------------------------------------------------------------------------------------------------------------|
| recov_err | flash recoverable errors |
| fatal_std_err | flash standard fatal errors |
| fatal_err | flash fatal errors |
| fatal_prim_flash_alert | Fatal alert triggered inside the flash primitive, including fatal TL-UL bus integrity faults of the test interface. |
| recov_prim_flash_alert | Recoverable alert triggered inside the flash primitive. |
| Alert Name | Description |
|:-----------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| recov_err | flash recoverable errors |
| fatal_std_err | flash standard fatal errors |
| fatal_err | Flash fatal errors including uncorrectable ECC errors. Note that this alert is not always fatal. The underlying error bits in the [`FAULT_STATUS`](registers.md#fault_status) register remain set until reset, meaning the alert keeps firing. This doesn't hold for [`FAULT_STATUS.PHY_RELBL_ERR`](registers.md#fault_status) and [`FAULT_STATUS.PHY_STORAGE_ERR.`](registers.md#fault_status) To enable firmware dealing with multi-bit ECC and ICV errors during firmware selection and verification, these error bits can be cleared. After passing this stage, it is recommended that firmware classifies the corresponding alert as fatal on the receiver end, i.e, inside the alert handler. |
| fatal_prim_flash_alert | Fatal alert triggered inside the flash primitive, including fatal TL-UL bus integrity faults of the test interface. |
| recov_prim_flash_alert | Recoverable alert triggered inside the flash primitive. |

## Security Countermeasures

Expand Down
Loading

0 comments on commit dec4bc1

Please sign in to comment.