Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[flash_ctrl] Enable firmware dealing with multi-bit ECC and ICV errors #22431

Merged
merged 1 commit into from
Apr 16, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 31 additions & 4 deletions hw/ip/flash_ctrl/data/flash_ctrl.hjson
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,15 @@
desc: "flash standard fatal errors"
},
{ name: "fatal_err",
desc: "flash fatal errors"
desc: '''
Flash fatal errors including uncorrectable ECC errors.

Note that this alert is not always fatal.
The underlying error bits in the !!FAULT_STATUS register remain set until reset, meaning the alert keeps firing.
This doesn't hold for !!FAULT_STATUS.PHY_RELBL_ERR and !!FAULT_STATUS.PHY_STORAGE_ERR.
To enable firmware dealing with multi-bit ECC and ICV errors during firmware selection and verification, these error bits can be cleared.
After passing this stage, it is recommended that firmware classifies the corresponding alert as fatal on the receiver end, i.e, inside the alert handler.
'''
},
{ name: "fatal_prim_flash_alert",
desc: "Fatal alert triggered inside the flash primitive, including fatal TL-UL bus integrity faults of the test interface."
Expand Down Expand Up @@ -2043,27 +2051,31 @@
desc: '''
This register tabulates customized fault status of the flash.

These are errors that are impossible to have been caused by software or unrecoverable
in nature.
These are errors that are impossible to have been caused by software or unrecoverable in nature.

All errors except for multi-bit ECC errors (!!FAULT_STATUS.PHY_RELBL_ERR) and ICV (!!FAULT_STATUS.PHY_STORAGE_ERR) trigger a fatal alert.
Once set, they remain set until reset.
'''
swaccess: "ro",
hwaccess: "hrw",
fields: [
{ bits: "0",
name: "op_err",
swaccess: "ro",
desc: '''
The flash life cycle management interface has supplied an undefined operation.
See !!CONTROL.OP for list of valid operations.
'''
},
{ bits: "1",
name: "mp_err",
swaccess: "ro",
desc: '''
The flash life cycle management interface encountered a memory permission error.
'''
},
{ bits: "2",
name: "rd_err",
swaccess: "ro",
desc: '''
The flash life cycle management interface encountered a read error.
This could be a reliability ECC error or an integrity ECC error
Expand All @@ -2072,56 +2084,71 @@
},
{ bits: "3",
name: "prog_err",
swaccess: "ro",
desc: '''
The flash life cycle management interface encountered a program error.
This could be a program integirty eror, see !!STD_FAULT_STATUS for more details.
'''
},
{ bits: "4",
name: "prog_win_err",
swaccess: "ro",
desc: '''
The flash life cycle management interface encountered a program resolution error.
'''
},
{ bits: "5",
name: "prog_type_err",
swaccess: "ro",
desc: '''
The flash life cycle management interface encountered a program type error.
A program type not supported by the flash macro was issued.
'''
},
{ bits: "6",
name: "seed_err",
swaccess: "ro",
desc: '''
The seed reading process encountered an unexpected error.
'''
},
{ bits: "7",
name: "phy_relbl_err",
swaccess: "rw0c",
desc: '''
The flash macro encountered a storage reliability ECC error.

Note that this error bit can be cleared to allow firmware dealing with multi-bit ECC errors during firmware selection and verification.
After passing this stage, it is recommended that firmware classifies the corresponding alert as fatal on the receiver end, i.e, inside the alert handler.
'''
},
{ bits: "8",
name: "phy_storage_err",
swaccess: "rw0c",
desc: '''
The flash macro encountered a storage integrity ECC error.

Note that this error bit can be cleared to allow firmware dealing with ICV errors during firmware selection and verification.
After passing this stage, it is recommended that firmware classifies the corresponding alert as fatal on the receiver end, i.e, inside the alert handler.
'''
},
{ bits: "9",
name: "spurious_ack",
swaccess: "ro",
desc: '''
The flash emitted an unexpected acknowledgement.
'''
},
{ bits: "10",
name: "arb_err",
swaccess: "ro",
desc: '''
The phy arbiter encountered inconsistent results.
'''
},
{ bits: "11",
name: "host_gnt_err",
swaccess: "ro",
desc: '''
A host transaction was granted with illegal properties.
'''
Expand Down
35 changes: 31 additions & 4 deletions hw/ip/flash_ctrl/data/flash_ctrl.hjson.tpl
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,15 @@
desc: "flash standard fatal errors"
},
{ name: "fatal_err",
desc: "flash fatal errors"
desc: '''
Flash fatal errors including uncorrectable ECC errors.

Note that this alert is not always fatal.
The underlying error bits in the !!FAULT_STATUS register remain set until reset, meaning the alert keeps firing.
This doesn't hold for !!FAULT_STATUS.PHY_RELBL_ERR and !!FAULT_STATUS.PHY_STORAGE_ERR.
To enable firmware dealing with multi-bit ECC and ICV errors during firmware selection and verification, these error bits can be cleared.
After passing this stage, it is recommended that firmware classifies the corresponding alert as fatal on the receiver end, i.e, inside the alert handler.
'''
},
{ name: "fatal_prim_flash_alert",
desc: "Fatal alert triggered inside the flash primitive, including fatal TL-UL bus integrity faults of the test interface."
Expand Down Expand Up @@ -1518,27 +1526,31 @@
desc: '''
This register tabulates customized fault status of the flash.

These are errors that are impossible to have been caused by software or unrecoverable
in nature.
These are errors that are impossible to have been caused by software or unrecoverable in nature.

All errors except for multi-bit ECC errors (!!FAULT_STATUS.PHY_RELBL_ERR) and ICV (!!FAULT_STATUS.PHY_STORAGE_ERR) trigger a fatal alert.
Once set, they remain set until reset.
'''
swaccess: "ro",
hwaccess: "hrw",
fields: [
{ bits: "0",
name: "op_err",
swaccess: "ro",
desc: '''
The flash life cycle management interface has supplied an undefined operation.
See !!CONTROL.OP for list of valid operations.
'''
},
{ bits: "1",
name: "mp_err",
swaccess: "ro",
desc: '''
The flash life cycle management interface encountered a memory permission error.
'''
},
{ bits: "2",
name: "rd_err",
swaccess: "ro",
desc: '''
The flash life cycle management interface encountered a read error.
This could be a reliability ECC error or an integrity ECC error
Expand All @@ -1547,56 +1559,71 @@
},
{ bits: "3",
name: "prog_err",
swaccess: "ro",
desc: '''
The flash life cycle management interface encountered a program error.
This could be a program integirty eror, see !!STD_FAULT_STATUS for more details.
'''
},
{ bits: "4",
name: "prog_win_err",
swaccess: "ro",
desc: '''
The flash life cycle management interface encountered a program resolution error.
'''
},
{ bits: "5",
name: "prog_type_err",
swaccess: "ro",
desc: '''
The flash life cycle management interface encountered a program type error.
A program type not supported by the flash macro was issued.
'''
},
{ bits: "6",
name: "seed_err",
swaccess: "ro",
desc: '''
The seed reading process encountered an unexpected error.
'''
},
{ bits: "7",
name: "phy_relbl_err",
swaccess: "rw0c",
desc: '''
The flash macro encountered a storage reliability ECC error.

Note that this error bit can be cleared to allow firmware dealing with multi-bit ECC errors during firmware selection and verification.
After passing this stage, it is recommended that firmware classifies the corresponding alert as fatal on the receiver end, i.e, inside the alert handler.
'''
},
{ bits: "8",
name: "phy_storage_err",
swaccess: "rw0c",
desc: '''
The flash macro encountered a storage integrity ECC error.

Note that this error bit can be cleared to allow firmware dealing with ICV errors during firmware selection and verification.
After passing this stage, it is recommended that firmware classifies the corresponding alert as fatal on the receiver end, i.e, inside the alert handler.
'''
},
{ bits: "9",
name: "spurious_ack",
swaccess: "ro",
desc: '''
The flash emitted an unexpected acknowledgement.
'''
},
{ bits: "10",
name: "arb_err",
swaccess: "ro",
desc: '''
The phy arbiter encountered inconsistent results.
'''
},
{ bits: "11",
name: "host_gnt_err",
swaccess: "ro",
desc: '''
A host transaction was granted with illegal properties.
'''
Expand Down
15 changes: 14 additions & 1 deletion hw/ip/flash_ctrl/data/flash_ctrl.sv.tpl
Original file line number Diff line number Diff line change
Expand Up @@ -952,7 +952,20 @@ module flash_ctrl
reg2hw.alert_test.recov_err.q & reg2hw.alert_test.recov_err.qe
};

localparam logic [NumAlerts-1:0] IsFatal = {1'b0, 1'b1, 1'b1, 1'b1, 1'b0};
// The alert generated for errors reported in the fault status CSR (fatal_err) is not fatal.
// This is to enable firmware dealing with multi-bit ECC errors (phy_relbl_err) as well as ICV
// (phy_storage_err) errors inside the PHY during firmware selection and verification.
// Once firmware has cleared the corresponding bits in the fault status CSR and the alert
// handler has acknowledged the alert, the prim_alert_sender will stop triggering the alert.
// After firmware has passed the firmware selection / verification stage, the alert handler
// config can be adjusted to still classify the alert as fatal on the receiver side.
//
// This doesn't hold for the other errors conditions reported in the fault status CSR. The
// corresponding bits in the register cannot be unset. The alert thus keeps triggering until
// reset for these bits.
//
// For more details, refer to lowRISC/OpenTitan#21353.
localparam logic [NumAlerts-1:0] IsFatal = {1'b0, 1'b1, 1'b0, 1'b1, 1'b0};
for (genvar i = 0; i < NumAlerts; i++) begin : gen_alert_senders
prim_alert_sender #(
.AsyncOn(AlertAsyncOn[i]),
Expand Down
14 changes: 7 additions & 7 deletions hw/ip/flash_ctrl/doc/interfaces.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,13 +57,13 @@ Referring to the [Comportable guideline for peripheral device functionality](htt

## Security Alerts

| Alert Name | Description |
|:-----------------------|:--------------------------------------------------------------------------------------------------------------------|
| recov_err | flash recoverable errors |
| fatal_std_err | flash standard fatal errors |
| fatal_err | flash fatal errors |
| fatal_prim_flash_alert | Fatal alert triggered inside the flash primitive, including fatal TL-UL bus integrity faults of the test interface. |
| recov_prim_flash_alert | Recoverable alert triggered inside the flash primitive. |
| Alert Name | Description |
|:-----------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| recov_err | flash recoverable errors |
| fatal_std_err | flash standard fatal errors |
| fatal_err | Flash fatal errors including uncorrectable ECC errors. Note that this alert is not always fatal. The underlying error bits in the [`FAULT_STATUS`](registers.md#fault_status) register remain set until reset, meaning the alert keeps firing. This doesn't hold for [`FAULT_STATUS.PHY_RELBL_ERR`](registers.md#fault_status) and [`FAULT_STATUS.PHY_STORAGE_ERR.`](registers.md#fault_status) To enable firmware dealing with multi-bit ECC and ICV errors during firmware selection and verification, these error bits can be cleared. After passing this stage, it is recommended that firmware classifies the corresponding alert as fatal on the receiver end, i.e, inside the alert handler. |
| fatal_prim_flash_alert | Fatal alert triggered inside the flash primitive, including fatal TL-UL bus integrity faults of the test interface. |
| recov_prim_flash_alert | Recoverable alert triggered inside the flash primitive. |

## Security Countermeasures

Expand Down
Loading
Loading