Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[kmac] Abort when sideload key is invalid during operation #22794

Merged
merged 11 commits into from
May 3, 2024

Conversation

andreaskurth
Copy link
Contributor

@andreaskurth andreaskurth commented Apr 24, 2024

The KMAC HW IP block features an option to load keys from Key Manager
via a HW key sideload interface. Prior to this commit, KMAC would:

This could lead to cases where KMAC would use an invalid sideload key. This PR fixes that by raising an error in these cases.

To enable that, this PR moves the err_processed bit from the CFG_SHADOWED CSR to the CMD CSR, and this PR fixes three problems with KMAC's error handling (some of which would only occur with the newly introduced error).

Please see the commit messages for details on the changes in this PR.

Block-level tests maintain their pass rate (tested with -i all -rx 0.1).

Follow-up tasks

@andreaskurth andreaskurth self-assigned this Apr 24, 2024
@andreaskurth andreaskurth requested a review from a team as a code owner April 24, 2024 11:28
@andreaskurth andreaskurth requested review from marnovandermaas and removed request for a team April 24, 2024 11:28
hw/ip/kmac/rtl/kmac_app.sv Outdated Show resolved Hide resolved
hw/ip/kmac/rtl/kmac_app.sv Outdated Show resolved Hide resolved
hw/ip/kmac/rtl/kmac_app.sv Outdated Show resolved Hide resolved
hw/ip/kmac/rtl/kmac_app.sv Outdated Show resolved Hide resolved
@marnovandermaas marnovandermaas removed their request for review April 25, 2024 08:44
Copy link
Contributor

@vogelpi vogelpi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for investigating and fixing this @andreaskurth ! The code looks mostly good but there seems to be a bug in one of the new states which may cause KMAC to send a done pulse on the wrong app interface.

As for the new issue of misbehaving HW app interfaces stalling KMAC: I think the current behavior may be acceptable but it's probably worth to have a separate discussion. Would you mind creating a new GitHub issue please?

hw/ip/kmac/rtl/kmac_app.sv Outdated Show resolved Hide resolved
hw/ip/kmac/rtl/kmac_app.sv Outdated Show resolved Hide resolved
@andreaskurth andreaskurth force-pushed the kmac-abort-invalid-key branch 3 times, most recently from 85510b9 to db8e322 Compare April 26, 2024 08:34
@andreaskurth andreaskurth requested a review from a team as a code owner April 29, 2024 11:45
@andreaskurth andreaskurth requested review from moidx and removed request for a team April 29, 2024 11:45
@andreaskurth andreaskurth force-pushed the kmac-abort-invalid-key branch 2 times, most recently from 93bb7e1 to 8df6407 Compare April 29, 2024 11:56
@andreaskurth
Copy link
Contributor Author

Thanks for investigating and fixing this @andreaskurth ! The code looks mostly good but there seems to be a bug in one of the new states which may cause KMAC to send a done pulse on the wrong app interface.

Thanks for flagging this, @vogelpi. I have fixed this problem, alongside with two others I found. May I ask you to review the code again?

As for the new issue of misbehaving HW app interfaces stalling KMAC: I think the current behavior may be acceptable but it's probably worth to have a separate discussion. Would you mind creating a new GitHub issue please?

Yes, I'll create an issue for that when merging this PR. It's tracked in the follow-up tasks in the PR description.

Copy link
Contributor

@vogelpi vogelpi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, thanks @andreaskurth for fixing this and explaining what's going on. Nice work!

hw/ip/kmac/rtl/kmac_app.sv Show resolved Hide resolved
hw/ip/kmac/rtl/kmac_app.sv Show resolved Hide resolved
@andreaskurth andreaskurth force-pushed the kmac-abort-invalid-key branch 2 times, most recently from 5f53b24 to eae0444 Compare April 29, 2024 15:18
@moidx moidx requested a review from ballifatih April 30, 2024 00:00
Copy link
Contributor

@ballifatih ballifatih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix @andreaskurth. LGTM, but I must admit I could not fully review the full FSM transitions.

Restoring KMAC from an error has always been a bit tricky from SW side. I will have to keep that in mind that for the cryptolib. This PR really does not touch that clearing mechanism, so I think we should be fine (except for relocating err_processed bit, for which this PR also includes changes).

The following top level tests might provide some coverage (although they are not targeting this particular fix):

  • sw/device/tests/keymgr_sideload_kmac_test.c
  • sw/device/tests/kmac_entropy_test.c

@andreaskurth
Copy link
Contributor Author

andreaskurth commented Apr 30, 2024

Thx for the review @ballifatih. Actually the keymgr_sideload_kmac_test flags an issue that I'm now resolving (it gets also run in CI).

@andreaskurth andreaskurth force-pushed the kmac-abort-invalid-key branch 2 times, most recently from 8f27950 to 11fafff Compare April 30, 2024 20:23
@andreaskurth
Copy link
Contributor Author

Alright, I think I fixed the problems indicated by the failing keymgr_sideload_kmac_test TLT:

  • SW expected KMAC to complete (with a bogus output) when the sideload key from keymgr is invalid, whereas it now raises an error that needs to be handled;
  • key stability assertions in kmac_core no longer made sense when kmac_app is able to abort upon an invalid key;
  • the state of the kmac_errchk module did not get cleared when error handling was complete;
  • kmac_app waited for data from a HW app interface even when the error occurred during SW operation.

@andreaskurth andreaskurth force-pushed the kmac-abort-invalid-key branch 2 times, most recently from cd2f31e to 1cb6c45 Compare May 2, 2024 06:27
@andreaskurth
Copy link
Contributor Author

Waiting for #22935 (or another fix for the ROM_EXT problem that currently breaks the corresponding CI check) to be merged, then I'll rebase this. I'd rather not merge it without getting a green light from those checks, because this modifies the kmac-keymgr sideload interface, which is relevant for booting into ROM_EXT.

To get some confidence from simulation tests, I ran the relevant TLTs (util/dvsim/dvsim.py hw/top_earlgrey/dv/chip_sim_cfg.hjson -i chip_sw_kmac_app_rom chip_sw_kmac_mode_kmac{,_jitter_en{,_reduced_freq}} chip_sw_kmac_idle chip_sw_keymgr_sideload_kmac chip_sw_kmac_entropy chip_sw_clkmgr_off_kmac_trans -rx 2), which all pass except 50% of chip_sw_keymgr_sideload_kmac due to a timeout that is also reported in current nightly results:

1.chip_sw_keymgr_sideload_kmac.6479773378425374498954881563215145166078031770691723324347790450221767634834\                                                                                                                                                        
          Line 399, in log /home/dev/src/scratch/kmac-abort-invalid-key/chip_earlgrey_asic-sim-vcs/1.chip_sw_keymgr_sideload_kmac/latest/run.log                                                                                                                              
                                                                                                                                                                                                                                                                              
                UVM_FATAL @ 10010.160001 us: (chip_sw_keymgr_key_derivation_vseq.sv:122) [uvm_test_top.env.virtual_sequencer.chip_sw_keymgr_sideload_kmac_vseq] wait timeout occurred!                                                                                        
                UVM_INFO @ 10010.160001 us: (uvm_report_catcher.svh:705) [UVM/REPORT/CATCHER]                                                                                                                                                                                 
                --- UVM Report catcher Summary ---

The `CFG_SHADOWED` CSR gets write-locked while the SHA3 module is
operating.  This makes sense for configuration values but it is
problematic for the `err_processed` bit, through which SW acknowledges
that it has handled an error occurring while KMAC/SHA3 is operating,
because that acknowledgment would then not propagate to KMAC and it
would get stuck.

Signed-off-by: Andreas Kurth <[email protected]>
This commit fixes four problems in KMAC's error handling:

1) App interface done after SW has processed error

   Before this commit, when the FSM in `kmac_app` was in its `StError`
   state and SW acknowledged the error by writing the `err_processed`
   bit in the `CFG_SHADOWED` CSR, the FSM would go back to the `StIdle`
   state.  If the active HW app interface had not provided all data
   before this SW-induced transition to the `StIdle` state, KMAC was no
   longer ready to absorb the remaining data, thereby stalling the HW
   app interface.

   This commit extends the FSM in `kmac_app` so that when an error
   happens it doesn't matter whether SW acknowledges the error before or
   after the HW app interface has provided all data.  In both cases, the
   FSM now waits for SW to acknowledge *and* the HW app interface to
   provide all data before it returns to `StIdle`.

   To implement this, two new states (`StErrorAwaitSw` and
   `StErrorAwaitApp`) get added.

   In block-level DV, `kmac_key_error_vseq` waited for the *first* data
   item instead of the *last* before stopping to send items on the
   active HW app interface and continuing with the vseq (acknowledging
   the error from SW and going to the next iteration).  This no longer
   works because KMAC now remains in `StError` until the HW app
   interface has send the last data item, so that vseq now correctly
   sends all data items including the last on the HW app interface.

   This may raise a new issue: If the HW app interface misbehaves, KMAC
   will remain in the error state until it gets reset.  To remediate
   this, we could add a mechanism for SW to clear KMAC's state
   regardless of the state of the HW app interface.

2) SHA3 core not told to process

   Before this commit, when the FSM in `kmac_app` went through the
   `StError` state, it would not tell the SHA3 core to process the data
   so that its internal state would get flushed.

   This commit extends the FSM in `kmac_app` so that `CmdProcess` is
   signaled on `cmd_o` to the SHA3 core when the error is handled, and
   lets the FSM wait for the completion of SHA3 (`absorbed_i` input)
   before returning to idle.  To implement this, a new state
   (`StErrorWaitAbsorbed`) is added.

3) Signaling completion to SW when an error has occurred in SW mode

   Before this commit, when an error would occur in SW mode (`mux_sel ==
   SelSw`), the `absorbed_o` output, which in turn triggers the
   `kmac_done` IRQ would not be raised.  This case could not happen
   because no errors were raised in SW mode (such errors will get raised
   with the next commit), but it's still wrong.

   This commit extends the FSM in `kmac_app` with a side-band state bit
   (`err_during_sw_q`) and uses that to set `absorbed_o` when an error
   that occurred during SW mode got handled.

4) Clearing of `kmac_errchk`

   Before this commit, the state of the `kmac_errchk` would not get
   cleared after the error has been handled.  Thus the next invocation
   of KMAC could start with a residual state in `kmac_errchk` and
   incorrectly fail.

   This commits adds a MuBi-encoded signal from `kmac_app` to
   `kmac_errchk` through which the former can clear the latter after an
   error.

Signed-off-by: Andreas Kurth <[email protected]>
When `kmac_app` aborts an operation as soon as the sideload key becomes
invalid (which it will do in a couple of commits) the current stability
assertions on `key_len_i` and `key_data_i` in `kmac_core` would be
violated.  This is not a problem: `kmac_core` *should* use an invalid,
useless key in that case.  To prevent these assertions from failing,
this commit introduces a signal with which `kmac_app` indicates to
`kmac_core` whether `key_len_i` and `key_data_i` are valid and rewrites
the assertions to take that valid signal into account.

Signed-off-by: Andreas Kurth <[email protected]>
The KMAC HW IP block features an option to load keys from Key Manager
via a HW key sideload interface.  Prior to this commit, KMAC would:
- when used via the SW application interface: *not check at all* if the
  sideload key is valid (issue lowRISC#10704, lowRISC#16855);
- when used via a HW application interface: check if the sideload key is
  valid *only for a single cycle* when the application interface gets
  configured (state `StAppCfg` in `kmac_app`).

This could lead to cases where KMAC would use an invalid sideload key.

This commit fixes the problem by checking whether the sideload key is
valid in *every* FSM state in which the sideload key is used.  If the
sideload key is invalid even for a single cycle (the FSM cannot know
whether the key is being used in this exact cycle or not), `kmac_app`'s
FSM goes into the `StKeyMgrErrKeyNotValid` state.  In that state, the
FSM signals the `keymgr_pkg::ErrKeyNotValid` error code in KMAC's
`err_code` CSR.  The FSM then transitions to the `StError` state, where
it drains data from the HW application interface by keeping
`app_o.ready` high.  The digest output remains all-zero (it can only
take a non-zero value in the `StAppWait` state).  The FSM exits the
`StError` state after SW has signalled that it has processed the error
by writing the `processed` bit in the `CFG_SHADOWED` CSR *and* the
active HW app interface has sent the last data item.

This commit resolves lowRISC#10704 and implements the RTL part of lowRISC#16855.
Covering this in DV remains open, although the existing tests (which
don't cover this) keep their previous pass rates.

Signed-off-by: Andreas Kurth <[email protected]>
@vogelpi
Copy link
Contributor

vogelpi commented May 2, 2024

Update: rebased on top of #22940.

@andreaskurth
Copy link
Contributor Author

Thx! With that, CI finally passes. Merging

Copy link

@cdgori cdgori left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM modulo the one hjson bit that might have snuck in (?)

@@ -99,6 +99,7 @@
{
name: "{name}_sideload"
uvm_test_seq: kmac_sideload_vseq
run_opts: ["+test_timeout_ns=1_000_000_000"]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this supposed to be included? It's very long.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is necessary because kmac now drains all data beats on the HW app interface when there's an error.

Re this being very long -- time is relative, right? :-) kmac block level DV simulates at a relatively high rate (simulated time over wall clock time), and other tests have an even higher timeout (e.g., kmac_long_msg_and_output above has 10 s).

nasahlpa added a commit to nasahlpa/opentitan that referenced this pull request Jul 5, 2024
PR lowRISC#22794 moved the `err_processed` field from
the `CFG_SHADOWED` shadow register to the `CMD` register.

However, the `shadow_reg_wr` function was not adapted. This function
assumes that writing to the `CFG_SHADOWED` reg clears the update error
field. As this now can only be done with the `CMD` register, this is
not true anymore. Hence, this commit removes this assumption.

Closes lowRISC#23083.

Signed-off-by: Pascal Nasahl <[email protected]>
vogelpi pushed a commit that referenced this pull request Jul 5, 2024
PR #22794 moved the `err_processed` field from
the `CFG_SHADOWED` shadow register to the `CMD` register.

However, the `shadow_reg_wr` function was not adapted. This function
assumes that writing to the `CFG_SHADOWED` reg clears the update error
field. As this now can only be done with the `CMD` register, this is
not true anymore. Hence, this commit removes this assumption.

Closes #23083.

Signed-off-by: Pascal Nasahl <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants