[edn] Elevate CSRNG status errors to fatal alerts #15894

cindychip · 2022-11-01T22:28:22Z

I would like to confirm the behavior when EDN encounters csrng errors (csrng_rsp_sts):

1). On the EDN side, upon receiving a csrng_rsp_sts error, it will report to cmd_sts register. But will still distribute the received data to all EDN requesters.

2). On CSRNG side, (only from reading the spec, I do not actually know the simulation results). It seems like CSRNG will fire an interrupt.

I would like to confirm if these behaviors are enough to treat the csrng_rsp_sts error?

@tjaychen @moidx @martin-lueker @mwbranstad

Thanks,
Cindy

estimate 16

The text was updated successfully, but these errors were encountered:

tjaychen · 2022-11-01T22:30:09Z

it seems like to me that if we encounter this situation, the only way to handle right now is to completely disable / re-enable the entropy complex. (or reboot).

It does feel like however this should be linked into recoverable alerts, but it doesn't seem like that right now.

mwbranstad · 2022-11-02T12:58:36Z

the only case where the sts bit will be on is when a silicon error occurs, or some attacker is driving this bit on. In other words there is no functional reason that this bit will set. I would probably rate this bit being set as more severe, such as a fatal alert.

tjaychen · 2022-11-02T16:19:03Z

@mwbranstad if it's alright with you, i can go in and make that change.
To not disturb things further, we can leave the interrupt and also add the fatal indication.
What do you think?

mwbranstad · 2022-11-02T17:26:24Z

@tjaychen sounds good, just add function only at this point.

tjaychen · 2022-11-02T17:39:54Z

sounds good.
If it is okay with everyone, let us keep this at M3 as Cindy has done.
I don't think this is something we absolutely have to get in now.

Let me know if there are any objections.

vogelpi · 2022-11-25T11:21:53Z

The exact same question cam also up during the work for CSRNG V2/V2S (#16516). And I think there are two different questions to answer here:

What should EDN do in case it gets a csrng_rsp_sts error signaled from CSRNG?
When should CSRNG signal a csrng_rsp_sts error to SW/EDN?

The second question is probably better discussed in #16516 which gives some more context. I am thus removing the CSRNG label here.

As for what EDN should do I think the best would be to not distribute the data received upon getting an sts error from CSRNG. AS @mwbranstad right now the sts bit cannot be driven to 1 by CSRNG unless someone injects a fault. But from the spec there might be other reasons to (let's discuss in #16516). What I am getting at is that EDN doesn't need to know why sts got asserted. It's just an upstream error condition that got signaled. And to me it seems that just continuing regular operation upon getting such an error isn't right.

tjaychen · 2022-11-29T19:19:48Z

but for random bits already generated and distributed to the EDN fifo's, wouldn't it be okay for them to continue distribution? I do agree though that edn could halt further requests / acks with csrng though.

I guess if we go down this route though, we need to do a check of the edn consumers to make sure that if the edn interface doesn't ack (because something has stopped due to errors upstream), it should actually halt the end point. For example I don't think keymgr handles this very well right now.

cdgori · 2022-12-29T17:19:52Z

but for random bits already generated and distributed to the EDN fifo's, wouldn't it be okay for them to continue distribution? I do agree though that edn could halt further requests / acks with csrng though.

Probably good to elevate CSRNG status errors to fatal, but we should review what the causes of those errors are in case any of them should not be elevated (not sure what that would be off the top of my head though). EDN <-> CSRNG req/ack should halt once that error is flagged/reported.

As far as entropy already down the pipe:

One can certainly argue that entropy from the CSRNG already distributed to the EDN prior to the status error is safe to continue to distribute until the EDN runs dry - those entropy samples passed all the relevant checks at the time of whitening/processing and no errors were present at that time.

One could also be ultra-paranoid and take an aggressive flushing posture as soon as any error is detected, but that could prevent tidying-up actions that might want to use entropy (e.g. to clear), which is problematic. I don't think the ultra-paranoia is justified here but happy to hear other opinions on this posture. (@mwbranstad / @martin-lueker / @johannheyszl / @moidx)

100% on your non-ack comment, that needs to be halted as it indicates some more serious fault that EDN consumers need to be aware of and handle properly.

andreaskurth · 2023-03-01T13:05:03Z

Triaged for edn. Assigning to M2.5 with Priority:P1 Priority: high because we need to take a decision on this, as any change will require time for implementation and verification.

vogelpi · 2023-03-03T11:08:45Z

P1 sounds right to me @andreaskurth .

I agree with @tjaychen and @cdgori that entropy already inside EDN when the csrng_rsp_sts error is signaled should be okay to distribute. Not sending new requests afterwards , not accepting new entropy coming in from CSRNG after that point would probably be the right thing to do.

andreaskurth · 2023-03-07T11:10:22Z

SGTM. As we've discussed yesterday, when this case happens, SW must be able to detect and handle it. This may require adding some status bit or even interrupt to EDN.

marnovandermaas · 2023-03-23T13:22:11Z

M2.5 effort estimate, range 2-16 depending on whether we decide to make design changes:

estimate 16

vogelpi · 2023-03-30T19:00:36Z

We've discussed this in the Security WG meeting on Mar 30 2023 and concluded the following:

Without changing CSRNG to actually produce high csrng_rsp_sts bits (see [csrng] Define how CSRNG status response should operate #16516), it makes little sense to fix this for M2.5.
In case CSRNG detects an internal error condition (e.g. invalid command), its main FSM stops and signals a recoverable alert. It stops serving entropy to the EDN eventually. EDN thus doesn't receive entropy after an error condition has occurred upstream (which would have to be signaled using the status bit). This means from a security perspective things are okay.
Given the expected effort and our deadlines, we have agreed to leave this part of the design as is for M2.5.

However, it is also understood that this should be fixed for a future release. I am thus changing the labels accordingly.

vogelpi · 2024-01-15T13:47:58Z

FYI @h-filali , this is very much related to the status register work you're currently doing. We should also fix this for production. We should fix this as described in my earlier comment.

h-filali · 2024-01-16T16:40:31Z

@vogelpi thanks for linking me. I suggest the EDN should enter the error state in the main SM upon receiving an status error from CSRNG. It should then signal a recoverable alert and not accept any further data from CSRNG until the whole entropy complex is disabled and re-enabled. This would be needed since continuing normal operation after receiving an error status signal from CSRNG could be catastrophic. A really bad example would be that the generate in boot mode could fail and thus boot entropy could not be delivered while the system keeps on waiting for boot entropy.
Both the error state and the recoverable alerts already exist.

Regarding what happens with the entropy, inside the EDN, which was received before the failure:
currently when the error state is reached the remaining entropy won't be sent out to the endpoints. This is probably solved best by not entering the error state in the ack_sm when certain errors like this one occur. This way, the endpoints can still request entropy while the genbits FIFO is not empty yet. We should make sure however, that the genbits FIFO won't accept any further data when the main SM enters the error state.

The above covers the HW modes (boot and auto mode). For SW mode the sw_cmd_sts signal should be fed through to the sw_cmd_sts register which can be read by firmware. From there it could either be the responsibility of the firmware to restart the entropy complex. In this case the EDN continues operation. Or we could proceed as above for the HW modes. What do you think @vogelpi.

vogelpi · 2024-02-05T11:05:54Z

I've already discussed this offline with @h-filali but for others previously involved in this discussion here: This sounds all good to me.

We've discussed to add a new recoverable error state into the main FSM out of which the FSM can get by a disable/enable cycle. The current error state is fatal which is not suitable for the behavior needed here. The Ack FSM we can leave untouched. End points will get the remaining entropy already received but further commands won't be sent out to CSRNG.

Also, some more status/ack bits have to be added for the hardware modes. Similar to the recently added bits to the SW_CMD_STS register. Hakim has come up with a plan for doing this.

johngt · 2024-02-09T12:25:11Z

@vogelpi - I assume that based on the conversation there might be significant work here. When it is possible could you please update the total effort expected and propagate to the top down + tracker doc. TIA!

vogelpi · 2024-02-09T12:58:45Z

@johngt , this is the issue @h-filali is about to finish. No need to update any effort estimates or the tracker doc. The PR is expected to arrive later today.

cindychip added the Type:Question Questions label Nov 1, 2022

cindychip added this to the Project: M3 milestone Nov 1, 2022

tjaychen removed the Type:Question Questions label Nov 2, 2022

tjaychen changed the title ~~[edn/csrng] Response to CSRNG errors~~ [edn/csrng] Elevate csrng status errors to fatal alerts Nov 2, 2022

johngt added IP:csrng IP:edn labels Nov 16, 2022

vogelpi removed the IP:csrng label Nov 25, 2022

vogelpi changed the title ~~[edn/csrng] Elevate csrng status errors to fatal alerts~~ [csrng] Elevate CSRNG status errors to fatal alerts Nov 25, 2022

vogelpi added the Component:RTL label Nov 25, 2022

vogelpi mentioned this issue Nov 25, 2022

[csrng] Define how CSRNG status response should operate #16516

Closed

johannheyszl added the Hotlist:Security Security Opinion Needed label Jan 10, 2023

andreaskurth changed the title ~~[csrng] Elevate CSRNG status errors to fatal alerts~~ [edn] Elevate CSRNG status errors to fatal alerts Mar 1, 2023

andreaskurth added Priority:P1 Priority: high Triaged labels Mar 1, 2023

andreaskurth modified the milestones: Discrete: M3, Discrete: M2.5 Mar 1, 2023

vogelpi removed the Hotlist:Security Security Opinion Needed label Mar 30, 2023

vogelpi modified the milestones: Discrete: M2.5, Backlog Mar 30, 2023

vogelpi added Priority:P2 Priority: medium and removed Priority:P1 Priority: high labels Mar 30, 2023

moidx added Priority:P3 Priority: low and removed Priority:P2 Priority: medium Type:FutureRelease Not relevant to currently planned releases/milestones labels Apr 6, 2023

msfschaffner added the Earlgrey-PROD Candidate Temporary label to triage issues into Earlgrey-PROD Milestones label Oct 7, 2023

vogelpi mentioned this issue Oct 24, 2023

[chip-test] chip_sw_csrng_edn_error #16837

Closed

6 tasks

msfschaffner modified the milestones: Backlog, Earlgrey-PROD.M2 Nov 7, 2023

msfschaffner added Hotlist:Security Security Opinion Needed and removed Triaged labels Nov 7, 2023

msfschaffner added the Subsystem:Entropy entropy_src, csrng, or edn related issues label Dec 21, 2023

msfschaffner assigned h-filali Dec 21, 2023

msfschaffner added Priority:P2 Priority: medium and removed Priority:P3 Priority: low labels Dec 21, 2023

johannheyszl removed the Priority:P2 Priority: medium label Jan 12, 2024

h-filali mentioned this issue Feb 5, 2024

[edn/rtl] Signal all recoverable alerts #21187

Merged

h-filali mentioned this issue Feb 9, 2024

[edn/rtl] Process CSRNG status error responses #21280

Merged

vogelpi closed this as completed in #21280 Feb 12, 2024

andreaskurth mentioned this issue Mar 28, 2024

[keymgr] D2(S) Signoff #20981

Closed

This was referenced Mar 29, 2024

[edn] D2S Signoff #20982

Closed

[csrng] D2S Signoff #20976

Closed

vogelpi mentioned this issue Jun 25, 2024

[csrng] V2S Signoff #22467

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[edn] Elevate CSRNG status errors to fatal alerts #15894

[edn] Elevate CSRNG status errors to fatal alerts #15894

cindychip commented Nov 1, 2022 •

edited by andreaskurth

Loading

tjaychen commented Nov 1, 2022

mwbranstad commented Nov 2, 2022

tjaychen commented Nov 2, 2022

mwbranstad commented Nov 2, 2022

tjaychen commented Nov 2, 2022

vogelpi commented Nov 25, 2022

tjaychen commented Nov 29, 2022

cdgori commented Dec 29, 2022

andreaskurth commented Mar 1, 2023

vogelpi commented Mar 3, 2023

andreaskurth commented Mar 7, 2023

marnovandermaas commented Mar 23, 2023 •

edited

Loading

vogelpi commented Mar 30, 2023

vogelpi commented Jan 15, 2024

h-filali commented Jan 16, 2024

vogelpi commented Feb 5, 2024

johngt commented Feb 9, 2024

vogelpi commented Feb 9, 2024

[edn] Elevate CSRNG status errors to fatal alerts #15894

[edn] Elevate CSRNG status errors to fatal alerts #15894

Comments

cindychip commented Nov 1, 2022 • edited by andreaskurth Loading

tjaychen commented Nov 1, 2022

mwbranstad commented Nov 2, 2022

tjaychen commented Nov 2, 2022

mwbranstad commented Nov 2, 2022

tjaychen commented Nov 2, 2022

vogelpi commented Nov 25, 2022

tjaychen commented Nov 29, 2022

cdgori commented Dec 29, 2022

andreaskurth commented Mar 1, 2023

vogelpi commented Mar 3, 2023

andreaskurth commented Mar 7, 2023

marnovandermaas commented Mar 23, 2023 • edited Loading

vogelpi commented Mar 30, 2023

vogelpi commented Jan 15, 2024

h-filali commented Jan 16, 2024

vogelpi commented Feb 5, 2024

johngt commented Feb 9, 2024

vogelpi commented Feb 9, 2024

cindychip commented Nov 1, 2022 •

edited by andreaskurth

Loading

marnovandermaas commented Mar 23, 2023 •

edited

Loading