-
Notifications
You must be signed in to change notification settings - Fork 781
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[edn] Elevate CSRNG status errors to fatal alerts #15894
Comments
it seems like to me that if we encounter this situation, the only way to handle right now is to completely disable / re-enable the entropy complex. (or reboot). It does feel like however this should be linked into recoverable alerts, but it doesn't seem like that right now. |
the only case where the sts bit will be on is when a silicon error occurs, or some attacker is driving this bit on. In other words there is no functional reason that this bit will set. I would probably rate this bit being set as more severe, such as a fatal alert. |
@mwbranstad if it's alright with you, i can go in and make that change. |
@tjaychen sounds good, just add function only at this point. |
sounds good. Let me know if there are any objections. |
The exact same question cam also up during the work for CSRNG V2/V2S (#16516). And I think there are two different questions to answer here:
The second question is probably better discussed in #16516 which gives some more context. I am thus removing the CSRNG label here. As for what EDN should do I think the best would be to not distribute the data received upon getting an sts error from CSRNG. AS @mwbranstad right now the sts bit cannot be driven to 1 by CSRNG unless someone injects a fault. But from the spec there might be other reasons to (let's discuss in #16516). What I am getting at is that EDN doesn't need to know why sts got asserted. It's just an upstream error condition that got signaled. And to me it seems that just continuing regular operation upon getting such an error isn't right. |
but for random bits already generated and distributed to the EDN fifo's, wouldn't it be okay for them to continue distribution? I do agree though that I guess if we go down this route though, we need to do a check of the edn consumers to make sure that if the edn interface doesn't ack (because something has stopped due to errors upstream), it should actually halt the end point. For example I don't think |
Probably good to elevate CSRNG status errors to fatal, but we should review what the causes of those errors are in case any of them should not be elevated (not sure what that would be off the top of my head though). EDN <-> CSRNG req/ack should halt once that error is flagged/reported. As far as entropy already down the pipe: One can certainly argue that entropy from the CSRNG already distributed to the EDN prior to the status error is safe to continue to distribute until the EDN runs dry - those entropy samples passed all the relevant checks at the time of whitening/processing and no errors were present at that time. One could also be ultra-paranoid and take an aggressive flushing posture as soon as any error is detected, but that could prevent tidying-up actions that might want to use entropy (e.g. to clear), which is problematic. I don't think the ultra-paranoia is justified here but happy to hear other opinions on this posture. (@mwbranstad / @martin-lueker / @johannheyszl / @moidx) 100% on your non-ack comment, that needs to be halted as it indicates some more serious fault that EDN consumers need to be aware of and handle properly. |
Triaged for |
P1 sounds right to me @andreaskurth . I agree with @tjaychen and @cdgori that entropy already inside EDN when the |
SGTM. As we've discussed yesterday, when this case happens, SW must be able to detect and handle it. This may require adding some status bit or even interrupt to EDN. |
M2.5 effort estimate, range 2-16 depending on whether we decide to make design changes:
|
We've discussed this in the Security WG meeting on Mar 30 2023 and concluded the following:
However, it is also understood that this should be fixed for a future release. I am thus changing the labels accordingly. |
FYI @h-filali , this is very much related to the status register work you're currently doing. We should also fix this for production. We should fix this as described in my earlier comment. |
@vogelpi thanks for linking me. I suggest the EDN should enter the error state in the main SM upon receiving an status error from CSRNG. It should then signal a recoverable alert and not accept any further data from CSRNG until the whole entropy complex is disabled and re-enabled. This would be needed since continuing normal operation after receiving an error status signal from CSRNG could be catastrophic. A really bad example would be that the generate in boot mode could fail and thus boot entropy could not be delivered while the system keeps on waiting for boot entropy. Regarding what happens with the entropy, inside the EDN, which was received before the failure: The above covers the HW modes (boot and auto mode). For SW mode the sw_cmd_sts signal should be fed through to the sw_cmd_sts register which can be read by firmware. From there it could either be the responsibility of the firmware to restart the entropy complex. In this case the EDN continues operation. Or we could proceed as above for the HW modes. What do you think @vogelpi. |
I've already discussed this offline with @h-filali but for others previously involved in this discussion here: This sounds all good to me. We've discussed to add a new recoverable error state into the main FSM out of which the FSM can get by a disable/enable cycle. The current error state is fatal which is not suitable for the behavior needed here. The Ack FSM we can leave untouched. End points will get the remaining entropy already received but further commands won't be sent out to CSRNG. Also, some more status/ack bits have to be added for the hardware modes. Similar to the recently added bits to the SW_CMD_STS register. Hakim has come up with a plan for doing this. |
@vogelpi - I assume that based on the conversation there might be significant work here. When it is possible could you please update the total effort expected and propagate to the top down + tracker doc. TIA! |
I would like to confirm the behavior when EDN encounters csrng errors (csrng_rsp_sts):
1). On the EDN side, upon receiving a
csrng_rsp_sts
error, it will report tocmd_sts
register. But will still distribute the received data to all EDN requesters.2). On CSRNG side, (only from reading the spec, I do not actually know the simulation results). It seems like CSRNG will fire an interrupt.
I would like to confirm if these behaviors are enough to treat the
csrng_rsp_sts
error?@tjaychen @moidx @martin-lueker @mwbranstad
Thanks,
Cindy
The text was updated successfully, but these errors were encountered: