Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[sw] polling in ROM #6220

Open
tjaychen opened this issue Apr 20, 2021 · 10 comments
Open

[sw] polling in ROM #6220

tjaychen opened this issue Apr 20, 2021 · 10 comments
Assignees
Labels
Component:Software Issue related to Software Earlgrey-PROD Candidate Temporary label to triage issues into Earlgrey-PROD Milestones Hotlist:Software to be resolved in Software Committee Priority:P2 Priority: medium SW:ROM ROM related issues

Comments

@tjaychen
Copy link

Certain setup routines (for example sram / flash scramble key request) require the ROM to make a hardware request and poll for its completion. Polling in ROM however, can be bad if the hardware underneath failed and the ROM just never gets out of the loop.

As a result, we should think a bit about what kind of polling-timeout scheme we want in ROM and how that should terminate in the event of a failure.

@moidx moidx added the SW:ROM ROM related issues label Apr 20, 2021
@moidx moidx added this to the ROM-S1 milestone Sep 28, 2021
@alphan alphan modified the milestones: ROM-S1, ROM Hardened Nov 29, 2021
@tjaychen
Copy link
Author

tjaychen commented Feb 4, 2022

@alphan
@moidx
is this already handled? I figure you guys probably have something planned for it already.

@tjaychen tjaychen added the Hotlist:Software to be resolved in Software Committee label Feb 4, 2022
@arunthomas arunthomas added the Component:Software Issue related to Software label Mar 25, 2022
@alphan
Copy link
Contributor

alphan commented May 10, 2022

Thanks @tjaychen. Currently we are not using an explicit polling-timeout scheme. Instead, we are relying on the watchdog which is initialized early in the boot with a safe default (TBD) here and reconfigured based on the life cycle state here. Also, ROM does not pet the watchdog, we plan to set the timeout to a value that is long enough for proper ROM execution.

The main reason for this approach is simplicity, i.e. no iteration counts to tune based on the clock frequency or peripheral being polled e.g. sram_ctrl, flash_ctrl, otbn, keymgr, hmac, uart, etc.

@tjaychen
Copy link
Author

@alphan just to clarify, wouldn't a watchdog in this case actually reboot the device?
If that were to happen, will you guys get any information on what hung where? We may be able to make use of the rstmgr crash dump here as well, but i'm not sure if that information is exposed.

@alphan
Copy link
Contributor

alphan commented May 10, 2022

Good point, I think I was assuming debugging would provide that information.

@tjaychen
Copy link
Author

yeah we have a rstmgr crashdump that basically is intended to show where the processor was prior to reset.
However we will have to expose that info somehow though, because if we watchdog during production state, and it's a ROM timeout, we may never get to a debug interface or a modifiable piece of software to dump out where things to stuck.

The "timeout" failure I was referring to was a failure, and more a "fall-through" of a polling loop, that would then manifest some kind of user visible error so we can figure out where to look.

All of these are possible options though.

@arunthomas arunthomas added Hotlist:Software Opinion and removed Hotlist:Software to be resolved in Software Committee labels Jun 3, 2022
@alphan
Copy link
Contributor

alphan commented Sep 28, 2022

I think we can close this since we now can

  • Attach a debugger,
  • Build programs to be executed from SRAM (e.g. for dumps over UART) and inject them.

@tjaychen @dmcardle plmk if you have any concerns.

@alphan alphan closed this as completed Sep 28, 2022
@tjaychen
Copy link
Author

it's not entirely the same, because the assumption is that some kind of lock-up is happening after we enter life cycle PROD, which means the debugger would no longer be available. We have seen this from time to time just because during production there's more volume and it can expose corner cases not previously thought of.

We don't necessarily need a fall through like what i'm suggesting, but just have some kind of plan of how we should debug if things were to get stuck.

@tjaychen tjaychen reopened this Sep 28, 2022
@alphan
Copy link
Contributor

alphan commented Sep 28, 2022

Yeah, prod can be tricky. I think this deserves an offline discussion.

@arunthomas arunthomas added Hotlist:Software to be resolved in Software Committee and removed Hotlist:Software Opinion labels Mar 9, 2023
@msfschaffner
Copy link
Contributor

@timothytrippel @moidx @cfrantz @cdgori should we pick this discussion up again for PROD?

@msfschaffner msfschaffner added the Earlgrey-PROD Candidate Temporary label to triage issues into Earlgrey-PROD Milestones label Oct 6, 2023
@cfrantz
Copy link
Contributor

cfrantz commented Jul 1, 2024

As of 0539d58 (and to my best knowledge)

Known polling loops without timeouts:

  • uart functions: spin waiting if TX FIFO is full.
  • Some functions (none currently used in ROM) spin waiting for TX FIFO empty.

Known polling loops with timeouts:

  • pwrmgr_cdc_sync spins with a CPU mcycle timeout (the timeout is the amount of time a CDC sync should take, plus a small amount of error overhead).
  • shutdown's UART functions spin with a CPU mcycle timeout (the timeout is the amount of time it should take to empty the TX fifo).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component:Software Issue related to Software Earlgrey-PROD Candidate Temporary label to triage issues into Earlgrey-PROD Milestones Hotlist:Software to be resolved in Software Committee Priority:P2 Priority: medium SW:ROM ROM related issues
Projects
None yet
Development

No branches or pull requests

6 participants