-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: various tweaks to, and a pile of documentation for, the switcher and exception handler #320
base: main
Are you sure you want to change the base?
Conversation
37ccd9e
to
6616047
Compare
e8af222
to
ee9a1a6
Compare
OK, I believe this is ready for review. As so often happens with my PRs, this is somewhat separable, with many of the "bottom" commits suitable for landing even if the whole thing is not, if that'd be desirable. In the same spirit, this is almost surely best reviewed commit by commit, rather than all at once. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you comment out the labels that are not branch targets? I found that they made the code harder to read, whereas everything else was a big improvement.
sdk/core/switcher/entry.S
Outdated
* tp, t2: scratch | ||
*/ | ||
/* | ||
* The caller should back up all caller saved registers. Spill |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This took a couple of attempts to parse. The caller is asserting that they don’t care about the values in caller-save registers. There is no obligation to spill them, they may simply be unused.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check that you're happy with the new comment here.
sdk/core/switcher/entry.S
Outdated
* callee-save registers carefully. If we find ourselves unable to do | ||
* so, we'll return an error to the caller (via the exception path; see | ||
* .Lhandle_error_in_switcher). The error handling path assumes that | ||
* the first spill is to the lowest address and guaranteed to trap if |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don’t think that’s a safe assumption. What happens if:
- I clear store global from CSP and don’t have local things in cs0?
- I put a heap cap in CSP and free it in another thread? I think this can’t happen yet but will be possible once the switcher runs with interrupts enabled (or we do multicore).
After more thinking, I think the code is right but the comment is misleading. If we've accidentally run out of stack space, we'll trap on the first instruction and so should gracefully return. If we've maliciously done bad things to csp before invoking the switcher, we should be force unwound.
sdk/core/switcher/entry.S
Outdated
*/ | ||
cgetperm t2, csp | ||
li tp, COMPARTMENT_STACK_PERMISSIONS | ||
bne tp, t2, .Lforce_unwind |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not that this check checks for store local which ensures that this is a stack and so later accesses cannot fault.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check that you are happy with the expanded comment.
* If we have already unwound so far that the TrustedStack::frameoffset is | ||
* pointing at TrustedStack::frames[0] -- that is, if the stack has no | ||
* active frames on it -- then just go back to the context we came from, | ||
* effectively parking this thread in a (slow) infinite loop. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we want to hit the thread exit path in that case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we already have hit the thread exit path in this case, and the scheduler is continuing to try to run us. But all the same, see #328 (not presently part of this PR stack, but if it lands, I'll rebase).
cspecialr ct1, mtdc | ||
// Atlas: t1: pointer to TrustedStack | ||
#ifdef CONFIG_MSHWM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe not this PR, but I think we can assume MSHWM now. We haven't tested without it for ages and I expect everyone to use it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ifdef was useful for benchmarking, not sure if you still want that?
Just mentioning here: I encountered the bug where a compartment's error handler isn't called if the thread is at the root of the trusted stack while testing my future PR that uses the new error handlers in the NW stack (https://github.com/CHERIoT-Platform/network-stack/tree/hlefeuvre/stack-overflow-resilient-error-handler). I applied this patch to my local tree and it fixed it. |
#320 is taking a while since it's a sizable amount of prose. Pull the end-user visible changes off the bottom and make them their own PR. --------- Co-authored-by: Robert Norton <[email protected]> Co-authored-by: Murali Vijayaraghavan <[email protected]>
3d0f0df
to
76f09b0
Compare
e693d23
to
6a2e33b
Compare
* the register spill we created at .Lswitch_entry_first_spill) | ||
* - s1 for the length of the stack suffix to which the callee is entitled | ||
*/ | ||
//.Lswitch_stack_chop: | ||
cgetaddr s0, csp | ||
cgetbase s1, csp | ||
csetaddr csp, csp, s1 | ||
sub s1, s0, s1 | ||
csetboundsexact ct2, csp, s1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, what's the guarantee that this wouldn't fail?
Is there a better way to carve things out so that we dont fail here instead of trying to use csetboundsexact and failing? Perhaps get the cap with the same base but smaller length?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think, once we have it, we should use CSetBoundsRoundDown; I've left myself a note over on the emerging to-do list at #334 .
Yes. I’m not too worried about that because we jump to the value from ra only after restoring state. With the current code structure, it would be possible to use the switcher to disable interrupts in an arbitrary piece of code if we exposed an interrupt-inheriting sentry if you:
We could avoid this fairly trivially by just force unwinding if cra does not contain a return sentry. |
971d800
to
07f8316
Compare
b23dc58
to
a3a1b57
Compare
This is sitting atop #346 at the moment (which hopefully lands quickly). The I would very much appreciate more eyes / the same eyes again on this so that it can hopefully land soon. |
cf15d92
to
ebc04c8
Compare
Co-authored-by: David Chisnall <[email protected]> Co-authored-by: Robert Norton <[email protected]>
Co-authored-by: Robert Norton <[email protected]>
Inline check_compartment_stack_integrity since it's used exactly once and makes some assumptions (and had a bug, hard-coding `csp` when it meant `\reg`). Remove the `csp`-relative `lb`, as we have already checked the things that this would that the surviving subsequent permissions check would not. Leave the permission and alignedness checks.
Force unwind clobbers a0 and a1; there's no need to be paranoid about their values before possibly trapping or manually invoking force unwind behavior.
.Lout_of_trusted_stack is used only by __Z26compartment_switcher_entryz, so move it up closer to that, next to the other local error paths.
When loading the target thread's trusted stack, do more work in csp, just so we don't have to explain that ct0 gets clobbered by something else in the next instruction. This may be excessive.
When in .Lskip_compartment_call, t0 is spare, in that it will be overwritten by one of... - us, in a few instructions (the zeroAllRegistersExcept), or - zeroAllRegistersExcept in .Ltherad_exit, via .Lset_mcause_and_exit_thread, or - the exception path (.Lforce_unwind via .Lhandle_error_in_switcher) bringing us back here, one TrustedFrame further up. We can use it instead of tp for a temporary value, avoiding the need to reload mtdc.
Having spilled s0 already, we're free to use it rather than clobber t2. We'll zero t2 on the error path here, so there is no concern of it leaking.
These were only useful on simulators that did not report all effects of instructions, and we no longer use any such.
As @davidchisnall points out, just because sp's address is zero doesn't mean that the rest of the cap is too. So, if it looks like we're dead in the water, make sure we're actually dead by zeroing mtcc completely. While here, slightly tidy the code by flipping the condition and moving the wedging operation out of line. Add an instruction guaranteed to trap after nulling MTCC.
Co-authored-by: David Chisnall <[email protected]>
Include "Register Atlas"es and updates at key points, making it hopefully easier to understand what's going on at any given point in the code. Annotate arcs in the CFG at their target, not just their source, as well as IRQ disposition and live-in and live-out register sets. The intention is that much of this should be amenable to automatic verification, but no such tooling yet exists. To ease review, this large commit changes no bytes in the actual assembled output. Co-authored-by: David Chisnall <[email protected]> Co-authored-by: Murali Vijayaraghavan <[email protected]> Co-authored-by: Robert Norton <[email protected]>
As embarrassing as it might be, it's still better to share than not.
ebc04c8
to
4f9e9f6
Compare
The switcher is a very small, and yet also big, chunk of carefully hand-written, security-critical assembler. In absence of formal verification of correctness, add more commentary about what's going on.
At the time of writing, this just adds "Register Atlas"es to the
__Z26compartment_switcher_entryz
entry point (and very slightly moves some code around).I would also like to have "Control Flow Atlas"es with backwards arcs from labels with interesting control flow, but I haven't gotten there yet.
Mostly posting for bikeshedding and commentary; commit early, commit often.