Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: various tweaks to, and a pile of documentation for, the switcher and exception handler #320

Open
wants to merge 15 commits into
base: main
Choose a base branch
from

Conversation

nwf
Copy link
Member

@nwf nwf commented Oct 18, 2024

The switcher is a very small, and yet also big, chunk of carefully hand-written, security-critical assembler. In absence of formal verification of correctness, add more commentary about what's going on.

At the time of writing, this just adds "Register Atlas"es to the __Z26compartment_switcher_entryz entry point (and very slightly moves some code around).

I would also like to have "Control Flow Atlas"es with backwards arcs from labels with interesting control flow, but I haven't gotten there yet.

Mostly posting for bikeshedding and commentary; commit early, commit often.

@nwf nwf marked this pull request as draft October 18, 2024 23:12
sdk/core/switcher/entry.S Outdated Show resolved Hide resolved
sdk/core/switcher/entry.S Outdated Show resolved Hide resolved
@nwf nwf force-pushed the 202410-switcher-commentary branch 5 times, most recently from 37ccd9e to 6616047 Compare October 24, 2024 22:19
@nwf nwf changed the title WIP: rework switcher commentary WIP: various tweaks to, and a pile of documentation for, the switcher and exception handler Oct 24, 2024
@nwf nwf force-pushed the 202410-switcher-commentary branch 4 times, most recently from e8af222 to ee9a1a6 Compare October 25, 2024 03:49
@nwf nwf marked this pull request as ready for review October 25, 2024 03:50
@nwf
Copy link
Member Author

nwf commented Oct 25, 2024

OK, I believe this is ready for review. As so often happens with my PRs, this is somewhat separable, with many of the "bottom" commits suitable for landing even if the whole thing is not, if that'd be desirable. In the same spirit, this is almost surely best reviewed commit by commit, rather than all at once.

Copy link
Collaborator

@davidchisnall davidchisnall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you comment out the labels that are not branch targets? I found that they made the code harder to read, whereas everything else was a big improvement.

sdk/core/switcher/entry.S Outdated Show resolved Hide resolved
sdk/core/switcher/entry.S Outdated Show resolved Hide resolved
* tp, t2: scratch
*/
/*
* The caller should back up all caller saved registers. Spill
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This took a couple of attempts to parse. The caller is asserting that they don’t care about the values in caller-save registers. There is no obligation to spill them, they may simply be unused.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check that you're happy with the new comment here.

* callee-save registers carefully. If we find ourselves unable to do
* so, we'll return an error to the caller (via the exception path; see
* .Lhandle_error_in_switcher). The error handling path assumes that
* the first spill is to the lowest address and guaranteed to trap if
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t think that’s a safe assumption. What happens if:

  • I clear store global from CSP and don’t have local things in cs0?
  • I put a heap cap in CSP and free it in another thread? I think this can’t happen yet but will be possible once the switcher runs with interrupts enabled (or we do multicore).

After more thinking, I think the code is right but the comment is misleading. If we've accidentally run out of stack space, we'll trap on the first instruction and so should gracefully return. If we've maliciously done bad things to csp before invoking the switcher, we should be force unwound.

*/
cgetperm t2, csp
li tp, COMPARTMENT_STACK_PERMISSIONS
bne tp, t2, .Lforce_unwind
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not that this check checks for store local which ensures that this is a stack and so later accesses cannot fault.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check that you are happy with the expanded comment.

sdk/core/switcher/entry.S Outdated Show resolved Hide resolved
* If we have already unwound so far that the TrustedStack::frameoffset is
* pointing at TrustedStack::frames[0] -- that is, if the stack has no
* active frames on it -- then just go back to the context we came from,
* effectively parking this thread in a (slow) infinite loop.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we want to hit the thread exit path in that case?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we already have hit the thread exit path in this case, and the scheduler is continuing to try to run us. But all the same, see #328 (not presently part of this PR stack, but if it lands, I'll rebase).

sdk/core/switcher/entry.S Outdated Show resolved Hide resolved
sdk/core/switcher/entry.S Outdated Show resolved Hide resolved
cspecialr ct1, mtdc
// Atlas: t1: pointer to TrustedStack
#ifdef CONFIG_MSHWM
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe not this PR, but I think we can assume MSHWM now. We haven't tested without it for ages and I expect everyone to use it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ifdef was useful for benchmarking, not sure if you still want that?

@hlef
Copy link
Contributor

hlef commented Oct 29, 2024

Just mentioning here: I encountered the bug where a compartment's error handler isn't called if the thread is at the root of the trusted stack while testing my future PR that uses the new error handlers in the NW stack (https://github.com/CHERIoT-Platform/network-stack/tree/hlefeuvre/stack-overflow-resilient-error-handler). I applied this patch to my local tree and it fixed it.

@nwf nwf mentioned this pull request Oct 29, 2024
nwf added a commit that referenced this pull request Oct 29, 2024
#320 is taking a while since it's a sizable amount of prose. Pull the
end-user visible changes off the bottom and make them their own PR.

---------

Co-authored-by: Robert Norton <[email protected]>
Co-authored-by: Murali Vijayaraghavan <[email protected]>
@nwf nwf force-pushed the 202410-switcher-commentary branch 5 times, most recently from 3d0f0df to 76f09b0 Compare October 30, 2024 03:47
@nwf nwf force-pushed the 202410-switcher-commentary branch 3 times, most recently from e693d23 to 6a2e33b Compare November 5, 2024 15:38
* the register spill we created at .Lswitch_entry_first_spill)
* - s1 for the length of the stack suffix to which the callee is entitled
*/
//.Lswitch_stack_chop:
cgetaddr s0, csp
cgetbase s1, csp
csetaddr csp, csp, s1
sub s1, s0, s1
csetboundsexact ct2, csp, s1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, what's the guarantee that this wouldn't fail?

Is there a better way to carve things out so that we dont fail here instead of trying to use csetboundsexact and failing? Perhaps get the cap with the same base but smaller length?

Copy link
Member Author

@nwf nwf Nov 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, once we have it, we should use CSetBoundsRoundDown; I've left myself a note over on the emerging to-do list at #334 .

sdk/core/switcher/entry.S Outdated Show resolved Hide resolved
@davidchisnall
Copy link
Collaborator

In the relaxed scenario, you would call the switcher with an interrupt inheriting sentry, and hence can set up any register as the returning register?

Yes. I’m not too worried about that because we jump to the value from ra only after restoring state. With the current code structure, it would be possible to use the switcher to disable interrupts in an arbitrary piece of code if we exposed an interrupt-inheriting sentry if you:

  1. Derive a capability from your PCC and set the address wherever you want.
  2. Put that value in cra.
  3. Invoke the switcher to jump to an interrupts-disabled function.
  4. The switcher resumes still interrupt-disabled.
  5. Switcher jumps to cra.

We could avoid this fairly trivially by just force unwinding if cra does not contain a return sentry.

sdk/core/switcher/entry.S Outdated Show resolved Hide resolved
@nwf nwf force-pushed the 202410-switcher-commentary branch from 971d800 to 07f8316 Compare November 7, 2024 23:10
@nwf nwf force-pushed the 202410-switcher-commentary branch 2 times, most recently from b23dc58 to a3a1b57 Compare November 13, 2024 03:44
@nwf
Copy link
Member Author

nwf commented Nov 13, 2024

This is sitting atop #346 at the moment (which hopefully lands quickly). The dot_from_switcher.lua script has grown the ability to cross-check not just FROM/IFROM/ITO annotations but also IRQ ASSUME and IRQ REQUIRE. Assuming I have gotten the script and the latter annotations right, there are no more IRQ-related surprises lurking in the switcher.

I would very much appreciate more eyes / the same eyes again on this so that it can hopefully land soon.

@nwf nwf force-pushed the 202410-switcher-commentary branch 2 times, most recently from cf15d92 to ebc04c8 Compare November 13, 2024 19:43
nwf and others added 15 commits November 14, 2024 16:36
Co-authored-by: David Chisnall <[email protected]>
Co-authored-by: Robert Norton <[email protected]>
Inline check_compartment_stack_integrity since it's used exactly once
and makes some assumptions (and had a bug, hard-coding `csp` when it
meant `\reg`).

Remove the `csp`-relative `lb`, as we have already checked the things
that this would that the surviving subsequent permissions check would
not.  Leave the permission and alignedness checks.
Force unwind clobbers a0 and a1; there's no need to be paranoid about
their values before possibly trapping or manually invoking force unwind
behavior.
.Lout_of_trusted_stack is used only by __Z26compartment_switcher_entryz,
so move it up closer to that, next to the other local error paths.
When loading the target thread's trusted stack, do more work in csp,
just so we don't have to explain that ct0 gets clobbered by something
else in the next instruction.  This may be excessive.
When in .Lskip_compartment_call, t0 is spare, in that it will be
overwritten by one of...
- us, in a few instructions (the zeroAllRegistersExcept), or
- zeroAllRegistersExcept in .Ltherad_exit, via
  .Lset_mcause_and_exit_thread, or
- the exception path (.Lforce_unwind via .Lhandle_error_in_switcher)
  bringing us back here, one TrustedFrame further up.

We can use it instead of tp for a temporary value, avoiding the need to
reload mtdc.
Having spilled s0 already, we're free to use it rather than clobber t2.
We'll zero t2 on the error path here, so there is no concern of it
leaking.
These were only useful on simulators that did not report all effects of
instructions, and we no longer use any such.
As @davidchisnall points out, just because sp's address is zero doesn't
mean that the rest of the cap is too.  So, if it looks like we're dead
in the water, make sure we're actually dead by zeroing mtcc completely.

While here, slightly tidy the code by flipping the condition and moving
the wedging operation out of line.  Add an instruction guaranteed to
trap after nulling MTCC.
Include "Register Atlas"es and updates at key points, making it
hopefully easier to understand what's going on at any given point in the
code.

Annotate arcs in the CFG at their target, not just their source, as well
as IRQ disposition and live-in and live-out register sets.

The intention is that much of this should be amenable to automatic
verification, but no such tooling yet exists.

To ease review, this large commit changes no bytes in the actual
assembled output.

Co-authored-by: David Chisnall <[email protected]>
Co-authored-by: Murali Vijayaraghavan <[email protected]>
Co-authored-by: Robert Norton <[email protected]>
As embarrassing as it might be, it's still better to share than not.
@nwf nwf force-pushed the 202410-switcher-commentary branch from ebc04c8 to 4f9e9f6 Compare November 14, 2024 19:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants