Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CMO exceptions don't look quite right #213

Closed
Timmmm opened this issue Apr 22, 2024 · 29 comments · Fixed by #216
Closed

CMO exceptions don't look quite right #213

Timmmm opened this issue Apr 22, 2024 · 29 comments · Fixed by #216
Labels
bug Something isn't working documentation Improvements or additions to documentation

Comments

@Timmmm
Copy link
Contributor

Timmmm commented Apr 22, 2024

The CMO/prefetch exceptions are:

ifdef::cbo_clean_flush[]
| Permission violation  | It does not grant <<w_perm>> and <<r_perm>>
| Length violation      | At least one byte accessed is within the bounds
endif::cbo_clean_flush[]

ifdef::cbo_inval[]
| Permission violation  | It does not grant <<w_perm>>, <<r_perm>> or <<asr_perm>>
| Length violation      | At least one byte accessed is outside the bounds
endif::[]

ifdef::prefetch_i[]
| Permission violation  | It does not grant <<x_perm>>
| Length violation      | At least one byte accessed is within the bounds
endif::[]

ifdef::prefetch_r[]
| Permission violation  | It does not grant <<r_perm>>
| Length violation      | At least one byte accessed is within the bounds
endif::[]

ifdef::prefetch_w[]
| Permission violation  | It does not grant <<w_perm>>
| Length violation      | At least one byte accessed is within the bounds
endif::[]
  1. The prefetch instructions shouldn't ever raise exceptions because they're just hints. The (non-cheri) spec says:

A cache-block prefetch instruction is permitted to access the specified cache block whenever a load instruction,
store instruction, or instruction fetch is permitted to access the corresponding physical addresses. If access to the
cache block is not permitted, a cache-block prefetch instruction does not raise any exceptions
and shall not
access any caches or memory

This makes sense because they're just or.i instructions semantically. In fact do they even need to be mentioned in this spec at all?

  1. Shouldn't cbo.clean and cbo.inval have the same Length violation? Pretty sure they should both be At least one byte accessed is outside the bounds.

  2. The CBO spec actually says you only need read OR write permission, not both:

A cache-block management instruction is permitted to access the specified cache block whenever a load
instruction or store instruction is permitted to access the corresponding physical addresses. If neither a load
instruction nor store instruction is permitted to access the physical addresses, but an instruction fetch is
permitted to access the physical addresses, whether a cache-block management instruction is permitted to access
the cache block is UNSPECIFIED. (yeay)

@jrtc27
Copy link
Collaborator

jrtc27 commented Apr 22, 2024

  1. Yes because it matters for microarchitectural side channels
  2. No. Clean is safe, inval is not, because the latter throws away data in the cache line, acting sort of like a write to the whole line with whatever possibly stale data is in DRAM.
  3. Yes, and this is a bad idea. More sensible architectures like AArch64 require write permission, because if it’s not writable the data shouldn’t be changing.

@Timmmm
Copy link
Contributor Author

Timmmm commented Apr 23, 2024

Yes because it matters for microarchitectural side channels

Yes it should be mentioned, or yes it should raise an exception? Can you expand a bit? I get that this is a spectre danger zone, but surely it's wrong for ori x0, x0, 0 to raise an exception?

Clean is safe

Yeah but it's saying raise a length exception if at least one byte is within bounds. At the very least it should be None of the bytes accessed are in bounds shouldn't it?

Also I agree clean is safe but the CBO extension requires read or write access to the entire block. That does seem wrong but it also seems weird for CHERI to do something different.

Yes, and this is a bad idea.

Yeah though as you say only clean (and hence flush) are really "writes". So should cbo.inval require write permission? I can definitely think of situations where you are reading from read-only memory and you want to use cbo.inval no?

@PeterRugg
Copy link
Contributor

@Timmmm I think this was already fixed? The current spec pages for the prefetch instructions say "the instruction does not perform a prefetch if...." rather than listing exception conditions. Is there a separate list of exceptions that's stale somewhere?

The decision to add the capability info to the instruction is to allow microarchitecture to protect itself from side channels. You're right though, that they mustn't trap, especially as a result of bounds.

You said that "clean, and hence flush" are writes: I see what you mean, but that's from the memory subsystem's point of view. From the program's point of view, an invalidate is absolutely a write: it writes the address to some value that address previously held. Clean and flush are not writes from that point of view: they don't change what the program is allowed to observe, only non-coherent things.

When reading from read-only memory, surely invalidate and flush are indistinguishable? The only difference is how dirty lines in the cache are treated, but if it's read only, then there can't be any?

@Timmmm
Copy link
Contributor Author

Timmmm commented Apr 23, 2024

I think this was already fixed?

Ah you're right. I only noticed them in cbo_exceptions.adoc - it looks like they were accidentally left there when the link to that file from the prefetch instructions was removed. I'll make a PR to fix it.

When reading from read-only memory

Sorry I wasn't clear - I mean suppose you have memory that is read-only from the current program's point of view (because it hasn't been given write access). But that memory is also being written to by a different non-coherent agent. You might still want to do cbo.invalidate on it right?

@PeterRugg
Copy link
Contributor

If you haven't made a write to it, then you shouldn't care whether you invalidate or flush. It seems like software that requires an invalidate rather than a flush is always broken, because the cache can write out the line whenever it likes anyway.

@Timmmm
Copy link
Contributor Author

Timmmm commented Apr 23, 2024

Isn't the point of invalidating that some other agent may have written to it, so then their writes become visible to us? Or am I totally misunderstanding this?

An invalidate operation makes data from store operations performed by a set of non-coherent agents visible to the set of coherent agents at a point common to both sets by deallocating all copies of a cache block from the set of coherent caches up to that point

@jrtc27
Copy link
Collaborator

jrtc27 commented Apr 23, 2024

Yes, but if you’re not writing to it, flush and invalidate are the same

@Timmmm
Copy link
Contributor Author

Timmmm commented Apr 23, 2024

I agree... Sorry maybe we're talking about different things or I'm just being really stupid. Here's what I mean:

image

  1. Hart 0 does a normal load of some block.
  2. Some other agent does an incoherent write to that block. It's incoherent so it doesn't automatically clear caches etc.
  3. The hart's cache is now stale but it doesn't know it.
  4. The hart does a cmo.inval on the block, clearing its cache.
  5. So when it does a read next it gets the value from the other agent instead of the stale 7.

The CMO spec isn't the clearest but I think that's the sort of thing it's supposed to enable right?

In this case Hart 0 only does reads. It might only have read permission from its CHERI capabilities. But it's still useful to be able to execute cbo.inval.

@PeterRugg
Copy link
Contributor

For that example though, flush would do exactly the same as invalidate.

@Timmmm
Copy link
Contributor Author

Timmmm commented Apr 23, 2024

Yes I agree.... My point was that CHERI requires read and write permission for cbo.inval, but the CBO spec only requires read or write. And I think this is an example of where it would make sense to be able to execute cbo.inval with only read permission.

@Timmmm
Copy link
Contributor Author

Timmmm commented Apr 23, 2024

Unless I am reading this differently:

It does not grant W-permission, R-permission or ASR-permission

Is that saying if not (grants(read) or grants(write) or grants(asr)) or if (not grants(read)) or (not grants(write)) or (not grants(asr))?

Probably should be reworded to be not ambiguous anyway!

@PeterRugg
Copy link
Contributor

So cheri needs read and write for inval, because of something like the following:

k = secret key;

// do some work

k = 0; // clear the key

untrused_func((const int *) &k); //pass read only pointer

If untrusted_func is allowed to do an invalidate, not only can it read the old value of secret key, which is bad enough (and why I think we said this needed ASR), but it can also violate the read-only because it's changed the value of k.

Your example points to where the code needs to run either cbo.flush or cbo.inval, but can't tell the difference. Since cbo.inval is so dangerous, why wouldn't we insist such code just has to do cbo.flush instead of cbo.inval?

@jrtc27
Copy link
Collaborator

jrtc27 commented Apr 23, 2024

To make a broader statement:

It is always correct to use flush rather than invalidate. If your code relies on using invalidate, it is broken, because the line could be written back by the hardware at any point anyway. Moreover, flush is safe, because it’s an operation that the hardware can do at any point. Invalidate, however, is not. It exists solely as an optimisation for when you know you don’t need to write back. It is therefore completely reasonable to heavily restrict when you can use it, and to require potentially-dangerous scenarios to use flush instead.

@Timmmm
Copy link
Contributor Author

Timmmm commented Apr 23, 2024

Ah great example Peter. That makes sense, thank you both! Could be a great thing to add to an non-normative aside in the spec (I really like those in the main spec).

I think the final thing is that this is definitely incorrect:

| Length violation      | At least one byte accessed is within the bounds

Presumably it is meant to be the opposite?

| Length violation      | None of the bytes accessed are within the bounds

@Timmmm
Copy link
Contributor Author

Timmmm commented Apr 23, 2024

(for cbo.flush and cbo.clean)

@PeterRugg
Copy link
Contributor

Yeah, it's one thing I noticed when I first ran through the spec: the conditions are a random mix of saying "exception when..." and "exception unless...". Could be good to do a pass making it consistent to avoid issues like this.

@Timmmm
Copy link
Contributor Author

Timmmm commented Apr 23, 2024

Cool I'll make a PR for that. I think most of them are "exception when" so I would view any that are "unless" as a bug.

Timmmm added a commit to Timmmm/riscv-cheri that referenced this issue Apr 23, 2024
The sense of this column elsewhere is "raise a length violation if <condition>", but here it was backwards.

Fixes riscv#213
@arichardson
Copy link
Collaborator

See also CTSRD-CHERI/cheri-specification#65. My view is that INVAL is so dangerous it should require almighty permissions and bounds since it can time travel and bring back capability patterns that are outside the bounds of any reachable capability.

@arichardson
Copy link
Collaborator

However, I do agree that prefetch should not raise exceptions and instead just ignore the operation if the preconditions are not met. That was my understanding of CTSRD-CHERI/cheri-specification#65 and I thought also specified here, but I must have missed that.

@arichardson arichardson reopened this Apr 23, 2024
@Timmmm
Copy link
Contributor Author

Timmmm commented Apr 23, 2024

I thought also specified here, but I must have missed that.

Yeah it was but there was some dead text in a file that said otherwise that confused me. Removed in #214

@andresag01
Copy link
Collaborator

@arichardson : What still needs to be resolved in this ticket? From your comment, I am guessing you re-opened it due to this?

See also CTSRD-CHERI/cheri-specification#65. My view is that INVAL is so dangerous it should require almighty permissions and bounds since it can time travel and bring back capability patterns that are outside the bounds of any reachable capability.

I agree that invalidations are dangerous, but they have legitimate uses and are part of RISC-V. Therefore, I do not think removing them entirely from the ISA will be well-received. Requiring almighty is a bit strange too, for example, why would I need a capability with bounds over the entire address space if I would like to perform an operation over 1 cache line only?

My recollection of previous discussions is that we require a capability that has "sufficient" permissions to do the side-effect operations that an invalidate would incur. For example, an invalidate could "bring back" stale data that was not written back from the cache. If you squint, this could be reproduced using regular stores given that your authorising cap has W permissions and appropriate bounds.

We ended up requiring W, R and ASR permissions along with bounds covering the entire cache line -- ASR is because invalidate needs to read a CSR to work out how it behaves. IMHO, these need to be updated because invalidates could (e.g.) "bring back" (i.e. write) a capability, so my proposal is that invalidates require the following:

  • Read permission (R)
  • Write permission (W)
  • Capability permission (C) -- since caps could be "brought back"
  • ASR permission
  • Bounds containing the entire cache line

@andresag01 andresag01 added bug Something isn't working documentation Improvements or additions to documentation labels Apr 24, 2024
@tariqkurd-repo
Copy link
Collaborator

riscv/riscv-CMOs#56

We've already asked for FLUSH to be a valid implementation of INVAL which is almost is in the CMO spec already but it's better to clarify it. Clearly the result of FLUSH is always correct, so doing this is a valid implementation.

We can add a recommendation that FLUSH is always used instead of INVAL for CHERI-RISC-V, or even strengthen it and say must be used instead and we won't violate the CMO spec as a result.

@jrtc27
Copy link
Collaborator

jrtc27 commented Apr 24, 2024

C is potentially awkward because you may well not have that for device driver mappings.

@PeterRugg
Copy link
Contributor

@andresag01

ASR is because invalidate needs to read a CSR to work out how it behaves.

Hmm, that wasn't quite my understanding: it was more because it's so dangerous. I was viewing this as a theoretical problem, but it probably is actually very practical, and could be a good experiment to do. If we didn't need ASR to do an invalidate, we could have some "compartment" that we model as being compromised just run an invalidate on any capability it ever gets given and see what capabilities it can find in the recovered memory: I wonder how often there will be an almighty cap lying around.

It's not hard to imagine a post-CHERI world where the goto thing for exploits to do when they compromise a compartment is to just run invalidates on every address they can find. At the very least, invalidates aren't giving code with ASR any more privilege than they already have since they can already access any address they like by setting up an evil translation mapping. However, even with ASR, invalidates may actually be an easier escalation route than setting up translation windows to the memory you want to read from.

@tariqkurd-repo
Copy link
Collaborator

tariqkurd-repo commented Apr 26, 2024

presumably this can happen because a revocation sweep hasn't happened? If the revocation has happened then stale caps won't be around. Or... when closing down a compartment it should be cleaned. So it all sounds very avoidable if the stale caps are cleaned up correctly.

@PeterRugg
Copy link
Contributor

It's true that revocation might get rid of them as a side effect, just from churning through the caches. It's not guaranteed though: the revocation sweep won't deliberately flush the caches.

@jrtc27
Copy link
Collaborator

jrtc27 commented Apr 26, 2024

(and you wouldn't want revocation to flush the caches either, that sounds expensive...)

@tariqkurd-repo
Copy link
Collaborator

@Timmmm can we close this one now?

@Timmmm
Copy link
Contributor Author

Timmmm commented May 9, 2024

Yep. I think it might be nice to mention menvcfg[CBIE] but people can probably figure that out.

@Timmmm Timmmm closed this as completed May 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants