CMO exceptions don't look quite right #213

Timmmm · 2024-04-22T21:32:15Z

The CMO/prefetch exceptions are:

ifdef::cbo_clean_flush[]
| Permission violation  | It does not grant <<w_perm>> and <<r_perm>>
| Length violation      | At least one byte accessed is within the bounds
endif::cbo_clean_flush[]

ifdef::cbo_inval[]
| Permission violation  | It does not grant <<w_perm>>, <<r_perm>> or <<asr_perm>>
| Length violation      | At least one byte accessed is outside the bounds
endif::[]

ifdef::prefetch_i[]
| Permission violation  | It does not grant <<x_perm>>
| Length violation      | At least one byte accessed is within the bounds
endif::[]

ifdef::prefetch_r[]
| Permission violation  | It does not grant <<r_perm>>
| Length violation      | At least one byte accessed is within the bounds
endif::[]

ifdef::prefetch_w[]
| Permission violation  | It does not grant <<w_perm>>
| Length violation      | At least one byte accessed is within the bounds
endif::[]

The prefetch instructions shouldn't ever raise exceptions because they're just hints. The (non-cheri) spec says:

A cache-block prefetch instruction is permitted to access the specified cache block whenever a load instruction,
store instruction, or instruction fetch is permitted to access the corresponding physical addresses. If access to the
cache block is not permitted, a cache-block prefetch instruction does not raise any exceptions and shall not
access any caches or memory

This makes sense because they're just or.i instructions semantically. In fact do they even need to be mentioned in this spec at all?

Shouldn't cbo.clean and cbo.inval have the same Length violation? Pretty sure they should both be At least one byte accessed is outside the bounds.
The CBO spec actually says you only need read OR write permission, not both:

A cache-block management instruction is permitted to access the specified cache block whenever a load
instruction or store instruction is permitted to access the corresponding physical addresses. If neither a load
instruction nor store instruction is permitted to access the physical addresses, but an instruction fetch is
permitted to access the physical addresses, whether a cache-block management instruction is permitted to access
the cache block is UNSPECIFIED. (yeay)

The text was updated successfully, but these errors were encountered:

jrtc27 · 2024-04-22T21:55:50Z

Yes because it matters for microarchitectural side channels
No. Clean is safe, inval is not, because the latter throws away data in the cache line, acting sort of like a write to the whole line with whatever possibly stale data is in DRAM.
Yes, and this is a bad idea. More sensible architectures like AArch64 require write permission, because if it’s not writable the data shouldn’t be changing.

Timmmm · 2024-04-23T08:13:17Z

Yes because it matters for microarchitectural side channels

Yes it should be mentioned, or yes it should raise an exception? Can you expand a bit? I get that this is a spectre danger zone, but surely it's wrong for ori x0, x0, 0 to raise an exception?

Clean is safe

Yeah but it's saying raise a length exception if at least one byte is within bounds. At the very least it should be None of the bytes accessed are in bounds shouldn't it?

Also I agree clean is safe but the CBO extension requires read or write access to the entire block. That does seem wrong but it also seems weird for CHERI to do something different.

Yes, and this is a bad idea.

Yeah though as you say only clean (and hence flush) are really "writes". So should cbo.inval require write permission? I can definitely think of situations where you are reading from read-only memory and you want to use cbo.inval no?

PeterRugg · 2024-04-23T09:43:23Z

@Timmmm I think this was already fixed? The current spec pages for the prefetch instructions say "the instruction does not perform a prefetch if...." rather than listing exception conditions. Is there a separate list of exceptions that's stale somewhere?

The decision to add the capability info to the instruction is to allow microarchitecture to protect itself from side channels. You're right though, that they mustn't trap, especially as a result of bounds.

You said that "clean, and hence flush" are writes: I see what you mean, but that's from the memory subsystem's point of view. From the program's point of view, an invalidate is absolutely a write: it writes the address to some value that address previously held. Clean and flush are not writes from that point of view: they don't change what the program is allowed to observe, only non-coherent things.

When reading from read-only memory, surely invalidate and flush are indistinguishable? The only difference is how dirty lines in the cache are treated, but if it's read only, then there can't be any?

Timmmm · 2024-04-23T09:58:27Z

I think this was already fixed?

Ah you're right. I only noticed them in cbo_exceptions.adoc - it looks like they were accidentally left there when the link to that file from the prefetch instructions was removed. I'll make a PR to fix it.

When reading from read-only memory

Sorry I wasn't clear - I mean suppose you have memory that is read-only from the current program's point of view (because it hasn't been given write access). But that memory is also being written to by a different non-coherent agent. You might still want to do cbo.invalidate on it right?

PeterRugg · 2024-04-23T10:01:37Z

If you haven't made a write to it, then you shouldn't care whether you invalidate or flush. It seems like software that requires an invalidate rather than a flush is always broken, because the cache can write out the line whenever it likes anyway.

Timmmm · 2024-04-23T10:48:37Z

Isn't the point of invalidating that some other agent may have written to it, so then their writes become visible to us? Or am I totally misunderstanding this?

An invalidate operation makes data from store operations performed by a set of non-coherent agents visible to the set of coherent agents at a point common to both sets by deallocating all copies of a cache block from the set of coherent caches up to that point

jrtc27 · 2024-04-23T11:20:47Z

Yes, but if you’re not writing to it, flush and invalidate are the same

Timmmm · 2024-04-23T13:18:54Z

I agree... Sorry maybe we're talking about different things or I'm just being really stupid. Here's what I mean:

Hart 0 does a normal load of some block.
Some other agent does an incoherent write to that block. It's incoherent so it doesn't automatically clear caches etc.
The hart's cache is now stale but it doesn't know it.
The hart does a cmo.inval on the block, clearing its cache.
So when it does a read next it gets the value from the other agent instead of the stale 7.

The CMO spec isn't the clearest but I think that's the sort of thing it's supposed to enable right?

In this case Hart 0 only does reads. It might only have read permission from its CHERI capabilities. But it's still useful to be able to execute cbo.inval.

PeterRugg · 2024-04-23T13:22:54Z

For that example though, flush would do exactly the same as invalidate.

Timmmm · 2024-04-23T13:25:12Z

Yes I agree.... My point was that CHERI requires read and write permission for cbo.inval, but the CBO spec only requires read or write. And I think this is an example of where it would make sense to be able to execute cbo.inval with only read permission.

Timmmm · 2024-04-23T13:28:46Z

Unless I am reading this differently:

It does not grant W-permission, R-permission or ASR-permission

Is that saying if not (grants(read) or grants(write) or grants(asr)) or if (not grants(read)) or (not grants(write)) or (not grants(asr))?

Probably should be reworded to be not ambiguous anyway!

PeterRugg · 2024-04-23T13:33:35Z

So cheri needs read and write for inval, because of something like the following:

k = secret key;

// do some work

k = 0; // clear the key

untrused_func((const int *) &k); //pass read only pointer

If untrusted_func is allowed to do an invalidate, not only can it read the old value of secret key, which is bad enough (and why I think we said this needed ASR), but it can also violate the read-only because it's changed the value of k.

Your example points to where the code needs to run either cbo.flush or cbo.inval, but can't tell the difference. Since cbo.inval is so dangerous, why wouldn't we insist such code just has to do cbo.flush instead of cbo.inval?

jrtc27 · 2024-04-23T13:48:03Z

To make a broader statement:

It is always correct to use flush rather than invalidate. If your code relies on using invalidate, it is broken, because the line could be written back by the hardware at any point anyway. Moreover, flush is safe, because it’s an operation that the hardware can do at any point. Invalidate, however, is not. It exists solely as an optimisation for when you know you don’t need to write back. It is therefore completely reasonable to heavily restrict when you can use it, and to require potentially-dangerous scenarios to use flush instead.

Timmmm · 2024-04-23T13:59:02Z

Ah great example Peter. That makes sense, thank you both! Could be a great thing to add to an non-normative aside in the spec (I really like those in the main spec).

I think the final thing is that this is definitely incorrect:

| Length violation      | At least one byte accessed is within the bounds

Presumably it is meant to be the opposite?

| Length violation      | None of the bytes accessed are within the bounds

Timmmm · 2024-04-23T13:59:27Z

(for cbo.flush and cbo.clean)

PeterRugg · 2024-04-23T14:02:41Z

Yeah, it's one thing I noticed when I first ran through the spec: the conditions are a random mix of saying "exception when..." and "exception unless...". Could be good to do a pass making it consistent to avoid issues like this.

Timmmm · 2024-04-23T14:25:48Z

Cool I'll make a PR for that. I think most of them are "exception when" so I would view any that are "unless" as a bug.

The sense of this column elsewhere is "raise a length violation if <condition>", but here it was backwards. Fixes riscv#213

arichardson · 2024-04-23T18:08:25Z

See also CTSRD-CHERI/cheri-specification#65. My view is that INVAL is so dangerous it should require almighty permissions and bounds since it can time travel and bring back capability patterns that are outside the bounds of any reachable capability.

arichardson · 2024-04-23T18:10:41Z

However, I do agree that prefetch should not raise exceptions and instead just ignore the operation if the preconditions are not met. That was my understanding of CTSRD-CHERI/cheri-specification#65 and I thought also specified here, but I must have missed that.

Timmmm · 2024-04-23T19:11:57Z

I thought also specified here, but I must have missed that.

Yeah it was but there was some dead text in a file that said otherwise that confused me. Removed in #214

andresag01 · 2024-04-24T08:48:55Z

@arichardson : What still needs to be resolved in this ticket? From your comment, I am guessing you re-opened it due to this?

See also CTSRD-CHERI/cheri-specification#65. My view is that INVAL is so dangerous it should require almighty permissions and bounds since it can time travel and bring back capability patterns that are outside the bounds of any reachable capability.

I agree that invalidations are dangerous, but they have legitimate uses and are part of RISC-V. Therefore, I do not think removing them entirely from the ISA will be well-received. Requiring almighty is a bit strange too, for example, why would I need a capability with bounds over the entire address space if I would like to perform an operation over 1 cache line only?

My recollection of previous discussions is that we require a capability that has "sufficient" permissions to do the side-effect operations that an invalidate would incur. For example, an invalidate could "bring back" stale data that was not written back from the cache. If you squint, this could be reproduced using regular stores given that your authorising cap has W permissions and appropriate bounds.

We ended up requiring W, R and ASR permissions along with bounds covering the entire cache line -- ASR is because invalidate needs to read a CSR to work out how it behaves. IMHO, these need to be updated because invalidates could (e.g.) "bring back" (i.e. write) a capability, so my proposal is that invalidates require the following:

Read permission (R)
Write permission (W)
Capability permission (C) -- since caps could be "brought back"
ASR permission
Bounds containing the entire cache line

tariqkurd-repo · 2024-04-24T09:28:15Z

riscv/riscv-CMOs#56

We've already asked for FLUSH to be a valid implementation of INVAL which is almost is in the CMO spec already but it's better to clarify it. Clearly the result of FLUSH is always correct, so doing this is a valid implementation.

We can add a recommendation that FLUSH is always used instead of INVAL for CHERI-RISC-V, or even strengthen it and say must be used instead and we won't violate the CMO spec as a result.

jrtc27 · 2024-04-24T10:43:34Z

C is potentially awkward because you may well not have that for device driver mappings.

PeterRugg · 2024-04-24T17:49:05Z

@andresag01

ASR is because invalidate needs to read a CSR to work out how it behaves.

Hmm, that wasn't quite my understanding: it was more because it's so dangerous. I was viewing this as a theoretical problem, but it probably is actually very practical, and could be a good experiment to do. If we didn't need ASR to do an invalidate, we could have some "compartment" that we model as being compromised just run an invalidate on any capability it ever gets given and see what capabilities it can find in the recovered memory: I wonder how often there will be an almighty cap lying around.

It's not hard to imagine a post-CHERI world where the goto thing for exploits to do when they compromise a compartment is to just run invalidates on every address they can find. At the very least, invalidates aren't giving code with ASR any more privilege than they already have since they can already access any address they like by setting up an evil translation mapping. However, even with ASR, invalidates may actually be an easier escalation route than setting up translation windows to the memory you want to read from.

tariqkurd-repo · 2024-04-26T16:40:54Z

presumably this can happen because a revocation sweep hasn't happened? If the revocation has happened then stale caps won't be around. Or... when closing down a compartment it should be cleaned. So it all sounds very avoidable if the stale caps are cleaned up correctly.

PeterRugg · 2024-04-26T16:44:16Z

It's true that revocation might get rid of them as a side effect, just from churning through the caches. It's not guaranteed though: the revocation sweep won't deliberately flush the caches.

jrtc27 · 2024-04-26T16:44:35Z

(and you wouldn't want revocation to flush the caches either, that sounds expensive...)

tariqkurd-repo · 2024-05-08T15:39:09Z

@Timmmm can we close this one now?

Timmmm · 2024-05-09T08:06:26Z

Yep. I think it might be nice to mention menvcfg[CBIE] but people can probably figure that out.

Timmmm mentioned this issue Apr 23, 2024

Implement Zicbom, Zicboz (cbo.flush, cbo.inval, cbo.zero) riscv/sail-riscv#455

Merged

Timmmm added a commit to Timmmm/riscv-cheri that referenced this issue Apr 23, 2024

Correct the length violation condition for cbo.clean & cbo.flush

f7e0341

The sense of this column elsewhere is "raise a length violation if <condition>", but here it was backwards. Fixes riscv#213

Timmmm mentioned this issue Apr 23, 2024

Correct the length violation condition for cbo.clean & cbo.flush #216

Merged

tariqkurd-repo closed this as completed in #216 Apr 23, 2024

tariqkurd-repo closed this as completed in bcf61d8 Apr 23, 2024

arichardson reopened this Apr 23, 2024

andresag01 added bug Something isn't working documentation Improvements or additions to documentation labels Apr 24, 2024

tariqkurd-repo mentioned this issue Apr 26, 2024

Add note that INVAL is dangerous and should be FLUSH #221

Merged

Timmmm closed this as completed May 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CMO exceptions don't look quite right #213

CMO exceptions don't look quite right #213

Timmmm commented Apr 22, 2024

jrtc27 commented Apr 22, 2024

Timmmm commented Apr 23, 2024

PeterRugg commented Apr 23, 2024

Timmmm commented Apr 23, 2024

PeterRugg commented Apr 23, 2024

Timmmm commented Apr 23, 2024

jrtc27 commented Apr 23, 2024

Timmmm commented Apr 23, 2024

PeterRugg commented Apr 23, 2024

Timmmm commented Apr 23, 2024

Timmmm commented Apr 23, 2024

PeterRugg commented Apr 23, 2024

jrtc27 commented Apr 23, 2024 •

edited

Loading

Timmmm commented Apr 23, 2024

Timmmm commented Apr 23, 2024

PeterRugg commented Apr 23, 2024

Timmmm commented Apr 23, 2024

arichardson commented Apr 23, 2024

arichardson commented Apr 23, 2024

Timmmm commented Apr 23, 2024

andresag01 commented Apr 24, 2024

tariqkurd-repo commented Apr 24, 2024

jrtc27 commented Apr 24, 2024

PeterRugg commented Apr 24, 2024

tariqkurd-repo commented Apr 26, 2024 •

edited

Loading

PeterRugg commented Apr 26, 2024

jrtc27 commented Apr 26, 2024

tariqkurd-repo commented May 8, 2024

Timmmm commented May 9, 2024

CMO exceptions don't look quite right #213

CMO exceptions don't look quite right #213

Comments

Timmmm commented Apr 22, 2024

jrtc27 commented Apr 22, 2024

Timmmm commented Apr 23, 2024

PeterRugg commented Apr 23, 2024

Timmmm commented Apr 23, 2024

PeterRugg commented Apr 23, 2024

Timmmm commented Apr 23, 2024

jrtc27 commented Apr 23, 2024

Timmmm commented Apr 23, 2024

PeterRugg commented Apr 23, 2024

Timmmm commented Apr 23, 2024

Timmmm commented Apr 23, 2024

PeterRugg commented Apr 23, 2024

jrtc27 commented Apr 23, 2024 • edited Loading

Timmmm commented Apr 23, 2024

Timmmm commented Apr 23, 2024

PeterRugg commented Apr 23, 2024

Timmmm commented Apr 23, 2024

arichardson commented Apr 23, 2024

arichardson commented Apr 23, 2024

Timmmm commented Apr 23, 2024

andresag01 commented Apr 24, 2024

tariqkurd-repo commented Apr 24, 2024

jrtc27 commented Apr 24, 2024

PeterRugg commented Apr 24, 2024

tariqkurd-repo commented Apr 26, 2024 • edited Loading

PeterRugg commented Apr 26, 2024

jrtc27 commented Apr 26, 2024

tariqkurd-repo commented May 8, 2024

Timmmm commented May 9, 2024

jrtc27 commented Apr 23, 2024 •

edited

Loading

tariqkurd-repo commented Apr 26, 2024 •

edited

Loading