-
Notifications
You must be signed in to change notification settings - Fork 0
Store Coherence and Cache Flush Coherence
Some people place great value on the distinction between being cache coherent on every store (modulo memory ordering relaxation), and cache coherent on CMOs like cache flush.
E.g. they might say that an instruction and data cache are not coherent on every store. That a store by processor P1 will not necessarily affect P1's instruction cache, let alone other processors' instruction caches.
But they might assume that there is instruction, let's call it INVAL.I, that invalidates all such processors' instruction caches when any of them execute INVAL.I.
Versus, if processor P1's INVAL.I does not invalidate other processors instruction caches, then you must do something make those other processors execute their own INVAL.I. E.g. an IPI (Interprocessor Interrupt). E.g. protocols such as initializing the buffers to which code is written with trapping instructions.
IMHO these are both examples of multiprocessor cache coherence mechanisms. They might be called
- store coherence
- versus CMO or cache flush coherence
Some processors provide the ability for one processor to invalidate somebody else's instruction cache, but only on a whole cache basis. I might be willing to say that this is different enough that you should not call these coherent multiprocessor I/D caches.
Note: It is reasonable for a processor executing its own INVAL.I, whether by address or whole cache, to invalidate its own I cache lines. It is similarly reasonable for a processor to not invalidate its own I cache lines on its own stores. IMHO it is less a reasonable for a processor executing INVAL.I(addr) to invalidate other processors I cache lines, because that requires a mechanism for incoming bus operations to be directed not just to the data cache coherency mechanism, but also to the I cache coherency mechanism. Which, as I say above, is basically the mechanism that you need to implement multiprocessor I/D coherency on stores. Such an INVAL.I operation is quite likely to be more expensive than data cache side operation, especially since the I-cache is frequently referenced every cycle, whereas the D cache may have idle cycles allowing single ported tags to be interrogated for bus traffic.
Similar considerations can apply to data cache coherence, on all stores, or only on certain stores.
Indeed, if you have fence operations that take an address. FENCE.???(addr), these can often be implemented by remote data cache invalidation or flushes.
(TBD: IIRC not necessarily true for transitive synchronization like store.release/load.acquire, or even fences that provide such transitivity.)