Skip to content

Store Coherence and Cache Flush Coherence

AndyGlew edited this page Feb 9, 2021 · 5 revisions

Table of Contents

Some people place great value on the distinction between being cache coherent on every store (modulo memory ordering relaxation), and cache coherent on CMOs like cache flush.

invalidate I cache line in remote processors

E.g. they might say that an instruction and data cache are not coherent on every store. That a store by processor P1 will not necessarily affect P1's instruction cache, let alone other processors' instruction caches.

But they might assume that there is instruction, let's call it INVAL.I, that invalidates all such processors' instruction caches when any of them execute INVAL.I.

Versus, if processor P1's INVAL.I does not invalidate other processors instruction caches, then you must do something make those other processors execute their own INVAL.I. E.g. an IPI (Interprocessor Interrupt). E.g. protocols such as initializing the buffers to which code is written with trapping instructions.

IMHO these are both examples of multiprocessor cache coherence mechanisms. They might be called

  • store coherence
  • versus CMO or cache flush coherence
If the INVAL.I instruction takes an address, INVAL.I(addr), then you have already implemented most of the mechanism required to do store coherence. The main difference is how often you perform such multiprocessor coherence operations: every store, or much less often when you explicitly do cache flushes. This can affect performance and power,

Some processors provide the ability for one processor to invalidate somebody else's instruction cache, but only on a whole cache basis. I might be willing to say that this is different enough that you should not call these coherent multiprocessor I/D caches.

Note: It is reasonable for a processor executing its own INVAL.I, whether by address or whole cache, to invalidate its own I cache lines. It is similarly reasonable for a processor to not invalidate its own I cache lines on its own stores. IMHO it is less a reasonable for a processor executing INVAL.I(addr) to invalidate other processors I cache lines, because that requires a mechanism for incoming bus operations to be directed not just to the data cache coherency mechanism, but also to the I cache coherency mechanism. Which, as I say above, is basically the mechanism that you need to implement multiprocessor I/D coherency on stores. Such an INVAL.I operation is quite likely to be more expensive than data cache side operation, especially since the I-cache is frequently referenced every cycle, whereas the D cache may have idle cycles allowing single ported tags to be interrogated for bus traffic.

invalidate D cache lines in remote processors

Similar considerations can apply to data cache coherence, on all stores, or only on certain stores.

Indeed, if you have fence operations that take an address. FENCE.???(addr), these can often be implemented by remote data cache invalidation or flushes.

(TBD: IIRC not necessarily true for transitive synchronization like store.release/load.acquire, or even fences that provide such transitivity.)

Clone this wiki locally