SECOND QUESTION: cache index CMOs, e.g. (set,way) vs "microarchitecture index range" #10

AndyGlew · 2020-09-02T19:29:14Z

Just like an earlier issue discusses address range CMOs vs per-cache-line CMOs... but this time for operations that are typically used for things like "flush the entire I$ or D$".

Such "cache microarchitecture dependent CMOs" have been done in some earlier processors a cache line at a time --- but this is less well established than for peer-cache-line-address-at-a-time. Quite a few RISC processors have "full cache flushes", etc.

First, if operating a cache line at a time, there must be a way of indicating which cache line is involved. Typically this is (set,way), but not all caches have sets and ways - indeed, it is not really clear what the set and ways are for something like a skewed associative cache.

But that's okay, we can abstract that as a "cache entry index number", which might be Set*Nways+Way for a traditional set associative cache, or whatever is appropriate.

Then, a per-cache-index loop typically looks like

FOR i from 0 to  #cache_entries-1 DO
     CMO.cache_index  i

or

FOR s from 0 to  Nsets-1 DO
FOR w from 0 to Nways-1 DO
     CMO.by_set_way  s,w

That's the traditional approaxch.

The draft proposal (by me, Andy Glew, TBD link here3) defines "microarchitecture range CMOs" that look like

        x1 := 0
loop:
        x1 := CMO.UR x1
        BNEZ x1, loop

which looks remarkably like the per-cache-index loop

except that, like in the CMO.AR proposal, the next cache index is returned by the CMO.UR instruction.

This allows severral implementations

(1) per (set,way) cache line at a time - traditional

(2) trap to M-mode efficiently, less overhead

(3) state machines that iterate over the entire cache, e.g. for EVICT, to write out dirty data

also (3.1) non-state machine impl;ementations, as in bulk invalidations that set all valid bits to 0 as a single operation.

I mark this as a SECONDARY QUESTION:

in the title, because I want it to be blaringly obvious

also becausde I am in a hurry, and will apply this issue tracker's priority scheme later

but mainly because I think there will be less discussion about this CMO.UR cache index range than there will be for the CMO.AR address range instruction.

since there are already quite a few implementations that are "full cache invalidations", and we want RISC-V to support such hardware when it is available.

--

again, this issue is not for the details of the CMO.UR. It is mostly for the idea of a midfroarchitwecure or cache index range.

The text was updated successfully, but these errors were encountered:

brucehoult · 2020-09-02T22:21:17Z

Agreed. Iterating over the cache can sometimes be better than iterating over an address range. And this form provides flexibility in implementation.

Manufacturers of cores could if they wish document the encoding scheme from sets and ways or whatever they have into abstract indexes, thus allowing non-portable code to operate on a single way (or whatever).

ingallsj · 2020-09-15T06:34:07Z

I'm not a fan of including micro-architecture specific encodings or manufacturer-specific abstractions in the general-purpose ISA.

What is the use case, and what value would make it worthwhile for a manufacturer to make their micro-architecture-specific cache ops (set+way, if that's what they built) fit into an architecture-level abstraction?

ingallsj · 2020-09-15T06:36:29Z

Twist: I would be a fan of an "ALL" variant, instead of set/way/uarch-range.

billhuffman · 2020-09-22T20:14:07Z

If we're going to approach this, I see two issues that are at a conceptual level above instruction definition.

First is how to represent the micro-architectural structure. Does the implementation have to make some set of numbers that, when complete, will have covered the cache?
Second is protection. Instructions could be restricted to M-Mode with delegation capability to S-Mode. Another possibility is to use stores to MMIO space and have MMU/PMP control access, which gives more flexibility over the long run.
```
Bill
```

AndyGlew changed the title ~~SECONDARY QUESTION: cache index CMOs, e.g. (set,way) vs "microarchitecture index range"~~ SECOND QUESTION: cache index CMOs, e.g. (set,way) vs "microarchitecture index range" Sep 2, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SECOND QUESTION: cache index CMOs, e.g. (set,way) vs "microarchitecture index range" #10

SECOND QUESTION: cache index CMOs, e.g. (set,way) vs "microarchitecture index range" #10

AndyGlew commented Sep 2, 2020

brucehoult commented Sep 2, 2020

ingallsj commented Sep 15, 2020 •

edited

Loading

ingallsj commented Sep 15, 2020

billhuffman commented Sep 22, 2020

SECOND QUESTION: cache index CMOs, e.g. (set,way) vs "microarchitecture index range" #10

SECOND QUESTION: cache index CMOs, e.g. (set,way) vs "microarchitecture index range" #10

Comments

AndyGlew commented Sep 2, 2020

brucehoult commented Sep 2, 2020

ingallsj commented Sep 15, 2020 • edited Loading

ingallsj commented Sep 15, 2020

billhuffman commented Sep 22, 2020

ingallsj commented Sep 15, 2020 •

edited

Loading