Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SECOND QUESTION: cache index CMOs, e.g. (set,way) vs "microarchitecture index range" #10

Open
AndyGlew opened this issue Sep 2, 2020 · 4 comments

Comments

@AndyGlew
Copy link
Contributor

AndyGlew commented Sep 2, 2020

Just like an earlier issue discusses address range CMOs vs per-cache-line CMOs... but this time for operations that are typically used for things like "flush the entire I$ or D$".

Such "cache microarchitecture dependent CMOs" have been done in some earlier processors a cache line at a time --- but this is less well established than for peer-cache-line-address-at-a-time. Quite a few RISC processors have "full cache flushes", etc.

First, if operating a cache line at a time, there must be a way of indicating which cache line is involved. Typically this is (set,way), but not all caches have sets and ways - indeed, it is not really clear what the set and ways are for something like a skewed associative cache.

But that's okay, we can abstract that as a "cache entry index number", which might be Set*Nways+Way for a traditional set associative cache, or whatever is appropriate.

Then, a per-cache-index loop typically looks like

FOR i from 0 to  #cache_entries-1 DO
     CMO.cache_index  i

or

FOR s from 0 to  Nsets-1 DO
FOR w from 0 to Nways-1 DO
     CMO.by_set_way  s,w

That's the traditional approaxch.

The draft proposal (by me, Andy Glew, TBD link here3) defines "microarchitecture range CMOs" that look like

        x1 := 0
loop:
        x1 := CMO.UR x1
        BNEZ x1, loop

which looks remarkably like the per-cache-index loop

except that, like in the CMO.AR proposal, the next cache index is returned by the CMO.UR instruction.

This allows severral implementations

(1) per (set,way) cache line at a time - traditional

(2) trap to M-mode efficiently, less overhead

(3) state machines that iterate over the entire cache, e.g. for EVICT, to write out dirty data

also (3.1) non-state machine impl;ementations, as in bulk invalidations that set all valid bits to 0 as a single operation.


I mark this as a SECONDARY QUESTION:

in the title, because I want it to be blaringly obvious

also becausde I am in a hurry, and will apply this issue tracker's priority scheme later

but mainly because I think there will be less discussion about this CMO.UR cache index range than there will be for the CMO.AR address range instruction.

since there are already quite a few implementations that are "full cache invalidations", and we want RISC-V to support such hardware when it is available.

--

again, this issue is not for the details of the CMO.UR. It is mostly for the idea of a midfroarchitwecure or cache index range.

@AndyGlew AndyGlew changed the title SECONDARY QUESTION: cache index CMOs, e.g. (set,way) vs "microarchitecture index range" SECOND QUESTION: cache index CMOs, e.g. (set,way) vs "microarchitecture index range" Sep 2, 2020
@brucehoult
Copy link

Agreed. Iterating over the cache can sometimes be better than iterating over an address range. And this form provides flexibility in implementation.

Manufacturers of cores could if they wish document the encoding scheme from sets and ways or whatever they have into abstract indexes, thus allowing non-portable code to operate on a single way (or whatever).

@ingallsj
Copy link

ingallsj commented Sep 15, 2020

I'm not a fan of including micro-architecture specific encodings or manufacturer-specific abstractions in the general-purpose ISA.

What is the use case, and what value would make it worthwhile for a manufacturer to make their micro-architecture-specific cache ops (set+way, if that's what they built) fit into an architecture-level abstraction?

@ingallsj
Copy link

Twist: I would be a fan of an "ALL" variant, instead of set/way/uarch-range.

@billhuffman
Copy link

If we're going to approach this, I see two issues that are at a conceptual level above instruction definition.

  • First is how to represent the micro-architectural structure. Does the implementation have to make some set of numbers that, when complete, will have covered the cache?

  • Second is protection. Instructions could be restricted to M-Mode with delegation capability to S-Mode. Another possibility is to use stores to MMIO space and have MMU/PMP control access, which gives more flexibility over the long run.

    Bill
    

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants