-
Notifications
You must be signed in to change notification settings - Fork 0
Ri5 CMOs proposal
2025-01-08 05:39:38 -0800 - When this draft was generated. Not necessarily modified.
See https://github.com/AndyGlew/Ri5-stuff/wiki/generated-HTML-and-PDF-for-CMOs-proposal for pointers to AsciiDoc source, and generated HTML and PDF, for a version of document.
See [_techpubs_information] section below for more details. Or, in this wiki/AsciiDoc page: techpubs-info, since the AsciiDoc cross-reference [_techpubs_information] does not render when viewed on the web.
This document is a proposal for cache management operations for RISC-V.
Note
|
Rationale and other background is distinguished by NOTE sections such as this. See [Rationale using AsciiDoctor NOTE admonition]. |
-
v0.5, July 16, 2020:
-
WIP: edits after reviews received, after sending links to several Ri5 working groups (security, virtual memory)
-
WIP: moving out of Glew personal GitHub to official RISC-V GitHub
-
links may break as move is accomplished.
-
-
Spreadsheet of actual operations is at https://github.com/AndyGlew/Ri5-stuff/blob/master/CMOs-proposal-spreadsheet.xlsx
-
TBD: move along with others
-
TBD: reduce operation count
-
-
ISSUE: CMO flushes when cacheability is changing (via PMAs, possibly PTEs)
-
-
v0.4 June 16, 2020:
-
WIP edits after June 11 review
-
TIMING_FLUSH removed, merged into loopful CMO.UR
-
CMO.UR index start/ends with 0
-
many spelling/typo errors (but by no means all).
-
Started distinguishing Rationale and discussion from normative specification using AsciiDoc NOTE admonition blocks
-
NOT FINISHED
-
COMPLETION_FENCE merged with existing RISC-V FENCE
-
.<cmo_specifier> definition
-
-
-
v.3. June 11, 2020:
-
Fixed Block Size CMOs removed
-
.<cmo_specifier> ⇒ .<cmo_operation>.<which_cache>
-
TIMING_FLUSH
-
COMPLETION_FENCE
-
There are 3 formats of CMO instructions:
-
[Fixed Block Size Prefetches (PREFETCH.*)] operating on 64B naturally aligned regions of memory
-
[Variable Address Range CMOs (CMO.VAR)] operating on arbitrary address ranges
-
[Microarchitecture Structure Range CMOs (CMO.UR)] supporting whole cache operations operating on "cache entry numbers" or "indexes" which generalize and abstract cache set + way
There are many types of CMO operations, which are formed by the combination of
-
which caches the operation applies to (and/or other parts of the memory system)
-
what operation is actually performed (e.g. invalidate, flush dirty data)
-
other aspects, such as invalidating related prefetchers and predictors
The CMO types are represented in the assembly syntax as the .<cmo_specifier> field. They are encoded in the instruction encoding as described below.
TBD: include spreadsheet of encodings?
In addition to the different CMO instruction formats such as CMO.VAR and CMO.UR discussed above there are many types of CMO operations. The CMO types are represented in the assembly syntax as the .<cmo_specifier> field. They are encoded in the instruction encoding in the Funct7 field of the instruction encoding, in conjunction with the lowest numbered bit of Funct3, bit 11 of the instruction encoding.
These instruction types are formed by the the combination of
-
which caches the operation applies to (and/or other parts of the memory system) - .<cmo_specifier>.<which_cache>
-
what operation is actually performed (e.g. invalidate, flush dirty data) - .<cmo_specifier>.<cmo_operation>
-
other aspects, such as invalidating related prefetchers and predictors .<cmo_specifier>.<cmo_other>
The subcomponents .<which_cache>, .<cmo_operation> and .<cmo_other> are NOT orthogonal bitfields of the .<cmo_specifier> bitset formed by Funct7 and Funct3.0/11. Nevertheless, it is convenient to use the .<cmo_specifier>.<property> notation, to describe these subcomponent properties that are computed from irregular encodings.
Note
|
CSR bitfields would be less tightly encoded than instruction bitfields
<cmo_specifier> might be specified quite simply in a CSR with 64 bits as follows;
This encoding occupies 18 bits, much more than the 128-256 reasonable to place in an instruction encoding. Such a specification has encodings reserved for future instruction extensions. The biggest consumker of bits, however, are the from-domains and to-domains. E.g. for third party remote cache operations: hart1 performing a CMO that prefetches data from hardt2’s L4 cache and moves it to hart’s L2 cache. Even 5 bits is conservative, allowing only 32 distinct caches. E.g. for prefetch instructions that fetch into level N, bit do not prefetch past level M, since the interconnect past that level is saturated. However, since this proposal places the .<cmo_specifier> in the instruction encoding, the CMO types must be restricted and more tightly encoded. |
draft-microarchitecture-timing-state-flushes.asciidoc :leveloffset: +1