diff --git a/introduction.adoc b/introduction.adoc index 6263da45..d5f27667 100644 --- a/introduction.adoc +++ b/introduction.adoc @@ -33,6 +33,7 @@ This specification uses the following terms and abbreviations: | XLEN | The width of an integer register in bits | FLEN | The width of a floating-point register in bits | Linker relaxation | A mechanism for optimizing programs at link-time, see <> for more detail. +| RVWMO | RISC-V Weak Memory Order, as defined in the RISC-V specification. |=== = Status of ABI diff --git a/riscv-abi.adoc b/riscv-abi.adoc index a07a8d2b..f4070c91 100644 --- a/riscv-abi.adoc +++ b/riscv-abi.adoc @@ -12,3 +12,5 @@ include::riscv-elf.adoc[] include::riscv-dwarf.adoc[] include::riscv-rtabi.adoc[] + +include::riscv-atomic.adoc[] diff --git a/riscv-atomic.adoc b/riscv-atomic.adoc new file mode 100644 index 00000000..d3ae907c --- /dev/null +++ b/riscv-atomic.adoc @@ -0,0 +1,185 @@ +[[riscv-atomics]] += RISC-V Atomics ABI Specification +ifeval::["{docname}" == "riscv-atomics"] +include::prelude.adoc[] +endif::[] + +== RISC-V atomics mappings + +This specifies mappings of C and C\++ atomic operations to RISC-V +machine instructions. Other languages, for example Java, provide similar +facilities that should be implemented in a consistent manner, usually +by applying the mapping for the corresponding C++ primitive. + +NOTE: Because different programming languages may be used within the same +process, these mappings must be compatible across programming languages. For +example, Java programmers expect memory ordering guarantees to be enforced even +if some of the actual memory accesses are performed by a library written in +C. + +NOTE: Though many mappings are possible, not all of them will interoperate +correctly. In particular, many mapping combinations will not +correctly enforce ordering  between a C++ `memory_order_seq_cst` +store and a subsequent `memory_order_seq_cst` load. + +NOTE: Our choice of mappings anticipates +the future addition of load-acquire and store-release +instructions, allowing those to be incorporated without introducing an +ABI incompatibility. The primary design goal is to maximize performance +of the mappings _with those instructions_ . See Table A.7 in the +"unprivileged" architecture specification (as stated, without the +footnote) for a preview of those mappings. +The mapping for `memory_order_seq_cst` stores uses an otherwise unnecessary +trailing fence to avoid such an "ABI break". + +These mappings currently assume only the A extension. + +We first present the basic mapping, and then suggest some possible +optimizations that are compatible with both the basic mapping and +our future target mapping. However these optimizations may not be +universally appropriate, for reasons discussed below. + +We present the basic mappings as a table in 3 sections. The first +deals with translations for loads, stores, and fences. The next two sections +address mappings for read-modify-write operations like `fetch_add`, and +`exchange`. The second section deals with operations that have direct +`amo` instruction equivalents in the RISC-V A extension. The final +section deals with other read-modify-write operations that require +the `lr` and `sc` instructions. + +NOTE: These mappings are very similar to those that originally appeared in the +appendix of the RISC-V "unprivileged" architecture specification as +"Mappings from C/C++ primitives to RISC-V Primitives", which we will +refer to by their 2019 historical label of "Table A.6". Our basic specification +differs *only* in that `atomic_store(memory_order_seq_cst)` +has an extra trailing fence for compatibility with the "Hypothetical mappings ..." +table in the same section, which we similarly refer to as "Table A.7". + +[[tab:c11mappings]] +.Mappings from C/C++ primitives to RISC-V primitives +[cols="<,<",options="header",] +|=== +|C/C++ Construct |RVWMO Mapping +|Non-atomic load |`l{b\|h\|w\|d}` + +|`atomic_load(memory_order_relaxed)` |`l{b\|h\|w\|d}` + +|`atomic_load(memory_order_acquire)` |`l{b\|h\|w\|d}; fence r,rw` + +|`atomic_load(memory_order_seq_cst)` |`fence rw,rw; l{b\|h\|w\|d}; fence r,rw` + +|Non-atomic store |`s{b\|h\|w\|d}` + +|`atomic_store(memory_order_relaxed)` |`s{b\|h\|w\|d}` + +|`atomic_store(memory_order_release)` |`fence rw,w; s{b\|h\|w\|d}` + +|`atomic_store(memory_order_seq_cst)` |`fence rw,w; s{b\|h\|w\|d}; fence rw,rw;` + +|`atomic_thread_fence(memory_order_acquire)` |`fence r,rw` + +|`atomic_thread_fence(memory_order_release)` |`fence rw,w` + +|`atomic_thread_fence(memory_order_acq_rel)` |`fence.tso` + +|`atomic_thread_fence(memory_order_seq_cst)` |`fence rw,rw` +|=== + +[cols="<,<",options="header",] +|=== +|C/C++ Construct |RVWMO AMO Mapping +|`atomic_(memory_order_relaxed)` |`amo.{w\|d}` + +|`atomic_(memory_order_acquire)` |`amo.{w\|d}.aq` + +|`atomic_(memory_order_release)` |`amo.{w\|d}.rl` + +|`atomic_(memory_order_acq_rel)` |`amo.{w\|d}.aqrl` + +|`atomic_(memory_order_seq_cst)` |`amo.{w\|d}.aqrl` + +|=== + +[cols="<,<",options="header",] +|=== +|C/C++ Construct |RVWMO LR/SC Mapping + +|`atomic_(memory_order_relaxed)` |`loop:lr.{w\|d}; ; sc.{w\|d}; bnez loop` + +|`atomic_(memory_order_acquire)` +|`loop:lr.{w\|d}.aq; ; sc.{w\|d}; bnez loop` + +|`atomic_(memory_order_release)` +|`loop:lr.{w\|d}; ; sc.{w\|d}.rl; bnez loop` + +|`atomic_(memory_order_acq_rel)` +|`loop:lr.{w\|d}.aq; ; sc.{w\|d}.rl; bnez loop` + +|`atomic_(memory_order_seq_cst)` +|`loop:lr.{w\|d}.aqrl; ; sc.{w\|d}.rl; bnez loop` +|=== + +It is expected that the RVWMO AMO Mappings will be used for atomic read-modify-write +operations that are directly supported by corresponding AMO instructions, +and that LR/SC mappings will be used for the remainder, currently +including compare-exchange operations. Compare-exchange LR/SC sequences +on the containing 32-bit word should be used for shorter operands. + +It is acceptable, but usually undesirable for performance reasons, to use LR/SC +mappings where an AMO mapping would suffice. + +Atomics do not imply any ordering for IO operations. IO operations +should include sufficient fences to prevent them from being visibly +reordered with atomic operations. + +Float and double atomic loads and stores should be implemented using +the integer sequences. + +Float and double read-modify-write instructions should consist of a loop performing +an initial plain load of the value, followed by the floating point +computation, followed by an integer compare-and-swap sequence to try to +store back the updated value. This avoids floating point +instructions between LR and SC instructions + +NOTE: The "Eventual Success of Store-Conditional Instructions" section +in the ISA specification provides that essential progress guarantee only +if there are no floating point instructions between the LR and matching SC +instruction. By compiling such sequences with an "extra" ordinary load, +and performing the floating point computation before the LR, we preserve +the guarantee. + +== Possible `memory_order_seq_cst` store mapping optimization + +The `memory_order_seq_cst` store mapping may be replaced by the following for +32- and 64-bit operands: + +[cols="<,<",options="header",] +|=== +|C/C++ Construct |RVWMO Mapping +|`atomic_store(memory_order_seq_cst)` |`amoswap.rl{w\|d};` +|=== + +NOTE: We expect this to be profitable in most cases. The same mapping may be +used for the `memory_order_release` case, where it is less likely to be +profitable. + +== Weakening of the LR/SC `memory_order_seq_cst` mapping + +The final LR/SC mapping may be weakened by replacing the `aqrl` ordering above +with `aq`: + +[cols="<,<",options="header",] +|=== +|C/C++ Construct |RVWMO LR/SC Mapping + +|`atomic_(memory_order_seq_cst)` +|`loop:lr.{w\|d}.aq; ; sc.{w\|d}.rl; bnez loop` +|=== + +NOTE: This has clear performance advantages. However the resulting mapping is +no longer compatible with the "Table A.6" mapping, which was used by some +implementations as a preliminary ABI for atomics. Thus this optimization should +be postponed until code compiled according to that earlier specification is no +longer in circulation. For platforms that did not implement this earlier +specification, there is no reason to delay. +