Initial psABI atomics specification

riscv-non-isa · May 12, 2023 · 5a40656 · 5a40656
1 parent cfed71f
commit 5a40656
Show file tree

Hide file tree

Showing 3 changed files with 203 additions and 0 deletions.
diff --git a/introduction.adoc b/introduction.adoc
@@ -33,6 +33,7 @@ This specification uses the following terms and abbreviations:
 | XLEN              | The width of an integer register in bits
 | FLEN              | The width of a floating-point register in bits
 | Linker relaxation | A mechanism for optimizing programs at link-time, see <<Linker Relaxation>> for more detail.
+| RVWMO             | RISC-V Weak Memory Order, as defined in the RISC-V specification.
 |===
 
 = Status of ABI

diff --git a/riscv-abi.adoc b/riscv-abi.adoc
@@ -12,3 +12,5 @@ include::riscv-elf.adoc[]
 include::riscv-dwarf.adoc[]
 
 include::riscv-rtabi.adoc[]
+
+include::riscv-atomic.adoc[]
diff --git a/riscv-atomic.adoc b/riscv-atomic.adoc
@@ -0,0 +1,200 @@
+[[riscv-atomics]]
+= RISC-V Atomics ABI Specification
+ifeval::["{docname}" == "riscv-atomics"]
+include::prelude.adoc[]
+endif::[]
+
+== RISC-V atomics mappings
+
+This specifies mappings of C and C\++ atomic operations to RISC-V
+machine instructions. Other languages, for example Java, provide similar
+facilities that should be implemented in a consistent manner, usually
+by applying the mapping for the corresponding C++ primitive.
+
+NOTE: Because different programming languages may be used within the same
+process, these mappings must be compatible across programming languages. For
+example, Java programmers expect memory ordering guarantees to be enforced even
+if some of the actual memory accesses are performed by a library written in
+C.
+
+NOTE: Though many mappings are possible, not all of them will interoperate
+correctly. In particular, many mapping combinations will not
+correctly enforce ordering  between a C++ `memory_order_seq_cst`
+store and a subsequent `memory_order_seq_cst` load.
+
+NOTE: Our choice of mappings anticipates
+the future addition of load-acquire and store-release
+instructions, allowing those to be incorporated without introducing an
+ABI incompatibility. The primary design goal is to maximize performance
+of the mappings _with those instructions_ . See Table A.7 in the
+"unprivileged" architecture specification (as stated, without the
+footnote) for a preview of those mappings.
+The mapping for `memory_order_seq_cst` stores uses an otherwise unnecessary
+trailing fence to avoid such an "ABI break".
+
+These mappings currently assume only the A extension.
+
+We first present the basic mapping, and then suggest some possible
+optimizations that are compatible with both the basic mapping and
+our future target mapping. However these optimizations may not be
+universally appropriate, for reasons discussed below.
+
+We present the basic mappings as a table in 3 sections. The first
+deals with translations for loads, stores, and fences. The next two sections
+address mappings for read-modify-write operations like `fetch_add`, and
+`exchange`. The second section deals with operations that have direct
+`amo` instruction equivalents in the RISC-V A extension. The final
+section deals with other read-modify-write operations that require
+the `lr` and `sc` instructions.
+
+NOTE: These mappings are very similar to those that originally appeared in the
+appendix of the RISC-V "unprivileged" architecture specification as
+"Mappings from C/C++ primitives to RISC-V Primitives", which we will
+refer to by their 2019 historical label of "Table A.6". Our basic specification
+differs *only* in that `atomic_store(memory_order_seq_cst)`
+has an extra trailing fence for compatibility with the "Hypothetical mappings ..."
+table in the same section, which we similarly refer to as "Table A.7".
+
+[[tab:c11mappings]]
+.Mappings from C/C++ primitives to RISC-V primitives
+[cols="<,<",options="header",]
+|===
+|C/C++ Construct |RVWMO Mapping
+|Non-atomic load |`l{b\|h\|w\|d}`
+
+|`atomic_load(memory_order_relaxed)` |`l{b\|h\|w\|d}`
+
+|`atomic_load(memory_order_acquire)` |`l{b\|h\|w\|d}; fence r,rw`
+
+|`atomic_load(memory_order_seq_cst)` |`fence rw,rw; l{b\|h\|w\|d}; fence r,rw`
+
+|Non-atomic store |`s{b\|h\|w\|d}`
+
+|`atomic_store(memory_order_relaxed)` |`s{b\|h\|w\|d}`
+
+|`atomic_store(memory_order_release)` |`fence rw,w; s{b\|h\|w\|d}`
+
+|`atomic_store(memory_order_seq_cst)` |`fence rw,w; s{b\|h\|w\|d}; fence rw,rw;`
+
+|`atomic_thread_fence(memory_order_acquire)` |`fence r,rw`
+
+|`atomic_thread_fence(memory_order_release)` |`fence rw,w`
+
+|`atomic_thread_fence(memory_order_acq_rel)` |`fence.tso`
+
+|`atomic_thread_fence(memory_order_seq_cst)` |`fence rw,rw`
+|===
+
+[cols="<,<",options="header",]
+|===
+|C/C++ Construct |RVWMO AMO Mapping
+|`atomic_<op>(memory_order_relaxed)` |`amo<op>.{w\|d}`
+
+|`atomic_<op>(memory_order_acquire)` |`amo<op>.{w\|d}.aq`
+
+|`atomic_<op>(memory_order_release)` |`amo<op>.{w\|d}.rl`
+
+|`atomic_<op>(memory_order_acq_rel)` |`amo<op>.{w\|d}.aqrl`
+
+|`atomic_<op>(memory_order_seq_cst)` |`amo<op>.{w\|d}.aqrl`
+
+|===
+
+[cols="<,<",options="header",]
+|===
+|C/C++ Construct |RVWMO LR/SC Mapping
+
+|`atomic_<op>(memory_order_relaxed)` |`loop:lr.{w\|d}; <op>; sc.{w\|d}; bnez loop`
+
+|`atomic_<op>(memory_order_acquire)`
+|`loop:lr.{w\|d}.aq; <op>; sc.{w\|d}; bnez loop`
+
+|`atomic_<op>(memory_order_release)`
+|`loop:lr.{w\|d}; <op>; sc.{w\|d}.rl; bnez loop`
+
+|`atomic_<op>(memory_order_acq_rel)`
+|`loop:lr.{w\|d}.aq; <op>; sc.{w\|d}.rl; bnez loop`
+
+|`atomic_<op>(memory_order_seq_cst)`
+|`loop:lr.{w\|d}.aqrl; <op>; sc.{w\|d}.rl; bnez loop`
+|===
+
+It is expected that the RVWMO AMO Mappings will be used for atomic read-modify-write
+operations that are directly supported by corresponding AMO instructions,
+and that LR/SC mappings will be used for the remainder, currently
+including compare-exchange operations. Compare-exchange LR/SC sequences
+on the containing 32-bit word should be used for shorter operands.
+
+It is acceptable, but usually undesirable for performance reasons, to use LR/SC
+mappings where an AMO mapping would suffice.
+
+Atomics do not imply any ordering for IO operations. IO operations
+should include sufficient fences to prevent them from being visibly
+reordered with atomic operations.
+
+Float and double atomic loads and stores should be implemented using
+the integer sequences.
+
+Float and double read-modify-write instructions should consist of a loop performing
+an initial plain load of the value, followed by the floating point
+computation, followed by an integer compare-and-swap sequence to try to
+store back the updated value. This avoids floating point
+instructions between LR and SC instructions
+
+NOTE: The "Eventual Success of Store-Conditional Instructions" section
+in the ISA specification provides that essential progress guarantee only
+if there are no floating point instructions between the LR and matching SC
+instruction. By compiling such sequences with an "extra" ordinary load,
+and performing the floating point computation before the LR, we preserve
+the guarantee.
+
+== Possible `memory_order_seq_cst` store mapping optimization
+
+The `memory_order_seq_cst` store mapping may be replaced by the following for
+32- and 64-bit operands:
+
+[cols="<,<",options="header",]
+|===
+|C/C++ Construct |RVWMO Mapping
+|`atomic_store(memory_order_seq_cst)` |`amoswap.rl{w\|d};`
+|===
+
+NOTE: We expect this to be profitable in most cases. The same mapping may be
+used for the `memory_order_release` case, where it is less likely to be
+profitable.
+
+== Weakening of the LR/SC `memory_order_seq_cst` mapping
+
+The final LR/SC mapping may be weakened by replacing the `aqrl` ordering above
+with `aq`:
+
+[cols="<,<",options="header",]
+|===
+|C/C++ Construct |RVWMO LR/SC Mapping
+
+|`atomic_<op>(memory_order_seq_cst)`
+|`loop:lr.{w\|d}.aq; <op>; sc.{w\|d}.rl; bnez loop`
+|===
+
+NOTE: This has clear performance advantages. However the resulting mapping is
+no longer compatible with the "Table A.6" mapping, which was used by some
+implementations as a preliminary ABI for atomics. Thus this optimization should
+be postponed until code compiled according to that earlier specification is no
+longer in circulation. For platforms that did not implement this earlier
+specification, there is no reason to delay.
+
+////
+== Other missing specifications
+
+In addition something should specify size and alignment of atomic types
+such as `atomic<struct { char a; char b; }>`. It is unclear to what
+extent this should be processor-specific. It may not belong in the psABI.
+
+Something should specify functions to be called for "large" atomic
+operations. This should ideally not be processor-specific.
+
+Should there be an ELF note or similar annotation associated with conformance
+to this ABI, probably with a different one for the future A.7 ABI? This would
+allow linker warnings if the A.7 mapping is used in connection with a convention
+predating this ABI.
+////
Original file line number	Diff line number	Diff line change
Expand Up		@@ -12,3 +12,5 @@ include::riscv-elf.adoc[]
		include::riscv-dwarf.adoc[]

		include::riscv-rtabi.adoc[]

		include::riscv-atomic.adoc[]