From 9e93dd637a1af5d54e3a951337ef08262e11984c Mon Sep 17 00:00:00 2001
From: Hans Boehm <hboehm@google.com>
Date: Fri, 28 Apr 2023 13:20:13 -0700
Subject: [PATCH] Initial psABI atomics specification

---
 introduction.adoc |   1 +
 riscv-abi.adoc    |   2 +
 riscv-atomic.adoc | 185 ++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 188 insertions(+)
 create mode 100644 riscv-atomic.adoc
diff --git a/introduction.adoc b/introduction.adoc
index 6263da45..d5f27667 100644
--- a/introduction.adoc
+++ b/introduction.adoc
@@ -33,6 +33,7 @@ This specification uses the following terms and abbreviations:
 | XLEN              | The width of an integer register in bits
 | FLEN              | The width of a floating-point register in bits
 | Linker relaxation | A mechanism for optimizing programs at link-time, see <<Linker Relaxation>> for more detail.
+| RVWMO             | RISC-V Weak Memory Order, as defined in the RISC-V specification.
 |===
 
 = Status of ABI
diff --git a/riscv-abi.adoc b/riscv-abi.adoc
index a07a8d2b..f4070c91 100644
--- a/riscv-abi.adoc
+++ b/riscv-abi.adoc
@@ -12,3 +12,5 @@ include::riscv-elf.adoc[]
 include::riscv-dwarf.adoc[]
 
 include::riscv-rtabi.adoc[]
+
+include::riscv-atomic.adoc[]
diff --git a/riscv-atomic.adoc b/riscv-atomic.adoc
new file mode 100644
index 00000000..d3ae907c
--- /dev/null
+++ b/riscv-atomic.adoc
@@ -0,0 +1,185 @@
+[[riscv-atomics]]
+= RISC-V Atomics ABI Specification
+ifeval::["{docname}" == "riscv-atomics"]
+include::prelude.adoc[]
+endif::[]
+
+== RISC-V atomics mappings
+
+This specifies mappings of C and C\++ atomic operations to RISC-V
+machine instructions. Other languages, for example Java, provide similar
+facilities that should be implemented in a consistent manner, usually
+by applying the mapping for the corresponding C++ primitive.
+
+NOTE: Because different programming languages may be used within the same
+process, these mappings must be compatible across programming languages. For
+example, Java programmers expect memory ordering guarantees to be enforced even
+if some of the actual memory accesses are performed by a library written in
+C.
+
+NOTE: Though many mappings are possible, not all of them will interoperate
+correctly. In particular, many mapping combinations will not
+correctly enforce ordering  between a C++ `memory_order_seq_cst`
+store and a subsequent `memory_order_seq_cst` load.
+
+NOTE: Our choice of mappings anticipates
+the future addition of load-acquire and store-release
+instructions, allowing those to be incorporated without introducing an
+ABI incompatibility. The primary design goal is to maximize performance
+of the mappings _with those instructions_ . See Table A.7 in the
+"unprivileged" architecture specification (as stated, without the
+footnote) for a preview of those mappings.
+The mapping for `memory_order_seq_cst` stores uses an otherwise unnecessary
+trailing fence to avoid such an "ABI break".
+
+These mappings currently assume only the A extension.
+
+We first present the basic mapping, and then suggest some possible
+optimizations that are compatible with both the basic mapping and
+our future target mapping. However these optimizations may not be
+universally appropriate, for reasons discussed below.
+
+We present the basic mappings as a table in 3 sections. The first
+deals with translations for loads, stores, and fences. The next two sections
+address mappings for read-modify-write operations like `fetch_add`, and
+`exchange`. The second section deals with operations that have direct
+`amo` instruction equivalents in the RISC-V A extension. The final
+section deals with other read-modify-write operations that require
+the `lr` and `sc` instructions.
+
+NOTE: These mappings are very similar to those that originally appeared in the
+appendix of the RISC-V "unprivileged" architecture specification as
+"Mappings from C/C++ primitives to RISC-V Primitives", which we will
+refer to by their 2019 historical label of "Table A.6". Our basic specification
+differs *only* in that `atomic_store(memory_order_seq_cst)`
+has an extra trailing fence for compatibility with the "Hypothetical mappings ..."
+table in the same section, which we similarly refer to as "Table A.7".
+
+[[tab:c11mappings]]
+.Mappings from C/C++ primitives to RISC-V primitives
+[cols="<,<",options="header",]
+|===
+|C/C++ Construct |RVWMO Mapping
+|Non-atomic load |`l{b\|h\|w\|d}`
+
+|`atomic_load(memory_order_relaxed)` |`l{b\|h\|w\|d}`
+
+|`atomic_load(memory_order_acquire)` |`l{b\|h\|w\|d}; fence r,rw`
+
+|`atomic_load(memory_order_seq_cst)` |`fence rw,rw; l{b\|h\|w\|d}; fence r,rw`
+
+|Non-atomic store |`s{b\|h\|w\|d}`
+
+|`atomic_store(memory_order_relaxed)` |`s{b\|h\|w\|d}`
+
+|`atomic_store(memory_order_release)` |`fence rw,w; s{b\|h\|w\|d}`
+
+|`atomic_store(memory_order_seq_cst)` |`fence rw,w; s{b\|h\|w\|d}; fence rw,rw;`
+
+|`atomic_thread_fence(memory_order_acquire)` |`fence r,rw`
+
+|`atomic_thread_fence(memory_order_release)` |`fence rw,w`
+
+|`atomic_thread_fence(memory_order_acq_rel)` |`fence.tso`
+
+|`atomic_thread_fence(memory_order_seq_cst)` |`fence rw,rw`
+|===
+
+[cols="<,<",options="header",]
+|===
+|C/C++ Construct |RVWMO AMO Mapping
+|`atomic_<op>(memory_order_relaxed)` |`amo<op>.{w\|d}`
+
+|`atomic_<op>(memory_order_acquire)` |`amo<op>.{w\|d}.aq`
+
+|`atomic_<op>(memory_order_release)` |`amo<op>.{w\|d}.rl`
+
+|`atomic_<op>(memory_order_acq_rel)` |`amo<op>.{w\|d}.aqrl`
+
+|`atomic_<op>(memory_order_seq_cst)` |`amo<op>.{w\|d}.aqrl`
+
+|===
+
+[cols="<,<",options="header",]
+|===
+|C/C++ Construct |RVWMO LR/SC Mapping
+
+|`atomic_<op>(memory_order_relaxed)` |`loop:lr.{w\|d}; <op>; sc.{w\|d}; bnez loop`
+
+|`atomic_<op>(memory_order_acquire)`
+|`loop:lr.{w\|d}.aq; <op>; sc.{w\|d}; bnez loop`
+
+|`atomic_<op>(memory_order_release)`
+|`loop:lr.{w\|d}; <op>; sc.{w\|d}.rl; bnez loop`
+
+|`atomic_<op>(memory_order_acq_rel)`
+|`loop:lr.{w\|d}.aq; <op>; sc.{w\|d}.rl; bnez loop`
+
+|`atomic_<op>(memory_order_seq_cst)`
+|`loop:lr.{w\|d}.aqrl; <op>; sc.{w\|d}.rl; bnez loop`
+|===
+
+It is expected that the RVWMO AMO Mappings will be used for atomic read-modify-write
+operations that are directly supported by corresponding AMO instructions,
+and that LR/SC mappings will be used for the remainder, currently
+including compare-exchange operations. Compare-exchange LR/SC sequences
+on the containing 32-bit word should be used for shorter operands.
+
+It is acceptable, but usually undesirable for performance reasons, to use LR/SC
+mappings where an AMO mapping would suffice.
+
+Atomics do not imply any ordering for IO operations. IO operations
+should include sufficient fences to prevent them from being visibly
+reordered with atomic operations.
+
+Float and double atomic loads and stores should be implemented using
+the integer sequences.
+
+Float and double read-modify-write instructions should consist of a loop performing
+an initial plain load of the value, followed by the floating point
+computation, followed by an integer compare-and-swap sequence to try to
+store back the updated value. This avoids floating point
+instructions between LR and SC instructions
+
+NOTE: The "Eventual Success of Store-Conditional Instructions" section
+in the ISA specification provides that essential progress guarantee only
+if there are no floating point instructions between the LR and matching SC
+instruction. By compiling such sequences with an "extra" ordinary load,
+and performing the floating point computation before the LR, we preserve
+the guarantee.
+
+== Possible `memory_order_seq_cst` store mapping optimization
+
+The `memory_order_seq_cst` store mapping may be replaced by the following for
+32- and 64-bit operands:
+
+[cols="<,<",options="header",]
+|===
+|C/C++ Construct |RVWMO Mapping
+|`atomic_store(memory_order_seq_cst)` |`amoswap.rl{w\|d};`
+|===
+
+NOTE: We expect this to be profitable in most cases. The same mapping may be
+used for the `memory_order_release` case, where it is less likely to be
+profitable.
+
+== Weakening of the LR/SC `memory_order_seq_cst` mapping
+
+The final LR/SC mapping may be weakened by replacing the `aqrl` ordering above
+with `aq`:
+
+[cols="<,<",options="header",]
+|===
+|C/C++ Construct |RVWMO LR/SC Mapping
+
+|`atomic_<op>(memory_order_seq_cst)`
+|`loop:lr.{w\|d}.aq; <op>; sc.{w\|d}.rl; bnez loop`
+|===
+
+NOTE: This has clear performance advantages. However the resulting mapping is
+no longer compatible with the "Table A.6" mapping, which was used by some
+implementations as a preliminary ABI for atomics. Thus this optimization should
+be postponed until code compiled according to that earlier specification is no
+longer in circulation. For platforms that did not implement this earlier
+specification, there is no reason to delay.
+