Skip to content

Commit

Permalink
Initial psABI atomics specification
Browse files Browse the repository at this point in the history
  • Loading branch information
hboehm committed May 12, 2023
1 parent cfed71f commit 5a40656
Show file tree
Hide file tree
Showing 3 changed files with 203 additions and 0 deletions.
1 change: 1 addition & 0 deletions introduction.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@ This specification uses the following terms and abbreviations:
| XLEN | The width of an integer register in bits
| FLEN | The width of a floating-point register in bits
| Linker relaxation | A mechanism for optimizing programs at link-time, see <<Linker Relaxation>> for more detail.
| RVWMO | RISC-V Weak Memory Order, as defined in the RISC-V specification.
|===

= Status of ABI
Expand Down
2 changes: 2 additions & 0 deletions riscv-abi.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -12,3 +12,5 @@ include::riscv-elf.adoc[]
include::riscv-dwarf.adoc[]

include::riscv-rtabi.adoc[]

include::riscv-atomic.adoc[]
200 changes: 200 additions & 0 deletions riscv-atomic.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,200 @@
[[riscv-atomics]]
= RISC-V Atomics ABI Specification
ifeval::["{docname}" == "riscv-atomics"]
include::prelude.adoc[]
endif::[]

== RISC-V atomics mappings

This specifies mappings of C and C\++ atomic operations to RISC-V
machine instructions. Other languages, for example Java, provide similar
facilities that should be implemented in a consistent manner, usually
by applying the mapping for the corresponding C++ primitive.

NOTE: Because different programming languages may be used within the same
process, these mappings must be compatible across programming languages. For
example, Java programmers expect memory ordering guarantees to be enforced even
if some of the actual memory accesses are performed by a library written in
C.

NOTE: Though many mappings are possible, not all of them will interoperate
correctly. In particular, many mapping combinations will not
correctly enforce ordering  between a C++ `memory_order_seq_cst`
store and a subsequent `memory_order_seq_cst` load.

NOTE: Our choice of mappings anticipates
the future addition of load-acquire and store-release
instructions, allowing those to be incorporated without introducing an
ABI incompatibility. The primary design goal is to maximize performance
of the mappings _with those instructions_ . See Table A.7 in the
"unprivileged" architecture specification (as stated, without the
footnote) for a preview of those mappings.
The mapping for `memory_order_seq_cst` stores uses an otherwise unnecessary
trailing fence to avoid such an "ABI break".

These mappings currently assume only the A extension.

We first present the basic mapping, and then suggest some possible
optimizations that are compatible with both the basic mapping and
our future target mapping. However these optimizations may not be
universally appropriate, for reasons discussed below.

We present the basic mappings as a table in 3 sections. The first
deals with translations for loads, stores, and fences. The next two sections
address mappings for read-modify-write operations like `fetch_add`, and
`exchange`. The second section deals with operations that have direct
`amo` instruction equivalents in the RISC-V A extension. The final
section deals with other read-modify-write operations that require
the `lr` and `sc` instructions.

NOTE: These mappings are very similar to those that originally appeared in the
appendix of the RISC-V "unprivileged" architecture specification as
"Mappings from C/C++ primitives to RISC-V Primitives", which we will
refer to by their 2019 historical label of "Table A.6". Our basic specification
differs *only* in that `atomic_store(memory_order_seq_cst)`
has an extra trailing fence for compatibility with the "Hypothetical mappings ..."
table in the same section, which we similarly refer to as "Table A.7".

[[tab:c11mappings]]
.Mappings from C/C++ primitives to RISC-V primitives
[cols="<,<",options="header",]
|===
|C/C++ Construct |RVWMO Mapping
|Non-atomic load |`l{b\|h\|w\|d}`

|`atomic_load(memory_order_relaxed)` |`l{b\|h\|w\|d}`

|`atomic_load(memory_order_acquire)` |`l{b\|h\|w\|d}; fence r,rw`

|`atomic_load(memory_order_seq_cst)` |`fence rw,rw; l{b\|h\|w\|d}; fence r,rw`

|Non-atomic store |`s{b\|h\|w\|d}`

|`atomic_store(memory_order_relaxed)` |`s{b\|h\|w\|d}`

|`atomic_store(memory_order_release)` |`fence rw,w; s{b\|h\|w\|d}`

|`atomic_store(memory_order_seq_cst)` |`fence rw,w; s{b\|h\|w\|d}; fence rw,rw;`

|`atomic_thread_fence(memory_order_acquire)` |`fence r,rw`

|`atomic_thread_fence(memory_order_release)` |`fence rw,w`

|`atomic_thread_fence(memory_order_acq_rel)` |`fence.tso`

|`atomic_thread_fence(memory_order_seq_cst)` |`fence rw,rw`
|===

[cols="<,<",options="header",]
|===
|C/C++ Construct |RVWMO AMO Mapping
|`atomic_<op>(memory_order_relaxed)` |`amo<op>.{w\|d}`

|`atomic_<op>(memory_order_acquire)` |`amo<op>.{w\|d}.aq`

|`atomic_<op>(memory_order_release)` |`amo<op>.{w\|d}.rl`

|`atomic_<op>(memory_order_acq_rel)` |`amo<op>.{w\|d}.aqrl`

|`atomic_<op>(memory_order_seq_cst)` |`amo<op>.{w\|d}.aqrl`

|===

[cols="<,<",options="header",]
|===
|C/C++ Construct |RVWMO LR/SC Mapping

|`atomic_<op>(memory_order_relaxed)` |`loop:lr.{w\|d}; <op>; sc.{w\|d}; bnez loop`

|`atomic_<op>(memory_order_acquire)`
|`loop:lr.{w\|d}.aq; <op>; sc.{w\|d}; bnez loop`

|`atomic_<op>(memory_order_release)`
|`loop:lr.{w\|d}; <op>; sc.{w\|d}.rl; bnez loop`

|`atomic_<op>(memory_order_acq_rel)`
|`loop:lr.{w\|d}.aq; <op>; sc.{w\|d}.rl; bnez loop`

|`atomic_<op>(memory_order_seq_cst)`
|`loop:lr.{w\|d}.aqrl; <op>; sc.{w\|d}.rl; bnez loop`
|===

It is expected that the RVWMO AMO Mappings will be used for atomic read-modify-write
operations that are directly supported by corresponding AMO instructions,
and that LR/SC mappings will be used for the remainder, currently
including compare-exchange operations. Compare-exchange LR/SC sequences
on the containing 32-bit word should be used for shorter operands.

It is acceptable, but usually undesirable for performance reasons, to use LR/SC
mappings where an AMO mapping would suffice.

Atomics do not imply any ordering for IO operations. IO operations
should include sufficient fences to prevent them from being visibly
reordered with atomic operations.

Float and double atomic loads and stores should be implemented using
the integer sequences.

Float and double read-modify-write instructions should consist of a loop performing
an initial plain load of the value, followed by the floating point
computation, followed by an integer compare-and-swap sequence to try to
store back the updated value. This avoids floating point
instructions between LR and SC instructions

NOTE: The "Eventual Success of Store-Conditional Instructions" section
in the ISA specification provides that essential progress guarantee only
if there are no floating point instructions between the LR and matching SC
instruction. By compiling such sequences with an "extra" ordinary load,
and performing the floating point computation before the LR, we preserve
the guarantee.

== Possible `memory_order_seq_cst` store mapping optimization

The `memory_order_seq_cst` store mapping may be replaced by the following for
32- and 64-bit operands:

[cols="<,<",options="header",]
|===
|C/C++ Construct |RVWMO Mapping
|`atomic_store(memory_order_seq_cst)` |`amoswap.rl{w\|d};`
|===

NOTE: We expect this to be profitable in most cases. The same mapping may be
used for the `memory_order_release` case, where it is less likely to be
profitable.

== Weakening of the LR/SC `memory_order_seq_cst` mapping

The final LR/SC mapping may be weakened by replacing the `aqrl` ordering above
with `aq`:

[cols="<,<",options="header",]
|===
|C/C++ Construct |RVWMO LR/SC Mapping

|`atomic_<op>(memory_order_seq_cst)`
|`loop:lr.{w\|d}.aq; <op>; sc.{w\|d}.rl; bnez loop`
|===

NOTE: This has clear performance advantages. However the resulting mapping is
no longer compatible with the "Table A.6" mapping, which was used by some
implementations as a preliminary ABI for atomics. Thus this optimization should
be postponed until code compiled according to that earlier specification is no
longer in circulation. For platforms that did not implement this earlier
specification, there is no reason to delay.

////
== Other missing specifications
In addition something should specify size and alignment of atomic types
such as `atomic<struct { char a; char b; }>`. It is unclear to what
extent this should be processor-specific. It may not belong in the psABI.
Something should specify functions to be called for "large" atomic
operations. This should ideally not be processor-specific.
Should there be an ELF note or similar annotation associated with conformance
to this ABI, probably with a different one for the future A.7 ABI? This would
allow linker warnings if the A.7 mapping is used in connection with a convention
predating this ABI.
////

0 comments on commit 5a40656

Please sign in to comment.