-
Notifications
You must be signed in to change notification settings - Fork 165
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
3 changed files
with
203 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -12,3 +12,5 @@ include::riscv-elf.adoc[] | |
include::riscv-dwarf.adoc[] | ||
|
||
include::riscv-rtabi.adoc[] | ||
|
||
include::riscv-atomic.adoc[] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,200 @@ | ||
[[riscv-atomics]] | ||
= RISC-V Atomics ABI Specification | ||
ifeval::["{docname}" == "riscv-atomics"] | ||
include::prelude.adoc[] | ||
endif::[] | ||
|
||
== RISC-V atomics mappings | ||
|
||
This specifies mappings of C and C\++ atomic operations to RISC-V | ||
machine instructions. Other languages, for example Java, provide similar | ||
facilities that should be implemented in a consistent manner, usually | ||
by applying the mapping for the corresponding C++ primitive. | ||
|
||
NOTE: Because different programming languages may be used within the same | ||
process, these mappings must be compatible across programming languages. For | ||
example, Java programmers expect memory ordering guarantees to be enforced even | ||
if some of the actual memory accesses are performed by a library written in | ||
C. | ||
|
||
NOTE: Though many mappings are possible, not all of them will interoperate | ||
correctly. In particular, many mapping combinations will not | ||
correctly enforce ordering between a C++ `memory_order_seq_cst` | ||
store and a subsequent `memory_order_seq_cst` load. | ||
|
||
NOTE: Our choice of mappings anticipates | ||
the future addition of load-acquire and store-release | ||
instructions, allowing those to be incorporated without introducing an | ||
ABI incompatibility. The primary design goal is to maximize performance | ||
of the mappings _with those instructions_ . See Table A.7 in the | ||
"unprivileged" architecture specification (as stated, without the | ||
footnote) for a preview of those mappings. | ||
The mapping for `memory_order_seq_cst` stores uses an otherwise unnecessary | ||
trailing fence to avoid such an "ABI break". | ||
|
||
These mappings currently assume only the A extension. | ||
|
||
We first present the basic mapping, and then suggest some possible | ||
optimizations that are compatible with both the basic mapping and | ||
our future target mapping. However these optimizations may not be | ||
universally appropriate, for reasons discussed below. | ||
|
||
We present the basic mappings as a table in 3 sections. The first | ||
deals with translations for loads, stores, and fences. The next two sections | ||
address mappings for read-modify-write operations like `fetch_add`, and | ||
`exchange`. The second section deals with operations that have direct | ||
`amo` instruction equivalents in the RISC-V A extension. The final | ||
section deals with other read-modify-write operations that require | ||
the `lr` and `sc` instructions. | ||
|
||
NOTE: These mappings are very similar to those that originally appeared in the | ||
appendix of the RISC-V "unprivileged" architecture specification as | ||
"Mappings from C/C++ primitives to RISC-V Primitives", which we will | ||
refer to by their 2019 historical label of "Table A.6". Our basic specification | ||
differs *only* in that `atomic_store(memory_order_seq_cst)` | ||
has an extra trailing fence for compatibility with the "Hypothetical mappings ..." | ||
table in the same section, which we similarly refer to as "Table A.7". | ||
|
||
[[tab:c11mappings]] | ||
.Mappings from C/C++ primitives to RISC-V primitives | ||
[cols="<,<",options="header",] | ||
|=== | ||
|C/C++ Construct |RVWMO Mapping | ||
|Non-atomic load |`l{b\|h\|w\|d}` | ||
|
||
|`atomic_load(memory_order_relaxed)` |`l{b\|h\|w\|d}` | ||
|
||
|`atomic_load(memory_order_acquire)` |`l{b\|h\|w\|d}; fence r,rw` | ||
|
||
|`atomic_load(memory_order_seq_cst)` |`fence rw,rw; l{b\|h\|w\|d}; fence r,rw` | ||
|
||
|Non-atomic store |`s{b\|h\|w\|d}` | ||
|
||
|`atomic_store(memory_order_relaxed)` |`s{b\|h\|w\|d}` | ||
|
||
|`atomic_store(memory_order_release)` |`fence rw,w; s{b\|h\|w\|d}` | ||
|
||
|`atomic_store(memory_order_seq_cst)` |`fence rw,w; s{b\|h\|w\|d}; fence rw,rw;` | ||
|
||
|`atomic_thread_fence(memory_order_acquire)` |`fence r,rw` | ||
|
||
|`atomic_thread_fence(memory_order_release)` |`fence rw,w` | ||
|
||
|`atomic_thread_fence(memory_order_acq_rel)` |`fence.tso` | ||
|
||
|`atomic_thread_fence(memory_order_seq_cst)` |`fence rw,rw` | ||
|=== | ||
|
||
[cols="<,<",options="header",] | ||
|=== | ||
|C/C++ Construct |RVWMO AMO Mapping | ||
|`atomic_<op>(memory_order_relaxed)` |`amo<op>.{w\|d}` | ||
|
||
|`atomic_<op>(memory_order_acquire)` |`amo<op>.{w\|d}.aq` | ||
|
||
|`atomic_<op>(memory_order_release)` |`amo<op>.{w\|d}.rl` | ||
|
||
|`atomic_<op>(memory_order_acq_rel)` |`amo<op>.{w\|d}.aqrl` | ||
|
||
|`atomic_<op>(memory_order_seq_cst)` |`amo<op>.{w\|d}.aqrl` | ||
|
||
|=== | ||
|
||
[cols="<,<",options="header",] | ||
|=== | ||
|C/C++ Construct |RVWMO LR/SC Mapping | ||
|
||
|`atomic_<op>(memory_order_relaxed)` |`loop:lr.{w\|d}; <op>; sc.{w\|d}; bnez loop` | ||
|
||
|`atomic_<op>(memory_order_acquire)` | ||
|`loop:lr.{w\|d}.aq; <op>; sc.{w\|d}; bnez loop` | ||
|
||
|`atomic_<op>(memory_order_release)` | ||
|`loop:lr.{w\|d}; <op>; sc.{w\|d}.rl; bnez loop` | ||
|
||
|`atomic_<op>(memory_order_acq_rel)` | ||
|`loop:lr.{w\|d}.aq; <op>; sc.{w\|d}.rl; bnez loop` | ||
|
||
|`atomic_<op>(memory_order_seq_cst)` | ||
|`loop:lr.{w\|d}.aqrl; <op>; sc.{w\|d}.rl; bnez loop` | ||
|=== | ||
|
||
It is expected that the RVWMO AMO Mappings will be used for atomic read-modify-write | ||
operations that are directly supported by corresponding AMO instructions, | ||
and that LR/SC mappings will be used for the remainder, currently | ||
including compare-exchange operations. Compare-exchange LR/SC sequences | ||
on the containing 32-bit word should be used for shorter operands. | ||
|
||
It is acceptable, but usually undesirable for performance reasons, to use LR/SC | ||
mappings where an AMO mapping would suffice. | ||
|
||
Atomics do not imply any ordering for IO operations. IO operations | ||
should include sufficient fences to prevent them from being visibly | ||
reordered with atomic operations. | ||
|
||
Float and double atomic loads and stores should be implemented using | ||
the integer sequences. | ||
|
||
Float and double read-modify-write instructions should consist of a loop performing | ||
an initial plain load of the value, followed by the floating point | ||
computation, followed by an integer compare-and-swap sequence to try to | ||
store back the updated value. This avoids floating point | ||
instructions between LR and SC instructions | ||
|
||
NOTE: The "Eventual Success of Store-Conditional Instructions" section | ||
in the ISA specification provides that essential progress guarantee only | ||
if there are no floating point instructions between the LR and matching SC | ||
instruction. By compiling such sequences with an "extra" ordinary load, | ||
and performing the floating point computation before the LR, we preserve | ||
the guarantee. | ||
|
||
== Possible `memory_order_seq_cst` store mapping optimization | ||
|
||
The `memory_order_seq_cst` store mapping may be replaced by the following for | ||
32- and 64-bit operands: | ||
|
||
[cols="<,<",options="header",] | ||
|=== | ||
|C/C++ Construct |RVWMO Mapping | ||
|`atomic_store(memory_order_seq_cst)` |`amoswap.rl{w\|d};` | ||
|=== | ||
|
||
NOTE: We expect this to be profitable in most cases. The same mapping may be | ||
used for the `memory_order_release` case, where it is less likely to be | ||
profitable. | ||
|
||
== Weakening of the LR/SC `memory_order_seq_cst` mapping | ||
|
||
The final LR/SC mapping may be weakened by replacing the `aqrl` ordering above | ||
with `aq`: | ||
|
||
[cols="<,<",options="header",] | ||
|=== | ||
|C/C++ Construct |RVWMO LR/SC Mapping | ||
|
||
|`atomic_<op>(memory_order_seq_cst)` | ||
|`loop:lr.{w\|d}.aq; <op>; sc.{w\|d}.rl; bnez loop` | ||
|=== | ||
|
||
NOTE: This has clear performance advantages. However the resulting mapping is | ||
no longer compatible with the "Table A.6" mapping, which was used by some | ||
implementations as a preliminary ABI for atomics. Thus this optimization should | ||
be postponed until code compiled according to that earlier specification is no | ||
longer in circulation. For platforms that did not implement this earlier | ||
specification, there is no reason to delay. | ||
|
||
//// | ||
== Other missing specifications | ||
In addition something should specify size and alignment of atomic types | ||
such as `atomic<struct { char a; char b; }>`. It is unclear to what | ||
extent this should be processor-specific. It may not belong in the psABI. | ||
Something should specify functions to be called for "large" atomic | ||
operations. This should ideally not be processor-specific. | ||
Should there be an ELF note or similar annotation associated with conformance | ||
to this ABI, probably with a different one for the future A.7 ABI? This would | ||
allow linker warnings if the A.7 mapping is used in connection with a convention | ||
predating this ABI. | ||
//// |