-
Notifications
You must be signed in to change notification settings - Fork 165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial psABI atomics specification #378
Conversation
We have prototypes for LLVM out for review. https://reviews.llvm.org/D149486 |
an initial plain load of the value, followed by the floating point | ||
computation, followed by an integer compare-and-swap sequence to try to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will the same floating-point exception flags be asserted each time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The C++ standard says "The floating-point environment (28.3) for atomic arithmetic operations on
floating-point-type may be different than the calling thread’s floating-point environment." That hopefully covers us here sufficiently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it justifies using a different rounding mode for the computation, as well as not reporting all exception flags raised back to the calling thread's fenv
. I need more convincing that it allows the atomic implementation to update the calling thread's fenv
with exception flags that don't correspond to the particular floating point operation that was ultimately performed. (This feels like it violates sequential consistency.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, it looks like libstdc++ already does what you suggest:
https://github.com/gcc-mirror/gcc/blob/master/libstdc%2B%2B-v3/include/bits/atomic_base.h#L1163-L1173
... and I now see that you coauthored https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0020r6.html , which includes the sentence you quoted earlier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. AFAICT, this is an issue with the C++ standard's formulation. I'll ask if Carter remembers some reason this is OK as is.
AFAICT, this is all kind of a mess at the language standards level, so it's unclear how much we can really do here. C++ has fetch_add(), and says a little about exceptions, but probably not enough. On the other hand, the FENV_ACCESS pragma is not generally supported, so this isn't really guaranteed to work. C does not provide floating-point fetch-add, but it does provide atomic +=. And it seems to require that floating point flags are saved and restored. AFAICT, gcc actually does that, but clang doesn't. I think the gcc behavior is much more correct, but I would guess the clang behavior is what's usually desired.
I'll add some minimal weasel-wording.
Gcc patches to implement this version of the atomics ABI were just committed by Patrick O'Neill, See https://inbox.sourceware.org/gcc-patches/[email protected]/ for the final discussion of strengthened seq_cst stores. |
LGTM in the sense that the text does what it claims (copying across table A.6 with the change of the additional fence for a sequentially consistent store). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! really appropriate this PR :)
Just few minor comment for this PR
riscv-atomic.adoc
Outdated
the future addition of load-acquire and store-release | ||
instructions, allowing those to be incorporated without introducing an | ||
ABI incompatibility. The primary design goal is to maximize performance | ||
of the mappings _with those instructions_ . See Table A.7 in the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am little hesitate about the cross document table reference here, my main concern is I am not sure the A.7
number is stable for long time ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the label unfortunately already changed, I agree that it's important to fix. I attempted to so so. Let me know what you think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like everybody use the A.6
and A.7
for communication (and also that's kind of well know). I'm wondering does it possible to label those two tables as A.6 and A.7 in the ISA spec?
cc @aswaterman
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They already are labeled as A.6 and A.7, right? That's where the names came from.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the most recent ISA manual draft they've been relabeled to "Table 54" and "Table 55" respectively
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even if they get renumbered to Chapter.Counter, there's no guarantee tables won't be added or deleted, or chapters added, deleted or otherwise renumbered (if that can include appendices' letters), thereby still renaming the tables
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True. Perhaps it is better to include them in this document directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The intent here was to first describe these references via the text starting at line 50. Unfortunately, I missed the earlier reference here. If you think th text around line 50 suffices, I will move that up.
I'm OK with moving A.7 here, so long as we can explain where it came from. It seems a bit strange to include a mapping here that is not yet implementable.
I'm less enthusiastic about moving A.6. It seems confusing to have a mapping table here that we don't want people to use. I guess we could "define" "Table A.6" to be the mapping here with the fence removed, with a comment about the origin of the label?
IMO, the alternative would be to say "Formerly Table A.6" in the ISA spec caption, and try to maintain that, no matter what the actual table number turns out to be. That seems a bit cleaner to me, unless it's likely the tables will be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can't we just cite a ratified version of the ISA spec? For example, just point at the PDF available at
https://riscv.org/technical/specifications/
or, if we're worried about RVI swapping this out with the AsciiDoc version,
https://github.com/riscv/riscv-isa-manual/releases/tag/Ratified-IMAFDQC
A separate question is whether some of this material should be moved out of the ISA spec, to here or elsewhere. I don't think we should address that in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking about this again after our meeting today, I'd like to restructure this into basically a single table that has both the "A.6 + trailing store fence" mappings here, and essentially the "A.7" mappings, as well as the optional mappings currently listed separately below. For the A.7 mappings, I would describe the required semantics of the "hypothetical" instructions, analogously to the "table formerly known as A.7" in the ISA spec.
The rules would be that you could pick any of the alternate mappings in the table, and they would play correctly, so you can make the choice based on performance. A few of the mappings would come with one of two disclaimers: (1) Relies on not-yet-approved ISA extension, or (2) Incompatible with "A.6".
The intent would be to make this table extensible in the future. If we eventually get RCpc operations, we could add those, without changing the structure or meaning of the table.
As discussed in one of the LLVM patches, I am extremely concerned about the prospect of breaking the atomics ABI that has been employed by LLVM for years and many releases, and do not believe the model should be slowed down by additional fences that aren’t necessary for the current ABI unless we are absolutely sure we want to take on the pain of introducing concurrency bugs that are hard to reproduce, hard to debug, go away when recompiling, etc.; having broken atomics is a world of hurt. |
Re: Jessica's comment: I understand the concern. This has certainly been discussed before. The goal here is precisely to avoid future breakage. Here's where I'm coming from:
Of course, if anyone has technical disagreements with any of this, please discuss. |
5a40656
to
9e93dd6
Compare
This atomic change is definitely an ABI incompatible change - since it is inconsistent with existing LLVM and GCC implementation; so one intuition solution is adding a flag to indicate the atomic implementation, e.g., adding a new flag RISC-V GCC got several feedbacks in past years that indicate there might have some potential issues with atomic sequence (hard to reproduce and confirm, so I use That might be a little bit like a word game, but I am treating this as a bug fix for atomic stuff rather than an incompatible change from the RISC-V GCC aspect. Unfortunately, we didn't standardize the atomic sequence before, and now it's time to standardize and fix the incompatible issue between different compiler implementations.
|
My understanding is that the preferred atomic sequences were effectively standardised in table A.6 of the ISA spec, though it's since been decided the psABI spec would be a better home. LLVM implemented that table, but GCC's atomics implementation predated it. I think it was hoped that GCC's atomics implementation lowering was just stronger than required, but it sounds like there's actually compatibility issues. I defer to the experts on the merits of compatibility with the A7 table vs the cost of slightly longer instruction sequences). But this proposal seems to have involved multiple RISC-V implementers who haven't voiced concerns. |
The GCC patch notes suggest that previous implementation also had some subtle correctness issues, which is why the patch was accepted as a bug fix rather than just a change to the ABI. Since they already had to break ABI w/ the previous implementation to fix the bug, the rationale seems to be that they would rather just limit future breaks. GCC's implementation was (and still is) ABI incompatible with the one in LLVM, and that is going to become a source of subtle bugs. |
It's worth emphasizing that this is not an ABI break for implementations that used A.6 (since renumbered, but let's stick with that name), i.e. LLVM. This is A.6 with an additional fence to avoid a future ABI break. There appears to be a consensus that we're moving to A.7 eventually. (Since there is/was and A.7 in the ISA spec, that seems to have been anticipated.) A.6 to A.7 is unavoidably an ABI break. The goal of this proposal is to minimize its impact. This version minimizes the impact of that unavoidable ABI break in two ways:
|
I'll just point out that:
|
Thanks for the correction. For some reason I was under the impression that the stronger ordering w/ the additional fence may have introduced a potential inconsistency/ source of bugs in the atomics ABI. Given that the existing implementation isn't incompatible with this proposal, my previous concerns seem to be unfounded. |
I'm not sure whether that was the intended meaning, but I would disagree that the eventual switch to the "A.7" atomics ABI is a minor performance improvement. From my perspective, probably the most common and convincing reason for using atomics, rather than lock-based synchronization, in production code is to avoid cache contention among readers for read-mostly data structures. Most accesses require acquire-like semantics which, in standard-conforming code, rely on acquire or seq_cst code, depending on the language, etc. (Dependency ordering sometimes suffices, but can't really be used in portable code.) Based on the ARM Cortex microbenchmark results posted in https://lists.riscv.org/g/tech-unprivileged/message/382, the current mappings are significantly off the mark compared to what could be expected from the A.7 mappings, at least based on current out-of-order ARM processors. In the seq_cst case, we're adding on the order of a dozen cycles to what's probably the most common atomics operation, and one that (in the uncontended cache hit case measured here) normally takes much less than that. Granted, this is muddied up a bit by large hardware and application differences. But I would expect this is easily the most significant atomics performance issue we're likely to encounter, possibly outside large system LR/SC scalability issues. The fact that we're adding a penalty someplace where ARM and x86 commonly don't have much of one, is particularly painful. |
Though you agree that the change in this PR isn't actually an ABI break? It just (from my perspective) makes a potential future ABI break less disruptive my providing forwards compatibility. Even if FreeBSD opted not to change to something based on the A.7 mapping, the improved compatibility with that table could still be an advantage to users compiling code for the system (if the benefit outweighs the cost of slightly longer lowerings in some cases). |
The proposed LLVM patches are not an ABI break, but they add pointless overhead if the ABI is not later broken to move to A.7, therefore they should only be applied for FreeBSD if FreeBSD is willing to break ABI for this. |
I think there probably are reasons to go ahead with it even if there's no interest in a future ABI break:
What do you think is the path forwards for this proposal? |
To be clear about the concern here: we're worried about FreeBSD packages that will not get recompiled between the LLVM change, and when enough load-acquire store-release capable hardware appears to make the A.7 mapping interesting? My presumption is that interval (unfortunately, in other respects) will be measured in years. Are you worried about having to recompile those packages during those years, or not detecting the incompatibility if you miss one? I think the latter could be addressed fairly easily with some kind of ELF annotation to record ABI conformance, something that was suggested in a comment in earlier versions of this pull request. I'd welcome such an addition. I'm not enough of an ELF expert to suggest the right way to do this. I think there are strong arguments for keeping these conventions consistent not only across compilers and languages, but also across mainstream operating systems. Otherwise not only would default compiler configurations have to vary by OS, but something like OpenJDK's JIT compiler would have to generate different code depending on which OS it's running on, so that it interacts fully correctly with platform JNI code. This does mean that e.g. gcc's mapping conventions effectively matter, even on a platform that only uses clang/llvm. |
My concern is that adopting A.7 is a userspace ABI break, which must be suitably justified for a Tier 2 architecture in FreeBSD. |
I updated the PR, as planned. The new version gives a single set of mapping tables, with multiple compatible mappings for some of the C++ constructs. Some of those mappings are marked as not implementable without new instructions, and some are marked as incompatible with "A.6 classic". Aside from the marked "A.6 classic" issues, unrestricted mix-and-match is allowed. This changes the presentation into a format that I like better. And it sounded like it might be better received here. It is not intended to change the actual described mappings, except perhaps by making some options and relationships more explicit. |
Created a PR for adding atomic abi flag, #385 |
I am thinking maybe we should explicitly specify the mapping in the table and also the mapping name, because 1) easier for reference and implement for toolchain developer, 2) #385 will need a name to reference, and would be great to define within this doc. Something like this: [[tab:c11mappings]]
.Mappings from C/C++ primitives to RISC-V primitives
[cols="<28,<4,<12,<4",options="header",]
|===
|C/C++ Construct | Mapping | Instruction |Notes
|Non-atomic store | All | `s{b\|h\|w\|d}` |
.2+|`atomic_store(memory_order_release)` | A6C | `fence rw,w; s{b\|h\|w\|d}` .2+| 1, 2 |
A6F | `fence rw,w; s{b\|h\|w\|d}; fence rw,w`
|=== And also does it possible to including the TSO mapping into the table? |
We currently don't have an agreed upon mapping for Ztso so I'm not sure what we would include as the TSO mappings. There was some discussion on the GCC mailing list about A.7 compatibility of Andrea Parri's proposed mappings. The mapping proposal does not include the discussed changes for A.7 compatibility. |
I think we should try hard to minimize the number of "ABIs" we define here to keep it as clear as possible what new code should do, and to maximize the degree to which code interoperates. I'm happy to define additional properties of generated code that can be mentioned in the ELF flag specification. This is kind of similar to defining additional ABIs, but it makes it clearer that the alternative conventions are not ABIs that should be exposed to users by new systems. There is (for atomics ) one ABI, and there are also legacy compatibility issues. Ignoring TSO for now, I agree we need 2 ABI compatibility flag bits, which I would define something like the following: Define "legacy compatible" as "conforms to this ABI and does not use Note 3 mappings". I can add that definition to this PR. Then the 4 possible values are:
Each one is compatible with itself. ( (0) really isn't, but that's water under the bridge, and best ignored.) (1) is compatible with (2), and (2) is compatible with (3). All other pairs, particularly (1) and (3) are incompatible. I expect gcc to (continue to) migrate (0)->(2)->(3) and clang to migrate (1)->(2)->(3). I would hope that we do not need to extend this in the future. And, in the absence of some terrible bug, I wouldn't expect us to be able to agree on such an addition once we have a much larger number of ABI-sensitive systems deployed. I expect that we will add relevant instructions, which will add mappings to the table. Some of these will need Note 3 because they are also incompatible with A.6. Some of them may not fully follow RVWMO rules, and will need fences to hide that fact, basically following the current model for IO. But I don't expect them to introduce new kinds of atomics compatibility issues that would prevent linking to older code. Thus such new code should continue to be labelled as (3), or possibly even (2). Future extensions may need some way to ensure that they are only used on hardware with the right kind of support, but I think that's orthogonal to what we're trying to do here? It seems to me that we can also make TSO fit into the above framework, though I haven't followed the recent discussion there: The current table has some rows that are only valid with additional hardware extensions (in this case the load-acquire/store-release one). Similarly we can add rows to the table that are only valid with Ztso, and will thus typically involve shorter instruction sequences. (These would probably be presented as another table, but they're not logically distinct.) But whenever they are supported by hardware, they are compatible with the rest. Thus TSO does not introduce another ABI in my current sense. (Though I still think there may be some ecosystem fragmentation issue.) This would argue that the "TSO bit" is really something different from the two "atomics ABI" bits discussed above. It sounded like the TSO mappings are still in flux? But if someone wants to point me at the consensus set, and we agree that they should go here, I can attempt that integration. |
Thanks for correction, I guess I have such wrong impression is because it's ratified and LLVM has landed some code gen. |
ztso remains behind an experimental flag, so the partial codegen changes that were merged in LLVM are very explicitly not considered stable, and are open to change as the mappings are agreed. |
@hboehm For TSO: For flags: @asb @jrtc27 want to make sure again it looks good to you from the view of LLVM / FreeBSD. |
The main thing I'm checking from an LLVM perspective is that the text concerning the compatibility with the old a.6 table (and hence the compatibility with the current LLVM implementation) is sufficiently clear. Looks good to me. |
do you have a specific suggestion for the text here, so we can move forward with the llvm change? |
I am OK with current text, just waiting ack from LLVM land, and seems asb is OK, so let we moving forward :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LLVM maintainer is acked, and also GCC community also no objection, so gonna merge this :)
awesome, thanks! (timed well to match our preliminary ABI announcement at the risc-v summit in barcelona today :-) ) |
This is a similar change to one proposed for GCC: https://inbox.sourceware.org/gcc-patches/[email protected]/ The changes in this patch are based on the proposal by Hans Boehm to more closely match the intended semantics for sequentially consistent stores and to allow some platforms to avoid an ABI break when switching to more performant atomic instructions. Platforms that have already compiled code using the existing mappings will also have more time to gradually replace that code in preparation of the switch. Further details can be found in the psABI proposal: riscv-non-isa/riscv-elf-psabi-doc#378. This patch implements a mapping that is stronger than the one outlined in table A.6 of the RISC-V unprivileged spec to be future compatible with table A.7 of the same document. The related discussion can be found at https://lists.riscv.org/g/tech-unprivileged/topic/risc_v_memory_model_topics/92916241 The major change to RISC-V code generation is that we will now emit a trailing fence for sequentially consistent stores. The new code sequence should have the following form: ``` fence rw,w; s{b|h|w|d}; fence rw,rw; ``` Other changes and optimizations like using amoswap will be handled separately. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D149486
This is a similar change to one proposed for GCC: https://inbox.sourceware.org/gcc-patches/[email protected]/ The changes in this patch are based on the proposal by Hans Boehm to more closely match the intended semantics for sequentially consistent stores and to allow some platforms to avoid an ABI break when switching to more performant atomic instructions. Platforms that have already compiled code using the existing mappings will also have more time to gradually replace that code in preparation of the switch. Further details can be found in the psABI proposal: riscv-non-isa/riscv-elf-psabi-doc#378. This patch implements a mapping that is stronger than the one outlined in table A.6 of the RISC-V unprivileged spec to be future compatible with table A.7 of the same document. The related discussion can be found at https://lists.riscv.org/g/tech-unprivileged/topic/risc_v_memory_model_topics/92916241 The major change to RISC-V code generation is that we will now emit a trailing fence for sequentially consistent stores. The new code sequence should have the following form: ``` fence rw,w; s{b|h|w|d}; fence rw,rw; ``` Other changes and optimizations like using amoswap will be handled separately. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D149486
This is a similar change to one proposed for GCC: https://inbox.sourceware.org/gcc-patches/[email protected]/ The changes in this patch are based on the proposal by Hans Boehm to more closely match the intended semantics for sequentially consistent stores and to allow some platforms to avoid an ABI break when switching to more performant atomic instructions. Platforms that have already compiled code using the existing mappings will also have more time to gradually replace that code in preparation of the switch. Further details can be found in the psABI proposal: riscv-non-isa/riscv-elf-psabi-doc#378. This patch implements a mapping that is stronger than the one outlined in table A.6 of the RISC-V unprivileged spec to be future compatible with table A.7 of the same document. The related discussion can be found at https://lists.riscv.org/g/tech-unprivileged/topic/risc_v_memory_model_topics/92916241 The major change to RISC-V code generation is that we will now emit a trailing fence for sequentially consistent stores. The new code sequence should have the following form: ``` fence rw,w; s{b|h|w|d}; fence rw,rw; ``` Other changes and optimizations like using amoswap will be handled separately. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D149486
Define an ABI for RISC-V atomics that allows for eventual migration to the Table A.7 mapping in the unprivileged architecture spec. Our main spec differs from A.6 only in that it requires an additional fence in the store(..., memory_order_seq_cst) instruction sequence.
We believe that the Table A.7 mapping, together with the new instructions it requires, will be necessary to be performance competitive with other architectures for seq_cst operations, especially for processor designs similar to current out-of-order mobile cores. Starting with the ABI described here will allow some platforms to completely avoid an ABI break when switching to A.7. Platforms that already have code compiled to (unmodified) A.6 will get more time to gradually replace that code in preparation for such a switch.
More discussion can be found around https://lists.riscv.org/g/tech-unprivileged/message/382 .