Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial psABI atomics specification #378

Merged
merged 1 commit into from
Jun 7, 2023

Conversation

hboehm
Copy link
Contributor

@hboehm hboehm commented Apr 28, 2023

Define an ABI for RISC-V atomics that allows for eventual migration to the Table A.7 mapping in the unprivileged architecture spec. Our main spec differs from A.6 only in that it requires an additional fence in the store(..., memory_order_seq_cst) instruction sequence.

We believe that the Table A.7 mapping, together with the new instructions it requires, will be necessary to be performance competitive with other architectures for seq_cst operations, especially for processor designs similar to current out-of-order mobile cores. Starting with the ABI described here will allow some platforms to completely avoid an ABI break when switching to A.7. Platforms that already have code compiled to (unmodified) A.6 will get more time to gradually replace that code in preparation for such a switch.

More discussion can be found around https://lists.riscv.org/g/tech-unprivileged/message/382 .

@ilovepi
Copy link
Contributor

ilovepi commented Apr 28, 2023

We have prototypes for LLVM out for review.

https://reviews.llvm.org/D149486
https://reviews.llvm.org/D149488

Comment on lines +135 to +172
an initial plain load of the value, followed by the floating point
computation, followed by an integer compare-and-swap sequence to try to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will the same floating-point exception flags be asserted each time?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The C++ standard says "The floating-point environment (28.3) for atomic arithmetic operations on
floating-point-type may be different than the calling thread’s floating-point environment." That hopefully covers us here sufficiently.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it justifies using a different rounding mode for the computation, as well as not reporting all exception flags raised back to the calling thread's fenv. I need more convincing that it allows the atomic implementation to update the calling thread's fenv with exception flags that don't correspond to the particular floating point operation that was ultimately performed. (This feels like it violates sequential consistency.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, it looks like libstdc++ already does what you suggest:
https://github.com/gcc-mirror/gcc/blob/master/libstdc%2B%2B-v3/include/bits/atomic_base.h#L1163-L1173

... and I now see that you coauthored https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0020r6.html , which includes the sentence you quoted earlier.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. AFAICT, this is an issue with the C++ standard's formulation. I'll ask if Carter remembers some reason this is OK as is.
AFAICT, this is all kind of a mess at the language standards level, so it's unclear how much we can really do here. C++ has fetch_add(), and says a little about exceptions, but probably not enough. On the other hand, the FENV_ACCESS pragma is not generally supported, so this isn't really guaranteed to work. C does not provide floating-point fetch-add, but it does provide atomic +=. And it seems to require that floating point flags are saved and restored. AFAICT, gcc actually does that, but clang doesn't. I think the gcc behavior is much more correct, but I would guess the clang behavior is what's usually desired.
I'll add some minimal weasel-wording.

@hboehm
Copy link
Contributor Author

hboehm commented May 2, 2023

Gcc patches to implement this version of the atomics ABI were just committed by Patrick O'Neill, See https://inbox.sourceware.org/gcc-patches/[email protected]/ for the final discussion of strengthened seq_cst stores.

@asb
Copy link
Collaborator

asb commented May 3, 2023

LGTM in the sense that the text does what it claims (copying across table A.6 with the change of the additional fence for a sequentially consistent store).

Copy link
Collaborator

@kito-cheng kito-cheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! really appropriate this PR :)

Just few minor comment for this PR

riscv-atomic.adoc Outdated Show resolved Hide resolved
the future addition of load-acquire and store-release
instructions, allowing those to be incorporated without introducing an
ABI incompatibility. The primary design goal is to maximize performance
of the mappings _with those instructions_ . See Table A.7 in the
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am little hesitate about the cross document table reference here, my main concern is I am not sure the A.7 number is stable for long time ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the label unfortunately already changed, I agree that it's important to fix. I attempted to so so. Let me know what you think.

Copy link
Collaborator

@kito-cheng kito-cheng May 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like everybody use the A.6 and A.7 for communication (and also that's kind of well know). I'm wondering does it possible to label those two tables as A.6 and A.7 in the ISA spec?

cc @aswaterman

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They already are labeled as A.6 and A.7, right? That's where the names came from.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the most recent ISA manual draft they've been relabeled to "Table 54" and "Table 55" respectively

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if they get renumbered to Chapter.Counter, there's no guarantee tables won't be added or deleted, or chapters added, deleted or otherwise renumbered (if that can include appendices' letters), thereby still renaming the tables

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. Perhaps it is better to include them in this document directly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intent here was to first describe these references via the text starting at line 50. Unfortunately, I missed the earlier reference here. If you think th text around line 50 suffices, I will move that up.

I'm OK with moving A.7 here, so long as we can explain where it came from. It seems a bit strange to include a mapping here that is not yet implementable.

I'm less enthusiastic about moving A.6. It seems confusing to have a mapping table here that we don't want people to use. I guess we could "define" "Table A.6" to be the mapping here with the fence removed, with a comment about the origin of the label?

IMO, the alternative would be to say "Formerly Table A.6" in the ISA spec caption, and try to maintain that, no matter what the actual table number turns out to be. That seems a bit cleaner to me, unless it's likely the tables will be removed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't we just cite a ratified version of the ISA spec? For example, just point at the PDF available at
https://riscv.org/technical/specifications/
or, if we're worried about RVI swapping this out with the AsciiDoc version,
https://github.com/riscv/riscv-isa-manual/releases/tag/Ratified-IMAFDQC

A separate question is whether some of this material should be moved out of the ISA spec, to here or elsewhere. I don't think we should address that in this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about this again after our meeting today, I'd like to restructure this into basically a single table that has both the "A.6 + trailing store fence" mappings here, and essentially the "A.7" mappings, as well as the optional mappings currently listed separately below. For the A.7 mappings, I would describe the required semantics of the "hypothetical" instructions, analogously to the "table formerly known as A.7" in the ISA spec.

The rules would be that you could pick any of the alternate mappings in the table, and they would play correctly, so you can make the choice based on performance. A few of the mappings would come with one of two disclaimers: (1) Relies on not-yet-approved ISA extension, or (2) Incompatible with "A.6".

The intent would be to make this table extensible in the future. If we eventually get RCpc operations, we could add those, without changing the structure or meaning of the table.

riscv-atomic.adoc Outdated Show resolved Hide resolved
riscv-atomic.adoc Outdated Show resolved Hide resolved
@kito-cheng kito-cheng requested a review from jrtc27 May 5, 2023 06:19
@jrtc27
Copy link
Collaborator

jrtc27 commented May 5, 2023

As discussed in one of the LLVM patches, I am extremely concerned about the prospect of breaking the atomics ABI that has been employed by LLVM for years and many releases, and do not believe the model should be slowed down by additional fences that aren’t necessary for the current ABI unless we are absolutely sure we want to take on the pain of introducing concurrency bugs that are hard to reproduce, hard to debug, go away when recompiling, etc.; having broken atomics is a world of hurt.

@hboehm
Copy link
Contributor Author

hboehm commented May 5, 2023

Re: Jessica's comment:

I understand the concern. This has certainly been discussed before. The goal here is precisely to avoid future breakage. Here's where I'm coming from:

  • AFAICT, there are currently relatively few production RISC-V systems that care about ABI stability, at least relative to what I would expect in the future. (That's certainly true for Android, which I work on, where the first number is currently zero. I'm not counting very large numbers of embedded systems without major software ecosystems.) Corollary: Any breaking ABI changes are much easier now than later. Informed conjecture: We're talking about slightly difficult vs. nearly or completely impossible.
  • The psABI document did not contain an atomics ABI. The atomics implementations in gcc and llvm were inconsistent and incompatible until now, so there was no real stable ABI. The Table A.6 mapping in the architecture spec was the closest to one, and I assume the one that's under discussion here.
  • There appears to be a consensus that we need essentially the table A.7 mapping to get good portable performance in the future. I particularly want to see as little overhead as possible for sequentially consistent and acquire loads. Commonly used idioms like double-checked locking rely on that. And it is already quite common to code around performance issues in this area with ugly hacks. Thus I don't want to regress from the current situation on x86 and ARM where these operations are commonly cheap, and rapidly getting more so. (There exist implementations of other architectures on which "fence rw,rw" is close to free, and there is no need to move to A.7. But that is nowhere near true on most existing mainstream systems that care about ABIs.)
  • The RVWMO work in the architecture spec also considered A.6 to A.7 compatibility, and partially addressed it, but seems to have overlooked the issue addressed here (?) (I wasn't part of that effort, so I'm guessing.) Thus Table A.6 and A.7 in the architecture spec are incompatible. This proposes a minimal fix and otherwise adheres to A.6. Without this fix, an eventual change to A.7 would be far more difficult. And it's hard to see how it would avoid significantly more breakage.
  • This "strengthened A.6" mapping here is compatible with "A.6 classic". (Except for the aqrl -> aq optimization, which I suggested delaying where it matters.) If an existing implementation moves from "A.6 classic" to "strengthened A.6" to A.7, the only problem is if "A.6 classic" code still coexists with A.7. There is some danger of that. It increases as we delay.

Of course, if anyone has technical disagreements with any of this, please discuss.

@hboehm hboehm force-pushed the atomics branch 2 times, most recently from 5a40656 to 9e93dd6 Compare May 12, 2023 01:13
@kito-cheng
Copy link
Collaborator

This atomic change is definitely an ABI incompatible change - since it is inconsistent with existing LLVM and GCC implementation; so one intuition solution is adding a flag to indicate the atomic implementation, e.g., adding a new flag EF_RISCV_ATOMIC_ABI_V2 to e_flags; However, the unfortunate fact is that current LLVM and GCC*1 are implemented differently, so adding a new flag doesn't help the situation IMO.

RISC-V GCC got several feedbacks in past years that indicate there might have some potential issues with atomic sequence (hard to reproduce and confirm, so I use might); however, we never fixed that since all RISC-V GCC maintainers are not an atomic expert.

That might be a little bit like a word game, but I am treating this as a bug fix for atomic stuff rather than an incompatible change from the RISC-V GCC aspect.

Unfortunately, we didn't standardize the atomic sequence before, and now it's time to standardize and fix the incompatible issue between different compiler implementations.

*1: GCC has implemented this table on the top of the trunk, and the RISC-V GCC community is considering backporting that to GCC 13.

@asb
Copy link
Collaborator

asb commented May 16, 2023

This atomic change is definitely an ABI incompatible change - since it is inconsistent with existing LLVM and GCC implementation; so one intuition solution is adding a flag to indicate the atomic implementation, e.g., adding a new flag EF_RISCV_ATOMIC_ABI_V2 to e_flags; However, the unfortunate fact is that current LLVM and GCC*1 are implemented differently, so adding a new flag doesn't help the situation IMO.

RISC-V GCC got several feedbacks in past years that indicate there might have some potential issues with atomic sequence (hard to reproduce and confirm, so I use might); however, we never fixed that since all RISC-V GCC maintainers are not an atomic expert.

That might be a little bit like a word game, but I am treating this as a bug fix for atomic stuff rather than an incompatible change from the RISC-V GCC aspect.

Unfortunately, we didn't standardize the atomic sequence before, and now it's time to standardize and fix the incompatible issue between different compiler implementations.

My understanding is that the preferred atomic sequences were effectively standardised in table A.6 of the ISA spec, though it's since been decided the psABI spec would be a better home. LLVM implemented that table, but GCC's atomics implementation predated it. I think it was hoped that GCC's atomics implementation lowering was just stronger than required, but it sounds like there's actually compatibility issues.

I defer to the experts on the merits of compatibility with the A7 table vs the cost of slightly longer instruction sequences). But this proposal seems to have involved multiple RISC-V implementers who haven't voiced concerns.

@ilovepi
Copy link
Contributor

ilovepi commented May 16, 2023

LLVM implemented that table, but GCC's atomics implementation predated it. I think it was hoped that GCC's atomics implementation lowering was just stronger than required, but it sounds like there's actually compatibility issues.

The GCC patch notes suggest that previous implementation also had some subtle correctness issues, which is why the patch was accepted as a bug fix rather than just a change to the ABI. Since they already had to break ABI w/ the previous implementation to fix the bug, the rationale seems to be that they would rather just limit future breaks. GCC's implementation was (and still is) ABI incompatible with the one in LLVM, and that is going to become a source of subtle bugs.

@hboehm
Copy link
Contributor Author

hboehm commented May 16, 2023

It's worth emphasizing that this is not an ABI break for implementations that used A.6 (since renumbered, but let's stick with that name), i.e. LLVM. This is A.6 with an additional fence to avoid a future ABI break. There appears to be a consensus that we're moving to A.7 eventually. (Since there is/was and A.7 in the ISA spec, that seems to have been anticipated.) A.6 to A.7 is unavoidably an ABI break. The goal of this proposal is to minimize its impact.

This version minimizes the impact of that unavoidable ABI break in two ways:

  1. It avoids it for environments that haven't yet shipped to production, and minimizes it for those ramping up.
  2. It gives us a much longer transition period, making the transition feasible for systems currently using A.6. A.6 is compatible with this proposal, and this proposal is compatible with A.7. We just need to make sure that A.6 code doesn't mix with A.7 code.

@jrtc27
Copy link
Collaborator

jrtc27 commented May 16, 2023

I'll just point out that:

@ilovepi
Copy link
Contributor

ilovepi commented May 16, 2023

It's worth emphasizing that this is not an ABI break for implementations that used A.6 (since renumbered, but let's stick with that name), i.e. LLVM. This is A.6 with an additional fence to avoid a future ABI break. There appears to be a consensus that we're moving to A.7 eventually. (Since there is/was and A.7 in the ISA spec, that seems to have been anticipated.) A.6 to A.7 is unavoidably an ABI break. The goal of this proposal is to minimize its impact.

This version minimizes the impact of that unavoidable ABI break in two ways:

  1. It avoids it for environments that haven't yet shipped to production, and minimizes it for those ramping up.
  2. It gives us a much longer transition period, making the transition feasible for systems currently using A.6. A.6 is compatible with this proposal, and this proposal is compatible with A.7. We just need to make sure that A.6 code doesn't mix with A.7 code.

Thanks for the correction. For some reason I was under the impression that the stronger ordering w/ the additional fence may have introduced a potential inconsistency/ source of bugs in the atomics ABI. Given that the existing implementation isn't incompatible with this proposal, my previous concerns seem to be unfounded.

@hboehm
Copy link
Contributor Author

hboehm commented May 16, 2023

I'm not sure whether that was the intended meaning, but I would disagree that the eventual switch to the "A.7" atomics ABI is a minor performance improvement.

From my perspective, probably the most common and convincing reason for using atomics, rather than lock-based synchronization, in production code is to avoid cache contention among readers for read-mostly data structures. Most accesses require acquire-like semantics which, in standard-conforming code, rely on acquire or seq_cst code, depending on the language, etc. (Dependency ordering sometimes suffices, but can't really be used in portable code.)

Based on the ARM Cortex microbenchmark results posted in https://lists.riscv.org/g/tech-unprivileged/message/382, the current mappings are significantly off the mark compared to what could be expected from the A.7 mappings, at least based on current out-of-order ARM processors. In the seq_cst case, we're adding on the order of a dozen cycles to what's probably the most common atomics operation, and one that (in the uncontended cache hit case measured here) normally takes much less than that.

Granted, this is muddied up a bit by large hardware and application differences. But I would expect this is easily the most significant atomics performance issue we're likely to encounter, possibly outside large system LR/SC scalability issues. The fact that we're adding a penalty someplace where ARM and x86 commonly don't have much of one, is particularly painful.

@asb
Copy link
Collaborator

asb commented May 17, 2023

I'll just point out that:

* FreeBSD uses LLVM as its system toolchain

* FreeBSD has supported RISC-V since its 12.x release, but as of 13.x it is a Tier 2 architecture (only amd64 and arm64 are Tier 1) (https://www.freebsd.org/platforms/)

* For Tier 2 architectures, "the ABI should not be broken gratuitously" (https://docs.freebsd.org/en/articles/committers-guide/#_tier_2_developmental_and_niche_architectures)

* Minor performance improvements for atomics may or may not be gratuitous

Though you agree that the change in this PR isn't actually an ABI break? It just (from my perspective) makes a potential future ABI break less disruptive my providing forwards compatibility. Even if FreeBSD opted not to change to something based on the A.7 mapping, the improved compatibility with that table could still be an advantage to users compiling code for the system (if the benefit outweighs the cost of slightly longer lowerings in some cases).

@jrtc27
Copy link
Collaborator

jrtc27 commented May 17, 2023

I'll just point out that:

* FreeBSD uses LLVM as its system toolchain

* FreeBSD has supported RISC-V since its 12.x release, but as of 13.x it is a Tier 2 architecture (only amd64 and arm64 are Tier 1) (https://www.freebsd.org/platforms/)

* For Tier 2 architectures, "the ABI should not be broken gratuitously" (https://docs.freebsd.org/en/articles/committers-guide/#_tier_2_developmental_and_niche_architectures)

* Minor performance improvements for atomics may or may not be gratuitous

Though you agree that the change in this PR isn't actually an ABI break? It just (from my perspective) makes a potential future ABI break less disruptive my providing forwards compatibility. Even if FreeBSD opted not to change to something based on the A.7 mapping, the improved compatibility with that table could still be an advantage to users compiling code for the system (if the benefit outweighs the cost of slightly longer lowerings in some cases).

The proposed LLVM patches are not an ABI break, but they add pointless overhead if the ABI is not later broken to move to A.7, therefore they should only be applied for FreeBSD if FreeBSD is willing to break ABI for this.

@asb
Copy link
Collaborator

asb commented May 17, 2023

The proposed LLVM patches are not an ABI break, but they add pointless overhead if the ABI is not later broken to move to A.7, therefore they should only be applied for FreeBSD if FreeBSD is willing to break ABI for this.

I think there probably are reasons to go ahead with it even if there's no interest in a future ABI break:

  • Avoids alternate codegen paths for different architectures, which might not be a cost worth paying if the performance difference is minimal.
  • Allows more flexibility for users of the target platform, as they can compile their own userspace code with A.7 and have it be compatible with other libraries compiled against the strengthened A.6. (I suppose you could describe this as users choosing to "break" the ABI, but it's still an option they'd have they wouldn't have with the old A.6 lowerings).

What do you think is the path forwards for this proposal?

@hboehm
Copy link
Contributor Author

hboehm commented May 17, 2023

To be clear about the concern here: we're worried about FreeBSD packages that will not get recompiled between the LLVM change, and when enough load-acquire store-release capable hardware appears to make the A.7 mapping interesting? My presumption is that interval (unfortunately, in other respects) will be measured in years.

Are you worried about having to recompile those packages during those years, or not detecting the incompatibility if you miss one? I think the latter could be addressed fairly easily with some kind of ELF annotation to record ABI conformance, something that was suggested in a comment in earlier versions of this pull request. I'd welcome such an addition. I'm not enough of an ELF expert to suggest the right way to do this.

I think there are strong arguments for keeping these conventions consistent not only across compilers and languages, but also across mainstream operating systems. Otherwise not only would default compiler configurations have to vary by OS, but something like OpenJDK's JIT compiler would have to generate different code depending on which OS it's running on, so that it interacts fully correctly with platform JNI code. This does mean that e.g. gcc's mapping conventions effectively matter, even on a platform that only uses clang/llvm.

@jrtc27
Copy link
Collaborator

jrtc27 commented May 17, 2023

My concern is that adopting A.7 is a userspace ABI break, which must be suitably justified for a Tier 2 architecture in FreeBSD.

@hboehm
Copy link
Contributor Author

hboehm commented May 23, 2023

I updated the PR, as planned. The new version gives a single set of mapping tables, with multiple compatible mappings for some of the C++ constructs. Some of those mappings are marked as not implementable without new instructions, and some are marked as incompatible with "A.6 classic". Aside from the marked "A.6 classic" issues, unrestricted mix-and-match is allowed. This changes the presentation into a format that I like better. And it sounded like it might be better received here. It is not intended to change the actual described mappings, except perhaps by making some options and relationships more explicit.

@kito-cheng
Copy link
Collaborator

kito-cheng commented May 24, 2023

Created a PR for adding atomic abi flag, #385
it's alternative version than Palmer's proposal, the key difference is we merge the TSO bit, so that we could have more atomic ABI value in future.

@kito-cheng
Copy link
Collaborator

I am thinking maybe we should explicitly specify the mapping in the table and also the mapping name, because 1) easier for reference and implement for toolchain developer, 2) #385 will need a name to reference, and would be great to define within this doc.

Something like this:

[[tab:c11mappings]]
.Mappings from C/C++ primitives to RISC-V primitives
[cols="<28,<4,<12,<4",options="header",]
|===
|C/C++ Construct | Mapping | Instruction  |Notes

|Non-atomic store | All     | `s{b\|h\|w\|d}` |
.2+|`atomic_store(memory_order_release)` | A6C | `fence rw,w; s{b\|h\|w\|d}`           .2+| 1, 2 |

                                           A6F | `fence rw,w; s{b\|h\|w\|d}; fence rw,w`     
|===

And also does it possible to including the TSO mapping into the table?

@patrick-rivos
Copy link
Contributor

We currently don't have an agreed upon mapping for Ztso so I'm not sure what we would include as the TSO mappings. There was some discussion on the GCC mailing list about A.7 compatibility of Andrea Parri's proposed mappings. The mapping proposal does not include the discussed changes for A.7 compatibility.

@hboehm
Copy link
Contributor Author

hboehm commented May 24, 2023

I think we should try hard to minimize the number of "ABIs" we define here to keep it as clear as possible what new code should do, and to maximize the degree to which code interoperates.

I'm happy to define additional properties of generated code that can be mentioned in the ELF flag specification. This is kind of similar to defining additional ABIs, but it makes it clearer that the alternative conventions are not ABIs that should be exposed to users by new systems. There is (for atomics ) one ABI, and there are also legacy compatibility issues.

Ignoring TSO for now, I agree we need 2 ABI compatibility flag bits, which I would define something like the following:

Define "legacy compatible" as "conforms to this ABI and does not use Note 3 mappings". I can add that definition to this PR. Then the 4 possible values are:

  1. unknown, no compatibility guarantees at all. (E.g. old gcc-generated code.) By implication, this does not use Note 3 mappings.

  2. uses a combination of ABI compatible mappings, as defined here, and the old seq_cst store mapping without trailing fence. (Effectively A.6, current clang code) By implication, this does not use Note 3 mappings. It's free to use any other mappings here, even if they weren't originally in A.6. (I think that's currently only the store->amoswap mapping, which was mentioned before, but not actually in A.6.)

  3. "legacy compatible"

  4. May use any mapping listed here. (Or compatible mappings added in the future.)

Each one is compatible with itself. ( (0) really isn't, but that's water under the bridge, and best ignored.) (1) is compatible with (2), and (2) is compatible with (3). All other pairs, particularly (1) and (3) are incompatible. I expect gcc to (continue to) migrate (0)->(2)->(3) and clang to migrate (1)->(2)->(3).

I would hope that we do not need to extend this in the future. And, in the absence of some terrible bug, I wouldn't expect us to be able to agree on such an addition once we have a much larger number of ABI-sensitive systems deployed. I expect that we will add relevant instructions, which will add mappings to the table. Some of these will need Note 3 because they are also incompatible with A.6. Some of them may not fully follow RVWMO rules, and will need fences to hide that fact, basically following the current model for IO. But I don't expect them to introduce new kinds of atomics compatibility issues that would prevent linking to older code. Thus such new code should continue to be labelled as (3), or possibly even (2).

Future extensions may need some way to ensure that they are only used on hardware with the right kind of support, but I think that's orthogonal to what we're trying to do here?

It seems to me that we can also make TSO fit into the above framework, though I haven't followed the recent discussion there: The current table has some rows that are only valid with additional hardware extensions (in this case the load-acquire/store-release one). Similarly we can add rows to the table that are only valid with Ztso, and will thus typically involve shorter instruction sequences. (These would probably be presented as another table, but they're not logically distinct.) But whenever they are supported by hardware, they are compatible with the rest. Thus TSO does not introduce another ABI in my current sense. (Though I still think there may be some ecosystem fragmentation issue.) This would argue that the "TSO bit" is really something different from the two "atomics ABI" bits discussed above.

It sounded like the TSO mappings are still in flux? But if someone wants to point me at the consensus set, and we agree that they should go here, I can attempt that integration.

@kito-cheng
Copy link
Collaborator

@patrick-rivos

We currently don't have an agreed upon mapping for Ztso so I'm not sure what we would include as the TSO mappings. There was some discussion on the GCC mailing list about A.7 compatibility of Andrea Parri's proposed mappings. The mapping proposal does not include the discussed changes for A.7 compatibility.

Thanks for correction, I guess I have such wrong impression is because it's ratified and LLVM has landed some code gen.

@asb
Copy link
Collaborator

asb commented May 25, 2023

@patrick-rivos

We currently don't have an agreed upon mapping for Ztso so I'm not sure what we would include as the TSO mappings. There was some discussion on the GCC mailing list about A.7 compatibility of Andrea Parri's proposed mappings. The mapping proposal does not include the discussed changes for A.7 compatibility.

Thanks for correction, I guess I have such wrong impression is because it's ratified and LLVM has landed some code gen.

ztso remains behind an experimental flag, so the partial codegen changes that were merged in LLVM are very explicitly not considered stable, and are open to change as the mappings are agreed.

@kito-cheng
Copy link
Collaborator

@hboehm
For A6C mapping into table:
My intention is I don't want to make A.6 (Classical) mapping implementation (e.g. existing LLVM stable releases) is look like ABI noncompliance, but I agree minimize the number of "ABIs", so I think we could mention the A.6 (Classical) when we describe those flags.

For TSO:
Yeah, let's skip that for now, and integrate into this table once it settle down.

For flags:
Will update #385, and that move the flag related discussion their.

@asb @jrtc27 want to make sure again it looks good to you from the view of LLVM / FreeBSD.

@asb
Copy link
Collaborator

asb commented Jun 2, 2023

@asb @jrtc27 want to make sure again it looks good to you from the view of LLVM / FreeBSD.

The main thing I'm checking from an LLVM perspective is that the text concerning the compatibility with the old a.6 table (and hence the compatibility with the current LLVM implementation) is sufficiently clear. Looks good to me.

@enh-google
Copy link

@hboehm For A6C mapping into table: My intention is I don't want to make A.6 (Classical) mapping implementation (e.g. existing LLVM stable releases) is look like ABI noncompliance, but I agree minimize the number of "ABIs", so I think we could mention the A.6 (Classical) when we describe those flags.

do you have a specific suggestion for the text here, so we can move forward with the llvm change?

@kito-cheng
Copy link
Collaborator

do you have a specific suggestion for the text here, so we can move forward with the llvm change?

I am OK with current text, just waiting ack from LLVM land, and seems asb is OK, so let we moving forward :)

Copy link
Collaborator

@kito-cheng kito-cheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LLVM maintainer is acked, and also GCC community also no objection, so gonna merge this :)

@kito-cheng kito-cheng merged commit 9401f64 into riscv-non-isa:master Jun 7, 2023
@enh-google
Copy link

awesome, thanks! (timed well to match our preliminary ABI announcement at the risc-v summit in barcelona today :-) )

ilovepi added a commit to llvm/llvm-project that referenced this pull request Jun 22, 2023
This is a similar change to one proposed for GCC:
https://inbox.sourceware.org/gcc-patches/[email protected]/

The changes in this patch are based on the proposal by Hans Boehm to more
closely match the intended semantics for sequentially consistent stores
and to allow some platforms to avoid an ABI break when switching to more
performant atomic instructions. Platforms that have already compiled
code using the existing mappings will also have more time to gradually
replace that code in preparation of the switch.

Further details can be found in the psABI proposal:
riscv-non-isa/riscv-elf-psabi-doc#378.

This patch implements a mapping that is stronger than the one outlined in table
A.6 of the RISC-V unprivileged spec to be future compatible with table A.7 of
the same document. The related discussion can be found at
https://lists.riscv.org/g/tech-unprivileged/topic/risc_v_memory_model_topics/92916241

The major change to RISC-V code generation is that we will now emit a trailing
fence for sequentially consistent stores.

The new code sequence should have the following form:
```
fence rw,w; s{b|h|w|d}; fence rw,rw;
```

Other changes and optimizations like using amoswap will be handled separately.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D149486
Chenyang-L pushed a commit to intel/llvm that referenced this pull request Jul 11, 2023
This is a similar change to one proposed for GCC:
https://inbox.sourceware.org/gcc-patches/[email protected]/

The changes in this patch are based on the proposal by Hans Boehm to more
closely match the intended semantics for sequentially consistent stores
and to allow some platforms to avoid an ABI break when switching to more
performant atomic instructions. Platforms that have already compiled
code using the existing mappings will also have more time to gradually
replace that code in preparation of the switch.

Further details can be found in the psABI proposal:
riscv-non-isa/riscv-elf-psabi-doc#378.

This patch implements a mapping that is stronger than the one outlined in table
A.6 of the RISC-V unprivileged spec to be future compatible with table A.7 of
the same document. The related discussion can be found at
https://lists.riscv.org/g/tech-unprivileged/topic/risc_v_memory_model_topics/92916241

The major change to RISC-V code generation is that we will now emit a trailing
fence for sequentially consistent stores.

The new code sequence should have the following form:
```
fence rw,w; s{b|h|w|d}; fence rw,rw;
```

Other changes and optimizations like using amoswap will be handled separately.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D149486
veselypeta pushed a commit to veselypeta/cherillvm that referenced this pull request Aug 30, 2024
This is a similar change to one proposed for GCC:
https://inbox.sourceware.org/gcc-patches/[email protected]/

The changes in this patch are based on the proposal by Hans Boehm to more
closely match the intended semantics for sequentially consistent stores
and to allow some platforms to avoid an ABI break when switching to more
performant atomic instructions. Platforms that have already compiled
code using the existing mappings will also have more time to gradually
replace that code in preparation of the switch.

Further details can be found in the psABI proposal:
riscv-non-isa/riscv-elf-psabi-doc#378.

This patch implements a mapping that is stronger than the one outlined in table
A.6 of the RISC-V unprivileged spec to be future compatible with table A.7 of
the same document. The related discussion can be found at
https://lists.riscv.org/g/tech-unprivileged/topic/risc_v_memory_model_topics/92916241

The major change to RISC-V code generation is that we will now emit a trailing
fence for sequentially consistent stores.

The new code sequence should have the following form:
```
fence rw,w; s{b|h|w|d}; fence rw,rw;
```

Other changes and optimizations like using amoswap will be handled separately.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D149486
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants