-
Notifications
You must be signed in to change notification settings - Fork 194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
static linkers (lld and GNU ld) out of sync with aaelf64 for GOT relocations with addends. #217
Comments
On FreeBSD and NetBSD we don't use .weak due to differing semantics. Currently we end up using no directive, which gives a local symbol, whereas the closer thing to a weak symbol would be a global one. In particular, both GNU and LLVM toolchains cannot handle a GOT-indirect reference to a local symbol at a non-zero offset within a section on AArch64 (see ARM-software/abi-aa#217), and so interceptors do not work on FreeBSD/arm64, failing to link with LLD. Switching to .globl both works around this bug and more closely aligns such non-weak platforms with weak ones. Fixes #63418 Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D158552
On FreeBSD and NetBSD we don't use .weak due to differing semantics. Currently we end up using no directive, which gives a local symbol, whereas the closer thing to a weak symbol would be a global one. In particular, both GNU and LLVM toolchains cannot handle a GOT-indirect reference to a local symbol at a non-zero offset within a section on AArch64 (see ARM-software/abi-aa#217), and so interceptors do not work on FreeBSD/arm64, failing to link with LLD. Switching to .globl both works around this bug and more closely aligns such non-weak platforms with weak ones. Fixes llvm/llvm-project#63418 Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D158552 (cherry picked from commit 7e1afab1b1821550c5f8d0d6a50636236fa02e2c)
…relocations Assemblers change certain relocations referencing a local symbol to reference the section symbol instead. This conversion is disabled for many conditions (`shouldRelocateWithSymbol`), e.g. TLS symbol, for most targets (including AArch32, x86, PowerPC, and RISC-V) GOT-generating relocations. However, AArch64 encodes the GOT-generating intent in MCValue::RefKind instead of MCSymbolRef::Kind (see commit 0999cbd (2014)), therefore not affected by the code `case MCSymbolRefExpr::VK_GOT:`. Therefore, GOT relocations referencing two local symbols may share the same GOT entry after linking (GNU ld, ld.lld), which is not expected: ``` ldr x1, [x1, :got_lo12:x] // converted to .data+0 ldr x1, [x1, :got_lo12:y] // converted to .data+4 .data // .globl x, y would suppress STT_SECTION conversion x: .zero 4 y: .long 42 ``` This patch changes AArch64 to suppress local symbol to STT_SECTION conversion for GOT relocations, matching most other targets. x and y will use different GOT entries, which IMO is the most sensable behavior. With this change, the ABI decision on ARM-software/abi-aa#217 will only affect relocations explicitly referencing STT_SECTION symbols, e.g. ``` ldr x1, [x1, :got_lo12:(.data+0)] ldr x1, [x1, :got_lo12:(.data+4)] // I consider this unreasonable uses ``` IMO all reasonable use cases are unaffected. Link: llvm#63418 GNU assembler PR: https://sourceware.org/bugzilla/show_bug.cgi?id=30788 Differential Revision: https://reviews.llvm.org/D158577
I consider this assembler issues specifically for AArch64. GNU assembler and LLVM integrated assembler for most other targets suppress local symbol to STT_SECTION conversion for GOT relocations. This is neglected likely because compilers don't emit such constructs. llvm/llvm-project#63418 is found due to inline assembly uses.
Agreed that GNU ld and lld don't comply to the ABI when generating GOT entries. If we fix GNU assembler and LLVM integrated assembler to suppress STT_SECTION conversion (https://reviews.llvm.org/D158577 and https://sourceware.org/bugzilla/show_bug.cgi?id=30788), I believe this ABI change ( ldr x1, [x1, :got_lo12:x]
ldr x1, [x1, :got_lo12:y]
// affected by this change but I do not recommend this use case.
// x86-64 maintainer considers a similar case reasonable: https://sourceware.org/bugzilla/show_bug.cgi?id=26939
ldr x1, [x1, :got_lo12:(x+8)]
// affected by this change but I do not recommend this use case.
ldr x1, [x1, :got_lo12:(.data+0)]
ldr x1, [x1, :got_lo12:(.data+8)]
.data
x:
.zero 4
y:
.zero 8
Switching from If we change assemblers as mentioned, whether or not the ABI requires |
On FreeBSD and NetBSD we don't use .weak due to differing semantics. Currently we end up using no directive, which gives a local symbol, whereas the closer thing to a weak symbol would be a global one. In particular, both GNU and LLVM toolchains cannot handle a GOT-indirect reference to a local symbol at a non-zero offset within a section on AArch64 (see ARM-software/abi-aa#217), and so interceptors do not work on FreeBSD/arm64, failing to link with LLD. Switching to .globl both works around this bug and more closely aligns such non-weak platforms with weak ones. Fixes llvm/llvm-project#63418 Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D158552 (cherry picked from commit 7e1afab1b1821550c5f8d0d6a50636236fa02e2c)
…relocations Assemblers change certain relocations referencing a local symbol to reference the section symbol instead. This conversion is disabled for many conditions (`shouldRelocateWithSymbol`), e.g. TLS symbol, for most targets (including AArch32, x86, PowerPC, and RISC-V) GOT-generating relocations. However, AArch64 encodes the GOT-generating intent in MCValue::RefKind instead of MCSymbolRef::Kind (see commit 0999cbd (2014)), therefore not affected by the code `case MCSymbolRefExpr::VK_GOT:`. As GNU ld and ld.lld create GOT entries based on the symbol, ignoring addend, the two ldr instructions will share the same GOT entry, which is not expected: ``` ldr x1, [x1, :got_lo12:x] // converted to .data+0 ldr x1, [x1, :got_lo12:y] // converted to .data+4 .data // .globl x, y would suppress STT_SECTION conversion x: .zero 4 y: .long 42 ``` This patch changes AArch64 to suppress local symbol to STT_SECTION conversion for GOT relocations, matching most other targets. x and y will use different GOT entries, which IMO is the most sensable behavior. With this change, the ABI decision on ARM-software/abi-aa#217 will only affect relocations explicitly referencing STT_SECTION symbols, e.g. ``` ldr x1, [x1, :got_lo12:(.data+0)] ldr x1, [x1, :got_lo12:(.data+4)] // I consider this unreasonable uses ``` IMO all reasonable use cases are unaffected. Link: #63418 GNU assembler PR: https://sourceware.org/bugzilla/show_bug.cgi?id=30788 Reviewed By: peter.smith Differential Revision: https://reviews.llvm.org/D158577
On FreeBSD and NetBSD we don't use .weak due to differing semantics. Currently we end up using no directive, which gives a local symbol, whereas the closer thing to a weak symbol would be a global one. In particular, both GNU and LLVM toolchains cannot handle a GOT-indirect reference to a local symbol at a non-zero offset within a section on AArch64 (see ARM-software/abi-aa#217), and so interceptors do not work on FreeBSD/arm64, failing to link with LLD. Switching to .globl both works around this bug and more closely aligns such non-weak platforms with weak ones. Fixes llvm#63418 Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D158552
On FreeBSD and NetBSD we don't use .weak due to differing semantics. Currently we end up using no directive, which gives a local symbol, whereas the closer thing to a weak symbol would be a global one. In particular, both GNU and LLVM toolchains cannot handle a GOT-indirect reference to a local symbol at a non-zero offset within a section on AArch64 (see ARM-software/abi-aa#217), and so interceptors do not work on FreeBSD/arm64, failing to link with LLD. Switching to .globl both works around this bug and more closely aligns such non-weak platforms with weak ones. Fixes llvm#63418 Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D158552
On FreeBSD and NetBSD we don't use .weak due to differing semantics. Currently we end up using no directive, which gives a local symbol, whereas the closer thing to a weak symbol would be a global one. In particular, both GNU and LLVM toolchains cannot handle a GOT-indirect reference to a local symbol at a non-zero offset within a section on AArch64 (see ARM-software/abi-aa#217), and so interceptors do not work on FreeBSD/arm64, failing to link with LLD. Switching to .globl both works around this bug and more closely aligns such non-weak platforms with weak ones. Fixes llvm#63418 Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D158552
On FreeBSD and NetBSD we don't use .weak due to differing semantics. Currently we end up using no directive, which gives a local symbol, whereas the closer thing to a weak symbol would be a global one. In particular, both GNU and LLVM toolchains cannot handle a GOT-indirect reference to a local symbol at a non-zero offset within a section on AArch64 (see ARM-software/abi-aa#217), and so interceptors do not work on FreeBSD/arm64, failing to link with LLD. Switching to .globl both works around this bug and more closely aligns such non-weak platforms with weak ones. Fixes llvm#63418 Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D158552
On FreeBSD and NetBSD we don't use .weak due to differing semantics. Currently we end up using no directive, which gives a local symbol, whereas the closer thing to a weak symbol would be a global one. In particular, both GNU and LLVM toolchains cannot handle a GOT-indirect reference to a local symbol at a non-zero offset within a section on AArch64 (see ARM-software/abi-aa#217), and so interceptors do not work on FreeBSD/arm64, failing to link with LLD. Switching to .globl both works around this bug and more closely aligns such non-weak platforms with weak ones. Fixes llvm#63418 Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D158552
On FreeBSD and NetBSD we don't use .weak due to differing semantics. Currently we end up using no directive, which gives a local symbol, whereas the closer thing to a weak symbol would be a global one. In particular, both GNU and LLVM toolchains cannot handle a GOT-indirect reference to a local symbol at a non-zero offset within a section on AArch64 (see ARM-software/abi-aa#217), and so interceptors do not work on FreeBSD/arm64, failing to link with LLD. Switching to .globl both works around this bug and more closely aligns such non-weak platforms with weak ones. Fixes llvm#63418 Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D158552
On FreeBSD and NetBSD we don't use .weak due to differing semantics. Currently we end up using no directive, which gives a local symbol, whereas the closer thing to a weak symbol would be a global one. In particular, both GNU and LLVM toolchains cannot handle a GOT-indirect reference to a local symbol at a non-zero offset within a section on AArch64 (see ARM-software/abi-aa#217), and so interceptors do not work on FreeBSD/arm64, failing to link with LLD. Switching to .globl both works around this bug and more closely aligns such non-weak platforms with weak ones. Fixes llvm#63418 Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D158552
The GDAT(S + A) relocation operation requires a static linker to create a GOT entry for (S + A). Requiring at least one GOT entry for each unique tuple (S, A). Unfortunately no known static linker has implemented this correctly, with one of two forms being implemented instead: * GDAT(S) with the addend ignored. * GDAT(S) + A with a single GOT entry per S, and A added to the value of GDAT(S). These implementations are correct and consistent only for an addend (A) of zero. No known compiler uses non-zero addends in relocations that use the GDAT(S+A) operation, although it is possible to generate them using assembly language. This change synchronizes the ABI with the behavior of existing static linker implementations. The benefit of permitting code generators [*] to use a non zero addend in GDAT(S + A) is judged to be lower than implementing GDAT(S + A) correctly in existing static linkers, many of which assume that there is a single GOT entry per unique symbol S. It is QoI whether a static linker gives an error if a non zero addend is used for a relocation that uses the GDAT(S) operation. Fixes ARM-software#217 Also resolves ARM-software#247 [*] The most common use case for a non-zero addend is in constructing a C++ object with a vtable. The first two entries in the vtable are the offset to top and a pointer to RTTI, the vtable pointer in the object starts at offset 0x10. This offset can be encoded in the relocation addend. We would save an add instruction for each construction of a C++ object with a vtable if addends were permitted.
I've submitted #272 which removes A from the GDAT(S + A) relocation operation. |
Raised due to LLVM issue llvm/llvm-project#63418
Raising this as an ABI issue rather than a two separate toolchain issues as it may be simpler to amend the ABI than try and fix the tools.
When doing a
:got:
and:got_lo12:
expression to a local symbol, the GNU and LLVM assemblers convert this to a GOT generating relocation to the section symbol + addend.For example:
has the following relocations.
The ABI https://github.com/ARM-software/abi-aa/blob/main/aaelf64/aaelf64.rst#576static-aarch64-relocations search for GOT-relative instruction relocations
Defines these to be:
With
GNU ld and lld are mishandling this with LLD using something like:
With GNU ld.bfd appearing to ignore the addend A completely
We could say that this is a pair of toolchain bugs, however for LLD in particular some work would need to be done to generate separate GOT slots for S+A rather than just S. This might proove difficult to get accepted for a corner case just for AArch64. While I don't know for certain I expect GNU ld will have a similar problem.
It may be worth altering the ABI to not permit addends, or have them implemented in a way that GNU ld and LLD can both implement. We could then alter the assembler to not use symbol + addend forms for a local symbol.
If we determine that the ABI is correct we can raise toolchain issues and close this as won't fix.
The text was updated successfully, but these errors were encountered: