-
Notifications
You must be signed in to change notification settings - Fork 165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add R_RISCV_SET_ULEB128 and R_RISCV_SUB_ULEB128. #124
Conversation
In order to implement uleb128 subtraction, we need two relocations for the linker to re-calculate the value after relaxing.
Can you give some examples where |
This seems good to me :) I think this should be similar to the For GNU linker, we can use the existed APIs to get the length of uleb128, also can read and write the uleb128 value. For linker relocation, the key point may be that we can not reduce the code size when relocating, so even if the value For now we can fix the correct value for some data directives (.byte, .word, ...) after relaxations. So I think it would be nice if we can also handle the .uleb128 data directives correctly after relaxations :) Best and Regards |
How does the assembler represent When the linker sees |
riscv-elf.md
Outdated
@@ -904,4 +906,4 @@ wint_t | 4 | 4 | |||
The following definitions apply for all ABIs defined in this document. Here | |||
there is no differentiation between ILP32 and LP64 abis. | |||
|
|||
`wchar_t` is signed. `wint_t` is unsigned. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't actually figure out what the diff is here, but presumably it's accidental.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a "newline at the end of file" change, I think. Either adding or removing it.
Fangrui Song asked for an example. Consider this testcase In the object file there are pairs of set/sub relocations in the patched output. The use of uleb here then results in a smaller executable. Without the binutils patch I get On the other hand, object files get larger, because there are more relocations, and linker relaxation is going to be slower, as there is more stuff to relax. |
Now that I've reviewed Kuan-Lin's binutils patch, I see that there is no actual relaxation of the uleb128 values. The uleb128 size is set at assembler time based on assembly time values, and there is an assumption that values can only decrease at link time due to relaxation. If a value does decrease enough to reduce the uleb128 size, then we add leading zeros to the value to ensure that the uleb128 size doesn't change. This trick only works for uleb128, not sleb128. But gcc only uses leb128 subtraction for pointer values which are unsigned, so all we need is uleb128 support. Kuan-Lin's patch does require that there are matching SET_ULEB128 and SUB_ULEB128 relocs, in either order, and the linker gives an error if it sees one but not the other. The assumption that the uleb128 address size can only decrease with relaxation is true only if both addresses are in the same section. If we are subtracting a text section address from a data section address, then that value could increase due to alignment padding between sections/segments. So I think these relocs should only be allowed when the symbols are in the same section. I think the binutils patch should be fixed to enforce this. If Kuan-Lin's binutils patch goes in, gcc will automatically start emitting code using the new relocs due to a configure test. I don't want gcc to emit code by default that won't work with LLVM, so I think we should have buy in from LLVM folks before the binutils patch goes in. |
I have a post or two on the binutils mailing list about this, but the current binutils approach isn't sufficient. There's at least two issues I know of:
IIRC there's examples on the mailing list. While I don't think there's anything fundamentally wrong with the ABI proposed here, we need at least one valid implementation (and ideally an implementation in both GCC and LLVM) for me to be happy taking it. |
I don't see an alignment problem, because the size is fixed at assembly time. Maybe you are confusing this with the separate .p2align patch for binutils? Or maybe the issue here is that if someone puts a .uleb128 in the text section between instructions then we have a potential problem? I'd argue that people shouldn't do that, and we can warn/error for that, or we could disable the use of the set/sub relocs in that unusual case, or we could require an explicit align after the uleb128 directives.. Or maybe you are saying if we later add relaxation we have a potential problem? I didn't see a problem with uleb128 symbol when running the gcc testsuite, but that is no guarantee that there is no problem. I will have to recheck your email again. |
No, the .byte stuff is fine because the last bit of the instruction alignment is always statically known. The problem is trying to relax LEB128s, at which point we don't statically know that last alignment bit. The issue here is that we're trying to maintain two constraints: that instructions are two-byte aligned before linking (so they can be disassembled), and that instructions are at least two-byte aligned after linking. Imagine we had the following code, in rv64:
We'd need to emit a 10-byte sequence (presumably full invalid instructions) for the ULEB, as the address may be that big (it's rv64). We could probably cobble something together in the disassembler to make this work, but IIRC there were some other issues with the "round up to the next power of two" behavior of R_RISCV_ALIGN related to mixing RVC and no-RVC code. IIRC there were ways to fix those issues, but I don't think we ever got around to it. Instead, I think we should introduce some sort of "R_RISCV_ALIGN_DOWN" relocation that aligns to the next lower power of two. This would allow us to relax ULEBs, but also allow us to avoid the correctness requirement around handling R_RISCV_ALIGN -- essentially the idea would be to keep the alignment correct before linking, while still allowing enough bytes to align after linking. That would allow simple linkers (ie, Linux's module loader) to ignore R_RISCV_ALIGN_DOWN, which would avoid baking "this loader doesn't support relaxation" into the ABI. |
According to Kuan-Lin's binutils patch, @@ -1512,6 +1532,25 @@ perform_relocation (const reloc_howto_type *howto, case R_RISCV_SET_ULEB128: _bfd_read_unsigned_leb128 (input_bfd, contents + rel->r_offset, &len); /* Clean the contents value to zero. Do not reduce the length. */ At first, I'm wondering why we can ensure the size of uleb128 by adding only one leading zeros (0x80). It is reasonable, so I just record the reason here, in case someone has the same problem as me. Consider the following example, .text The value of |
Hi @kuanlinchentw:
|
I think the format |
I agree with this. I'll fix it in the patch. |
Yes, the original motivation of ULEB128 is for code size reduction.
I think .gcc_execption_table and some dwarf sections use ULEB128 most ofter.
It only need to implement in assembler and ld.bfd+lld, because the subtraction value is fixed after linking. |
@asb FYI https://sourceware.org/ml/binutils/2019-12/msg00024.html, the another patch about the alignment between code and date in same section. |
Apologies for making a comment without reading all previous discussions. I am a bit concerned about a relocation type without a clear width. See http://sourceware.org/PR4029 for an example that oscillating .uleb128 caused a bug. I don't really read through Alan Modra's workaround but I think it is about overaligning the label immediately preceding Type Tables. It works before type tables are accessed by adding a negative offset to TTBase. |
In order to implement uleb128 subtraction, we need two relocations for the
linker to re-calculate the value after relaxing.
Please reference the commit: https://sourceware.org/ml/binutils/2019-11/msg00393.html