-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AArch64: Better instruction sequence for constant move #1278
Conversation
Hi Pengfei. This patch looks fine to me (it looks just like the code in the OpenJDK AArch64 assembler :-) |
Can you verify that the correct annotations are still applied for subsequent ELF relocations in the finally native-image? |
Hi Bob, "Can you verify that the correct annotations are still applied for subsequent ELF relocations in the finally native-image?" Tht's an interesting question but I think it is missing something. If that was a problem after this patch then it ought to have been a problem before. The old code was not guaranteed always to generate 4 instruction move sequences. It might be that the old code got away with it because anything that needed a relocation always ended up with the same number of instructions but that would just be luck. In the OpenJDK code a routine that always emits 3 moves (well used to be 3 but I think ZGC now requires it to be 4) is used when moving a code or metadata address constant into a register. That ensures the value can be patched without a problem if the target gets moved. So, if the ELF reloc code needs a fixed size encoding then I think it will be necessary to do something similar i.e. provide two back end constant encoding options (fixed and variable insn count) and ensure that the correct one is invoked at the point where the constant is encoded according as to whether the value may potentially be relocated. |
The annotations included a “numOfInstrs” to accommodate 1 to 4 mov* sequences. Just wanted to make sure all of these still result in appropriate annotations. |
Okay, so the concern exists: There, the patching annotation assumes Likewise, where the new The annotating method is: Where the This annotation is later used to lay down the approprate ELF relocations for 1-4 Without adjusting at least the 4-instruction assumption of |
Of course, this may all be irrelevant if the other variants are never used for relocatable addresses. The 4-instruction relocation tends to be applied to a true 64-bit placeholder, but sometimes also |
The previous and |
Hi Bob, As you say, movNativeAddress definitely always generates 3 instructions and annotates accordingly. However, mov64 avoids generation of a movz/k for 16-bit chunks that are zero (the relevant code is the if (..) { continue } block inside the loop). So, in theory it might only encode a single movz instruction, say if it were passed the constant 0xabcd_0000_0000_0000, or a movz and movk pair if passed constant 0xabcd_0000_1234. I suspect you have never seen anything other than 4 insn sequences because in most cases constants are employed as arguments to arithmetic or logic operations and these can be encoded as immediates. There is a complex scheme for encoding such immediates which covers most of the commonly found cases, not just small integers. However, I am certain that it is possible to construct cases where the constant cannto be encoded as an immediate and hence where you might see move64 encode a 3 insn sequence and perhaps even a 2 insn sequence. So, it looks like it will be necessary to track the number of insns generated by mov64 and move the annotation generation into it as per movNativeAddress. |
Ah, I see the I'll make the annotation adjusted after this PR gets merged in. fwiw, I did have to muck about with the larger-than-inlinable constants that Graal puts significantly further away than hotspot and the nearby constant table for floats/doubles. |
Thanks for your review. I took a look at all usages of
OK. I will remember to check it after this PR merged. |
No, that is not the reason. The different handling of addresses and constants represents an historic split. Firstly, numeric constants may be up to 64 bits. At present addresses are only up to 48 bits (although ZGC may require them to grow to 64 bits). Secondly, On OpenJDK only addresses needed to be 'relocatable' i.e. able to be patched at runtime. So, they were always generated to leave space for the maximum possible number of needed instructions. Constant loads never got patched. When generating native image ELF files (shared lbs or executables) loads for both constants and addresses need to be ELF-relocatable, which means the number of instructions used for the load needs to be known. Address loads still need to be runtime-patchable. It is just luck that all constant relocs so far generated happen to have required 4 instructions. n.b. this is not an issue on x86_64 where the address/constant data is embedded into a single (variable length) load instruction. |
Since this is lingering, and I was cleaning up some the assembler anyhow, I've gone ahead and adjusted the annotations to hopefully be more correct. That being said, my adjustments are also on a PR, so one of us will probably have to rebase at some point. |
Code generation of AArch64 constant move is optimized in this patch. Original implementation uses MOVZ with MOVK(s) to build all non-zero 16-bit chunks in 32- or 64-bit constants. E.g. Below code is generated for long integer constant -200000L. mov x1, #0xf2c0 movk x1, #0xfffc, lsl #16 movk x1, #0xffff, lsl #32 movk x1, #0xffff, lsl #48 After this patch, we generate a variable length of instruction sequence containing MOVZ, MOVN with MOVK(s), based on the value of the constant. Generated assembly in above example is optimized to below. movn x1, #0xd3f movk x1, #0xfffc, lsl #16 We also renamed AArch64MacroAssemblerTest to AArch64AddressModeTest and annotateImm to needsImmAnnotation in this patch.
Hi, I have rebased my PR with the immediate annotation addressed. May I have a review again? Thanks. |
Hi Pengfei, That patch looks ok to me now. |
Thanks Andrew. Anyone from Oracle could help look at it? |
@sanzinger can you please review and integrate this if it looks good. |
@sanzinger Could you help look at this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
It looks like this patch has still yet to be merged. Is this due to an oversight, or is there some underlying reason why this patch hasn't been added? |
Looks good to me. I am working to get this merged. |
Code generation of AArch64 constant move is optimized in this patch.
Original implementation uses one MOVZ with one or three MOVK(s) to build
all kinds of 32- or 64-bit constants, respectively. E.g. Below code is
generated for long integer constant -200000L.
After this patch, we generate a variable length of instruction sequence
containing MOVZ, MOVN with MOVK(s), based on the value of the constant.
Generated assembly in above example is optimized to below.
We also renamed AArch64MacroAssemblerTest to AArch64AddressModeTest in
this patch.