Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AArch64: Better instruction sequence for constant move #1278

Merged
merged 1 commit into from
Dec 20, 2019
Merged

AArch64: Better instruction sequence for constant move #1278

merged 1 commit into from
Dec 20, 2019

Conversation

pfustc
Copy link
Contributor

@pfustc pfustc commented May 14, 2019

Code generation of AArch64 constant move is optimized in this patch.
Original implementation uses one MOVZ with one or three MOVK(s) to build
all kinds of 32- or 64-bit constants, respectively. E.g. Below code is
generated for long integer constant -200000L.

  mov     x1, #0xf2c0
  movk    x1, #0xfffc, lsl #16
  movk    x1, #0xffff, lsl #32
  movk    x1, #0xffff, lsl #48

After this patch, we generate a variable length of instruction sequence
containing MOVZ, MOVN with MOVK(s), based on the value of the constant.
Generated assembly in above example is optimized to below.

  movn    x1, #0xd3f
  movk    x1, #0xfffc, lsl #16

We also renamed AArch64MacroAssemblerTest to AArch64AddressModeTest in
this patch.

@adinn
Copy link
Collaborator

adinn commented May 14, 2019

Hi Pengfei. This patch looks fine to me

(it looks just like the code in the OpenJDK AArch64 assembler :-)

@bobmcwhirter
Copy link

Can you verify that the correct annotations are still applied for subsequent ELF relocations in the finally native-image?

@adinn
Copy link
Collaborator

adinn commented May 14, 2019

Hi Bob,

"Can you verify that the correct annotations are still applied for subsequent ELF relocations in the finally native-image?"

Tht's an interesting question but I think it is missing something. If that was a problem after this patch then it ought to have been a problem before. The old code was not guaranteed always to generate 4 instruction move sequences. It might be that the old code got away with it because anything that needed a relocation always ended up with the same number of instructions but that would just be luck.

In the OpenJDK code a routine that always emits 3 moves (well used to be 3 but I think ZGC now requires it to be 4) is used when moving a code or metadata address constant into a register. That ensures the value can be patched without a problem if the target gets moved.

So, if the ELF reloc code needs a fixed size encoding then I think it will be necessary to do something similar i.e. provide two back end constant encoding options (fixed and variable insn count) and ensure that the correct one is invoked at the point where the constant is encoded according as to whether the value may potentially be relocated.

@bobmcwhirter
Copy link

bobmcwhirter commented May 14, 2019

The annotations included a “numOfInstrs” to accommodate 1 to 4 mov* sequences. Just wanted to make sure all of these still result in appropriate annotations.

@bobmcwhirter
Copy link

bobmcwhirter commented May 14, 2019

Okay, so the concern exists:

https://github.com/oracle/graal/blob/d8961a21e4d4127e61f740dae80820c1264d7067/compiler/src/org.graalvm.compiler.asm.aarch64/src/org/graalvm/compiler/asm/aarch64/AArch64MacroAssembler.java#L509-L513

There, the patching annotation assumes mov64 is a 4-instruction sequence, which no longer holds. The annotation can be created for 1-4 instructions, so mov64 maybe should return an int denoting how many it used. Or the annotating should be relocated to inside of mov64 where that knowledge is held.

Likewise, where the new mov32 is used, it should probably also annotate.

The annotating method is:

https://github.com/oracle/graal/blob/d8961a21e4d4127e61f740dae80820c1264d7067/compiler/src/org.graalvm.compiler.asm.aarch64/src/org/graalvm/compiler/asm/aarch64/AArch64Assembler.java#L2970-L2974

Where the numInstrs parameter denotes how many mov* instructions are used.

This annotation is later used to lay down the approprate ELF relocations for 1-4 movz/movk or whatnot.

Without adjusting at least the 4-instruction assumption of mov64 there is a high probability of a new bug trampling non-mov instructions with irrelevant/wrong relocations, resulting in malformed binaries.

@bobmcwhirter
Copy link

Of course, this may all be irrelevant if the other variants are never used for relocatable addresses. The 4-instruction relocation tends to be applied to a true 64-bit placeholder, but sometimes also 0 if I recall (possibly wrongly).

@adinn
Copy link
Collaborator

adinn commented May 14, 2019

Hi Bob,

As you say, movNativeAddress definitely always generates 3 instructions and annotates accordingly.

However, mov64 avoids generation of a movz/k for 16-bit chunks that are zero (the relevant code is the if (..) { continue } block inside the loop).

So, in theory it might only encode a single movz instruction, say if it were passed the constant 0xabcd_0000_0000_0000, or a movz and movk pair if passed constant 0xabcd_0000_1234.

I suspect you have never seen anything other than 4 insn sequences because in most cases constants are employed as arguments to arithmetic or logic operations and these can be encoded as immediates. There is a complex scheme for encoding such immediates which covers most of the commonly found cases, not just small integers. However, I am certain that it is possible to construct cases where the constant cannto be encoded as an immediate and hence where you might see move64 encode a 3 insn sequence and perhaps even a 2 insn sequence.

So, it looks like it will be necessary to track the number of insns generated by mov64 and move the annotation generation into it as per movNativeAddress.

@bobmcwhirter
Copy link

Ah, I see the continue now. Good thing that hasn't bitten us yet. :)

I'll make the annotation adjusted after this PR gets merged in.

fwiw, I did have to muck about with the larger-than-inlinable constants that Graal puts significantly further away than hotspot and the nearby constant table for floats/doubles.

@pfustc
Copy link
Contributor Author

pfustc commented May 17, 2019

Thanks for your review. I took a look at all usages of mov64 and I didn't find it used for generating addresses. Maybe that's the reason why the incorrect annotation issue hasn't been exposed so far.

I'll make the annotation adjusted after this PR gets merged in.

OK. I will remember to check it after this PR merged.

@adinn
Copy link
Collaborator

adinn commented May 17, 2019

Thanks for your review. I took a look at all usages of mov64 and I didn't find it used for generating addresses. Maybe that's the reason why the incorrect annotation issue hasn't been exposed so far.

No, that is not the reason. The different handling of addresses and constants represents an historic split. Firstly, numeric constants may be up to 64 bits. At present addresses are only up to 48 bits (although ZGC may require them to grow to 64 bits). Secondly, On OpenJDK only addresses needed to be 'relocatable' i.e. able to be patched at runtime. So, they were always generated to leave space for the maximum possible number of needed instructions. Constant loads never got patched.

When generating native image ELF files (shared lbs or executables) loads for both constants and addresses need to be ELF-relocatable, which means the number of instructions used for the load needs to be known. Address loads still need to be runtime-patchable. It is just luck that all constant relocs so far generated happen to have required 4 instructions.

n.b. this is not an issue on x86_64 where the address/constant data is embedded into a single (variable length) load instruction.

@bobmcwhirter
Copy link

Since this is lingering, and I was cleaning up some the assembler anyhow, I've gone ahead and adjusted the annotations to hopefully be more correct.

That being said, my adjustments are also on a PR, so one of us will probably have to rebase at some point.

#1311

Code generation of AArch64 constant move is optimized in this patch.
Original implementation uses MOVZ with MOVK(s) to build all non-zero
16-bit chunks in 32- or 64-bit constants. E.g. Below code is generated
for long integer constant -200000L.

  mov     x1, #0xf2c0
  movk    x1, #0xfffc, lsl #16
  movk    x1, #0xffff, lsl #32
  movk    x1, #0xffff, lsl #48

After this patch, we generate a variable length of instruction sequence
containing MOVZ, MOVN with MOVK(s), based on the value of the constant.
Generated assembly in above example is optimized to below.

  movn    x1, #0xd3f
  movk    x1, #0xfffc, lsl #16

We also renamed AArch64MacroAssemblerTest to AArch64AddressModeTest and
annotateImm to needsImmAnnotation in this patch.
@pfustc
Copy link
Contributor Author

pfustc commented May 29, 2019

Hi, I have rebased my PR with the immediate annotation addressed. May I have a review again? Thanks.

@adinn
Copy link
Collaborator

adinn commented May 29, 2019

Hi Pengfei,

That patch looks ok to me now.

@pfustc
Copy link
Contributor Author

pfustc commented May 31, 2019

Thanks Andrew. Anyone from Oracle could help look at it?

@dougxc
Copy link
Member

dougxc commented Jun 3, 2019

@sanzinger can you please review and integrate this if it looks good.

@pfustc
Copy link
Contributor Author

pfustc commented Jul 3, 2019

@sanzinger Could you help look at this?

Copy link

@bobmcwhirter bobmcwhirter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@teshull
Copy link
Member

teshull commented Dec 10, 2019

It looks like this patch has still yet to be merged. Is this due to an oversight, or is there some underlying reason why this patch hasn't been added?

@sanzinger
Copy link
Contributor

Looks good to me. I am working to get this merged.

@graalvmbot graalvmbot merged commit 58438f9 into oracle:master Dec 20, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants