AArch64: Better instruction sequence for constant move #1278

pfustc · 2019-05-14T08:37:17Z

Code generation of AArch64 constant move is optimized in this patch.
Original implementation uses one MOVZ with one or three MOVK(s) to build
all kinds of 32- or 64-bit constants, respectively. E.g. Below code is
generated for long integer constant -200000L.

  mov     x1, #0xf2c0
  movk    x1, #0xfffc, lsl #16
  movk    x1, #0xffff, lsl #32
  movk    x1, #0xffff, lsl #48

After this patch, we generate a variable length of instruction sequence
containing MOVZ, MOVN with MOVK(s), based on the value of the constant.
Generated assembly in above example is optimized to below.

  movn    x1, #0xd3f
  movk    x1, #0xfffc, lsl #16

We also renamed AArch64MacroAssemblerTest to AArch64AddressModeTest in
this patch.

adinn · 2019-05-14T09:03:41Z

Hi Pengfei. This patch looks fine to me

(it looks just like the code in the OpenJDK AArch64 assembler :-)

bobmcwhirter · 2019-05-14T10:36:06Z

Can you verify that the correct annotations are still applied for subsequent ELF relocations in the finally native-image?

adinn · 2019-05-14T10:57:03Z

Hi Bob,

"Can you verify that the correct annotations are still applied for subsequent ELF relocations in the finally native-image?"

Tht's an interesting question but I think it is missing something. If that was a problem after this patch then it ought to have been a problem before. The old code was not guaranteed always to generate 4 instruction move sequences. It might be that the old code got away with it because anything that needed a relocation always ended up with the same number of instructions but that would just be luck.

In the OpenJDK code a routine that always emits 3 moves (well used to be 3 but I think ZGC now requires it to be 4) is used when moving a code or metadata address constant into a register. That ensures the value can be patched without a problem if the target gets moved.

So, if the ELF reloc code needs a fixed size encoding then I think it will be necessary to do something similar i.e. provide two back end constant encoding options (fixed and variable insn count) and ensure that the correct one is invoked at the point where the constant is encoded according as to whether the value may potentially be relocated.

bobmcwhirter · 2019-05-14T11:28:46Z

The annotations included a “numOfInstrs” to accommodate 1 to 4 mov* sequences. Just wanted to make sure all of these still result in appropriate annotations.

bobmcwhirter · 2019-05-14T14:15:22Z

Okay, so the concern exists:

https://github.com/oracle/graal/blob/d8961a21e4d4127e61f740dae80820c1264d7067/compiler/src/org.graalvm.compiler.asm.aarch64/src/org/graalvm/compiler/asm/aarch64/AArch64MacroAssembler.java#L509-L513

There, the patching annotation assumes mov64 is a 4-instruction sequence, which no longer holds. The annotation can be created for 1-4 instructions, so mov64 maybe should return an int denoting how many it used. Or the annotating should be relocated to inside of mov64 where that knowledge is held.

Likewise, where the new mov32 is used, it should probably also annotate.

The annotating method is:

https://github.com/oracle/graal/blob/d8961a21e4d4127e61f740dae80820c1264d7067/compiler/src/org.graalvm.compiler.asm.aarch64/src/org/graalvm/compiler/asm/aarch64/AArch64Assembler.java#L2970-L2974

Where the numInstrs parameter denotes how many mov* instructions are used.

This annotation is later used to lay down the approprate ELF relocations for 1-4 movz/movk or whatnot.

Without adjusting at least the 4-instruction assumption of mov64 there is a high probability of a new bug trampling non-mov instructions with irrelevant/wrong relocations, resulting in malformed binaries.

bobmcwhirter · 2019-05-14T14:17:47Z

Of course, this may all be irrelevant if the other variants are never used for relocatable addresses. The 4-instruction relocation tends to be applied to a true 64-bit placeholder, but sometimes also 0 if I recall (possibly wrongly).

bobmcwhirter · 2019-05-14T14:21:03Z

The previous mov64 did seem to guarantee a 4-instruction mov:

https://github.com/oracle/graal/blob/master/compiler/src/org.graalvm.compiler.asm.aarch64/src/org/graalvm/compiler/asm/aarch64/AArch64MacroAssembler.java#L344-L360

and movNativeAddress seemed to also issue 3:

https://github.com/oracle/graal/blob/master/compiler/src/org.graalvm.compiler.asm.aarch64/src/org/graalvm/compiler/asm/aarch64/AArch64MacroAssembler.java#L436-L454

adinn · 2019-05-14T15:45:56Z

Hi Bob,

As you say, movNativeAddress definitely always generates 3 instructions and annotates accordingly.

However, mov64 avoids generation of a movz/k for 16-bit chunks that are zero (the relevant code is the if (..) { continue } block inside the loop).

So, in theory it might only encode a single movz instruction, say if it were passed the constant 0xabcd_0000_0000_0000, or a movz and movk pair if passed constant 0xabcd_0000_1234.

I suspect you have never seen anything other than 4 insn sequences because in most cases constants are employed as arguments to arithmetic or logic operations and these can be encoded as immediates. There is a complex scheme for encoding such immediates which covers most of the commonly found cases, not just small integers. However, I am certain that it is possible to construct cases where the constant cannto be encoded as an immediate and hence where you might see move64 encode a 3 insn sequence and perhaps even a 2 insn sequence.

So, it looks like it will be necessary to track the number of insns generated by mov64 and move the annotation generation into it as per movNativeAddress.

bobmcwhirter · 2019-05-14T15:49:16Z

Ah, I see the continue now. Good thing that hasn't bitten us yet. :)

I'll make the annotation adjusted after this PR gets merged in.

fwiw, I did have to muck about with the larger-than-inlinable constants that Graal puts significantly further away than hotspot and the nearby constant table for floats/doubles.

pfustc · 2019-05-17T06:52:12Z

Thanks for your review. I took a look at all usages of mov64 and I didn't find it used for generating addresses. Maybe that's the reason why the incorrect annotation issue hasn't been exposed so far.

I'll make the annotation adjusted after this PR gets merged in.

OK. I will remember to check it after this PR merged.

adinn · 2019-05-17T09:24:59Z

Thanks for your review. I took a look at all usages of mov64 and I didn't find it used for generating addresses. Maybe that's the reason why the incorrect annotation issue hasn't been exposed so far.

No, that is not the reason. The different handling of addresses and constants represents an historic split. Firstly, numeric constants may be up to 64 bits. At present addresses are only up to 48 bits (although ZGC may require them to grow to 64 bits). Secondly, On OpenJDK only addresses needed to be 'relocatable' i.e. able to be patched at runtime. So, they were always generated to leave space for the maximum possible number of needed instructions. Constant loads never got patched.

When generating native image ELF files (shared lbs or executables) loads for both constants and addresses need to be ELF-relocatable, which means the number of instructions used for the load needs to be known. Address loads still need to be runtime-patchable. It is just luck that all constant relocs so far generated happen to have required 4 instructions.

n.b. this is not an issue on x86_64 where the address/constant data is embedded into a single (variable length) load instruction.

bobmcwhirter · 2019-05-21T14:59:45Z

Since this is lingering, and I was cleaning up some the assembler anyhow, I've gone ahead and adjusted the annotations to hopefully be more correct.

That being said, my adjustments are also on a PR, so one of us will probably have to rebase at some point.

#1311

Code generation of AArch64 constant move is optimized in this patch. Original implementation uses MOVZ with MOVK(s) to build all non-zero 16-bit chunks in 32- or 64-bit constants. E.g. Below code is generated for long integer constant -200000L. mov x1, #0xf2c0 movk x1, #0xfffc, lsl #16 movk x1, #0xffff, lsl #32 movk x1, #0xffff, lsl #48 After this patch, we generate a variable length of instruction sequence containing MOVZ, MOVN with MOVK(s), based on the value of the constant. Generated assembly in above example is optimized to below. movn x1, #0xd3f movk x1, #0xfffc, lsl #16 We also renamed AArch64MacroAssemblerTest to AArch64AddressModeTest and annotateImm to needsImmAnnotation in this patch.

pfustc · 2019-05-29T08:26:33Z

Hi, I have rebased my PR with the immediate annotation addressed. May I have a review again? Thanks.

adinn · 2019-05-29T08:45:26Z

Hi Pengfei,

That patch looks ok to me now.

pfustc · 2019-05-31T03:42:27Z

Thanks Andrew. Anyone from Oracle could help look at it?

dougxc · 2019-06-03T09:07:40Z

@sanzinger can you please review and integrate this if it looks good.

pfustc · 2019-07-03T10:02:14Z

@sanzinger Could you help look at this?

bobmcwhirter

LGTM

teshull · 2019-12-10T18:47:23Z

It looks like this patch has still yet to be merged. Is this due to an oversight, or is there some underlying reason why this patch hasn't been added?

sanzinger · 2019-12-20T09:43:10Z

Looks good to me. I am working to get this merged.

graalvmbot added the oca-signed label May 14, 2019

dougxc assigned sanzinger May 14, 2019

sanzinger added the compiler label May 20, 2019

bobmcwhirter approved these changes Jul 8, 2019

View reviewed changes

teshull mentioned this pull request Dec 10, 2019

Clear AArch64 PR Backlog #1964

Closed

graalvmbot merged commit 58438f9 into oracle:master Dec 20, 2019

brettwooldridge mentioned this pull request Feb 13, 2020

Support for ZGC (garbage collector) #2149

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AArch64: Better instruction sequence for constant move #1278

AArch64: Better instruction sequence for constant move #1278

pfustc commented May 14, 2019

adinn commented May 14, 2019

bobmcwhirter commented May 14, 2019

adinn commented May 14, 2019

bobmcwhirter commented May 14, 2019 •

edited

Loading

bobmcwhirter commented May 14, 2019 •

edited

Loading

bobmcwhirter commented May 14, 2019

bobmcwhirter commented May 14, 2019

adinn commented May 14, 2019

bobmcwhirter commented May 14, 2019

pfustc commented May 17, 2019

adinn commented May 17, 2019

bobmcwhirter commented May 21, 2019

pfustc commented May 29, 2019

adinn commented May 29, 2019

pfustc commented May 31, 2019

dougxc commented Jun 3, 2019

pfustc commented Jul 3, 2019

bobmcwhirter left a comment

teshull commented Dec 10, 2019

sanzinger commented Dec 20, 2019

AArch64: Better instruction sequence for constant move #1278

AArch64: Better instruction sequence for constant move #1278

Conversation

pfustc commented May 14, 2019

adinn commented May 14, 2019

bobmcwhirter commented May 14, 2019

adinn commented May 14, 2019

bobmcwhirter commented May 14, 2019 • edited Loading

bobmcwhirter commented May 14, 2019 • edited Loading

bobmcwhirter commented May 14, 2019

bobmcwhirter commented May 14, 2019

adinn commented May 14, 2019

bobmcwhirter commented May 14, 2019

pfustc commented May 17, 2019

adinn commented May 17, 2019

bobmcwhirter commented May 21, 2019

pfustc commented May 29, 2019

adinn commented May 29, 2019

pfustc commented May 31, 2019

dougxc commented Jun 3, 2019

pfustc commented Jul 3, 2019

bobmcwhirter left a comment

Choose a reason for hiding this comment

teshull commented Dec 10, 2019

sanzinger commented Dec 20, 2019

bobmcwhirter commented May 14, 2019 •

edited

Loading

bobmcwhirter commented May 14, 2019 •

edited

Loading