Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AArch64][PAC] Eliminate excessive MOVs when computing blend #115185

Merged
merged 2 commits into from
Nov 27, 2024

Conversation

atrosinenko
Copy link
Contributor

@atrosinenko atrosinenko commented Nov 6, 2024

As function calls do not generally preserve X16 and X17, it is beneficial
to allow AddrDisc operand of BLRA instruction to reside in these
registers and make use of this condition when computing the discriminator.

This can save up to two MOVs in cases such as loading a (signed) virtual
function pointer via a (signed) pointer to vtable, for example

ldr   x9, [x16]
mov   x8, x16
mov   x17, x8
movk  x17, #34646, lsl #48
blraa x9, x17

can be simplified to

ldr   x8, [x16]
movk  x16, #34646, lsl #48
blraa x8, x16

@atrosinenko atrosinenko marked this pull request as ready for review November 6, 2024 17:27
@llvmbot
Copy link
Member

llvmbot commented Nov 6, 2024

@llvm/pr-subscribers-backend-aarch64

Author: Anatoly Trosinenko (atrosinenko)

Changes

As function calls do not generally preserve X16 and X17, it is beneficial
to allow AddrDisc operand of B(L)RA instruction to reside in these
registers and make use of this condition when computing the discriminator.

This can save up to two MOVs in cases such as loading a (signed) virtual
function pointer via a (signed) pointer to vtable, for example

ldr   x9, [x16]
mov   x8, x16
mov   x17, x8
movk  x17, #<!-- -->34646, lsl #<!-- -->48
blraa x9, x17

can be simplified to

ldr   x8, [x16]
movk  x16, #<!-- -->34646, lsl #<!-- -->48
blraa x8, x16

Full diff: https://github.com/llvm/llvm-project/pull/115185.diff

3 Files Affected:

  • (modified) llvm/lib/Target/AArch64/AArch64AsmPrinter.cpp (+53-41)
  • (modified) llvm/lib/Target/AArch64/AArch64InstrInfo.td (+9-9)
  • (modified) llvm/test/CodeGen/AArch64/ptrauth-call.ll (+27)
diff --git a/llvm/lib/Target/AArch64/AArch64AsmPrinter.cpp b/llvm/lib/Target/AArch64/AArch64AsmPrinter.cpp
index 436bb332053f75..3263bb38ef1fcc 100644
--- a/llvm/lib/Target/AArch64/AArch64AsmPrinter.cpp
+++ b/llvm/lib/Target/AArch64/AArch64AsmPrinter.cpp
@@ -163,8 +163,15 @@ class AArch64AsmPrinter : public AsmPrinter {
   // Emit the sequence for AUT or AUTPAC.
   void emitPtrauthAuthResign(const MachineInstr *MI);
 
-  // Emit the sequence to compute a discriminator into x17, or reuse AddrDisc.
-  unsigned emitPtrauthDiscriminator(uint16_t Disc, unsigned AddrDisc);
+  // Emit the sequence to compute the discriminator.
+  // ScratchReg should be x16/x17.
+  // The returned register is either unmodified AddrDisc or x16/x17.
+  // If the expanded pseudo is allowed to clobber AddrDisc register, setting
+  // MayUseAddrAsScratch may save one MOV instruction, provided the address
+  // is already in x16/x17.
+  Register emitPtrauthDiscriminator(uint16_t Disc, Register AddrDisc,
+                                    Register ScratchReg,
+                                    bool MayUseAddrAsScratch = false);
 
   // Emit the sequence for LOADauthptrstatic
   void LowerLOADauthptrstatic(const MachineInstr &MI);
@@ -1727,8 +1734,10 @@ void AArch64AsmPrinter::emitFMov0(const MachineInstr &MI) {
   }
 }
 
-unsigned AArch64AsmPrinter::emitPtrauthDiscriminator(uint16_t Disc,
-                                                     unsigned AddrDisc) {
+Register AArch64AsmPrinter::emitPtrauthDiscriminator(uint16_t Disc,
+                                                     Register AddrDisc,
+                                                     Register ScratchReg,
+                                                     bool MayUseAddrAsScratch) {
   // So far we've used NoRegister in pseudos.  Now we need real encodings.
   if (AddrDisc == AArch64::NoRegister)
     AddrDisc = AArch64::XZR;
@@ -1738,16 +1747,24 @@ unsigned AArch64AsmPrinter::emitPtrauthDiscriminator(uint16_t Disc,
   if (!Disc)
     return AddrDisc;
 
-  // If there's only a constant discriminator, MOV it into x17.
+  // If there's only a constant discriminator, MOV it into the scratch register.
   if (AddrDisc == AArch64::XZR) {
-    emitMOVZ(AArch64::X17, Disc, 0);
-    return AArch64::X17;
+    emitMOVZ(ScratchReg, Disc, 0);
+    return ScratchReg;
   }
 
-  // If there are both, emit a blend into x17.
-  emitMovXReg(AArch64::X17, AddrDisc);
-  emitMOVK(AArch64::X17, Disc, 48);
-  return AArch64::X17;
+  // If there are both, emit a blend into the scratch register.
+
+  // Check if we can save one MOV instruction.
+  assert(MayUseAddrAsScratch || ScratchReg != AddrDisc);
+  bool AddrDiscIsSafe = AddrDisc == AArch64::X16 || AddrDisc == AArch64::X17;
+  if (MayUseAddrAsScratch && AddrDiscIsSafe)
+    ScratchReg = AddrDisc;
+  else
+    emitMovXReg(ScratchReg, AddrDisc);
+
+  emitMOVK(ScratchReg, Disc, 48);
+  return ScratchReg;
 }
 
 /// Emits a code sequence to check an authenticated pointer value.
@@ -1964,7 +1981,8 @@ void AArch64AsmPrinter::emitPtrauthAuthResign(const MachineInstr *MI) {
 
   // Compute aut discriminator into x17
   assert(isUInt<16>(AUTDisc));
-  unsigned AUTDiscReg = emitPtrauthDiscriminator(AUTDisc, AUTAddrDisc);
+  Register AUTDiscReg =
+      emitPtrauthDiscriminator(AUTDisc, AUTAddrDisc, AArch64::X17);
   bool AUTZero = AUTDiscReg == AArch64::XZR;
   unsigned AUTOpc = getAUTOpcodeForKey(AUTKey, AUTZero);
 
@@ -2005,7 +2023,8 @@ void AArch64AsmPrinter::emitPtrauthAuthResign(const MachineInstr *MI) {
 
   // Compute pac discriminator into x17
   assert(isUInt<16>(PACDisc));
-  unsigned PACDiscReg = emitPtrauthDiscriminator(PACDisc, PACAddrDisc);
+  Register PACDiscReg =
+      emitPtrauthDiscriminator(PACDisc, PACAddrDisc, AArch64::X17);
   bool PACZero = PACDiscReg == AArch64::XZR;
   unsigned PACOpc = getPACOpcodeForKey(PACKey, PACZero);
 
@@ -2037,8 +2056,17 @@ void AArch64AsmPrinter::emitPtrauthBranch(const MachineInstr *MI) {
 
   unsigned AddrDisc = MI->getOperand(3).getReg();
 
-  // Compute discriminator into x17
-  unsigned DiscReg = emitPtrauthDiscriminator(Disc, AddrDisc);
+  // Make sure AddrDisc is solely used to compute the discriminator.
+  // While hardly meaningful, it is still possible to describe an authentication
+  // of a pointer against its own value (instead of storage address) with
+  // intrinsics, so use report_fatal_error instead of assert.
+  if (BrTarget == AddrDisc)
+    report_fatal_error("Branch target is signed with its own value");
+
+  // x16 and x17 are implicit-def'ed by MI, and AddrDisc is not used as any
+  // other input, so try to save one MOV by setting MayUseAddrAsScratch.
+  Register DiscReg = emitPtrauthDiscriminator(Disc, AddrDisc, AArch64::X17,
+                                              /*MayUseAddrAsScratch=*/true);
   bool IsZeroDisc = DiscReg == AArch64::XZR;
 
   unsigned Opc;
@@ -2332,16 +2360,7 @@ void AArch64AsmPrinter::LowerMOVaddrPAC(const MachineInstr &MI) {
     }
   }
 
-  unsigned DiscReg = AddrDisc;
-  if (Disc != 0) {
-    if (AddrDisc != AArch64::XZR) {
-      emitMovXReg(AArch64::X17, AddrDisc);
-      emitMOVK(AArch64::X17, Disc, 48);
-    } else {
-      emitMOVZ(AArch64::X17, Disc, 0);
-    }
-    DiscReg = AArch64::X17;
-  }
+  Register DiscReg = emitPtrauthDiscriminator(Disc, AddrDisc, AArch64::X17);
 
   auto MIB = MCInstBuilder(getPACOpcodeForKey(Key, DiscReg == AArch64::XZR))
                  .addReg(AArch64::X16)
@@ -2609,6 +2628,7 @@ void AArch64AsmPrinter::emitInstruction(const MachineInstr *MI) {
   // instruction here.
   case AArch64::AUTH_TCRETURN:
   case AArch64::AUTH_TCRETURN_BTI: {
+    Register Callee = MI->getOperand(0).getReg();
     const uint64_t Key = MI->getOperand(2).getImm();
     assert((Key == AArch64PACKey::IA || Key == AArch64PACKey::IB) &&
            "Invalid auth key for tail-call return");
@@ -2618,31 +2638,23 @@ void AArch64AsmPrinter::emitInstruction(const MachineInstr *MI) {
 
     Register AddrDisc = MI->getOperand(4).getReg();
 
-    Register ScratchReg = MI->getOperand(0).getReg() == AArch64::X16
-                              ? AArch64::X17
-                              : AArch64::X16;
+    Register ScratchReg = Callee == AArch64::X16 ? AArch64::X17 : AArch64::X16;
 
     emitPtrauthTailCallHardening(MI);
 
-    unsigned DiscReg = AddrDisc;
-    if (Disc) {
-      if (AddrDisc != AArch64::NoRegister) {
-        if (ScratchReg != AddrDisc)
-          emitMovXReg(ScratchReg, AddrDisc);
-        emitMOVK(ScratchReg, Disc, 48);
-      } else {
-        emitMOVZ(ScratchReg, Disc, 0);
-      }
-      DiscReg = ScratchReg;
-    }
+    // See the comments in emitPtrauthBranch.
+    if (Callee == AddrDisc)
+      report_fatal_error("Call target is signed with its own value");
+    Register DiscReg = emitPtrauthDiscriminator(Disc, AddrDisc, ScratchReg,
+                                                /*MayUseAddrAsScratch=*/true);
 
-    const bool IsZero = DiscReg == AArch64::NoRegister;
+    const bool IsZero = DiscReg == AArch64::XZR;
     const unsigned Opcodes[2][2] = {{AArch64::BRAA, AArch64::BRAAZ},
                                     {AArch64::BRAB, AArch64::BRABZ}};
 
     MCInst TmpInst;
     TmpInst.setOpcode(Opcodes[Key][IsZero]);
-    TmpInst.addOperand(MCOperand::createReg(MI->getOperand(0).getReg()));
+    TmpInst.addOperand(MCOperand::createReg(Callee));
     if (!IsZero)
       TmpInst.addOperand(MCOperand::createReg(DiscReg));
     EmitToStreamer(*OutStreamer, TmpInst);
diff --git a/llvm/lib/Target/AArch64/AArch64InstrInfo.td b/llvm/lib/Target/AArch64/AArch64InstrInfo.td
index dfa3209928068b..8c019aed3e9793 100644
--- a/llvm/lib/Target/AArch64/AArch64InstrInfo.td
+++ b/llvm/lib/Target/AArch64/AArch64InstrInfo.td
@@ -1840,28 +1840,28 @@ let Predicates = [HasPAuth] in {
   // materialization here), in part because they're handled in a safer way by
   // the kernel, notably on Darwin.
   def BLRA : Pseudo<(outs), (ins GPR64noip:$Rn, i32imm:$Key, i64imm:$Disc,
-                                 GPR64noip:$AddrDisc),
+                                 GPR64:$AddrDisc),
                     [(AArch64authcall GPR64noip:$Rn, timm:$Key, timm:$Disc,
-                                      GPR64noip:$AddrDisc)]>, Sched<[]> {
+                                      GPR64:$AddrDisc)]>, Sched<[]> {
     let isCodeGenOnly = 1;
     let hasSideEffects = 1;
     let mayStore = 0;
     let mayLoad = 0;
     let isCall = 1;
     let Size = 12; // 4 fixed + 8 variable, to compute discriminator.
-    let Defs = [X17,LR];
+    let Defs = [X16,X17,LR];
     let Uses = [SP];
   }
 
   def BLRA_RVMARKER : Pseudo<
         (outs), (ins i64imm:$rvfunc, GPR64noip:$Rn, i32imm:$Key, i64imm:$Disc,
-                     GPR64noip:$AddrDisc),
+                     GPR64:$AddrDisc),
         [(AArch64authcall_rvmarker tglobaladdr:$rvfunc,
                                    GPR64noip:$Rn, timm:$Key, timm:$Disc,
-                                   GPR64noip:$AddrDisc)]>, Sched<[]> {
+                                   GPR64:$AddrDisc)]>, Sched<[]> {
     let isCodeGenOnly = 1;
     let isCall = 1;
-    let Defs = [X17,LR];
+    let Defs = [X16,X17,LR];
     let Uses = [SP];
   }
 
@@ -1869,7 +1869,7 @@ let Predicates = [HasPAuth] in {
   // This directly manipulates x16/x17, which are the only registers the OS
   // guarantees are safe to use for sensitive operations.
   def BRA : Pseudo<(outs), (ins GPR64noip:$Rn, i32imm:$Key, i64imm:$Disc,
-                                GPR64noip:$AddrDisc), []>, Sched<[]> {
+                                GPR64:$AddrDisc), []>, Sched<[]> {
     let isCodeGenOnly = 1;
     let hasNoSchedulingInfo = 1;
     let hasSideEffects = 1;
@@ -1880,7 +1880,7 @@ let Predicates = [HasPAuth] in {
     let isBarrier = 1;
     let isIndirectBranch = 1;
     let Size = 12; // 4 fixed + 8 variable, to compute discriminator.
-    let Defs = [X17];
+    let Defs = [X16,X17];
   }
 
   let isReturn = 1, isTerminator = 1, isBarrier = 1 in {
@@ -1971,7 +1971,7 @@ let Predicates = [HasPAuth] in {
   // make sure at least one register is usable as a scratch one - for that
   // purpose, use tcGPRnotx16x17 register class for the second operand.
   let isCall = 1, isTerminator = 1, isReturn = 1, isBarrier = 1, Size = 16,
-      Uses = [SP] in {
+      Defs = [X16,X17], Uses = [SP] in {
     def AUTH_TCRETURN
       : Pseudo<(outs), (ins tcGPRnotx16x17:$dst, i32imm:$FPDiff, i32imm:$Key,
                             i64imm:$Disc, tcGPR64:$AddrDisc),
diff --git a/llvm/test/CodeGen/AArch64/ptrauth-call.ll b/llvm/test/CodeGen/AArch64/ptrauth-call.ll
index c46fe79f754fef..bf35cf8fecbdb7 100644
--- a/llvm/test/CodeGen/AArch64/ptrauth-call.ll
+++ b/llvm/test/CodeGen/AArch64/ptrauth-call.ll
@@ -188,6 +188,33 @@ define void @test_tailcall_omit_mov_x16_x16(ptr %objptr) #0 {
   ret void
 }
 
+define i32 @test_call_omit_extra_moves(ptr %objptr) #0 {
+; CHECK-LABEL: test_call_omit_extra_moves:
+; DARWIN-NEXT:   stp     x29, x30, [sp, #-16]!
+; ELF-NEXT:      str     x30, [sp, #-16]!
+; CHECK-NEXT:    ldr     x16, [x0]
+; CHECK-NEXT:    mov     x17, x0
+; CHECK-NEXT:    movk    x17, #6503, lsl #48
+; CHECK-NEXT:    autda   x16, x17
+; CHECK-NEXT:    ldr     x8, [x16]
+; CHECK-NEXT:    movk    x16, #34646, lsl #48
+; CHECK-NEXT:    blraa   x8, x16
+; CHECK-NEXT:    mov     w0, #42
+; DARWIN-NEXT:   ldp     x29, x30, [sp], #16
+; ELF-NEXT:      ldr     x30, [sp], #16
+; CHECK-NEXT:    ret
+  %vtable.signed = load ptr, ptr %objptr
+  %objptr.int = ptrtoint ptr %objptr to i64
+  %vtable.discr = tail call i64 @llvm.ptrauth.blend(i64 %objptr.int, i64 6503)
+  %vtable.signed.int = ptrtoint ptr %vtable.signed to i64
+  %vtable.int = tail call i64 @llvm.ptrauth.auth(i64 %vtable.signed.int, i32 2, i64 %vtable.discr)
+  %vtable = inttoptr i64 %vtable.int to ptr
+  %callee.signed = load ptr, ptr %vtable
+  %callee.discr = tail call i64 @llvm.ptrauth.blend(i64 %vtable.int, i64 34646)
+  %call.result = tail call i32 %callee.signed(ptr %objptr) [ "ptrauth"(i32 0, i64 %callee.discr) ]
+  ret i32 42
+}
+
 define i32 @test_call_ia_arg(ptr %arg0, i64 %arg1) #0 {
 ; DARWIN-LABEL: test_call_ia_arg:
 ; DARWIN-NEXT:    stp x29, x30, [sp, #-16]!

@atrosinenko atrosinenko force-pushed the users/atrosinenko/pauth-discriminator branch from 54cb6c8 to 05c9852 Compare November 11, 2024 16:38
@atrosinenko
Copy link
Contributor Author

@kovdan01 Thank you for the review, I have just pushed the commit that should resolve the comments.

Additionally, I discovered that I misunderstood BRA pseudo to be kind of tail call instruction - it is actually used for computed goto. While my original changes are still harmless, it is not so natural to declare BRA to clobber both X16 and X17 as it is not actually a call instruction, so I adjusted the definition of BRA and emitPtrauthBranch function a bit.

@atrosinenko atrosinenko force-pushed the users/atrosinenko/move-lr-checks-to-asmprinter branch from ec05d10 to a12ced5 Compare November 12, 2024 12:55
@atrosinenko atrosinenko force-pushed the users/atrosinenko/pauth-discriminator branch from 05c9852 to 084f0b0 Compare November 12, 2024 12:56
Base automatically changed from users/atrosinenko/move-lr-checks-to-asmprinter to main November 12, 2024 15:27
@atrosinenko atrosinenko force-pushed the users/atrosinenko/pauth-discriminator branch from 084f0b0 to 8728007 Compare November 12, 2024 15:57
Copy link
Contributor

@kovdan01 kovdan01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM apart from one comment

llvm/lib/Target/AArch64/AArch64AsmPrinter.cpp Show resolved Hide resolved
As function calls do not generally preserve X16 and X17, it is beneficial
to allow AddrDisc operand of B(L)RA instruction to reside in these
registers and make use of this condition when computing the discriminator.

This can save up to two MOVs in cases such as loading a (signed) virtual
function pointer via a (signed) pointer to vtable, for example

    ldr   x9, [x16]
    mov   x8, x16
    mov   x17, x8
    movk  x17, #34646, lsl #48
    blraa x9, x17

can be simplified to

    ldr   x8, [x16]
    movk  x16, #34646, lsl #48
    blraa x8, x16
@atrosinenko atrosinenko force-pushed the users/atrosinenko/pauth-discriminator branch from 8728007 to e70d3a9 Compare November 26, 2024 18:30
@atrosinenko
Copy link
Contributor Author

Rebased onto current main branch.

@atrosinenko atrosinenko merged commit 1fccba5 into main Nov 27, 2024
8 checks passed
@atrosinenko atrosinenko deleted the users/atrosinenko/pauth-discriminator branch November 27, 2024 10:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants