Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

builtin trap placed after frame pop on Arm 32-bit #113154

Closed
DavidSpickett opened this issue Oct 21, 2024 · 8 comments · Fixed by #113287
Closed

builtin trap placed after frame pop on Arm 32-bit #113154

DavidSpickett opened this issue Oct 21, 2024 · 8 comments · Fixed by #113287

Comments

@DavidSpickett
Copy link
Collaborator

Since #109628, we have had an lldb test failure on 32 bit Arm:
https://lab.llvm.org/buildbot/#/builders/18/builds/5545

# CHECK: frame #{{.*}}`g() at verbose_trap-in-stl.cpp
         ^
<stdin>:11:88: note: scanning from here
* thread #1, name = 'verbose_trap-in', stop reason = Bounds error: out-of-bounds access
                                                                                       ^
<stdin>:12:44: note: possible intended match here
 frame #2: 0x00410694 verbose_trap-in-stl.test.tmp.out`main at verbose_trap-in-stl.cpp:15:3
                                           ^
Input file: <stdin>
Check file: /home/tcwg-buildbot/worker/lldb-arm-ubuntu/llvm-project/lldb/test/Shell/Recognizer/verbose_trap-in-stl.test

-dump-input=help explains the following input dump.

This test checks that lldb can show the first frame of user written code when there's an error in the STL. So in this case we expect to see g() in the backtrace, rather than the operator[], or main as we now get.

This appears to be because the trap instruction generated by __builtin_trap is put after the frame information is reset:

(lldb) dis
verbose_trap-in-stl.test.tmp.out`std::__1::vector<int>::operator[]:
    0x400558 <+0>:  push   {r11, lr}
    0x40055c <+4>:  mov    r11, sp
    0x400560 <+8>:  sub    sp, sp, #8
    0x400564 <+12>: str    r0, [sp, #0x4]
    0x400568 <+16>: str    r1, [sp]
    0x40056c <+20>: mov    sp, r11
    0x400570 <+24>: pop    {r11, lr}
->  0x400574 <+28>: trap
    0x400578 <+32>: bx     lr

I can show this with another example: https://godbolt.org/z/jnGWh1Wqh

void fn() {
    __builtin_trap();
}

Produces:

fn():
  push {r11, lr}
  mov r11, sp
  pop {r11, lr}
  .inst 0xe7ffdefe
  bx lr

fn has the trap after the frame pointer and link register is reset. This is not the case on RISC-V:

fn():
  addi sp, sp, -16
  sw ra, 12(sp)
  sw s0, 8(sp)
  addi s0, sp, 16
  unimp                          <<<<<<<<< trap is here
  lw ra, 12(sp)
  lw s0, 8(sp)
  addi sp, sp, 16
  ret

And I think changing the frame information before the trap is misleading because a debugger will see a PC within fn but the unwind information will look as if it is already in the caller of fn.

I don't know right now whether the linked change has caused the problem or merely exposed the problem by changing how leaf functions are handled.

@llvmbot
Copy link
Member

llvmbot commented Oct 21, 2024

@llvm/issue-subscribers-backend-arm

Author: David Spickett (DavidSpickett)

Since https://github.com//pull/109628, we have had an lldb test failure on 32 bit Arm: https://lab.llvm.org/buildbot/#/builders/18/builds/5545
# CHECK: frame #{{.*}}`g() at verbose_trap-in-stl.cpp
         ^
&lt;stdin&gt;:11:88: note: scanning from here
* thread #<!-- -->1, name = 'verbose_trap-in', stop reason = Bounds error: out-of-bounds access
                                                                                       ^
&lt;stdin&gt;:12:44: note: possible intended match here
 frame #<!-- -->2: 0x00410694 verbose_trap-in-stl.test.tmp.out`main at verbose_trap-in-stl.cpp:15:3
                                           ^
Input file: &lt;stdin&gt;
Check file: /home/tcwg-buildbot/worker/lldb-arm-ubuntu/llvm-project/lldb/test/Shell/Recognizer/verbose_trap-in-stl.test

-dump-input=help explains the following input dump.

This test checks that lldb can show the first frame of user written code when there's an error in the STL. So in this case we expect to see g() in the backtrace, rather than the operator[], or main as we now get.

This appears to be because the trap instruction generated by __builtin_trap is put after the frame information is reset:

(lldb) dis
verbose_trap-in-stl.test.tmp.out`std::__1::vector&lt;int&gt;::operator[]:
    0x400558 &lt;+0&gt;:  push   {r11, lr}
    0x40055c &lt;+4&gt;:  mov    r11, sp
    0x400560 &lt;+8&gt;:  sub    sp, sp, #<!-- -->8
    0x400564 &lt;+12&gt;: str    r0, [sp, #<!-- -->0x4]
    0x400568 &lt;+16&gt;: str    r1, [sp]
    0x40056c &lt;+20&gt;: mov    sp, r11
    0x400570 &lt;+24&gt;: pop    {r11, lr}
-&gt;  0x400574 &lt;+28&gt;: trap
    0x400578 &lt;+32&gt;: bx     lr

I can show this with another example: https://godbolt.org/z/jnGWh1Wqh

void fn() {
    __builtin_trap();
}

Produces:

fn():
  push {r11, lr}
  mov r11, sp
  pop {r11, lr}
  .inst 0xe7ffdefe
  bx lr

fn has the trap after the frame pointer and link register is reset. This is not the case on RISC-V:

fn():
  addi sp, sp, -16
  sw ra, 12(sp)
  sw s0, 8(sp)
  addi s0, sp, 16
  unimp                          &lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt; trap is here
  lw ra, 12(sp)
  lw s0, 8(sp)
  addi sp, sp, 16
  ret

And I think changing the frame information before the trap is misleading because a debugger will see a PC within fn but the unwind information will look as if it is already in the caller of fn.

I don't know right now whether the linked change has caused the problem or merely exposed the problem by changing how leaf functions are handled.

@DavidSpickett
Copy link
Collaborator Author

You can work around the issue by putting something after the trap:

void fn() {
    __builtin_trap();
    asm ("nop");
}

Then you get the expected order:

fn():
  push {r11, lr}
  mov r11, sp
  .inst 0xe7ffdefe
  mov r0, r0
  pop {r11, lr}
  bx lr

@DavidSpickett
Copy link
Collaborator Author

prologepilog for other architectures is inserting around the code of the function, but Arm is inserting before it:
https://godbolt.org/z/E3rorvnPG

ARM:

bb.0.entry:
  liveins: $lr
  $sp = frame-setup STMDB_UPD $sp(tied-def 0), 14, $noreg, killed $r11, killed $lr
  frame-setup CFI_INSTRUCTION def_cfa_offset 8
  frame-setup CFI_INSTRUCTION offset $lr, -4
  frame-setup CFI_INSTRUCTION offset $r11, -8
  $r11 = frame-setup MOVr killed $sp, 14, $noreg, $noreg
  frame-setup CFI_INSTRUCTION def_cfa_register $r11
  $sp = frame-destroy LDMIA_UPD $sp(tied-def 0), 14, $noreg, def $r11, def $lr; example.cpp:2:5

  TRAP; example.cpp:2:5

  BX_RET 14, $noreg; example.cpp:3:1

# End machine code for function fn().

RISC-V:

# Machine code for function fn(): NoPHIs, TracksLiveness, NoVRegs, TiedOpsRewritten
Frame Objects:
  fi#0: size=4, align=4, at location [SP-4]
  fi#1: size=4, align=4, at location [SP-8]

bb.0.entry:
  liveins: $x1
  $x2 = frame-setup ADDI $x2, -16
  frame-setup CFI_INSTRUCTION def_cfa_offset 16
  SW killed $x1, $x2, 12 :: (store (s32) into %stack.0)
  SW killed $x8, $x2, 8 :: (store (s32) into %stack.1)
  frame-setup CFI_INSTRUCTION offset $x1, -4
  frame-setup CFI_INSTRUCTION offset $x8, -8
  $x8 = frame-setup ADDI $x2, 16
  frame-setup CFI_INSTRUCTION def_cfa $x8, 0

  UNIMP; example.cpp:2:5

  $x1 = LW $x2, 12 :: (load (s32) from %stack.0)
  $x8 = LW $x2, 8 :: (load (s32) from %stack.1)
  $x2 = frame-destroy ADDI $x2, 16; example.cpp:3:1
  PseudoRET; example.cpp:3:1

However this does not happen if the only code in the body is asm ("nop"); or some other thing that is not builtin_trap. So perhaps there is someone special about that?

@DavidSpickett DavidSpickett changed the title builtin trap placed after frame pop on Arm 32 bit builtin trap placed after frame pop on Arm 32-bit Oct 21, 2024
DavidSpickett added a commit that referenced this issue Oct 21, 2024
…en issue

Since #109628 landed, this test
has been failing on 32-bit Arm.

This is due to a codegen problem (whether added or uncovered by the change,
not known) where the trap instruction is placed after the frame pointer
and link register are restored.

#113154

So the code was:
```
std::__1::vector<int>::operator[](unsigned int):
  sub sp, sp, #8
  str r0, [sp, #4]
  str r1, [sp]
  add sp, sp, #8
  .inst 0xe7ffdefe
  bx lr
```
When lldb saw the trap, the PC was inside operator[] but the frame
information actually pointed to g.

This bug only happens for leaf functions so adding a return type
works around it:
```
std::__1::vector<int>::operator[](unsigned int):
  push {r11, lr}
  mov r11, sp
  sub sp, sp, #8
  str r0, [sp, #4]
  str r1, [sp]
  mov sp, r11
  pop {r11, lr}
  .inst 0xe7ffdefe
  bx lr
```
(and operator[] should return T& anyway)

Now the PC location and frame information should match and the
test passes.
@guoxin049
Copy link
Contributor

I'm not sure how you're building it, but if it's a leaf function, you could try eliminating the frame pointer (FP) by using the -momit-leaf-frame-pointer option. If you want to remove the frame pointer entirely, the -fomit-frame-pointer option can be used to eliminate it across the board. like this: https://godbolt.org/z/4vKnKvbj6

@DavidSpickett
Copy link
Collaborator Author

I'm building with: clang <file> -O0 -g

No explicit options related to frame pointers. Yes adding that flag helps, but in the case of the lldb test case I was able to add a return type to make it non-leaf and avoid this.

@guoxin049
Copy link
Contributor

#109628 The behavior for non-leaf functions remains consistent before and after the patch modification, so I don't believe this modification introduced a bug. like this : https://godbolt.org/z/vrqn71qKa

@DavidSpickett
Copy link
Collaborator Author

DavidSpickett commented Oct 21, 2024

Right, this has just revealed what was already there by adding frame information to the function in question. As long as no one backtraces it, it would work fine it's only because lldb inspected it that we found this.

Without frame pointers I can still see the trap being placed after stack pointer restore code:

void gn() {
    volatile int i = 5;
    // As long as the trap is last, it'll be placed
    // after the pop.
    __builtin_trap();
}

clang -O0 -g -fomit-frame-pointer:

gn:
        sub     sp, sp, #4
        mov     r0, #5
        str     r0, [sp]
        add     sp, sp, #4         << destroy frame
        .inst   0xe7ffdefe         << trap is after that
        bx      lr

From CE's pipeline viewer:

bb.0.entry:
  $sp = frame-setup SUBri killed $sp, 4, 14, $noreg, $noreg
  frame-setup CFI_INSTRUCTION def_cfa_offset 4
  renamable $r0 = MOVi 5, 14, $noreg, $noreg
  STRi12 killed renamable $r0, $sp, 0, 14, $noreg :: (volatile store (s32) into %ir.i); example.c:7:18
  $sp = frame-destroy ADDri killed $sp, 4, 14, $noreg, $noreg; example.c:10:5
  TRAP; example.c:10:5
  BX_RET 14, $noreg; example.c:11:1

# End machine code for function gn.

Again it's inserting the frame destroy before the trap. Does not do this for the RISC-V equivalent options:

gn:
        addi    sp, sp, -16
        li      a0, 5
        sw      a0, 12(sp)
        unimp                       << trap
        addi    sp, sp, 16       << destroy frame
        ret

And this happens prior to your PR.

@efriedma-quic
Copy link
Collaborator

I think the problem is that the Arm "trap" instruction is marked "isTerminator = 1" in ARMInstrInfo.td. No other target does this, as far as I can tell, precisely because it leads to this sort of weird result.

void fn() {
    __builtin_trap();
    asm ("nop");
}

This actually fails MachineVerifier: "Bad machine code: Non-terminator instruction after the first terminator".

DavidSpickett added a commit to DavidSpickett/llvm-project that referenced this issue Oct 22, 2024
Fixes llvm#113154

The encodings used for llvm.trap() on ARM were all marked as
barriers and terminators. This lead to stack frame destroy
code being inserted before the trap if the trap was the last
thing in the function and it had no return statement.
```
void fn() {
  volatile int i = 0;
  __builtin_trap();
}
```
Produced:
```
fn:
        push    {r11, lr}   << stack frame create
<...>
        pop     {r11, lr}   << stack frame destroy
        .inst   0xe7ffdefe  << trap
        bx      lr
```
All the other targets don't mark them this way, instead
they mark them with isTrap. I've changed ARM to do this,
which fixes the code generation:
```
fn:
        push    {r11, lr}   << stack frame create
<...>
        .inst   0xe7ffdefe  << trap
        pop     {r11, lr}   << stack frame destroy
        bx      lr
```
I've updated the existing trap test to force the need for
a stack frame, then check that the instruction immediately
after the trap is resetting the stack pointer.
DavidSpickett added a commit to DavidSpickett/llvm-project that referenced this issue Oct 22, 2024
Fixes llvm#113154

The encodings used for llvm.trap() on ARM were all marked as
barriers and terminators. This lead to stack frame destroy
code being inserted before the trap if the trap was the last
thing in the function and it had no return statement.
```
void fn() {
  volatile int i = 0;
  __builtin_trap();
}
```
Produced:
```
fn:
        push    {r11, lr}   << stack frame create
<...>
        mov     sp, r11
        pop     {r11, lr}   << stack frame destroy
        .inst   0xe7ffdefe  << trap
        bx      lr
```
All the other targets don't mark them this way, instead
they mark them with isTrap. I've changed ARM to do this,
which fixes the code generation:
```
fn:
        push    {r11, lr}   << stack frame create
<...>
        .inst   0xe7ffdefe  << trap
        mov     sp, r11
        pop     {r11, lr}   << stack frame destroy
        bx      lr
```
I've updated the existing trap test to force the need for
a stack frame, then check that the instruction immediately
after the trap is resetting the stack pointer.
EricWF pushed a commit to efcs/llvm-project that referenced this issue Oct 22, 2024
…en issue

Since llvm#109628 landed, this test
has been failing on 32-bit Arm.

This is due to a codegen problem (whether added or uncovered by the change,
not known) where the trap instruction is placed after the frame pointer
and link register are restored.

llvm#113154

So the code was:
```
std::__1::vector<int>::operator[](unsigned int):
  sub sp, sp, llvm#8
  str r0, [sp, #4]
  str r1, [sp]
  add sp, sp, llvm#8
  .inst 0xe7ffdefe
  bx lr
```
When lldb saw the trap, the PC was inside operator[] but the frame
information actually pointed to g.

This bug only happens for leaf functions so adding a return type
works around it:
```
std::__1::vector<int>::operator[](unsigned int):
  push {r11, lr}
  mov r11, sp
  sub sp, sp, llvm#8
  str r0, [sp, #4]
  str r1, [sp]
  mov sp, r11
  pop {r11, lr}
  .inst 0xe7ffdefe
  bx lr
```
(and operator[] should return T& anyway)

Now the PC location and frame information should match and the
test passes.
DavidSpickett added a commit that referenced this issue Oct 23, 2024
Fixes #113154

The encodings used for llvm.trap() on ARM were all marked as barriers
and terminators. This lead to stack frame destroy code being inserted
before the trap if the trap was the last thing in the function and it
had no return statement.
```
void fn() {
  volatile int i = 0;
  __builtin_trap();
}
```
Produced:
```
fn:
        push    {r11, lr}   << stack frame create
<...>
        mov     sp, r11
        pop     {r11, lr}   << stack frame destroy
        .inst   0xe7ffdefe  << trap
        bx      lr
```
All the other targets don't mark them this way, instead they mark them
with isTrap. I've changed ARM to do this, which fixes the code
generation:
```
fn:
        push    {r11, lr}   << stack frame create
<...>
        .inst   0xe7ffdefe  << trap
        mov     sp, r11
        pop     {r11, lr}   << stack frame destroy
        bx      lr
```
I've updated the existing trap test to force the need for a stack frame,
then check that the instruction immediately after the trap is resetting
the stack pointer.

debugtrap was already working but I've added the same checks for it
anyway.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants