Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compiler_rt not respecting the CPU features on cross-compile #16957

Closed
aka-mj opened this issue Aug 25, 2023 · 5 comments
Closed

compiler_rt not respecting the CPU features on cross-compile #16957

aka-mj opened this issue Aug 25, 2023 · 5 comments
Labels
arch-arm 32-bit ARM bug Observed behavior contradicts documented or intended behavior compiler-rt
Milestone

Comments

@aka-mj
Copy link

aka-mj commented Aug 25, 2023

Zig Version

0.12.0-dev.47+0461a64a9

Steps to Reproduce and Observed Behavior

Can find initial troubleshooting here:
https://ziggit.dev/t/cross-compile-zig-fails-on-target-with-illegal-instruction/1505

Create project and build for target

> mkdir armtest
> cd armtest
> zig init-exec
> zig build -verbose -Dtarget=arm-linux-gnueabihf -Dcpu=cortex_a5-neon
/home/mjohn/zig/zig-linux-x86_64-0.12.0-dev.47+0461a64a9/zig build-exe /home/mjohn/zig/workspace/armtest/src/main.zig --cache-dir /home/mjohn/zig/workspace/armtest/zig-cache --global-cache-dir /home/mjohn/.cache/zig --name armtest -target arm-linux-gnueabihf -mcpu cortex_a5-neon --listen=- 
> file zig-out/bin/armtest 
zig-out/bin/armtest: ELF 32-bit LSB executable, ARM, EABI5 version 1 (SYSV), statically linked, with debug_info, not stripped

… copy to target and run …

~ # /armtest
Illegal instruction (core dumped)
~ # uname -a
Summit Linux V2470034 4.19.231 #1 PREEMPT none armv7l GNU/Linux
~ # strace /armtest
execve("/armtest", ["/armtest"], 0xbeafddb0 /* 14 vars */) = 0
--- SIGILL {si_signo=SIGILL, si_code=ILL_ILLOPC, si_addr=0x98b10} ---
+++ killed by SIGILL (core dumped) +++
Illegal instruction (core dumped)

Looking at the coredump shows the issue is with memset, that a NEON instruction was included even though it should be excluded.

Core was generated by `/armtest'.
Program terminated with signal SIGILL, Illegal instruction.
#0  0x00098b70 in memset (dest=0xaa000 <os.linux.tls.main_thread_tls_buffer> "", c=0 '\000', len=32) at /home/mjohn/zig/zig-linux-x86_64-0.12.0-dev.47+0461a64a9/lib/compiler_rt/memset.zig:21
21                  if (n == 0) break;
(gdb) where
#0  0x00098b70 in memset (dest=0xaa000 <os.linux.tls.main_thread_tls_buffer> "", c=0 '\000', len=32) at /home/mjohn/zig/zig-linux-x86_64-0.12.0-dev.47+0461a64a9/lib/compiler_rt/memset.zig:21
#1  0x0002e478 in os.linux.tls.prepareTLS (area=...) at /home/mjohn/zig/zig-linux-x86_64-0.12.0-dev.47+0461a64a9/lib/std/os/linux/tls.zig:280
#2  0x0002c774 in os.linux.tls.initStaticTLS (phdrs=...) at /home/mjohn/zig/zig-linux-x86_64-0.12.0-dev.47+0461a64a9/lib/std/os/linux/tls.zig:339
#3  0x0002bcb4 in start.posixCallMainAndExit () at /home/mjohn/zig/zig-linux-x86_64-0.12.0-dev.47+0461a64a9/lib/std/start.zig:404
#4  0x00000000 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) disas
Dump of assembler code for function memset:
   0x00098b48 <+0>:     push    {r4, r5, r11, lr}
   0x00098b4c <+4>:     add     r11, sp, #8
   0x00098b50 <+8>:     cmp     r2, #0
   0x00098b54 <+12>:    beq     0x98ba4 <memset+92>
   0x00098b58 <+16>:    cmp     r2, #16
   0x00098b5c <+20>:    bcs     0x98b6c <memset+36>
   0x00098b60 <+24>:    mov     r3, r2
   0x00098b64 <+28>:    mov     r12, r0
   0x00098b68 <+32>:    b       0x98b98 <memset+80>
   0x00098b6c <+36>:    bic     lr, r2, #15
=> 0x00098b70 <+40>:    vdup.8  q8, r1
   0x00098b74 <+44>:    add     r12, r0, lr
   0x00098b78 <+48>:    and     r3, r2, #15
   0x00098b7c <+52>:    mov     r4, lr
   0x00098b80 <+56>:    mov     r5, r0
   0x00098b84 <+60>:    vst1.8  {d16-d17}, [r5]!
   0x00098b88 <+64>:    subs    r4, r4, #16
   0x00098b8c <+68>:    bne     0x98b84 <memset+60>
   0x00098b90 <+72>:    cmp     lr, r2
   0x00098b94 <+76>:    beq     0x98ba4 <memset+92>
   0x00098b98 <+80>:    strb    r1, [r12], #1
   0x00098b9c <+84>:    subs    r3, r3, #1
   0x00098ba0 <+88>:    bne     0x98b98 <memset+80>
   0x00098ba4 <+92>:    pop     {r4, r5, r11, pc}
End of assembler dump.

Andrew mentions this may be related to #13303.

Expected Behavior

Some output to the console from the default initial project.

@aka-mj aka-mj added the bug Observed behavior contradicts documented or intended behavior label Aug 25, 2023
@mikdusan
Copy link
Member

with #16981 the following output shows compiler_rt uses the same cpu features:

stage4/bin/zig build-exe foo.zig -target arm-linux-gnueabihf -mcpu cortex_a5-neon --verbose-llvm-cpu-features
compilation: foo
  target: arm-linux.3.16...5.10.81-gnueabihf.2.19
  cpu: cortex_a5
  features: -32bit,-8msecext,-a76,-aapcs-frame-chain,-aapcs-frame-chain-leaf,+aclass,-acquire-release,-aes,-atomics-32,-avoid-movs-shop,-avoid-partial-cpsr,-bf16,-big-endian-instructions,-cde,-cdecp0,-cdecp1,-cdecp2,-cdecp3,-cdecp4,-cdecp5,-cdecp6,-cdecp7,-cheap-predicable-cpsr,-clrbhb,-crc,-crypto,+d32,+db,-dfb,-disable-postra-scheduler,-dont-widen-vmovs,-dotprod,+dsp,-execute-only,-expand-fp-mlx,-exynos,-fix-cmse-cve-2021-35465,-fix-cortex-a57-aes-1742098,+fp16,-fp16fml,+fp64,-fp-armv8,-fp-armv8d16,-fp-armv8d16sp,-fp-armv8sp,-fpao,+fpregs,-fpregs16,+fpregs64,-fullfp16,-fuse-aes,-fuse-literals,-harden-sls-blr,-harden-sls-nocomdat,-harden-sls-retbr,+v4t,+v5t,+v5te,+v6,+v6k,+v6m,+v6t2,+v7,+v7clrex,-v8,-v8.1a,-v8.1m.main,-v8.2a,-v8.3a,-v8.4a,-v8.5a,-v8.6a,-v8.7a,-v8.8a,-v8.9a,+v8m,-v8m.main,-v9.1a,-v9.2a,-v9.3a,-v9.4a,-v9a,-hwdiv,-hwdiv-arm,-i8mm,-iwmmxt,-iwmmxt2,-lob,-long-calls,-loop-align,-m3,-mclass,+mp,-muxed-units,-mve,-mve1beat,-mve2beat,-mve4beat,-mve.fp,-nacl-trap,-neon,-neon-fpmovs,-neonfp,-no-branch-predictor,-no-bti-at-return-twice,-no-movt,-no-neg-immediates,-noarm,-nonpipelined-vfp,-pacbti,+perfmon,-prefer-ishst,-prefer-vmovsr,-prof-unpr,-r4,-ras,-rclass,-read-tp-hard,-reserve-r9,+ret-addr-stack,-sb,-sha2,+slow-fp-brcc,-slow-load-D-subreg,-slow-odd-reg,-slow-vdup32,-slow-vgetlni32,+slowfpvfmx,+slowfpvmlx,-soft-float,-splat-vfp-neon,-strict-align,-swift,+thumb2,-thumb-mode,+trustzone,-use-mipipeliner,-use-misched,-armv4,-armv4t,-armv5t,-armv5te,-armv5tej,-armv6,-armv6j,-armv6k,-armv6kz,-armv6-m,-armv6s-m,-armv6t2,+armv7-a,-armv7e-m,-armv7k,-armv7-m,-armv7-r,-armv7s,-armv7ve,-armv8.1-a,-armv8.1-m.main,-armv8.2-a,-armv8.3-a,-armv8.4-a,-armv8.5-a,-armv8.6-a,-armv8.7-a,-armv8.8-a,-armv8.9-a,-armv8-a,-armv8-m.base,-armv8-m.main,-armv8-r,-armv9.1-a,-armv9.2-a,-armv9.3-a,-armv9.4-a,-armv9-a,+vfp2,+vfp2sp,+vfp3,+vfp3d16,+vfp3d16sp,+vfp3sp,+vfp4,+vfp4d16,+vfp4d16sp,+vfp4sp,-virtualization,-vldn-align,+vmlx-forwarding,-vmlx-hazards,-wide-stride-vfp,-xscale,-zcz
compilation: c
  target: arm-linux.3.16...5.10.81-gnueabihf.2.19
  cpu: cortex_a5
  features: -32bit,-8msecext,-a76,-aapcs-frame-chain,-aapcs-frame-chain-leaf,+aclass,-acquire-release,-aes,-atomics-32,-avoid-movs-shop,-avoid-partial-cpsr,-bf16,-big-endian-instructions,-cde,-cdecp0,-cdecp1,-cdecp2,-cdecp3,-cdecp4,-cdecp5,-cdecp6,-cdecp7,-cheap-predicable-cpsr,-clrbhb,-crc,-crypto,+d32,+db,-dfb,-disable-postra-scheduler,-dont-widen-vmovs,-dotprod,+dsp,-execute-only,-expand-fp-mlx,-exynos,-fix-cmse-cve-2021-35465,-fix-cortex-a57-aes-1742098,+fp16,-fp16fml,+fp64,-fp-armv8,-fp-armv8d16,-fp-armv8d16sp,-fp-armv8sp,-fpao,+fpregs,-fpregs16,+fpregs64,-fullfp16,-fuse-aes,-fuse-literals,-harden-sls-blr,-harden-sls-nocomdat,-harden-sls-retbr,+v4t,+v5t,+v5te,+v6,+v6k,+v6m,+v6t2,+v7,+v7clrex,-v8,-v8.1a,-v8.1m.main,-v8.2a,-v8.3a,-v8.4a,-v8.5a,-v8.6a,-v8.7a,-v8.8a,-v8.9a,+v8m,-v8m.main,-v9.1a,-v9.2a,-v9.3a,-v9.4a,-v9a,-hwdiv,-hwdiv-arm,-i8mm,-iwmmxt,-iwmmxt2,-lob,-long-calls,-loop-align,-m3,-mclass,+mp,-muxed-units,-mve,-mve1beat,-mve2beat,-mve4beat,-mve.fp,-nacl-trap,-neon,-neon-fpmovs,-neonfp,-no-branch-predictor,-no-bti-at-return-twice,-no-movt,-no-neg-immediates,-noarm,-nonpipelined-vfp,-pacbti,+perfmon,-prefer-ishst,-prefer-vmovsr,-prof-unpr,-r4,-ras,-rclass,-read-tp-hard,-reserve-r9,+ret-addr-stack,-sb,-sha2,+slow-fp-brcc,-slow-load-D-subreg,-slow-odd-reg,-slow-vdup32,-slow-vgetlni32,+slowfpvfmx,+slowfpvmlx,-soft-float,-splat-vfp-neon,-strict-align,-swift,+thumb2,-thumb-mode,+trustzone,-use-mipipeliner,-use-misched,-armv4,-armv4t,-armv5t,-armv5te,-armv5tej,-armv6,-armv6j,-armv6k,-armv6kz,-armv6-m,-armv6s-m,-armv6t2,+armv7-a,-armv7e-m,-armv7k,-armv7-m,-armv7-r,-armv7s,-armv7ve,-armv8.1-a,-armv8.1-m.main,-armv8.2-a,-armv8.3-a,-armv8.4-a,-armv8.5-a,-armv8.6-a,-armv8.7-a,-armv8.8-a,-armv8.9-a,-armv8-a,-armv8-m.base,-armv8-m.main,-armv8-r,-armv9.1-a,-armv9.2-a,-armv9.3-a,-armv9.4-a,-armv9-a,+vfp2,+vfp2sp,+vfp3,+vfp3d16,+vfp3d16sp,+vfp3sp,+vfp4,+vfp4d16,+vfp4d16sp,+vfp4sp,-virtualization,-vldn-align,+vmlx-forwarding,-vmlx-hazards,-wide-stride-vfp,-xscale,-zcz
compilation: compiler_rt
  target: arm-linux.3.16...5.10.81-gnueabihf.2.19
  cpu: cortex_a5
  features: -32bit,-8msecext,-a76,-aapcs-frame-chain,-aapcs-frame-chain-leaf,+aclass,-acquire-release,-aes,-atomics-32,-avoid-movs-shop,-avoid-partial-cpsr,-bf16,-big-endian-instructions,-cde,-cdecp0,-cdecp1,-cdecp2,-cdecp3,-cdecp4,-cdecp5,-cdecp6,-cdecp7,-cheap-predicable-cpsr,-clrbhb,-crc,-crypto,+d32,+db,-dfb,-disable-postra-scheduler,-dont-widen-vmovs,-dotprod,+dsp,-execute-only,-expand-fp-mlx,-exynos,-fix-cmse-cve-2021-35465,-fix-cortex-a57-aes-1742098,+fp16,-fp16fml,+fp64,-fp-armv8,-fp-armv8d16,-fp-armv8d16sp,-fp-armv8sp,-fpao,+fpregs,-fpregs16,+fpregs64,-fullfp16,-fuse-aes,-fuse-literals,-harden-sls-blr,-harden-sls-nocomdat,-harden-sls-retbr,+v4t,+v5t,+v5te,+v6,+v6k,+v6m,+v6t2,+v7,+v7clrex,-v8,-v8.1a,-v8.1m.main,-v8.2a,-v8.3a,-v8.4a,-v8.5a,-v8.6a,-v8.7a,-v8.8a,-v8.9a,+v8m,-v8m.main,-v9.1a,-v9.2a,-v9.3a,-v9.4a,-v9a,-hwdiv,-hwdiv-arm,-i8mm,-iwmmxt,-iwmmxt2,-lob,-long-calls,-loop-align,-m3,-mclass,+mp,-muxed-units,-mve,-mve1beat,-mve2beat,-mve4beat,-mve.fp,-nacl-trap,-neon,-neon-fpmovs,-neonfp,-no-branch-predictor,-no-bti-at-return-twice,-no-movt,-no-neg-immediates,-noarm,-nonpipelined-vfp,-pacbti,+perfmon,-prefer-ishst,-prefer-vmovsr,-prof-unpr,-r4,-ras,-rclass,-read-tp-hard,-reserve-r9,+ret-addr-stack,-sb,-sha2,+slow-fp-brcc,-slow-load-D-subreg,-slow-odd-reg,-slow-vdup32,-slow-vgetlni32,+slowfpvfmx,+slowfpvmlx,-soft-float,-splat-vfp-neon,-strict-align,-swift,+thumb2,-thumb-mode,+trustzone,-use-mipipeliner,-use-misched,-armv4,-armv4t,-armv5t,-armv5te,-armv5tej,-armv6,-armv6j,-armv6k,-armv6kz,-armv6-m,-armv6s-m,-armv6t2,+armv7-a,-armv7e-m,-armv7k,-armv7-m,-armv7-r,-armv7s,-armv7ve,-armv8.1-a,-armv8.1-m.main,-armv8.2-a,-armv8.3-a,-armv8.4-a,-armv8.5-a,-armv8.6-a,-armv8.7-a,-armv8.8-a,-armv8.9-a,-armv8-a,-armv8-m.base,-armv8-m.main,-armv8-r,-armv9.1-a,-armv9.2-a,-armv9.3-a,-armv9.4-a,-armv9-a,+vfp2,+vfp2sp,+vfp3,+vfp3d16,+vfp3d16sp,+vfp3sp,+vfp4,+vfp4d16,+vfp4d16sp,+vfp4sp,-virtualization,-vldn-align,+vmlx-forwarding,-vmlx-hazards,-wide-stride-vfp,-xscale,-zcz

@aka-mj
Copy link
Author

aka-mj commented Sep 5, 2023

Thanks @mikdusan, Looks like the only issue is similar to the one in #13303 where memset is not replaced with the version from the C library.

@alexrp
Copy link
Member

alexrp commented Aug 28, 2024

#13303 aside, the fact that our compiler-rt memset ends up with NEON instructions for -mcpu cortex_a5-neon is a bug for sure.

@alexrp
Copy link
Member

alexrp commented Aug 28, 2024

zig4 build-exe main.zig -target arm-linux-gnueabihf -mcpu cortex_a5-neon -fcompiler-rtobjdump main --disassemble=memset | grep vdup
   f27b0:       eee01b90        vdup.8  q8, r1zig4 build-obj main.zig -target arm-linux-gnueabihf -mcpu cortex_a5-neon -fcompiler-rtzig4 build-exe main.o -target arm-linux-gnueabihf -mcpu cortex_a5-neonobjdump main --disassemble=memset | grep vdupzig4 build-obj main.zig -target arm-linux-gnueabihf -mcpu cortex_a5-neonzig4 build-exe main.o -target arm-linux-gnueabihf -mcpu cortex_a5-neon -fcompiler-rtobjdump main --disassemble=memset | grep vdup
   f27b0:       eee01b90        vdup.8  q8, r1

Fairly odd.

@alexrp
Copy link
Member

alexrp commented Nov 23, 2024

This was fixed by #21501.

@alexrp alexrp closed this as completed Nov 23, 2024
@alexrp alexrp added this to the 0.14.0 milestone Nov 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arch-arm 32-bit ARM bug Observed behavior contradicts documented or intended behavior compiler-rt
Projects
None yet
Development

No branches or pull requests

3 participants