Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LLVM bug with StaticArray#sort_by! on aarch64-linux-musl and aarch64-darwin #11358

Open
straight-shoota opened this issue Oct 24, 2021 · 11 comments · May be fixed by #15018
Open

LLVM bug with StaticArray#sort_by! on aarch64-linux-musl and aarch64-darwin #11358

straight-shoota opened this issue Oct 24, 2021 · 11 comments · May be fixed by #15018
Labels
kind:bug A bug in the code. Does not apply to documentation, specs, etc. platform:aarch64 topic:compiler:codegen

Comments

@straight-shoota
Copy link
Member

Two of our aarch64 CI jobs consistently fail with an invalid memory access while building std_spec since #10889.

Latest master runs:

The invalid memory access comes from LLVM. With a debug compiler we get more info. The backtrace is identical for both targets:

Invalid memory access (signal 11) at address 0x2e
[0x5638f657f176] *Exception::CallStack::print_backtrace:Nil +118
[0x5638f654fcda] ~procProc(Int32, Pointer(LibC::SiginfoT), Pointer(Void), Nil) +330
[0x7fd8e12603c0] ???
[0x7fd8e24f1110] ???
[0x7fd8e250baaf] _ZNK4llvm14TargetLowering11LowerCallToERNS0_16CallLoweringInfoE +6607
[0x7fd8e2528cfa] _ZN4llvm19SelectionDAGBuilder14lowerInvokableERNS_14TargetLowering16CallLoweringInfoEPKNS_10BasicBlockE +1034
[0x7fd8e2513aea] _ZN4llvm19SelectionDAGBuilder11LowerCallToENS_17ImmutableCallSiteENS_7SDValueEbPKNS_10BasicBlockE +2442
[0x7fd8e24feed1] _ZN4llvm19SelectionDAGBuilder9visitCallERKNS_8CallInstE +1441
[0x7fd8e24f41c9] _ZN4llvm19SelectionDAGBuilder5visitERKNS_11InstructionE +105
[0x7fd8e2577c10] _ZN4llvm16SelectionDAGISel16SelectBasicBlockENS_14ilist_iteratorINS_12ilist_detail12node_optionsINS_11InstructionELb0ELb0EvEELb0ELb1EEES6_Rb +112
[0x7fd8e2577a07] _ZN4llvm16SelectionDAGISel20SelectAllBasicBlocksERKNS_8FunctionE +6487
[0x7fd8e2575396] _ZN4llvm16SelectionDAGISel20runOnMachineFunctionERNS_15MachineFunctionE +1942
[0x7fd8e222e5e8] _ZN4llvm19MachineFunctionPass13runOnFunctionERNS_8FunctionE +280
[0x7fd8e2098d76] _ZN4llvm13FPPassManager13runOnFunctionERNS_8FunctionE +1126
[0x7fd8e2098ff3] _ZN4llvm13FPPassManager11runOnModuleERNS_6ModuleE +51
[0x7fd8e20994a0] _ZN4llvm6legacy15PassManagerImpl3runERNS_6ModuleE +960
[0x7fd8e326f523] ???
[0x7fd8e326f35f] LLVMTargetMachineEmitToFile +175
[0x5638f736d019] *LLVM::TargetMachine#emit_to_file<LLVM::Module, String, LLVM::CodeGenFileType>:Bool +89
[0x5638f736d05b] *LLVM::TargetMachine#emit_obj_to_file<LLVM::Module, String>:Bool +11
[0x5638f7363c19] *Crystal::Compiler#cross_compile<Crystal::Program, Array(Crystal::Compiler::CompilationUnit), String>:Nil +313
[0x5638f73637ce] *Crystal::Compiler#codegen<Crystal::Program, Crystal::ASTNode+, Array(Crystal::Compiler::Source), String>:(Tuple(Array(Crystal::Compiler::CompilationUnit), Array(String)) | Nil) +1774
[0x5638f736789c] *Crystal::Compiler#compile<Array(Crystal::Compiler::Source), String>:Crystal::Compiler::Result +188
[0x5638f760bab9] *Crystal::Command::CompilerConfig#compile<String>:Crystal::Compiler::Result +57
[0x5638f760ba64] *Crystal::Command::CompilerConfig#compile:Crystal::Compiler::Result +36
[0x5638f75fade2] *Crystal::Command#build:Crystal::Compiler::Result +290
[0x5638f75f9fed] *Crystal::Command#run:(Bool | Nil) +413
[0x5638f75f9d19] *Crystal::Command::run<Array(String)>:(Bool | Nil) +25
[0x5638f75f9cdd] *Crystal::Command::run:(Bool | Nil) +29
[0x5638f6530984] __crystal_main +2820
[0x5638f711fe16] *Crystal::main_user_code<Int32, Pointer(Pointer(UInt8))>:Nil +6
[0x5638f711fc25] *Crystal::main<Int32, Pointer(Pointer(UInt8))>:Int32 +53
[0x5638f653c8e6] main +6
[0x7fd8e10070b3] __libc_start_main +243
[0x5638f652fdbe] _start +46
[0x0] ???

I've been able to reduce the code to this:

StaticArray["foo"].map {|e| {e, 1}}.sort!

The generic argument type must be a reference type. Inlining sort! as to_slice.sort! makes the error go away.
The backtrace is a little bit different from the one from static_array_spec, though:

Invalid memory access (signal 11) at address 0x28
[0x559a4371c176] *Exception::CallStack::print_backtrace:Nil +118
[0x559a436eccda] ~procProc(Int32, Pointer(LibC::SiginfoT), Pointer(Void), Nil) +330
[0x7fad219ea3c0] ???
[0x7fad22bea02e] ???
[0x7fad22c74088] ???
[0x7fad22c7375c] _ZN4llvm18ScheduleDAGSDNodes12EmitScheduleERNS_26MachineInstrBundleIteratorINS_12MachineInstrELb0EEE +748
[0x7fad22d02466] _ZN4llvm16SelectionDAGISel17CodeGenAndEmitDAGEv +1702
[0x7fad22d01a07] _ZN4llvm16SelectionDAGISel20SelectAllBasicBlocksERKNS_8FunctionE +6487
[0x7fad22cff396] _ZN4llvm16SelectionDAGISel20runOnMachineFunctionERNS_15MachineFunctionE +1942
[0x7fad229b85e8] _ZN4llvm19MachineFunctionPass13runOnFunctionERNS_8FunctionE +280
[0x7fad22822d76] _ZN4llvm13FPPassManager13runOnFunctionERNS_8FunctionE +1126
[0x7fad22822ff3] _ZN4llvm13FPPassManager11runOnModuleERNS_6ModuleE +51
[0x7fad228234a0] _ZN4llvm6legacy15PassManagerImpl3runERNS_6ModuleE +960
[0x7fad239f9523] ???
[0x7fad239f935f] LLVMTargetMachineEmitToFile +175
[0x559a4450a019] *LLVM::TargetMachine#emit_to_file<LLVM::Module, String, LLVM::CodeGenFileType>:Bool +89
[0x559a4450a05b] *LLVM::TargetMachine#emit_obj_to_file<LLVM::Module, String>:Bool +11
[0x559a44500c19] *Crystal::Compiler#cross_compile<Crystal::Program, Array(Crystal::Compiler::CompilationUnit), String>:Nil +313
[0x559a445007ce] *Crystal::Compiler#codegen<Crystal::Program, Crystal::ASTNode+, Array(Crystal::Compiler::Source), String>:(Tuple(Array(Crystal::Compiler::CompilationUnit), Array(String)) | Nil) +1774
[0x559a4450489c] *Crystal::Compiler#compile<Array(Crystal::Compiler::Source), String>:Crystal::Compiler::Result +188
[0x559a447a8ab9] *Crystal::Command::CompilerConfig#compile<String>:Crystal::Compiler::Result +57
[0x559a447a8a64] *Crystal::Command::CompilerConfig#compile:Crystal::Compiler::Result +36
[0x559a44797de2] *Crystal::Command#build:Crystal::Compiler::Result +290
[0x559a44796fed] *Crystal::Command#run:(Bool | Nil) +413
[0x559a44796d19] *Crystal::Command::run<Array(String)>:(Bool | Nil) +25
[0x559a44796cdd] *Crystal::Command::run:(Bool | Nil) +29
[0x559a436cd984] __crystal_main +2820
[0x559a442bce16] *Crystal::main_user_code<Int32, Pointer(Pointer(UInt8))>:Nil +6
[0x559a442bcc25] *Crystal::main<Int32, Pointer(Pointer(UInt8))>:Int32 +53
[0x559a436d98e6] main +6
[0x7fad217910b3] __libc_start_main +243
[0x559a436ccdbe] _start +46
[0x0] ???
@straight-shoota straight-shoota added kind:bug A bug in the code. Does not apply to documentation, specs, etc. topic:compiler:codegen platform:aarch64 labels Oct 24, 2021
@straight-shoota straight-shoota changed the title LLVM bug with StaticArray sort on aarch64-linux-musl and aarch64-darwin LLVM bug with StaticArray#sort_by! on aarch64-linux-musl and aarch64-darwin Oct 24, 2021
@asterite
Copy link
Member

Did you mean with a debug LLVM?

@straight-shoota
Copy link
Member Author

No, a compiler build with debug info. Just a local make crystal vs. the distribution compiler.

@asterite
Copy link
Member

If you compile LLVM with debug mode on (assertions on) you'll find out the issue.

How would a compiler build with debug info help in this case? LLVM is dying when generating an obj file.

@straight-shoota
Copy link
Member Author

straight-shoota commented Oct 24, 2021

Without debug info, the stacktrace is just this:

Invalid memory access (signal 11) at address 0x0
[0x7f3480cd30e6] ???
[0x7f3480cd30b3] ???
[0x7f3481d469c4] ???

So you wouldn't even know it's caused by LLVM.

@asterite
Copy link
Member

Oh, I missed that it's already using a compiler with debug info on.

That said, to understand the problem I suggest using an LLVM debug build.

@straight-shoota
Copy link
Member Author

straight-shoota commented Oct 24, 2021

This does not reproduce with LLVM 13. So I'd assume it's a legitimate LLVM bug that has been fixed by now. I didn't check with 12 (original compiler was built with patched 11.1 from homebrew locally and 10.0 in CI). Maybe it's even a different face of one of the bugs we've already known about.

Anyways, I'm not sure we need to do much about this. Upcoming Crystal releases will presumablty be built with LLVM 13. When we release StaticArray#sort_by in the next release, the respective compiler will be able to handle that.

For the record, somewhen in between I also saw this: LLVM ERROR: Unknown mismatch in getCopyFromParts!

@beta-ziliani
Copy link
Member

Since this is now handled by the compiler, I think we can close it. WDYT @straight-shoota ?

@straight-shoota
Copy link
Member Author

Sorry, what is handled by the compiler? The workaround from #11359 is still in place:

# StaticArray#sort_by and #sort_by! don't compile on aarch64-darwin and
# aarch64-linux-musl due to a codegen error caused by LLVM < 13.0.0.
# See https://github.com/crystal-lang/crystal/issues/11358 for details.
{% unless compare_versions(Crystal::LLVM_VERSION, "13.0.0") < 0 && flag?(:aarch64) && (flag?(:musl) || flag?(:darwin)) %}

@HertzDevil
Copy link
Contributor

This is also reproducible for AArch64 Android, by the way: https://github.com/crystal-lang/crystal/actions/runs/4331137915/jobs/7562790343

@HertzDevil
Copy link
Contributor

HertzDevil commented Aug 1, 2024

For the record, everything here is fixed by llvm/llvm-project@e4ecd83. You get the getCopyFromParts error by:

p StaticArray[{1.0, true}]

The same code gives Assertion failed: (idx < size()), function operator[], file SmallVector.h, line 276 for an LLVM build with assertions enabled, whereas this:

p StaticArray[1, 'a']

gives Assertion failed: (EVT(CLI.Ins[i].VT) == InVals[i].getValueType() && "LowerCall emitted a value with the wrong type!"), function LowerCallTo, file SelectionDAGBuilder.cpp, line 9680. Apparently these are exclusive to FastISel.

This is such a fundamental issue that I suggested in #14844 (comment) to simply disable targetting AArch64 if Crystal was built with LLVM 12 or below.

@straight-shoota
Copy link
Member Author

straight-shoota commented Aug 5, 2024

I suppose disabling aarch64 targets with LLVM <= 12 is fine.

However, this won't affect existing compiler builds. So I think it's even more important to document this properly.
I'm not sure where exactly to put this best, but https://crystal-lang.org/reference/1.13/man/required_libraries.html#compiler-dependencies and https://github.com/crystal-lang/crystal/blob/master/NOTICE.md look like a good place.
Currently we don't have any indication for supported library versions. But I figure it's probably not a bad idea to start this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:bug A bug in the code. Does not apply to documentation, specs, etc. platform:aarch64 topic:compiler:codegen
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants