Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opaque_closure test segfaulting on aarch64-linux-gnu #54054

Closed
giordano opened this issue Apr 11, 2024 · 5 comments · Fixed by #54443
Closed

opaque_closure test segfaulting on aarch64-linux-gnu #54054

giordano opened this issue Apr 11, 2024 · 5 comments · Fixed by #54443
Labels
ci Continuous integration system:arm ARMv7 and AArch64 system:linux Affects only Linux
Milestone

Comments

@giordano
Copy link
Contributor

giordano commented Apr 11, 2024

julia> Base.runtests("opaque_closure")                                                                                                                       
Running parallel tests with:                                                                                                                                 
  getpid() = 390433                                                                                                                                          
  nworkers() = 1                                                                                                                                             
  nthreads() = 1                                                                                                                                             
  Sys.CPU_THREADS = 36                                                                                                                                       
  Sys.total_memory() = 477.556 GiB                                                                                                                           
  Sys.free_memory() = 442.172 GiB                                                                                                                            
                                                                                                                                                             
Test       (Worker) | Time (s) | GC (s) | GC % | Alloc (MB) | RSS (MB)                                                                                       
opaque_closure  (1) |        started at 2024-04-11T21:05:15.712                                                                                              
                                                                                                                                                             
[390433] signal 11 (2): Segmentation fault                                                                                                                   
in expression starting at /home/cceamgi/.julia/juliaup/julia-nightly/share/julia/test/opaque_closure.jl:369                                                  
jl_system_image_data at /home/cceamgi/.julia/juliaup/julia-nightly/lib/julia/sys.so (unknown line)                                                           
jl_system_image_data at /home/cceamgi/.julia/juliaup/julia-nightly/lib/julia/sys.so (unknown line)                                                           
Allocations: 14847564 (Pool: 14847364; Big: 200); GC: 14                                                                                                     
ERROR: A test has failed. Please submit a bug report (https://github.com/JuliaLang/julia/issues)                                                             
including error messages above and the output of versioninfo():                                                                                              
Julia Version 1.12.0-DEV.325                                                                                                                                 
Commit e9a24d4cee4 (2024-04-10 13:11 UTC)                                                                                                                    
Build Info:                                                                                                                                                  
  Official https://julialang.org/ release                                                                                                                    
Platform Info:                                                                                                                                               
  OS: Linux (aarch64-linux-gnu)                                                                                                                              
  CPU: 72 × unknown                                                                                                                                          
  WORD_SIZE: 64                                                                                                                                              
  LLVM: libLLVM-16.0.6 (ORCJIT, neoverse-v2)                                                                                                                 
Threads: 1 default, 0 interactive, 1 GC (on 72 virtual cores)                                                                                                

This is failing also in all jobs in CI, but aarch64-linux-gnu is allowed to fail tests because of #52434, so no one noticed it.

@giordano giordano added system:linux Affects only Linux system:arm ARMv7 and AArch64 ci Continuous integration labels Apr 11, 2024
@giordano
Copy link
Contributor Author

Doing a quick search through buildkite logs, I think it was introduced by 4ee1022 (#53878). CC: @Keno.

@giordano giordano added this to the 1.12 milestone Apr 12, 2024
@giordano
Copy link
Contributor Author

With a debug+assertions build of be3bc9a I got just barely more information:

Test       (Worker) | Time (s) | GC (s) | GC % | Alloc (MB) | RSS (MB)
opaque_closure  (1) |        started at 2024-04-12T17:37:36.108
fatal error: stack corruption detected

[482916] signal 6 (-6): Aborted
in expression starting at /home/cceamgi/repo/julia/usr/share/julia/test/opaque_closure.jl:369
__pthread_kill_implementation at /lib64/libc.so.6 (unknown line)
raise at /lib64/libc.so.6 (unknown line)
abort at /lib64/libc.so.6 (unknown line)
__stack_chk_fail at /home/cceamgi/repo/julia/src/rtutils.c:236
eval_value at /home/cceamgi/repo/julia/src/interpreter.c:346
jl_system_image_data at /home/cceamgi/repo/julia/usr/lib/julia/sys-debug.so (unknown line)
jl_system_image_data at /home/cceamgi/repo/julia/usr/lib/julia/sys-debug.so (unknown line)
Allocations: 16190713 (Pool: 16190388; Big: 325); GC: 13

Note the

fatal error: stack corruption detected

from

fprintf(stderr, "fatal error: stack corruption detected\n");
The previous frame refers to src/interpreter.c:346, but that's weird since it's just the closing brace of a function definition: In GDB I get

(gdb) where
#0  0x0000fffff7e34650 in __pthread_kill_implementation () from /lib64/libc.so.6
#1  0x0000fffff7def86c in raise () from /lib64/libc.so.6
#2  0x0000fffff7dd7030 in abort () from /lib64/libc.so.6
#3  0x0000fffff73cf60c in __stack_chk_fail () at /home/cceamgi/repo/julia/src/rtutils.c:236
#4  0x0000fffff7349be0 in eval_value (e=0x0, s=0x0) at /home/cceamgi/repo/julia/src/interpreter.c:346
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

also complaining about corrupted stack.

@gbaraldi
Copy link
Member

I think this might be a wrong gc_pop

@gbaraldi
Copy link
Member

gbaraldi commented Apr 12, 2024

This code reproduces it

ir = first(only(Base.code_ircode(sin, (Int,))))
oc = Core.OpaqueClosure(ir, do_compile=false)
oc(1)

but no stacktrace in gdb as well, this is a full blown corrupt stack

@gbaraldi
Copy link
Member

gbaraldi commented May 17, 2024

This got fixed in #54443

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci Continuous integration system:arm ARMv7 and AArch64 system:linux Affects only Linux
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants