Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Binary crashes when statically linked with LTO turned on #94564

Open
Tracked by #93740
elast0ny opened this issue Mar 3, 2022 · 24 comments
Open
Tracked by #93740

Binary crashes when statically linked with LTO turned on #94564

elast0ny opened this issue Mar 3, 2022 · 24 comments
Assignees
Labels
A-LTO Area: Link-time optimization (LTO) O-linux Operating system: Linux regression-from-stable-to-stable Performance or correctness regression from one stable version to another. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@elast0ny
Copy link

elast0ny commented Mar 3, 2022

I tried this code:

[profile.release]
lto = true
use std::process::Command;
fn main() {
    Command::new("ls").spawn();
}

Compiled with :
RUSTFLAGS="-C target-feature=+crt-static" cargo run --release

I expected to see this happen: Spawn & get output of child ls process

Instead, this happened: Program crashes with :

thread panicked while processing panic. aborting.
illegal hardware instruction (core dumped)

Disabling LTO or static linking removes the issue without any code changes.

Meta

rustc --version --verbose:

rustc --version --verbose
rustc 1.59.0 (9d1b2106e 2022-02-23)
binary: rustc
commit-hash: 9d1b2106e23b1abd32fce1f17267604a5102f57a
commit-date: 2022-02-23
host: x86_64-unknown-linux-gnu
release: 1.59.0
LLVM version: 13.0.0
Backtrace

#0  0x000000000043957f in std::panicking::rust_panic_with_hook ()
    at /rustc/9d1b2106e23b1abd32fce1f17267604a5102f57a/library/core/src/ptr/[mod.rs:188](http://mod.rs:188/)
#1  0x0000000000439046 in std::panicking::begin_panic_handler::{closure#0} () at library/std/src/[panicking.rs:500](http://panicking.rs:500/)
#2  0x0000000000438fe6 in std::sys_common::backtrace::__rust_end_short_backtrace<std::panicking::begin_panic_handler::{closure#0}, !> () at library/std/src/sys_common/[backtrace.rs:139](http://backtrace.rs:139/)
#3  0x0000000000438fa2 in std::panicking::begin_panic_handler () at library/std/src/[panicking.rs:498](http://panicking.rs:498/)
#4  0x0000000000401230 in core::panicking::panic_fmt () at library/core/src/[panicking.rs:116](http://panicking.rs:116/)
#5  0x00000000004395d4 in std::sys::unix::rwlock::RWLock::read () at library/std/src/sys/unix/[rwlock.rs:49](http://rwlock.rs:49/)
#6  std::sys_common::rwlock::StaticRWLock::read () at library/std/src/sys_common/[rwlock.rs:23](http://rwlock.rs:23/)
#7  std::panicking::rust_panic_with_hook () at library/std/src/[panicking.rs:595](http://panicking.rs:595/)
#8  0x0000000000439046 in std::panicking::begin_panic_handler::{closure#0} () at library/std/src/[panicking.rs:500](http://panicking.rs:500/)
#9  0x0000000000438fe6 in std::sys_common::backtrace::__rust_end_short_backtrace<std::panicking::begin_panic_handler::{closure#0}, !> () at library/std/src/sys_common/[backtrace.rs:139](http://backtrace.rs:139/)
#10 0x0000000000438fa2 in std::panicking::begin_panic_handler () at library/std/src/[panicking.rs:498](http://panicking.rs:498/)
#11 0x0000000000401230 in core::panicking::panic_fmt () at library/core/src/[panicking.rs:116](http://panicking.rs:116/)
--Type <RET> for more, q to quit, c to continue without paging--
#12 0x00000000004395d4 in std::sys::unix::rwlock::RWLock::read () at library/std/src/sys/unix/[rwlock.rs:49](http://rwlock.rs:49/)
#13 std::sys_common::rwlock::StaticRWLock::read () at library/std/src/sys_common/[rwlock.rs:23](http://rwlock.rs:23/)
#14 std::panicking::rust_panic_with_hook () at library/std/src/[panicking.rs:595](http://panicking.rs:595/)
#15 0x0000000000439046 in std::panicking::begin_panic_handler::{closure#0} () at library/std/src/[panicking.rs:500](http://panicking.rs:500/)
#16 0x0000000000438fe6 in std::sys_common::backtrace::__rust_end_short_backtrace<std::panicking::begin_panic_handler::{closure#0}, !> () at library/std/src/sys_common/[backtrace.rs:139](http://backtrace.rs:139/)
#17 0x0000000000438fa2 in std::panicking::begin_panic_handler () at library/std/src/[panicking.rs:498](http://panicking.rs:498/)
#18 0x0000000000401230 in core::panicking::panic_fmt () at library/core/src/[panicking.rs:116](http://panicking.rs:116/)
#19 0x0000000000436687 in std::sys::unix::rwlock::RWLock::read () at library/std/src/sys/unix/[rwlock.rs:49](http://rwlock.rs:49/)
#20 std::sys_common::rwlock::StaticRWLock::read () at library/std/src/sys_common/[rwlock.rs:23](http://rwlock.rs:23/)
#21 std::sys::unix::os::env_read_lock () at library/std/src/sys/unix/[os.rs:490](http://os.rs:490/)
#22 std::sys::unix::process::process_common::Command::posix_spawn () at library/std/src/sys/unix/process/process_[unix.rs:529](http://unix.rs:529/)
#23 std::sys::unix::process::process_common::Command::spawn () at library/std/src/sys/unix/process/process_[unix.rs:55](http://unix.rs:55/)
#24 std::process::Command::spawn () at library/std/src/[process.rs:868](http://process.rs:868/)
#25 0x000000000043af8c in tester::main ()
--Type <RET> for more, q to quit, c to continue without paging--
#26 0x0000000000439da3 in std::sys_common::backtrace::__rust_begin_short_backtrace ()
#27 0x000000000043a29c in main ()

@elast0ny elast0ny added the C-bug Category: This is a bug. label Mar 3, 2022
@bjorn3
Copy link
Member

bjorn3 commented Mar 3, 2022

I can reproduce this issue.

@bjorn3 bjorn3 added I-prioritize Issue: Indicates that prioritization has been requested for this issue. I-crash Issue: The compiler crashes (SIGSEGV, SIGABRT, etc). Use I-ICE instead when the compiler panics. A-LTO Area: Link-time optimization (LTO) I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness and removed I-crash Issue: The compiler crashes (SIGSEGV, SIGABRT, etc). Use I-ICE instead when the compiler panics. labels Mar 3, 2022
@saethlin
Copy link
Member

saethlin commented Mar 6, 2022

I've kind of been hoping someone else will ask this: Are you sure this is a soundness bug? At a glance, this looks to me like a logic bug in an unfortunate location that double-panics.

@bjorn3
Copy link
Member

bjorn3 commented Mar 6, 2022

It shouldn't panic, so I would expect the panic is caused by corruption. Also if it was a double panic there should at least be a panic message for the first panic.

@hkratz
Copy link
Contributor

hkratz commented Mar 6, 2022

Reproducable with 1.59.0, 1.60.0-beta.2 and current nightly (c274e49 2022-03-05).

Stacktrace with current nightly:

(lldb) bt
* thread #1, name = 'ltocrash', stop reason = signal SIGABRT
  * frame #0: 0x00007ffff7f1682b ltocrash`gsignal + 203
    frame #1: 0x00007ffff7ecbe1e ltocrash`abort + 299
    frame #2: 0x00007ffff7edf127 ltocrash`std::sys::unix::abort_internal::hcaf26b2b5da2de51 at mod.rs:259:14
    frame #3: 0x00007ffff7f052ae ltocrash`std::panicking::rust_panic_with_hook::hfa6a9afb1a6b2eff at panicking.rs:682:9
    frame #4: 0x00007ffff7f04ebb ltocrash`std::panicking::begin_panic_handler::_$u7b$$u7b$closure$u7d$$u7d$::h6b37cfd8e0c1dd8a at panicking.rs:586:13
    frame #5: 0x00007ffff7f04e56 ltocrash`std::sys_common::backtrace::__rust_end_short_backtrace::hc59fb3f99f6cb43f at backtrace.rs:138:18
    frame #6: 0x00007ffff7f04e12 ltocrash`rust_begin_unwind at panicking.rs:584:5
    frame #7: 0x00007ffff7eca392 ltocrash`core::panicking::panic_fmt::h7275fb82410a6b0a at panicking.rs:143:14
    frame #8: 0x00007ffff7f05301 ltocrash`std::panicking::rust_panic_with_hook::hfa6a9afb1a6b2eff [inlined] std::sys::unix::rwlock::RWLock::read::hb91739d041bece46 at rwlock.rs:0
    frame #9: 0x00007ffff7f052b0 ltocrash`std::panicking::rust_panic_with_hook::hfa6a9afb1a6b2eff [inlined] std::sys_common::rwlock::StaticRWLock::read::hb0a5647500e474ee at rwlock.rs:23
    frame #10: 0x00007ffff7f052b0 ltocrash`std::panicking::rust_panic_with_hook::hfa6a9afb1a6b2eff at panicking.rs:687
    frame #11: 0x00007ffff7f04ebb ltocrash`std::panicking::begin_panic_handler::_$u7b$$u7b$closure$u7d$$u7d$::h6b37cfd8e0c1dd8a at panicking.rs:586:13
    frame #12: 0x00007ffff7f04e56 ltocrash`std::sys_common::backtrace::__rust_end_short_backtrace::hc59fb3f99f6cb43f at backtrace.rs:138:18
    frame #13: 0x00007ffff7f04e12 ltocrash`rust_begin_unwind at panicking.rs:584:5
    frame #14: 0x00007ffff7eca392 ltocrash`core::panicking::panic_fmt::h7275fb82410a6b0a at panicking.rs:143:14
    frame #15: 0x00007ffff7f05301 ltocrash`std::panicking::rust_panic_with_hook::hfa6a9afb1a6b2eff [inlined] std::sys::unix::rwlock::RWLock::read::hb91739d041bece46 at rwlock.rs:0
    frame #16: 0x00007ffff7f052b0 ltocrash`std::panicking::rust_panic_with_hook::hfa6a9afb1a6b2eff [inlined] std::sys_common::rwlock::StaticRWLock::read::hb0a5647500e474ee at rwlock.rs:23
    frame #17: 0x00007ffff7f052b0 ltocrash`std::panicking::rust_panic_with_hook::hfa6a9afb1a6b2eff at panicking.rs:687
    frame #18: 0x00007ffff7f04ebb ltocrash`std::panicking::begin_panic_handler::_$u7b$$u7b$closure$u7d$$u7d$::h6b37cfd8e0c1dd8a at panicking.rs:586:13
    frame #19: 0x00007ffff7f04e56 ltocrash`std::sys_common::backtrace::__rust_end_short_backtrace::hc59fb3f99f6cb43f at backtrace.rs:138:18
    frame #20: 0x00007ffff7f04e12 ltocrash`rust_begin_unwind at panicking.rs:584:5
    frame #21: 0x00007ffff7eca392 ltocrash`core::panicking::panic_fmt::h7275fb82410a6b0a at panicking.rs:143:14
    frame #22: 0x00007ffff7f0258b ltocrash`std::process::Command::spawn::h9eaebc0e504e67be [inlined] std::sys::unix::rwlock::RWLock::read::hb91739d041bece46 at rwlock.rs:49:13
    frame #23: 0x00007ffff7f02528 ltocrash`std::process::Command::spawn::h9eaebc0e504e67be [inlined] std::sys_common::rwlock::StaticRWLock::read::hb0a5647500e474ee at rwlock.rs:23
    frame #24: 0x00007ffff7f02528 ltocrash`std::process::Command::spawn::h9eaebc0e504e67be [inlined] std::sys::unix::os::env_read_lock::hb7ce90cb70e0dd99 at os.rs:487
    frame #25: 0x00007ffff7f02528 ltocrash`std::process::Command::spawn::h9eaebc0e504e67be [inlined] std::sys::unix::process::process_inner::_$LT$impl$u20$std..sys..unix..process..process_common..Command$GT$::posix_spawn::hf188deaf768f5228 at process_unix.rs:526
    frame #26: 0x00007ffff7f02528 ltocrash`std::process::Command::spawn::h9eaebc0e504e67be [inlined] std::sys::unix::process::process_inner::_$LT$impl$u20$std..sys..unix..process..process_common..Command$GT$::spawn::h27d6f54ab4e33566 at process_unix.rs:55
    frame #27: 0x00007ffff7f00af2 ltocrash`std::process::Command::spawn::h9eaebc0e504e67be at process.rs:868
    frame #28: 0x00007ffff7ecd03c ltocrash`ltocrash::main::h1695d9afc4b3cdf2 + 604
    frame #29: 0x00007ffff7ed4543 ltocrash`std::sys_common::backtrace::__rust_begin_short_backtrace::hf29c11af97155265 + 3
    frame #30: 0x00007ffff7ecd730 ltocrash`main + 1296
    frame #31: 0x00007ffff7f095b0 ltocrash`__libc_start_main + 1168
    frame #32: 0x00007ffff7ecc72e ltocrash`_start + 46

@hkratz
Copy link
Contributor

hkratz commented Mar 6, 2022

It seems that static RWLocks are not working properly for some reason leading to the double panic, because RWLock are also used in the central panic dispatcher. For the first panic there is no message displayed because the actual panic handler is never called.

Minimum rustc invocation to trigger it: rustc src/main.rs -C opt-level=1 -C lto=thin -C target-feature=+crt-static

It was working with Rust 1.51, regressed in nightly-2021-02-14.

found 7 bors merge commits in the specified range
commit[0] 2021-02-12UTC: Auto merge of #81744 - rylev:overlapping-early-exit2, r=lcnr
commit[1] 2021-02-13UTC: Auto merge of #82045 - Dylan-DPC:rollup-244l0sb, r=Dylan-DPC
commit[2] 2021-02-13UTC: Auto merge of #82053 - JohnTitor:rollup-ymi9q0g, r=JohnTitor
commit[3] 2021-02-13UTC: Auto merge of #81854 - the8472:specialize-clone-slice, r=Mark-Simulacrum
commit[4] 2021-02-13UTC: Auto merge of #81666 - hyd-dev:miri-windows-test-fail, r=Mark-Simulacrum
commit[5] 2021-02-13UTC: Auto merge of #81494 - cuviper:btree-node-init, r=Mark-Simulacrum
commit[6] 2021-02-13UTC: Auto merge of #81238 - RalfJung:copy-intrinsics, r=m-ou-se

@hkratz

This comment was marked as off-topic.

@rustbot rustbot added regression-from-stable-to-stable Performance or correctness regression from one stable version to another. O-linux Operating system: Linux labels Mar 6, 2022
@apiraino
Copy link
Contributor

apiraino commented Mar 9, 2022

Assigning priority as discussed in the Zulip thread of the Prioritization Working Group.

@rustbot label -I-prioritize +P-critical

@rustbot rustbot added P-critical Critical priority and removed I-prioritize Issue: Indicates that prioritization has been requested for this issue. labels Mar 9, 2022
@apiraino apiraino added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Mar 10, 2022
@pnkfelix pnkfelix self-assigned this Mar 10, 2022
@bjorn3
Copy link
Member

bjorn3 commented Mar 10, 2022

@pnkfelix
Copy link
Member

pnkfelix commented Mar 10, 2022

Quick note from Zulip discussion: We are seeing evidence that the ability to reproduce this issue is somewhat coupled with the version of glibc that one is using.

Specifically, people who have been able to reproduce the issue did so atop glibc versions 2.31 and 2.33.

People who have not been able to reproduce the issue were running atop glibc versions 2.34 and 2.35.

@tavianator
Copy link
Contributor

Just to repeat what I found in the Zulip thread: I think what's happening is with LTO, enough dead code elimination runs that only some object files from libpthread.a are linked in. Notably, the static initializer that fills in the TID for the main thread doesn't get linked in and so it doesn't run. This means that it thinks the main TID is zero, and the owner field of the rwlock is also zero, so this comparison thinks the same thread already has it locked and returns EDEADLK.

This is an old glibc bug: https://sourceware.org/bugzilla/show_bug.cgi?id=5784. In fact, on the same container I reproduced the Rust issue in, I can just as easily reproduce it in C:

[root@8b906bb3417f /]# cat foo.c
#include <assert.h>
#include <pthread.h>

static pthread_rwlock_t rwl = PTHREAD_RWLOCK_INITIALIZER;

int main() {
  assert(pthread_rwlock_rdlock(&rwl) == 0);
  return 0;
}
[root@8b906bb3417f /]# gcc -static -pthread ./foo.c -o foo
[root@8b906bb3417f /]# ./foo
foo: ./foo.c:7: main: Assertion `pthread_rwlock_rdlock(&rwl) == 0' failed.
Aborted (core dumped)

The usual workaround is to pass --whole-archive to the linker for libpthread which forces the whole archive to get linked in and the appropriate initializer to run. E.g.

[root@8b906bb3417f /]# gcc -static -pthread -Wl,--whole-archive,-lpthread,--no-whole-archive ./foo.c -o foo
[root@8b906bb3417f /]# ./foo
[root@8b906bb3417f /]#

but I don't know how to convince rustc to do that. -C link-arg didn't work and neither did #[link(name = "pthread", modifier = "+whole-archive")] extern {}

I believe this stopped reproducing with glibc 2.34 because libpthread was merged into libc.

@pnkfelix
Copy link
Member

Some thoughts:

assuming this is indeed a glibc bug, then that strikes me as an argument that we should not be bending over backwards to fix this.

The main mitigations I can imagine here:

  1. Should we detect the glibc version and issue a compile-time warning if someone has LTO enabled along with glibc <= 2.33 and -C target-feature=+crt-static ? that sounds like something we can reasonably land in the short term, and I think the mitigations it lists to the end user (disable LTO or upgrade your glibc)

  2. Alternatively, we figure out how to get the linker invocation to include --whole-archive -lpthread --no-whole-archive. I will spend a little while digging into that question.

@DemiMarie
Copy link
Contributor

Can this be closed now that (thanks to @m-ou-se) libstd uses its own locking implementation instead of using the buggy glibc rwlocks?

@m-ou-se
Copy link
Member

m-ou-se commented Apr 14, 2022

Oh did I accidentally fix this bug? Nice :)

Can someone confirm that the issue is gone on the latest nightly Rust?

@apiraino
Copy link
Contributor

apiraino commented Apr 14, 2022

I gave a quick spin with 1.62.0-nightly (34a6c9f26 2022-04-13) and it doesn't panic anymore:

$ ls
Cargo.lock  Cargo.toml  src  target  test.sh

$ RUSTFLAGS="-C target-feature=+crt-static" cargo +nightly run --release
    Finished release [optimized] target(s) in 0.00s
     Running `target/release/issue-94564`
Cargo.lock  Cargo.toml	src  target  test.sh

@oli-obk oli-obk added E-needs-test Call for participation: An issue has been fixed and does not reproduce, but no test has been added. and removed I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness C-bug Category: This is a bug. P-critical Critical priority labels Apr 14, 2022
@m-ou-se
Copy link
Member

m-ou-se commented Apr 19, 2022

Looks like it's still segfaulting in CI: #96208 (comment)

@JohnTitor JohnTitor removed the E-needs-test Call for participation: An issue has been fixed and does not reproduce, but no test has been added. label May 13, 2022
@m-ou-se
Copy link
Member

m-ou-se commented Jun 7, 2022

It fails with a different stack trace now:

(gdb) r
Starting program: rust/build/x86_64-unknown-linux-gnu/test/ui/issues/issue-94564/a 

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7f0c44a in __dcigettext ()
(gdb) bt
#0  0x00007ffff7f0c44a in __dcigettext ()
#1  0x00007ffff7f0b692 in __assert_fail ()
#2  0x00007ffff7f550ce in _dl_relocate_static_pie ()
#3  0x00007ffff7f0a438 in __libc_start_main_impl ()
#4  0x00007ffff7ecfc65 in _start () at ../sysdeps/x86_64/start.S:115

@m-ou-se m-ou-se added C-bug Category: This is a bug. I-prioritize Issue: Indicates that prioritization has been requested for this issue. labels Jun 7, 2022
@apiraino
Copy link
Contributor

apiraino commented Jun 7, 2022

I've tried with the latest nightly on a Debian/bookworm with libc 2.33-7 and it still works. I compiled the test code with:

$ rustc +nightly  --version
rustc 1.63.0-nightly (50b00252a 2022-06-06)
$ rustc +nightly 94564.rs -C opt-level=2 -C lto -C target-feature=+crt-static && ./94564

But is my test relevant? I also mention the libc version since in the Zulip thread I understood that it was somehow related

@apiraino
Copy link
Contributor

apiraino commented Jun 8, 2022

Adjusting priority after further discussion on Zulip.

@rustbot label -P-high +P-medium -I-prioritize

@rustbot rustbot added P-medium Medium priority and removed I-prioritize Issue: Indicates that prioritization has been requested for this issue. labels Jun 8, 2022
@m-ou-se
Copy link
Member

m-ou-se commented Jun 8, 2022

I compiled the test code with: [..]

@apiraino That works fine here too (glibc 2.35), but running ./x.py test src/test/ui/issues/issue-94564.rs here still fails. Looks like x.py test isn't running the exact same command.

@m-ou-se
Copy link
Member

m-ou-se commented Jun 8, 2022

It breaks only with -Crpath.

@m-ou-se
Copy link
Member

m-ou-se commented Jun 8, 2022

(gdb) disassemble
[...]
   0x00007ffff7f5582f <+3759>:	lea    0x6a8da(%rip),%rcx        # 0x7ffff7fc0110 <__PRETTY_FUNCTION__.1>
   0x00007ffff7f55836 <+3766>:	mov    $0x76,%edx
   0x00007ffff7f5583b <+3771>:	lea    0x6a738(%rip),%rsi        # 0x7ffff7fbff7a
   0x00007ffff7f55842 <+3778>:	lea    0x6a744(%rip),%rdi        # 0x7ffff7fbff8d
   0x00007ffff7f55849 <+3785>:	call   0x7ffff7f0bde0 <__assert_fail>
(gdb) x/s 0x7ffff7fbff8d
0x7ffff7fbff8d:	"info[DT_RUNPATH] == NULL"

Looks like there's simply an assert that checks that there's no rpath when statically linked:

      assert (info[DT_RUNPATH] == NULL);

https://github.com/bminor/glibc/blob/4f7b7d00e02e22acdda8c13e6db47d12a791c5e3/elf/get-dynamic-info.h#L133

@m-ou-se
Copy link
Member

m-ou-se commented Jun 8, 2022

I've updated #96208 to include -Crpath=no in the test, to cancel out the -Crpath that bootstrap adds. Then it passes.

@m-ou-se m-ou-se added E-needs-test Call for participation: An issue has been fixed and does not reproduce, but no test has been added. and removed P-medium Medium priority C-bug Category: This is a bug. E-needs-test Call for participation: An issue has been fixed and does not reproduce, but no test has been added. labels Jun 8, 2022
@Mark-Simulacrum
Copy link
Member

Is there a reason that we're not statically forbidding (or at least linting?) on such a configuration? It seems like if it's going to (likely) lead to an assertion at runtime, it'd be better for rustc to avoid letting you compile in such a mode, rather than accepting it. In general I think we try to hide this kind of sharp edge from users with better compiler error messages.

@DemiMarie
Copy link
Contributor

This seems like a plain bug in glibc. At the very least it should not be segfaulting trying to print an assertion failure message.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-LTO Area: Link-time optimization (LTO) O-linux Operating system: Linux regression-from-stable-to-stable Performance or correctness regression from one stable version to another. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.