Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ReferencePropagation exposed a latent miscompile, found in the wild with regex #114488

Closed
saethlin opened this issue Aug 4, 2023 · 6 comments · Fixed by #114904
Closed

ReferencePropagation exposed a latent miscompile, found in the wild with regex #114488

saethlin opened this issue Aug 4, 2023 · 6 comments · Fixed by #114904
Assignees
Labels
A-codegen Area: Code generation A-debuginfo Area: Debugging information in compiled programs (DWARF, PDB, etc.) A-mir-opt Area: MIR optimizations C-bug Category: This is a bug. I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness P-critical Critical priority regression-from-stable-to-nightly Performance or correctness regression from stable to nightly. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@saethlin
Copy link
Member

saethlin commented Aug 4, 2023

Better reproducer here: #114488 (comment)


I do not have proof that this isn't UB in regex but considering how much we have run that crate in Miri I think it is much more likely that this is a miscompilation of some kind.

fn main() {
    let _ = regex::Regex::new("");
}

Run with this pile of flags, as far as I can tell they are all required

$ RUSTFLAGS="-Cdebuginfo=2 -Zmir-opt-level=2 -Zmir-enable-passes=+DestinationPropagation -Cembed-bitcode=yes -Clto=fat -Cpanic=abort -Copt-level=3" cargo run
   Compiling memchr v2.5.0
   Compiling regex-syntax v0.7.4
   Compiling aho-corasick v1.0.2
   Compiling regex-automata v0.3.4
   Compiling regex v1.9.1
   Compiling scratch v0.1.0 (/tmp/scratch)
    Finished dev [unoptimized + debuginfo] target(s) in 16.62s
     Running `target/x86_64-unknown-linux-gnu/debug/scratch`
Segmentation fault (core dumped)

gdb says the segfault is here:

#0  core::sync::atomic::atomic_add<usize> (dst=0x1)
    at /rustc/474709a9a2a74a8bcf0055fadb335d0ca0d2d939/library/core/src/sync/atomic.rs:3217
#1  core::sync::atomic::AtomicUsize::fetch_add ()
    at /rustc/474709a9a2a74a8bcf0055fadb335d0ca0d2d939/library/core/src/sync/atomic.rs:2545
#2  alloc::sync::{impl#28}::clone<dyn regex_automata::util::prefilter::PrefilterI, alloc::alloc::Global> (self=<optimized out>)
    at /rustc/474709a9a2a74a8bcf0055fadb335d0ca0d2d939/library/alloc/src/sync.rs:2022
#3  regex_automata::util::prefilter::{impl#3}::clone () at src/util/prefilter/mod.rs:141
#4  core::option::{impl#5}::clone<regex_automata::util::prefilter::Prefilter> (self=<optimized out>)
    at /rustc/474709a9a2a74a8bcf0055fadb335d0ca0d2d939/library/core/src/option.rs:1968
#5  regex_automata::meta::wrappers::HybridEngine::new (info=0x7fff9afa5c90, pre=..., nfa=<optimized out>, 
    nfarev=<optimized out>) at src/meta/wrappers.rs:540
#6  regex_automata::meta::wrappers::Hybrid::new (info=0x7fff9afa5c90, pre=..., nfa=<optimized out>, nfarev=0x7fff9afa5e00)
    at src/meta/wrappers.rs:502
#7  0x0000559770000239 in regex_automata::meta::strategy::Core::new (info=..., pre=..., hirs=...) at src/meta/strategy.rs:517
#8  regex_automata::meta::strategy::new (info=<optimized out>, hirs=...) at src/meta/strategy.rs:151
#9  0x000055977002c439 in regex_automata::meta::regex::Builder::build_many_from_hir<regex_syntax::hir::Hir> (
    self=<optimized out>, hirs=...) at src/meta/regex.rs:3464
#10 regex_automata::meta::regex::Builder::build_many<&str> (self=<optimized out>, patterns=...) at src/meta/regex.rs:3352
#11 regex_automata::meta::regex::Builder::build (self=<optimized out>, pattern=...) at src/meta/regex.rs:3271
#12 0x000055976ffc4e88 in regex::builders::Builder::build_one_string (self=0x7fff9afa9488)
    at /home/ben/.cargo/registry/src/index.crates.io-6f17d22bba15001f/regex-1.9.1/src/builders.rs:77
#13 regex::builders::string::RegexBuilder::build (self=0x7fff9afa9488)
    at /home/ben/.cargo/registry/src/index.crates.io-6f17d22bba15001f/regex-1.9.1/src/builders.rs:223
#14 regex::regex::string::Regex::new (re=...)
    at /home/ben/.cargo/registry/src/index.crates.io-6f17d22bba15001f/regex-1.9.1/src/regex/string.rs:181
#15 scratch::main () at src/main.rs:2

searched nightlies: from nightly-2023-07-06 to nightly-2023-07-19
regressed nightly: nightly-2023-07-15
searched commit range: 7bd81ee...ad96323
regressed commit: 079e544

bisected with cargo-bisect-rustc v0.6.6

Host triple: x86_64-unknown-linux-gnu
Reproduce with:

cargo bisect-rustc --start 2023-07-06 --end 2023-07-19 --script script 

cc @cjgillot as the author of #109025

@saethlin saethlin added regression-from-stable-to-nightly Performance or correctness regression from stable to nightly. I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness C-bug Category: This is a bug. A-mir-opt Area: MIR optimizations labels Aug 4, 2023
@rustbot rustbot added needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. I-prioritize Issue: Indicates that prioritization has been requested for this issue. labels Aug 4, 2023
@saethlin saethlin removed the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Aug 4, 2023
@Noratrieb Noratrieb added the requires-nightly This issue requires a nightly compiler in some way. label Aug 5, 2023
@apiraino
Copy link
Contributor

apiraino commented Aug 8, 2023

WG-prioritization assigning priority (Zulip discussion).

@rustbot label -I-prioritize +P-high +T-compiler

@rustbot rustbot added P-high High priority T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. and removed I-prioritize Issue: Indicates that prioritization has been requested for this issue. labels Aug 8, 2023
@wesleywiser wesleywiser self-assigned this Aug 10, 2023
@apiraino apiraino added P-critical Critical priority and removed P-high High priority labels Aug 10, 2023
@wesleywiser wesleywiser added P-high High priority and removed P-critical Critical priority labels Aug 10, 2023
@narpfel
Copy link
Contributor

narpfel commented Aug 14, 2023

Different example that also starts to miscompile on nightly-2023-07-15 (in this case to a single ud2 instruction):

use std::collections::HashSet;

fn main() {
    let maybe_hash_set: Option<HashSet<()>> = None;
    for _ in maybe_hash_set.as_ref().unwrap_or(&HashSet::new()) {}
}

(Godbolt link)

Interestingly, for this code, only -C opt-level=3 -C debuginfo=2 are required to trigger the miscompilation. Also, it doesn’t miscompile with -Z mir-enable-passes=-Inline.

I found this while trying to build tree-sitter with debug info enabled for benchmarking. In that case, I get a segfault in anyhow, even though there should be no error:

Backtrace
$ gdb ../tree-sitter/target/release/tree-sitter
(gdb) r generate
[...]

Program received signal SIGSEGV, Segmentation fault.
anyhow::error::vtable (p=...) at src/error.rs:852
852         *(p.as_ptr() as *const &'static ErrorVTable)
(gdb) bt
#0  anyhow::error::vtable (p=...) at src/error.rs:852
#1  0x00005555555d585b in anyhow::Error::downcast_ref<std::io::error::Error> (self=0x7fffffffd5a0)
    at src/error.rs:500
#2  tree_sitter::main () at cli/src/main.rs:23
(gdb)

Manually reducing the code then yielded the above example.

@saethlin
Copy link
Member Author

saethlin commented Aug 14, 2023

Awesome work.

I think I know why this bisected to the RefProp PR. I'm trying to do a better bisection...

@saethlin
Copy link
Member Author

saethlin commented Aug 14, 2023

While trying to improve the bisection, I have run into an LLVM assertion, so I'm going to work on that and hope that it is the same problem.

Update: Looks like this is an unrelated problem with SROA + CopyProp + debuginfo

@saethlin
Copy link
Member Author

Minimized the above reproducer to a small program with no dependencies: https://godbolt.org/z/eeG6soTK1

use std::marker::PhantomData;

struct RawTable<T> {
    marker: PhantomData<T>,
}

impl<T> RawTable<T> {
    fn iter(&self) -> RawIter<T> {
        RawIter {
            marker: PhantomData,
        }
    }
}

struct RawIter<T> {
    marker: PhantomData<T>,
}

impl<T> Iterator for RawIter<T> {
    type Item = ();
    fn next(&mut self) -> Option<()> {
        None
    }
}

struct HashMap<T> {
    table: RawTable<T>,
}

struct Iter<T> {
    inner: RawIter<T>, // Removing this breaks the reproducer
}

impl<T> IntoIterator for &HashMap<T> {
    type Item = T;
    type IntoIter = Iter<T>;
    fn into_iter(self) -> Iter<T> {
        Iter {
            inner: self.table.iter(),
        }
    }
}

impl<T> Iterator for Iter<T> {
    type Item = T;
    fn next(&mut self) -> Option<T> {
        None
    }
}

pub fn main() {
    let maybe_hash_set: Option<HashMap<()>> = None;
    for _ in maybe_hash_set.as_ref().unwrap_or(&HashMap {
        table: RawTable { marker: PhantomData },
    }) {}
}

Still bisects to the same PR.

Note that my minimization relies on having only Inline and ReferencePropagation enabled, so it does not reproduce with stable flags (most likely other MIR opts totally erase this program). But the HashSet example above does miscompile with stable flags and was hit in the wild, so:

@rustbot label -requires-nightly -P-high +P-critical

@rustbot rustbot added P-critical Critical priority and removed requires-nightly This issue requires a nightly compiler in some way. P-high High priority labels Aug 15, 2023
@nikic
Copy link
Contributor

nikic commented Aug 15, 2023

Input IR: https://godbolt.org/z/GofEbodbo

Contains:

  %self = alloca ptr, align 8
  ; ... no writes to %self ...
  call void @llvm.dbg.declare(metadata ptr %self, metadata !83, metadata !DIExpression()), !dbg !89
  %1 = load ptr, ptr %self, align 8, !nonnull !24, !align !91, !noundef !24
  store ptr %1, ptr %self.ref0.dbg.spill1, align 8

This claims uninitialized memory is !noundef, which is UB. So the issue is on the rustc side.

@tmiasko tmiasko added A-debuginfo Area: Debugging information in compiled programs (DWARF, PDB, etc.) A-codegen Area: Code generation labels Aug 15, 2023
@wesleywiser wesleywiser removed their assignment Aug 17, 2023
@bors bors closed this as completed in ccc3ac0 Aug 18, 2023
@saethlin saethlin changed the title regex miscompiles with aggressive optimization flags ReferencePropagation exposed a latent miscompile, found in the wild with regex Aug 18, 2023
github-actions bot pushed a commit to rust-lang/miri that referenced this issue Aug 21, 2023
Remove references in VarDebugInfo

The codegen implementation is broken, and attempted to read uninitialized memory.

Fixes rust-lang/rust#114488
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-codegen Area: Code generation A-debuginfo Area: Debugging information in compiled programs (DWARF, PDB, etc.) A-mir-opt Area: MIR optimizations C-bug Category: This is a bug. I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness P-critical Critical priority regression-from-stable-to-nightly Performance or correctness regression from stable to nightly. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants