Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data race detector causes massive slowdown #1689

Closed
shepmaster opened this issue Jan 25, 2021 · 19 comments
Closed

Data race detector causes massive slowdown #1689

shepmaster opened this issue Jan 25, 2021 · 19 comments
Labels
A-concurrency Area: affects our concurrency (multi-thread) support C-bug Category: This is a bug. I-slow Impact: Makes Miri even slower than it already is

Comments

@shepmaster
Copy link
Member

Steps to reproduce:

Check out shepmaster/sxd-string-slab@8bf847e, then run Miri:

time MIRIFLAGS='-Zmiri-disable-isolation -Zmiri-disable-stacked-borrows -Zmiri-disable-data-race-detector' \
  CARGO_TARGET_DIR=target/miri \
  cargo miri test -- --nocapture
# Reports ~23 seconds on my machine

time MIRIFLAGS='-Zmiri-disable-isolation -Zmiri-disable-data-race-detector' \
  CARGO_TARGET_DIR=target/miri \
  cargo miri test -- --nocapture
# Reports ~16.5 minutes on my machine

time MIRIFLAGS='-Zmiri-disable-isolation -Zmiri-disable-stacked-borrows' \
  CARGO_TARGET_DIR=target/miri \
  cargo miri test -- --nocapture
# Reports ~4.25 minutes on my machine

Meta

  • macOS

  • rustc --version --verbose

    rustc 1.51.0-nightly (1d0d76f8d 2021-01-24)
    binary: rustc
    commit-hash: 1d0d76f8dd4f5f6ecbeab575b87edaf1c9f56bb8
    commit-date: 2021-01-24
    host: x86_64-apple-darwin
    release: 1.51.0-nightly
    LLVM version: 11.0.1
    
  • cargo miri --version --verbose

    miri 0.1.0 (1cf1a2e4 2021-01-15)
    
@RalfJung
Copy link
Member

Cc @JCTyblaidd any idea what could be happening here? This is a 10x slowdown due to the data race detector.

OTOH, it's also a 40x slowdown due to Stacked Borrows, which is much more than what I observed elsewhere.

@JCTyblaidd
Copy link
Contributor

I will look into this properly later:

My guess would be that criterion is spawning a large set of threads for which the data-race detector is enabled for, potentially a large set with delayed synchronization - so from the perspective of the data-race detector there is a very large set of live concurrent threads to detect races for (probably crossing the smallvec threshold). I need to try track thread usage for a proper analysis.

@shepmaster
Copy link
Member Author

that criterion is spawning a large set of threads

The code in question is not using criterion (the benchmarks are in a separate child crate). This code is using proptest. When I run the tests outside of Miri, I see that the CPU is pegged at 100%, indicating that this specific test isn't making heavy use of multiple threads (it may still be creating threads that are just chilling, however).

I also asked the test runner to use only one thread (--test-threads 1), with a small but not huge change:

time MIRIFLAGS='-Zmiri-disable-isolation -Zmiri-disable-stacked-borrows' \
  CARGO_TARGET_DIR=target/miri \
  cargo miri test -- --nocapture --test-threads 1
# Reports ~3.5 minutes on my machine

Additionally, it doesn't look like there's much use of multithreading from a CPU usage perspective outside of Miri:

time cargo test -- --nocapture

# real	7.088	7088012us
# user	6.970	6970495us
# sys	0.258	257838us
# cpu	101%

@RalfJung
Copy link
Member

I also asked the test runner to use only one thread (--test-threads 1), with a small but not huge change:

--test-threads 1 should be the default under Miri anyway (when asked for how many CPUs the system has, Miri always returns 1).
But proptest itself might spawn threads. However, Miri will print "thread support is experimental" when the first non-main thread is spawned; do you see that message?

@shepmaster
Copy link
Member Author

do you see that message?

I do not
MIRIFLAGS='-Zmiri-disable-isolation -Zmiri-disable-stacked-borrows -Zmiri-disable-data-race-detector' \
  CARGO_TARGET_DIR=target/miri \
  cargo miri test -- --nocapture
   Compiling sxd-string-slab v0.1.0 (/Users/shep/Projects/sxd/string-slab)
    Finished test [unoptimized + debuginfo] target(s) in 0.24s
     Running target/miri/x86_64-apple-darwin/debug/deps/sxd_string_slab-d5887c0cf1c6b66d

running 5 tests
test test::all_interned_keys_equal_each_other ... ok
test test::interning_has_equal_pointer ... ok
test test::interning_preserves_string_values ... ok
test test::interning_twice_creates_equal_values ... ok
test test::interning_two_values_creates_non_equal_values ... ok

test result: ok. 5 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

   Doc-tests sxd-string-slab
Running doctests is not currently supported by Miri.

@JCTyblaidd
Copy link
Contributor

If there is always exactly 1 thread, then the main data-race detection code should be never run.

Then the other possibility is that there is a large difference, maybe due to lack of inlining, between the first two data-race detection skip checks (globally disabled vs only 1 active thread). Might want to extract:

if self.global.multi_threaded.get() {

from read and write/unique_access into an inline always function or mark read, write, unique_access as inline.

@thomcc
Copy link
Member

thomcc commented Feb 1, 2021

I can't repro performance differences between with/without -Zmiri-disable-data-race-detector on x86_64-apple-darwin on a build of miri master (39a7bd0).

A quick profile with both the race detector and stacked borrows on gives the following (inverted) call tree:

33.29 s   69.4%	33.29 s	 	  _$LT$core..iter..adapters..rev..Rev$LT$I$GT$$u20$as$u20$core..iter..traits..iterator..Iterator$GT$::try_fold::hae1347c1e39f9ef1
33.29 s   69.3%	0 s	 	   miri::stacked_borrows::Stack::find_granting::h1bc045f8cfcb46a2
18.59 s   38.7%	0 s	 	    miri::stacked_borrows::Stack::grant::hd3f875dd7ebdcaa7
18.54 s   38.6%	0 s	 	     miri::stacked_borrows::Stacks::for_each::hdedf8d6a52690627
18.54 s   38.6%	0 s	 	      miri::helpers::EvalContextExt::visit_freeze_sensitive::hcd00899a5604a512
18.54 s   38.6%	0 s	 	       miri::stacked_borrows::EvalContextPrivExt::reborrow::h36398b4336eca798
18.54 s   38.6%	0 s	 	        miri::stacked_borrows::EvalContextPrivExt::retag_reference::hdb62d7d894f5f35d
18.54 s   38.6%	0 s	 	         miri::stacked_borrows::EvalContextExt::retag::h135215746e29b006
... <snip>
14.70 s   30.6%	0 s	 	    miri::stacked_borrows::Stack::access::h1e11b7d9a47fe816
14.69 s   30.6%	0 s	 	     miri::stacked_borrows::Stack::grant::hd3f875dd7ebdcaa7
14.64 s   30.5%	0 s	 	      miri::stacked_borrows::Stacks::for_each::hdedf8d6a52690627
14.64 s   30.5%	0 s	 	       miri::helpers::EvalContextExt::visit_freeze_sensitive::hcd00899a5604a512
14.64 s   30.5%	0 s	 	        miri::stacked_borrows::EvalContextPrivExt::reborrow::h36398b4336eca798
14.64 s   30.5%	0 s	 	         miri::stacked_borrows::EvalContextPrivExt::retag_reference::hdb62d7d894f5f35d
14.64 s   30.5%	0 s	 	          miri::stacked_borrows::EvalContextExt::retag::h135215746e29b006
14.64 s   30.5%	0 s	 	           rustc_mir::interpret::step::_$LT$impl$u20$rustc_mir..interpret..eval_context..InterpCx$LT$M$GT$$GT$::statement::h25ad5f561a1df15d

Which points at https://github.com/rust-lang/miri/blob/master/src/stacked_borrows.rs#L248 (Stack::find_granting) via https://github.com/rust-lang/miri/blob/master/src/stacked_borrows.rs#L522 (EvalContextPrivExt::reborrow). That said, I somewhat imagine it's not surprising that this is a slow part of stacked borrow eval.

@shepmaster
Copy link
Member Author

Perhaps it's been fixed in Miri already? I still see it with Miri 1cf1a2e:

Run log
/tmp % git clone https://github.com/shepmaster/sxd-string-slab/
Cloning into 'sxd-string-slab'...
remote: Enumerating objects: 67, done.
remote: Counting objects: 100% (67/67), done.
remote: Compressing objects: 100% (27/27), done.
remote: Total 67 (delta 24), reused 65 (delta 23), pack-reused 0
Receiving objects: 100% (67/67), 18.75 KiB | 6.25 MiB/s, done.
Resolving deltas: 100% (24/24), done.

/tmp % cd sxd-string-slab
sxd-string-slab % git checkout 8bf847eff2a8a1609c818cae68d227156f5f3dc1
Note: switching to '8bf847eff2a8a1609c818cae68d227156f5f3dc1'.

# Blah blah git

HEAD is now at 8bf847e Add basic performance and memory benchmarks

sxd-string-slab % time MIRIFLAGS='-Zmiri-disable-isolation -Zmiri-disable-stacked-borrows -Zmiri-disable-data-race-detector' \
  CARGO_TARGET_DIR=target/miri \
  cargo +nightly miri test -- --nocapture

# Initial build, I killed it to get a clean timing

sxd-string-slab % time MIRIFLAGS='-Zmiri-disable-isolation -Zmiri-disable-stacked-borrows -Zmiri-disable-data-race-detector' \
  CARGO_TARGET_DIR=target/miri \
  cargo +nightly miri test -- --nocapture
   Compiling sxd-string-slab v0.1.0 (/private/tmp/sxd-string-slab)
    Finished test [unoptimized + debuginfo] target(s) in 0.05s
     Running target/miri/x86_64-apple-darwin/debug/deps/sxd_string_slab-9c3b8d0214db5420

running 5 tests
test test::all_interned_keys_equal_each_other ... ok
test test::interning_has_equal_pointer ... ok
test test::interning_preserves_string_values ... ok
test test::interning_twice_creates_equal_values ... ok
test test::interning_two_values_creates_non_equal_values ... ok

test result: ok. 5 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

   Doc-tests sxd-string-slab
Running doctests is not currently supported by Miri.

real1:28.8988886665us
user1:24.2784273102us
sys2.3442344129us
cpu97%
mem570664 KiB

sxd-string-slab % time MIRIFLAGS='-Zmiri-disable-isolation -Zmiri-disable-stacked-borrows' \
  CARGO_TARGET_DIR=target/miri \
  cargo +nightly miri test -- --nocapture
    Finished test [unoptimized + debuginfo] target(s) in 0.15s
     Running target/miri/x86_64-apple-darwin/debug/deps/sxd_string_slab-9c3b8d0214db5420

running 5 tests
test test::all_interned_keys_equal_each_other ... ok
test test::interning_has_equal_pointer ... ok
test test::interning_preserves_string_values ... ok
test test::interning_twice_creates_equal_values ... ok
test test::interning_two_values_creates_non_equal_values ... ok

test result: ok. 5 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

   Doc-tests sxd-string-slab
Running doctests is not currently supported by Miri.

real2:15.65135650094us
user2:08.57128569996us
sys3.4883488009us
cpu97%
mem969060 KiB

sxd-string-slab % cargo +nightly miri --version
miri 0.1.0 (1cf1a2e4 2021-01-15)

sxd-string-slab % cargo +nightly --version
cargo 1.51.0-nightly (783bc43c6 2021-01-20)

@RalfJung
Copy link
Member

RalfJung commented Feb 1, 2021

Nothing major changed in Miri that would explain such a fix...

@thomcc
Copy link
Member

thomcc commented Feb 1, 2021

For clarity, what I meant by my inability to repro performance differences is: it still takes about 5 minutes (on a decently fast but not blistering 2019 macbook) to run with both -Zmiri-disable-stacked-borrows and -Zmiri-disable-data-race-detector.

Which is to say, I don't see the 23s. So, the opposite of "it got fixed on master" might be possible: perhaps whatever was making that fast in your case has regressed on master.

Or it could just be weird compiler shenanigans (this is actually somewhat plausible to me if the inlining theory here #1689 (comment) is true, although this would be a crazy amount of perf to get from extra inlining, tbh)

Unfortunately, I'm not 100% sure how to go back to rustup-managed miri/cargo-miri after running ./miri install so I can't easily check this.

@bjorn3
Copy link
Member

bjorn3 commented Feb 1, 2021

Removing ~/.cargo/bin/miri and ~/.cargo/bin/cargo-miri followed by rustup self update would work I think.

@thomcc
Copy link
Member

thomcc commented Feb 1, 2021

Removing ~/.cargo/bin/miri and ~/.cargo/bin/cargo-miri followed by rustup self update would work I think.

I just tried this and, erm, don't recommend it (even with rustup update and with manually reinstalling the nightly toolchain) since I think I may have somewhat messed up my setup trying to make it work. sigh

This is off topic for this discussion though.

@bjorn3
Copy link
Member

bjorn3 commented Feb 1, 2021

I mean rustup self update, not rustup update. rustup self update will re-install all rustup shims in ~/.cargo/bin.

@RalfJung RalfJung added A-concurrency Area: affects our concurrency (multi-thread) support C-bug Category: This is a bug. I-slow Impact: Makes Miri even slower than it already is labels Feb 17, 2021
@saethlin
Copy link
Member

The runtime of this example is extremely erratic. I just ran the tests a few times with SB and the data race detector off, and I'm seeing runtimes between 8 and 80 seconds. When I profile Miri with the data race detector on, all I can find is ~2% of CPU cycles in total associate with the concurrency module.

Under normal execution the run-to-run variance is 15% (from perf stat -r10 cargo test). With --jobs=1 -- --test-threads=1, the run-to-run variance is 1% which is what I would expect. If I pass those flags to Miri, the run-to-run variance is reduced but still huge, .

So I suspect that some somehow the interpreter is amplifying some randomization in proptest. If we're randomly generating test inputs there are probably some inputs which happen to encounter code paths in the interpreter which are very slow.

In any case, this slowdown would be a good thing to work on... If I could reproduce it reliably (I can't), and if I could diagnose it with a profile (I can't).

@RalfJung
Copy link
Member

RalfJung commented Aug 7, 2022

The serde2 benchmark was meant to capture some concurrency, but it doesn't help show this slowdown at all. That is possibly because there is only read-only state being shared across threads, no atomic accesses or anything like that. But I also don't know what a better concurrency benchmark would be.

@newpavlov

This comment was marked as off-topic.

@saethlin

This comment was marked as off-topic.

@RalfJung
Copy link
Member

RalfJung commented Jul 5, 2024

Since nobody seems to have been able to reproduce this... how sure are we that the slowdown is even real?

I tried two stress tests that have many (albeit short-lived) threads and got around a 20% slowdown in both cases, which I would not qualify as "massive". This was comparing -Zmiri-disable-stacked-borrows with -Zmiri-disable-stacked-borrows -Zmiri-disable-data-race-detector, so given the much larger slowdown of Stacked Borrows, this will be hardly noticeable when comparing the default with -Zmiri-disable-data-race-detector. (My stress tests were tests/pass/0weak_memory_consistency.rs from the repo and the scope_join_race test from the standard library.)

@saethlin
Copy link
Member

saethlin commented Jul 5, 2024

how sure are we that the slowdown is even real?

Considering that nobody has managed to reproduce such a slowdown on an issue that's over 3 years old I'm just going to close this. If anyone finds a program that spends more than half its runtime in the data race detector I would rather see a fresh issue filed about that program.

@saethlin saethlin closed this as not planned Won't fix, can't repro, duplicate, stale Jul 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-concurrency Area: affects our concurrency (multi-thread) support C-bug Category: This is a bug. I-slow Impact: Makes Miri even slower than it already is
Projects
None yet
Development

No branches or pull requests

7 participants