Data race detector causes massive slowdown #1689

shepmaster · 2021-01-25T15:34:47Z

Steps to reproduce:

Check out shepmaster/sxd-string-slab@8bf847e, then run Miri:

time MIRIFLAGS='-Zmiri-disable-isolation -Zmiri-disable-stacked-borrows -Zmiri-disable-data-race-detector' \
  CARGO_TARGET_DIR=target/miri \
  cargo miri test -- --nocapture
# Reports ~23 seconds on my machine

time MIRIFLAGS='-Zmiri-disable-isolation -Zmiri-disable-data-race-detector' \
  CARGO_TARGET_DIR=target/miri \
  cargo miri test -- --nocapture
# Reports ~16.5 minutes on my machine

time MIRIFLAGS='-Zmiri-disable-isolation -Zmiri-disable-stacked-borrows' \
  CARGO_TARGET_DIR=target/miri \
  cargo miri test -- --nocapture
# Reports ~4.25 minutes on my machine

Meta

macOS

rustc --version --verbose

rustc 1.51.0-nightly (1d0d76f8d 2021-01-24)
binary: rustc
commit-hash: 1d0d76f8dd4f5f6ecbeab575b87edaf1c9f56bb8
commit-date: 2021-01-24
host: x86_64-apple-darwin
release: 1.51.0-nightly
LLVM version: 11.0.1

cargo miri --version --verbose
```
miri 0.1.0 (1cf1a2e4 2021-01-15)
```

The text was updated successfully, but these errors were encountered:

RalfJung · 2021-01-25T18:19:58Z

Cc @JCTyblaidd any idea what could be happening here? This is a 10x slowdown due to the data race detector.

OTOH, it's also a 40x slowdown due to Stacked Borrows, which is much more than what I observed elsewhere.

JCTyblaidd · 2021-01-27T11:57:25Z

I will look into this properly later:

My guess would be that criterion is spawning a large set of threads for which the data-race detector is enabled for, potentially a large set with delayed synchronization - so from the perspective of the data-race detector there is a very large set of live concurrent threads to detect races for (probably crossing the smallvec threshold). I need to try track thread usage for a proper analysis.

shepmaster · 2021-01-27T15:23:04Z

that criterion is spawning a large set of threads

The code in question is not using criterion (the benchmarks are in a separate child crate). This code is using proptest. When I run the tests outside of Miri, I see that the CPU is pegged at 100%, indicating that this specific test isn't making heavy use of multiple threads (it may still be creating threads that are just chilling, however).

I also asked the test runner to use only one thread (--test-threads 1), with a small but not huge change:

time MIRIFLAGS='-Zmiri-disable-isolation -Zmiri-disable-stacked-borrows' \
  CARGO_TARGET_DIR=target/miri \
  cargo miri test -- --nocapture --test-threads 1
# Reports ~3.5 minutes on my machine

Additionally, it doesn't look like there's much use of multithreading from a CPU usage perspective outside of Miri:

time cargo test -- --nocapture

# real	7.088	7088012us
# user	6.970	6970495us
# sys	0.258	257838us
# cpu	101%

RalfJung · 2021-01-27T17:05:01Z

I also asked the test runner to use only one thread (--test-threads 1), with a small but not huge change:

--test-threads 1 should be the default under Miri anyway (when asked for how many CPUs the system has, Miri always returns 1).
But proptest itself might spawn threads. However, Miri will print "thread support is experimental" when the first non-main thread is spawned; do you see that message?

shepmaster · 2021-01-27T18:48:26Z

do you see that message?

I do not

MIRIFLAGS='-Zmiri-disable-isolation -Zmiri-disable-stacked-borrows -Zmiri-disable-data-race-detector' \
  CARGO_TARGET_DIR=target/miri \
  cargo miri test -- --nocapture
   Compiling sxd-string-slab v0.1.0 (/Users/shep/Projects/sxd/string-slab)
    Finished test [unoptimized + debuginfo] target(s) in 0.24s
     Running target/miri/x86_64-apple-darwin/debug/deps/sxd_string_slab-d5887c0cf1c6b66d

running 5 tests
test test::all_interned_keys_equal_each_other ... ok
test test::interning_has_equal_pointer ... ok
test test::interning_preserves_string_values ... ok
test test::interning_twice_creates_equal_values ... ok
test test::interning_two_values_creates_non_equal_values ... ok

test result: ok. 5 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

   Doc-tests sxd-string-slab
Running doctests is not currently supported by Miri.

JCTyblaidd · 2021-01-28T15:41:57Z

If there is always exactly 1 thread, then the main data-race detection code should be never run.

Then the other possibility is that there is a large difference, maybe due to lack of inlining, between the first two data-race detection skip checks (globally disabled vs only 1 active thread). Might want to extract:

if self.global.multi_threaded.get() {

from read and write/unique_access into an inline always function or mark read, write, unique_access as inline.

thomcc · 2021-02-01T02:23:49Z

I can't repro performance differences between with/without -Zmiri-disable-data-race-detector on x86_64-apple-darwin on a build of miri master (39a7bd0).

A quick profile with both the race detector and stacked borrows on gives the following (inverted) call tree:

33.29 s   69.4%	33.29 s	 	  _$LT$core..iter..adapters..rev..Rev$LT$I$GT$$u20$as$u20$core..iter..traits..iterator..Iterator$GT$::try_fold::hae1347c1e39f9ef1
33.29 s   69.3%	0 s	 	   miri::stacked_borrows::Stack::find_granting::h1bc045f8cfcb46a2
18.59 s   38.7%	0 s	 	    miri::stacked_borrows::Stack::grant::hd3f875dd7ebdcaa7
18.54 s   38.6%	0 s	 	     miri::stacked_borrows::Stacks::for_each::hdedf8d6a52690627
18.54 s   38.6%	0 s	 	      miri::helpers::EvalContextExt::visit_freeze_sensitive::hcd00899a5604a512
18.54 s   38.6%	0 s	 	       miri::stacked_borrows::EvalContextPrivExt::reborrow::h36398b4336eca798
18.54 s   38.6%	0 s	 	        miri::stacked_borrows::EvalContextPrivExt::retag_reference::hdb62d7d894f5f35d
18.54 s   38.6%	0 s	 	         miri::stacked_borrows::EvalContextExt::retag::h135215746e29b006
... <snip>
14.70 s   30.6%	0 s	 	    miri::stacked_borrows::Stack::access::h1e11b7d9a47fe816
14.69 s   30.6%	0 s	 	     miri::stacked_borrows::Stack::grant::hd3f875dd7ebdcaa7
14.64 s   30.5%	0 s	 	      miri::stacked_borrows::Stacks::for_each::hdedf8d6a52690627
14.64 s   30.5%	0 s	 	       miri::helpers::EvalContextExt::visit_freeze_sensitive::hcd00899a5604a512
14.64 s   30.5%	0 s	 	        miri::stacked_borrows::EvalContextPrivExt::reborrow::h36398b4336eca798
14.64 s   30.5%	0 s	 	         miri::stacked_borrows::EvalContextPrivExt::retag_reference::hdb62d7d894f5f35d
14.64 s   30.5%	0 s	 	          miri::stacked_borrows::EvalContextExt::retag::h135215746e29b006
14.64 s   30.5%	0 s	 	           rustc_mir::interpret::step::_$LT$impl$u20$rustc_mir..interpret..eval_context..InterpCx$LT$M$GT$$GT$::statement::h25ad5f561a1df15d

Which points at https://github.com/rust-lang/miri/blob/master/src/stacked_borrows.rs#L248 (Stack::find_granting) via https://github.com/rust-lang/miri/blob/master/src/stacked_borrows.rs#L522 (EvalContextPrivExt::reborrow). That said, I somewhat imagine it's not surprising that this is a slow part of stacked borrow eval.

shepmaster · 2021-02-01T16:32:48Z

Perhaps it's been fixed in Miri already? I still see it with Miri 1cf1a2e:

Run log

/tmp % git clone https://github.com/shepmaster/sxd-string-slab/
Cloning into 'sxd-string-slab'...
remote: Enumerating objects: 67, done.
remote: Counting objects: 100% (67/67), done.
remote: Compressing objects: 100% (27/27), done.
remote: Total 67 (delta 24), reused 65 (delta 23), pack-reused 0
Receiving objects: 100% (67/67), 18.75 KiB | 6.25 MiB/s, done.
Resolving deltas: 100% (24/24), done.

/tmp % cd sxd-string-slab
sxd-string-slab % git checkout 8bf847eff2a8a1609c818cae68d227156f5f3dc1
Note: switching to '8bf847eff2a8a1609c818cae68d227156f5f3dc1'.

# Blah blah git

HEAD is now at 8bf847e Add basic performance and memory benchmarks

sxd-string-slab % time MIRIFLAGS='-Zmiri-disable-isolation -Zmiri-disable-stacked-borrows -Zmiri-disable-data-race-detector' \
  CARGO_TARGET_DIR=target/miri \
  cargo +nightly miri test -- --nocapture

# Initial build, I killed it to get a clean timing

sxd-string-slab % time MIRIFLAGS='-Zmiri-disable-isolation -Zmiri-disable-stacked-borrows -Zmiri-disable-data-race-detector' \
  CARGO_TARGET_DIR=target/miri \
  cargo +nightly miri test -- --nocapture
   Compiling sxd-string-slab v0.1.0 (/private/tmp/sxd-string-slab)
    Finished test [unoptimized + debuginfo] target(s) in 0.05s
     Running target/miri/x86_64-apple-darwin/debug/deps/sxd_string_slab-9c3b8d0214db5420

running 5 tests
test test::all_interned_keys_equal_each_other ... ok
test test::interning_has_equal_pointer ... ok
test test::interning_preserves_string_values ... ok
test test::interning_twice_creates_equal_values ... ok
test test::interning_two_values_creates_non_equal_values ... ok

test result: ok. 5 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

   Doc-tests sxd-string-slab
Running doctests is not currently supported by Miri.

real1:28.8988886665us
user1:24.2784273102us
sys2.3442344129us
cpu97%
mem570664 KiB

sxd-string-slab % time MIRIFLAGS='-Zmiri-disable-isolation -Zmiri-disable-stacked-borrows' \
  CARGO_TARGET_DIR=target/miri \
  cargo +nightly miri test -- --nocapture
    Finished test [unoptimized + debuginfo] target(s) in 0.15s
     Running target/miri/x86_64-apple-darwin/debug/deps/sxd_string_slab-9c3b8d0214db5420

running 5 tests
test test::all_interned_keys_equal_each_other ... ok
test test::interning_has_equal_pointer ... ok
test test::interning_preserves_string_values ... ok
test test::interning_twice_creates_equal_values ... ok
test test::interning_two_values_creates_non_equal_values ... ok

test result: ok. 5 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

   Doc-tests sxd-string-slab
Running doctests is not currently supported by Miri.

real2:15.65135650094us
user2:08.57128569996us
sys3.4883488009us
cpu97%
mem969060 KiB

sxd-string-slab % cargo +nightly miri --version
miri 0.1.0 (1cf1a2e4 2021-01-15)

sxd-string-slab % cargo +nightly --version
cargo 1.51.0-nightly (783bc43c6 2021-01-20)

RalfJung · 2021-02-01T16:49:31Z

Nothing major changed in Miri that would explain such a fix...

thomcc · 2021-02-01T17:02:00Z

For clarity, what I meant by my inability to repro performance differences is: it still takes about 5 minutes (on a decently fast but not blistering 2019 macbook) to run with both -Zmiri-disable-stacked-borrows and -Zmiri-disable-data-race-detector.

Which is to say, I don't see the 23s. So, the opposite of "it got fixed on master" might be possible: perhaps whatever was making that fast in your case has regressed on master.

Or it could just be weird compiler shenanigans (this is actually somewhat plausible to me if the inlining theory here #1689 (comment) is true, although this would be a crazy amount of perf to get from extra inlining, tbh)

Unfortunately, I'm not 100% sure how to go back to rustup-managed miri/cargo-miri after running ./miri install so I can't easily check this.

bjorn3 · 2021-02-01T17:06:37Z

Removing ~/.cargo/bin/miri and ~/.cargo/bin/cargo-miri followed by rustup self update would work I think.

thomcc · 2021-02-01T17:13:13Z

Removing ~/.cargo/bin/miri and ~/.cargo/bin/cargo-miri followed by rustup self update would work I think.

I just tried this and, erm, don't recommend it (even with rustup update and with manually reinstalling the nightly toolchain) since I think I may have somewhat messed up my setup trying to make it work. sigh

This is off topic for this discussion though.

bjorn3 · 2021-02-01T17:14:45Z

I mean rustup self update, not rustup update. rustup self update will re-install all rustup shims in ~/.cargo/bin.

saethlin · 2022-07-10T05:56:29Z

The runtime of this example is extremely erratic. I just ran the tests a few times with SB and the data race detector off, and I'm seeing runtimes between 8 and 80 seconds. When I profile Miri with the data race detector on, all I can find is ~2% of CPU cycles in total associate with the concurrency module.

Under normal execution the run-to-run variance is 15% (from perf stat -r10 cargo test). With --jobs=1 -- --test-threads=1, the run-to-run variance is 1% which is what I would expect. If I pass those flags to Miri, the run-to-run variance is reduced but still huge, .

So I suspect that some somehow the interpreter is amplifying some randomization in proptest. If we're randomly generating test inputs there are probably some inputs which happen to encounter code paths in the interpreter which are very slow.

In any case, this slowdown would be a good thing to work on... If I could reproduce it reliably (I can't), and if I could diagnose it with a profile (I can't).

RalfJung · 2022-08-07T02:54:00Z

The serde2 benchmark was meant to capture some concurrency, but it doesn't help show this slowdown at all. That is possibly because there is only read-only state being shared across threads, no atomic accesses or anything like that. But I also don't know what a better concurrency benchmark would be.

RalfJung · 2024-07-05T21:04:02Z

Since nobody seems to have been able to reproduce this... how sure are we that the slowdown is even real?

I tried two stress tests that have many (albeit short-lived) threads and got around a 20% slowdown in both cases, which I would not qualify as "massive". This was comparing -Zmiri-disable-stacked-borrows with -Zmiri-disable-stacked-borrows -Zmiri-disable-data-race-detector, so given the much larger slowdown of Stacked Borrows, this will be hardly noticeable when comparing the default with -Zmiri-disable-data-race-detector. (My stress tests were tests/pass/0weak_memory_consistency.rs from the repo and the scope_join_race test from the standard library.)

saethlin · 2024-07-05T21:20:13Z

how sure are we that the slowdown is even real?

Considering that nobody has managed to reproduce such a slowdown on an issue that's over 3 years old I'm just going to close this. If anyone finds a program that spends more than half its runtime in the data race detector I would rather see a fresh issue filed about that program.

RalfJung added A-concurrency Area: affects our concurrency (multi-thread) support C-bug Category: This is a bug. I-slow Impact: Makes Miri even slower than it already is labels Feb 17, 2021

saethlin mentioned this issue Dec 21, 2022

Data race spans #2646

Merged

This comment was marked as off-topic.

Sign in to view

saethlin closed this as not planned Won't fix, can't repro, duplicate, stale Jul 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data race detector causes massive slowdown #1689

Data race detector causes massive slowdown #1689

shepmaster commented Jan 25, 2021

RalfJung commented Jan 25, 2021

JCTyblaidd commented Jan 27, 2021

shepmaster commented Jan 27, 2021

RalfJung commented Jan 27, 2021

shepmaster commented Jan 27, 2021

JCTyblaidd commented Jan 28, 2021

thomcc commented Feb 1, 2021

shepmaster commented Feb 1, 2021

RalfJung commented Feb 1, 2021

thomcc commented Feb 1, 2021 •

edited

Loading

bjorn3 commented Feb 1, 2021

thomcc commented Feb 1, 2021

bjorn3 commented Feb 1, 2021

saethlin commented Jul 10, 2022

RalfJung commented Aug 7, 2022

This comment was marked as off-topic.

This comment was marked as off-topic.

RalfJung commented Jul 5, 2024

saethlin commented Jul 5, 2024

Data race detector causes massive slowdown #1689

Data race detector causes massive slowdown #1689

Comments

shepmaster commented Jan 25, 2021

Meta

RalfJung commented Jan 25, 2021

JCTyblaidd commented Jan 27, 2021

shepmaster commented Jan 27, 2021

RalfJung commented Jan 27, 2021

shepmaster commented Jan 27, 2021

JCTyblaidd commented Jan 28, 2021

thomcc commented Feb 1, 2021

shepmaster commented Feb 1, 2021

RalfJung commented Feb 1, 2021

thomcc commented Feb 1, 2021 • edited Loading

bjorn3 commented Feb 1, 2021

thomcc commented Feb 1, 2021

bjorn3 commented Feb 1, 2021

saethlin commented Jul 10, 2022

RalfJung commented Aug 7, 2022

This comment was marked as off-topic.

This comment was marked as off-topic.

RalfJung commented Jul 5, 2024

saethlin commented Jul 5, 2024

thomcc commented Feb 1, 2021 •

edited

Loading