-
Notifications
You must be signed in to change notification settings - Fork 353
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data race detector causes massive slowdown #1689
Comments
Cc @JCTyblaidd any idea what could be happening here? This is a 10x slowdown due to the data race detector. OTOH, it's also a 40x slowdown due to Stacked Borrows, which is much more than what I observed elsewhere. |
I will look into this properly later: My guess would be that criterion is spawning a large set of threads for which the data-race detector is enabled for, potentially a large set with delayed synchronization - so from the perspective of the data-race detector there is a very large set of live concurrent threads to detect races for (probably crossing the smallvec threshold). I need to try track thread usage for a proper analysis. |
The code in question is not using criterion (the benchmarks are in a separate child crate). This code is using proptest. When I run the tests outside of Miri, I see that the CPU is pegged at 100%, indicating that this specific test isn't making heavy use of multiple threads (it may still be creating threads that are just chilling, however). I also asked the test runner to use only one thread ( time MIRIFLAGS='-Zmiri-disable-isolation -Zmiri-disable-stacked-borrows' \
CARGO_TARGET_DIR=target/miri \
cargo miri test -- --nocapture --test-threads 1
# Reports ~3.5 minutes on my machine Additionally, it doesn't look like there's much use of multithreading from a CPU usage perspective outside of Miri: time cargo test -- --nocapture
# real 7.088 7088012us
# user 6.970 6970495us
# sys 0.258 257838us
# cpu 101% |
|
I do not
|
If there is always exactly 1 thread, then the main data-race detection code should be never run. Then the other possibility is that there is a large difference, maybe due to lack of inlining, between the first two data-race detection skip checks (globally disabled vs only 1 active thread). Might want to extract:
from read and write/unique_access into an inline always function or mark read, write, unique_access as inline. |
I can't repro performance differences between with/without -Zmiri-disable-data-race-detector on x86_64-apple-darwin on a build of miri master (39a7bd0). A quick profile with both the race detector and stacked borrows on gives the following (inverted) call tree:
Which points at https://github.com/rust-lang/miri/blob/master/src/stacked_borrows.rs#L248 (Stack::find_granting) via https://github.com/rust-lang/miri/blob/master/src/stacked_borrows.rs#L522 (EvalContextPrivExt::reborrow). That said, I somewhat imagine it's not surprising that this is a slow part of stacked borrow eval. |
Perhaps it's been fixed in Miri already? I still see it with Miri 1cf1a2e: Run log
|
Nothing major changed in Miri that would explain such a fix... |
For clarity, what I meant by my inability to repro performance differences is: it still takes about 5 minutes (on a decently fast but not blistering 2019 macbook) to run with both Which is to say, I don't see the 23s. So, the opposite of "it got fixed on master" might be possible: perhaps whatever was making that fast in your case has regressed on master. Or it could just be weird compiler shenanigans (this is actually somewhat plausible to me if the inlining theory here #1689 (comment) is true, although this would be a crazy amount of perf to get from extra inlining, tbh) Unfortunately, I'm not 100% sure how to go back to |
Removing |
I just tried this and, erm, don't recommend it (even with This is off topic for this discussion though. |
I mean |
The runtime of this example is extremely erratic. I just ran the tests a few times with SB and the data race detector off, and I'm seeing runtimes between 8 and 80 seconds. When I profile Miri with the data race detector on, all I can find is ~2% of CPU cycles in total associate with the Under normal execution the run-to-run variance is 15% (from So I suspect that some somehow the interpreter is amplifying some randomization in proptest. If we're randomly generating test inputs there are probably some inputs which happen to encounter code paths in the interpreter which are very slow. In any case, this slowdown would be a good thing to work on... If I could reproduce it reliably (I can't), and if I could diagnose it with a profile (I can't). |
The serde2 benchmark was meant to capture some concurrency, but it doesn't help show this slowdown at all. That is possibly because there is only read-only state being shared across threads, no atomic accesses or anything like that. But I also don't know what a better concurrency benchmark would be. |
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
Since nobody seems to have been able to reproduce this... how sure are we that the slowdown is even real? I tried two stress tests that have many (albeit short-lived) threads and got around a 20% slowdown in both cases, which I would not qualify as "massive". This was comparing |
Considering that nobody has managed to reproduce such a slowdown on an issue that's over 3 years old I'm just going to close this. If anyone finds a program that spends more than half its runtime in the data race detector I would rather see a fresh issue filed about that program. |
Steps to reproduce:
Check out shepmaster/sxd-string-slab@8bf847e, then run Miri:
Meta
macOS
rustc --version --verbose
cargo miri --version --verbose
The text was updated successfully, but these errors were encountered: