Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed worst-case miri performance with lossy string decoding #98592

Closed
wants to merge 1 commit into from

Conversation

Kixiron
Copy link
Member

@Kixiron Kixiron commented Jun 27, 2022

Addresses miri/#2273 (I'm not sure I'd call it a fix since this situation could conceivably happen in other code, so it's more of a bandaid)

Reduces the runtime of the test case from miri/#2273 (comment) from ~forever to a little over 5 seconds on my machine (windows, could possibly be faster on linux)

cc @RalfJung
r? @saethlin

@rustbot rustbot added the T-libs Relevant to the library team, which will review and decide on the PR/issue. label Jun 27, 2022
@RalfJung
Copy link
Member

Why does this help Miri performance?
Are you sure this does not hurt runtime performance?

@Kixiron
Copy link
Member Author

Kixiron commented Jun 27, 2022

@saethlin knows the "why" better than I do, but could you start a perf run to see if there's any performance impact?

@saethlin
Copy link
Member

Phew that's a bit bigger of a diff than I was hoping for. Since this is basically optimizing the debug codegen (but almost a more dramatic version of that) I think it would be best if this PR were very small and had a clear comment explaining exactly why we want to write this code in a particular way. For example, I wonder if simply removing the len method call from the loop and storing the length in a local makes a huge dent in the runtime.

But the speedup is valuable as confirmation that we correctly identified the cause of the problem. I pointed out this pattern:

        while i < self.source.len() {
            let byte = unsafe { *self.source.get_unchecked(i) };

I long suspected that something like this could cause Stacked Borrows blowup, but I have yet failed to build a tight reproducer of the blowup. Perhaps one can be extracted based on this. I think this ends up with a large allocation where the stack-merging code totally fails because each byte has a different pattern of tags. Since this is a slice, each call to len and get_unchecked inserts at least one tag into every stack (it could be way more, at one point I was just printing borrow stacks and I recall some simple operations added a surprising number of items), but the granting tag is way at the bottom. So if I'm right, the linear search is truly worst-case behavior here.

@RalfJung
Copy link
Member

could you start a perf run to see if there's any performance impact?

Sure can, but that will just test perf of rustc itself, which I doubt is a heavy user of this code.
This will need more targeted micro-benchmarking. I am not sure if we have existing benchmarks for that.

@bors try @rust-timer queue

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jun 27, 2022
@bors
Copy link
Contributor

bors commented Jun 27, 2022

⌛ Trying commit 0633c81 with merge dc2897de64ccc546d1d9a10d8d93ac2bd641ef30...

@JakobDegen
Copy link
Contributor

JakobDegen commented Jun 27, 2022

I don't think this will show on a compiler perf run. Imo we should be benchmarking the implementation directly. Edit: Agh, didn't see Ralf's message

@Kixiron
Copy link
Member Author

Kixiron commented Jun 27, 2022

String's from_utf8_lossy() benches should cover this I think, it uses Utf8LossyChunksIter under the hood

@RalfJung
Copy link
Member

I think this ends up with a large allocation where the stack-merging code totally fails because each byte has a different pattern of tags.

Ah, that makes a lot of sense.
We could save memory by merging common tails of stacks in something like a linked list but that won't help with the time complexity.

@Kixiron
Copy link
Member Author

Kixiron commented Jun 27, 2022

I ran some benches and it looks like there's little to no difference between the two

Current:

test str::str_from_lossy_emoji  ... bench:       2,679 ns/iter (+/- 91) = 2114 MB/s
test str::str_from_lossy_en     ... bench:       1,409 ns/iter (+/- 47) = 3821 MB/s
test str::str_from_lossy_ru     ... bench:       2,693 ns/iter (+/- 85) = 1886 MB/s
test str::str_from_lossy_zh     ... bench:       3,393 ns/iter (+/- 235) = 1391 MB/s

This PR:

test str::str_from_lossy_emoji  ... bench:       2,623 ns/iter (+/- 123)
test str::str_from_lossy_en     ... bench:       1,423 ns/iter (+/- 96)
test str::str_from_lossy_ru     ... bench:       1,929 ns/iter (+/- 38)
test str::str_from_lossy_zh     ... bench:       3,367 ns/iter (+/- 59)

@saethlin
Copy link
Member

Did you try running the benchmarks in alloc? x bench library/alloc --test-args=lossy is what I ran, and I got a different set of benchmarks.

@bors
Copy link
Contributor

bors commented Jun 27, 2022

☀️ Try build successful - checks-actions
Build commit: dc2897de64ccc546d1d9a10d8d93ac2bd641ef30 (dc2897de64ccc546d1d9a10d8d93ac2bd641ef30)

@rust-timer
Copy link
Collaborator

Queued dc2897de64ccc546d1d9a10d8d93ac2bd641ef30 with parent 8e52fa8, future comparison URL.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (dc2897de64ccc546d1d9a10d8d93ac2bd641ef30): comparison url.

Instruction count

  • Primary benchmarks: no relevant changes found
  • Secondary benchmarks: 😿 relevant regression found
mean1 max count2
Regressions 😿
(primary)
N/A N/A 0
Regressions 😿
(secondary)
0.4% 0.4% 1
Improvements 🎉
(primary)
N/A N/A 0
Improvements 🎉
(secondary)
N/A N/A 0
All 😿🎉 (primary) N/A N/A 0

Max RSS (memory usage)

Results
  • Primary benchmarks: no relevant changes found
  • Secondary benchmarks: 😿 relevant regressions found
mean1 max count2
Regressions 😿
(primary)
N/A N/A 0
Regressions 😿
(secondary)
4.6% 6.0% 3
Improvements 🎉
(primary)
N/A N/A 0
Improvements 🎉
(secondary)
N/A N/A 0
All 😿🎉 (primary) N/A N/A 0

Cycles

This benchmark run did not return any relevant results for this metric.

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf -perf-regression

Footnotes

  1. the arithmetic mean of the percent change 2

  2. number of relevant changes 2

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Jun 28, 2022
@Kixiron
Copy link
Member Author

Kixiron commented Jun 28, 2022

Ran the alloc from_utf8_lossy benches
Current:

test string::from_utf8_lossy_100_ascii                   ... bench:          40 ns/iter (+/- 2)
test string::from_utf8_lossy_100_invalid                 ... bench:       1,459 ns/iter (+/- 54)
test string::from_utf8_lossy_100_multibyte               ... bench:          78 ns/iter (+/- 5)
test string::from_utf8_lossy_invalid                     ... bench:         199 ns/iter (+/- 5)

This PR:

test string::from_utf8_lossy_100_ascii                   ... bench:          36 ns/iter (+/- 2)
test string::from_utf8_lossy_100_invalid                 ... bench:       1,273 ns/iter (+/- 26)
test string::from_utf8_lossy_100_multibyte               ... bench:          55 ns/iter (+/- 5)
test string::from_utf8_lossy_invalid                     ... bench:         181 ns/iter (+/- 5)

@JohnCSimon JohnCSimon added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jul 24, 2022
@Dylan-DPC
Copy link
Member

@bors delegate=saethlin

@bors
Copy link
Contributor

bors commented Jul 24, 2022

✌️ @saethlin can now approve this pull request

Comment on lines -60 to -65
while i < self.source.len() {
// SAFETY: `i < self.source.len()` per previous line.
let length = self.source.len();
let mut current = self.source.as_ptr();
// SAFETY: current + length is one past the end of the allocation
let (start, end, mut valid_up_to) = unsafe { (current, current.add(length), current) };

while current < end {
// SAFETY: `current < end` per previous line.
// For some reason the following are both significantly slower:
// while let Some(&byte) = self.source.get(i) {
// while let Some(byte) = self.source.get(i).copied() {
let byte = unsafe { *self.source.get_unchecked(i) };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR as-written adds a a lot of unsafe. I do not think that we should do that without sufficient justification.

How much of a perf impact do we see on Miri backtrace printing and the standard library benchmarks from only adjusting these lines? The specific pattern we want to avoid is this: https://github.com/rust-lang/miri/blob/8fdb720329d7674a878a8252fe4b79ef93d6ffec/bench-cargo-miri/slice-get-unchecked/src/main.rs#L8-L9

    while i < x.len() {
        let _element = unsafe { *x.get_unchecked(i) };

I'm not completely opposed the pervasive sort of changes that you've implemented in the rest of this function, but we need benchmarking that makes the case for those changes, as opposed to just tweaking the ASCII fast path.

@Mark-Simulacrum Mark-Simulacrum removed the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jul 24, 2022
@camelid camelid added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. A-miri Area: The miri tool labels Aug 2, 2022
@JohnCSimon
Copy link
Member

@Kixiron
Ping from triage: I'm closing this due to inactivity, Please reopen when you are ready to continue with this.
Note: if you do please open the PR BEFORE you push to it, else you won't be able to reopen - this is a quirk of github.
Thanks for your contribution.

@rustbot label: +S-inactive

@JohnCSimon JohnCSimon closed this Jan 29, 2023
@rustbot rustbot added the S-inactive Status: Inactive and waiting on the author. This is often applied to closed PRs. label Jan 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-miri Area: The miri tool S-inactive Status: Inactive and waiting on the author. This is often applied to closed PRs. S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.