Optimize BufWriter #79930

tgnottingham · 2020-12-11T08:31:39Z

No description provided.

rust-highfive · 2020-12-11T08:31:42Z

(rust-highfive has picked a reviewer for you, use r? to override)

tgnottingham · 2020-12-11T08:41:06Z

This improves BufWriter performance by inlining the hot paths of write and write_all, not calling expensive Vec methods, and optimizing for the case where the input size is less than the buffer size.

Optimization was guided by rustc benchmarks. I have a local change that uses BufWriter more heavily in rustc. It changes code that wrote to a Vec, then wrote that Vec to a File, to code that writes directly to a BufWriter<File>.

Performance was initially completely unworkable.

Result of switching from baseline to current BufWriter

Here is the effect of adding each optimization to the current Bufwriter.

Optimization 1: inline hot paths
Optimization 2: avoid expensive Vec methods
Optimization 3: optimize for input size < buffer size

Result of Optimization 1 versus current BufWriter

Result of Optimization 1 + 2 versus Optimization 1

Result of Optimization 1 + 2 + 3 versus Optimization 1 + 2

Realize that these are benchmarks of the entirety of compilation, not just BufWriter code. So an increase in performance in the benchmarks represents a much larger improvement to BufWriter performance, at least according to rustc's use of it.

Obviously optimization 2 is the most impactful, optimization 1 is good, and optimization 3 would be good any other day of the week, but after the other two, it's like, whatever. :)

After all was said and done, the toll of using BufWriter heavily in rustc was far more tolerable:

Result of switching from baseline to optimized BufWriter

Still isn't quite where I want it to be though. See #79921 for details.

tgnottingham · 2020-12-11T08:48:59Z

I'm aware that #78551 is also making changes in BufWriter. I'm hoping that this PR and @Lucretiel's can be reconciled. I'm pretty certain the change to avoid using Vec methods can be, but I'm not as sure about the inlining and common case optimization parts.

library/std/src/io/buffered/bufwriter.rs

the8472 · 2020-12-14T18:30:15Z

Out of curiosity, have you experimented with a smaller hammer by using #[cold] or annotating the branches with likely/unlikely instead of #[inline(never)]?

Kogia-sima · 2020-12-15T05:20:20Z

I have a concern that this optimization is unsound when buf.len() >= isize::MAX. It may cause overflow, go into the fast path, and apply copy operation excessing the buffer capacity.

tgnottingham · 2020-12-15T09:04:03Z

Out of curiosity, have you experimented with a smaller hammer by using #[cold] or annotating the branches with likely/unlikely instead of #[inline(never)]?

I don't think I did--definitely not #[cold] anyway. I'll play with them.

Now that you mention it, I'm not quite sure what the right thing to do is here. If I don't separate the functions out, the rustc benchmarks take a pretty big hit. So I know that, for my modified rustc's use of BufWriter, it's better if they don't get inlined.

But maybe #[inline(never)] is too extreme? Maybe there are some situations where the optimizer can see that it would be better to inline them.

I could just remove the annotations, as currently, the "cold" functions don't get inlined in rustc, even without #[inline(never)]. But if they start getting inlined again as a result of some other change (and the lack of annotation), perf could take a hit for workloads where the inlining is harmful.

Anyway, I'm open to suggestions.

tgnottingham · 2020-12-15T09:09:06Z

I have a concern that this optimization is unsound when buf.len() >= isize::MAX. It may cause overflow, go into the fast path, and apply copy operation excessing the buffer capacity.

Ah, thank you. Looks like there was a correctness issue already, if I understand correctly, but it didn't lead to unsoundness until my change. I'll fix the problem.

Update: after looking into this, there's mostly no concern about overflow, as Vecs, slices, and objects in general are limited to isize::MAX bytes. Although there's no official spec for the language mandating this, I think it's de facto official, based on things like this:

The isize type is a signed integer type with the same number of bits as the platform's pointer type. The theoretical upper bound on object and array size is the maximum isize value. This ensures that isize can be used to calculate differences between pointers into an object or array and can address every byte within an object along with one byte past the end.

Even so, I tried using buf.len() < self.buf.capacity() - self.buf.len() to remove the question of addition overflow entirely. But the performance took a hit (this benchmark is super sensitive), and I'd like to avoid that.

There is potential for overflow in write_vectored, though, as it sums many slice lengths together. I'll try to address it.

Update again: disregard the strike-through text. I've learned that there's no guarantee yet that the maximum slice size won't increase beyond isize::MAX in the future. I've changed the code to safeguard against that possibility.

tgnottingham · 2020-12-15T23:28:14Z

I think I've addressed the possibility of overflow in write_vectored. It was a pre-existing issue, and so a bit outside the original scope of this PR. But I'm hoping that's alright, as fixing it is important for the soundness of the optimizations added by this PR.

tgnottingham · 2020-12-17T00:59:12Z

I've been told by @RalfJung that there's no guarantee yet that the maximum slice size won't increase beyond isize::MAX in the future. I'll update this PR to avoid the potential for overflow, even though it's not possible with the current language implementation.

See rust-lang/rust#79930 (comment) for more details.

tgnottingham · 2020-12-17T21:23:43Z

Out of curiosity, have you experimented with a smaller hammer by using #[cold] or annotating the branches with likely/unlikely instead of #[inline(never)]?

I don't think I did--definitely not #[cold] anyway. I'll play with them.

Btw, recombining the hot/cold code and annotating the branches wasn't helpful. And using #[inline(never)] versus #[cold] versus no annotation at all didn't change the inlining decisions, at least for the case I'm testing. Still not sure what the best course of action is there.

the8472 · 2020-12-17T21:28:44Z

#[cold] feels more appropriate semantically, but other places in the standard library also use #[inline(never)] as an optimization against code bloat so I guess it doesn't really matter.

library/std/src/io/buffered/bufwriter.rs

mzabaluev · 2021-01-04T09:57:36Z

#[cold] and #[inline] serve different purposes. #[cold] hints the code generator to set branch prediction bias, if supported by the target architecture. I think the _cold functions should have both #[cold] and #[inline(never)].

mzabaluev · 2021-01-04T10:07:43Z

I think the _cold functions should have both #[cold] and #[inline(never)].

Or perhaps, just #[cold] and let the optimizer decide about inlining. Same applies to #[inline(always)]; this attribute is prone to be misused.

tgnottingham · 2021-01-05T05:16:32Z

Okay, I've rebased, added a comment about the overflow edge case discussed above, and changed #[inline(never)] and #[inline(always)] to #[cold] and #[inline] respectively.

library/std/src/io/buffered/bufwriter.rs

tgnottingham · 2021-03-18T03:30:35Z

Rebased on latest master. Will address comments soon.

JohnCSimon · 2021-04-04T17:34:39Z

@rustbot label: +S-waiting-on-review -S-waiting-on-author

Ensure that `write` and `write_all` can be inlined and that their commonly executed fast paths can be as short as possible. `write_vectored` would likely benefit from the same optimization, but I omitted it because its implementation is more complex, and I don't have a benchmark on hand to guide its optimization.

We use a Vec as our internal, constant-sized buffer, but the overhead of using methods like `extend_from_slice` can be enormous, likely because they don't get inlined, because `Vec` has to repeat bounds checks that we've already done, and because it makes considerations for things like reallocating, even though they should never happen.

Optimize for the common case where the input write size is less than the buffer size. This slightly increases the cost for pathological write patterns that commonly fill the buffer exactly, but if a client is doing that frequently, they're already paying the cost of frequent flushing, etc., so the cost is of this optimization to them is relatively small.

tgnottingham · 2021-04-14T04:47:31Z

This change makes me wonder whether we should try to extend the interface of Vec to support use cases like this where you don't want it to grow (e.g. Vec::try_push or Vec::extend_from_slice_unchecked, etc.). Or if we should change Vec::spare_capacity_mut (which is still unstable) to return some wrapper type that doesn't only give you a &mut [MaybeUninit], but also allows updating len somehow. Or maybe a separate BoundedVec type would make more sense.

What do you think?

In this case, I think it's easy enough to get by with a Box<[MaybeUninit<u8>]. In general, I think a non-growing Vector might be nice to have, but it would have to be judged against whether or not it's worth adding to the Vec API (which is not something I think I'm qualified to evaluate! :)). Seems a bit fringe, but it does come up from time to time.

m-ou-se · 2021-05-05T12:38:17Z

library/std/src/io/buffered/bufwriter.rs

+        let old_len = self.buf.len();
+        let buf_len = buf.len();
+        let src = buf.as_ptr();
+        let dst = self.buf.as_mut_ptr().add(old_len);
+        ptr::copy_nonoverlapping(src, dst, buf_len);
+        self.buf.set_len(old_len + buf_len);


(Note that you could also implement this with self.buf.spare_capacity_mut() and MaybeUninit::write_slice, which are both still unstable.)

m-ou-se · 2021-05-05T12:56:13Z

Thanks a lot for doing this!

@bors r+

bors · 2021-05-05T12:56:14Z

📌 Commit 01e7018 has been approved by m-ou-se

… r=m-ou-se Optimize BufWriter

Mark-Simulacrum · 2021-05-06T15:49:58Z

@bors rollup=never - performance PR

bors · 2021-05-06T20:04:37Z

⌛ Testing commit 01e7018 with merge 676ee14...

bors · 2021-05-06T22:29:30Z

☀️ Test successful - checks-actions
Approved by: m-ou-se
Pushing 676ee14 to master...

rust-highfive assigned m-ou-se Dec 11, 2020

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Dec 11, 2020

tgnottingham mentioned this pull request Dec 11, 2020

Implement bufwriter_raw_buffer feature #79921

Closed

pickfire reviewed Dec 11, 2020

View reviewed changes

library/std/src/io/buffered/bufwriter.rs Show resolved Hide resolved

the8472 reviewed Dec 11, 2020

View reviewed changes

library/std/src/io/buffered/bufwriter.rs Outdated Show resolved Hide resolved

tgnottingham force-pushed the bufwriter_performance branch from 5d0dbb4 to 0a8863b Compare December 12, 2020 04:24

pickfire reviewed Dec 14, 2020

View reviewed changes

library/std/src/io/buffered/bufwriter.rs Show resolved Hide resolved

tgnottingham force-pushed the bufwriter_performance branch from ebdda3f to 6b68ae4 Compare December 15, 2020 23:21

This comment has been minimized.

Sign in to view

tgnottingham force-pushed the bufwriter_performance branch from 6b68ae4 to 157ede8 Compare December 15, 2020 23:49

Kogia-sima added a commit to rust-sailfish/sailfish that referenced this pull request Dec 17, 2020

security: Properly handle slices with size greater than isize::MAX

5961a74

See rust-lang/rust#79930 (comment) for more details.

tgnottingham force-pushed the bufwriter_performance branch from 157ede8 to 36844de Compare December 17, 2020 19:33

mzabaluev reviewed Jan 4, 2021

View reviewed changes

library/std/src/io/buffered/bufwriter.rs Outdated Show resolved Hide resolved

tgnottingham force-pushed the bufwriter_performance branch from 36844de to 5bfbe41 Compare January 5, 2021 05:13

tgnottingham force-pushed the bufwriter_performance branch from 5bfbe41 to a84518a Compare January 5, 2021 05:17

mzabaluev reviewed Jan 5, 2021

View reviewed changes

library/std/src/io/buffered/bufwriter.rs Show resolved Hide resolved

tgnottingham force-pushed the bufwriter_performance branch from 9a5792c to 76094fa Compare March 18, 2021 03:29

JohnCSimon added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Apr 4, 2021

tgnottingham added 7 commits April 13, 2021 09:48

BufWriter: handle possibility of overflow

72aecbf

BufWriter: use #[cold] and less aggressive #[inline] hints

85bc88d

BufWriter: simplify buffer draining

0f29dc4

BufWriter: improve safety comment

01e7018

tgnottingham force-pushed the bufwriter_performance branch from 76094fa to 01e7018 Compare April 14, 2021 04:21

crlf0710 added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels May 1, 2021

m-ou-se reviewed May 5, 2021

View reviewed changes

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels May 5, 2021

Dylan-DPC-zz pushed a commit to Dylan-DPC-zz/rust that referenced this pull request May 6, 2021

Rollup merge of rust-lang#79930 - tgnottingham:bufwriter_performance,…

b02d15d

… r=m-ou-se Optimize BufWriter

Dylan-DPC-zz mentioned this pull request May 6, 2021

Rollup of 16 pull requests #84975

Closed

bors added the merged-by-bors This PR was explicitly merged by bors. label May 6, 2021

bors merged commit 676ee14 into rust-lang:master May 6, 2021

rustbot added this to the 1.54.0 milestone May 6, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize BufWriter #79930

Optimize BufWriter #79930

tgnottingham commented Dec 11, 2020

rust-highfive commented Dec 11, 2020

tgnottingham commented Dec 11, 2020 •

edited

Loading

tgnottingham commented Dec 11, 2020 •

edited

Loading

the8472 commented Dec 14, 2020

Kogia-sima commented Dec 15, 2020

tgnottingham commented Dec 15, 2020

tgnottingham commented Dec 15, 2020 •

edited

Loading

tgnottingham commented Dec 15, 2020

This comment has been minimized.

tgnottingham commented Dec 17, 2020

tgnottingham commented Dec 17, 2020

the8472 commented Dec 17, 2020

mzabaluev commented Jan 4, 2021

mzabaluev commented Jan 4, 2021

tgnottingham commented Jan 5, 2021

tgnottingham commented Mar 18, 2021

JohnCSimon commented Apr 4, 2021

tgnottingham commented Apr 14, 2021

m-ou-se May 5, 2021

m-ou-se commented May 5, 2021

bors commented May 5, 2021

Mark-Simulacrum commented May 6, 2021

bors commented May 6, 2021

bors commented May 6, 2021

Optimize BufWriter #79930

Optimize BufWriter #79930

Conversation

tgnottingham commented Dec 11, 2020

rust-highfive commented Dec 11, 2020

tgnottingham commented Dec 11, 2020 • edited Loading

tgnottingham commented Dec 11, 2020 • edited Loading

the8472 commented Dec 14, 2020

Kogia-sima commented Dec 15, 2020

tgnottingham commented Dec 15, 2020

tgnottingham commented Dec 15, 2020 • edited Loading

tgnottingham commented Dec 15, 2020

This comment has been minimized.

tgnottingham commented Dec 17, 2020

tgnottingham commented Dec 17, 2020

the8472 commented Dec 17, 2020

mzabaluev commented Jan 4, 2021

mzabaluev commented Jan 4, 2021

tgnottingham commented Jan 5, 2021

tgnottingham commented Mar 18, 2021

JohnCSimon commented Apr 4, 2021

tgnottingham commented Apr 14, 2021

m-ou-se May 5, 2021

Choose a reason for hiding this comment

m-ou-se commented May 5, 2021

bors commented May 5, 2021

Mark-Simulacrum commented May 6, 2021

bors commented May 6, 2021

bors commented May 6, 2021

tgnottingham commented Dec 11, 2020 •

edited

Loading

tgnottingham commented Dec 11, 2020 •

edited

Loading

tgnottingham commented Dec 15, 2020 •

edited

Loading