Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

write!(wr,"foo") is 10% to 72% slower than wr.write("foo".as_bytes()) #10761

Open
erickt opened this issue Dec 2, 2013 · 15 comments
Open

write!(wr,"foo") is 10% to 72% slower than wr.write("foo".as_bytes()) #10761

erickt opened this issue Dec 2, 2013 · 15 comments
Labels
A-fmt Area: `std::fmt` C-enhancement Category: An issue proposing an enhancement or a PR with one. I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@erickt
Copy link
Contributor

erickt commented Dec 2, 2013

This example demonstrates that the trivial case of write!(wr, "foo") is much slower than calling wr.write("foo".as_bytes()):

extern mod extra;

use std::io::mem::MemWriter;
use extra::test::BenchHarness;

#[bench]
fn bench_write_value(bh: &mut BenchHarness) {
    bh.iter(|| {
        let mut mem = MemWriter::new();
        for _ in range(0, 1000) {
            mem.write("abc".as_bytes());
        }
    });
}

#[bench]
fn bench_write_ref(bh: &mut BenchHarness) {
    bh.iter(|| {
        let mut mem = MemWriter::new();
        let wr = &mut mem as &mut Writer;
        for _ in range(0, 1000) {
            wr.write("abc".as_bytes());
        }
    });
}

#[bench]
fn bench_write_macro1(bh: &mut BenchHarness) {
    bh.iter(|| {
        let mut mem = MemWriter::new();
        let wr = &mut mem as &mut Writer;
        for _ in range(0, 1000) {
            write!(wr, "abc");
        }
    });
}

#[bench]
fn bench_write_macro2(bh: &mut BenchHarness) {
    bh.iter(|| {
        let mut mem = MemWriter::new();
        let wr = &mut mem as &mut Writer;
        for _ in range(0, 1000) {
            write!(wr, "{}", "abc");
        }
    });
}

With no optimizations:

running 4 tests
test bench_write_macro1 ... bench:    280153 ns/iter (+/- 73615)
test bench_write_macro2 ... bench:    322462 ns/iter (+/- 24886)
test bench_write_ref    ... bench:     79974 ns/iter (+/- 3850)
test bench_write_value  ... bench:     78709 ns/iter (+/- 4003)

test result: ok. 0 passed; 0 failed; 0 ignored; 4 measured

With --opt-level=3:

running 4 tests
test bench_write_macro1 ... bench:     62397 ns/iter (+/- 5485)
test bench_write_macro2 ... bench:     80203 ns/iter (+/- 3355)
test bench_write_ref    ... bench:     55275 ns/iter (+/- 5156)
test bench_write_value  ... bench:     56273 ns/iter (+/- 7591)

test result: ok. 0 passed; 0 failed; 0 ignored; 4 measured

Is there anything we can do to improve this? I can think of a couple options, but I bet there are more:

  • Special case no-argument write! to compile down into wr.write("foo".as_bytes()). If we go this route, it'd be nice to also convert a series of str write!("foo {} {}", "bar", "baz").
  • Revive wr.write_str("foo"). From what I understand, that's being blocked on String encoding/decoding and I/O #6164.
  • Figure out why llvm isn't able to optimize away the write! overhead. Are there functions that should be getting inlined that are not? My scatter shot attempt didn't get any results.
@TeXitoi
Copy link
Contributor

TeXitoi commented Dec 2, 2013

I suspect that fn bench_write_ref does not make virtual calls. Adding this

#[inline(never)]
fn writer_write(w: &mut Writer, b: &[u8]) {
    w.write(b);
}

#[bench]
fn bench_write_virt(bh: &mut BenchHarness) {
    bh.iter(|| {
        let mut mem = MemWriter::new();
        let wr = &mut mem as &mut Writer;
        for _ in range(0, 1000) {
            writer_write(wr, "abc".as_bytes());
        }
    });
}

I have, with --opt-level 3

running 5 tests
test bench_write_macro1 ... bench:    680823 ns/iter (+/- 34497)
test bench_write_macro2 ... bench:    950790 ns/iter (+/- 72309)
test bench_write_ref    ... bench:    505846 ns/iter (+/- 41965)
test bench_write_value  ... bench:    511815 ns/iter (+/- 36681)
test bench_write_virt   ... bench:    553466 ns/iter (+/- 43716)

So, the virtual call have an impact, but it does not explain totally the slowness of write!() shown by this bench.

@nham
Copy link
Contributor

nham commented Aug 6, 2014

Visiting for bug triage. This seems to still be present. The code to run the benchmarks is now:

extern crate test;

use std::io::MemWriter;
use test::Bencher;

#[bench]
fn bench_write_value(bh: &mut Bencher) {
    bh.iter(|| {
        let mut mem = MemWriter::new();
        for _ in range(0u, 1000) {
            mem.write("abc".as_bytes());
        }
    });
}

#[bench]
fn bench_write_ref(bh: &mut Bencher) {
    bh.iter(|| {
        let mut mem = MemWriter::new();
        let wr = &mut mem as &mut Writer;
        for _ in range(0u, 1000) {
            wr.write("abc".as_bytes());
        }
    });
}

#[bench]
fn bench_write_macro1(bh: &mut Bencher) {
    bh.iter(|| {
        let mut mem = MemWriter::new();
        let wr = &mut mem as &mut Writer;
        for _ in range(0u, 1000) {
            write!(wr, "abc");
        }
    });
}

#[bench]
fn bench_write_macro2(bh: &mut Bencher) {
    bh.iter(|| {
        let mut mem = MemWriter::new();
        let wr = &mut mem as &mut Writer;
        for _ in range(0u, 1000) {
            write!(wr, "{}", "abc");
        }
    });
}

Without optimizations:

running 4 tests
test bench_write_macro1 ... bench:   1470468 ns/iter (+/- 291966)
test bench_write_macro2 ... bench:   1799612 ns/iter (+/- 316293)
test bench_write_ref    ... bench:   1336574 ns/iter (+/- 251664)
test bench_write_value  ... bench:   1317880 ns/iter (+/- 254668)

test result: ok. 0 passed; 0 failed; 0 ignored; 4 measured

With --opt-level=3:

running 4 tests
test bench_write_macro1 ... bench:    127671 ns/iter (+/- 1452)
test bench_write_macro2 ... bench:    196158 ns/iter (+/- 2053)
test bench_write_ref    ... bench:     43881 ns/iter (+/- 453)
test bench_write_value  ... bench:     43859 ns/iter (+/- 336)

test result: ok. 0 passed; 0 failed; 0 ignored; 4 measured

@emberian emberian self-assigned this Mar 25, 2015
@ghost
Copy link

ghost commented Oct 30, 2015

Still an issue, using the following code:

#![allow(unused_must_use)]
#![feature(test)]

extern crate test;

use std::io::Write;
use std::vec::Vec;

use test::Bencher;

#[bench]
fn bench_write_value(bh: &mut Bencher) {
    bh.iter(|| {
        let mut mem = Vec::new();
        for _ in 0..1000 {
            mem.write("abc".as_bytes());
        }
    });
}

#[bench]
fn bench_write_ref(bh: &mut Bencher) {
    bh.iter(|| {
        let mut mem = Vec::new();
        let wr = &mut mem as &mut Write;
        for _ in 0..1000 {
            wr.write("abc".as_bytes());
        }
    });
}

#[bench]
fn bench_write_macro1(bh: &mut Bencher) {
    bh.iter(|| {
        let mut mem = Vec::new();
        let wr = &mut mem as &mut Write;
        for _ in 0..1000 {
            write!(wr, "abc");
        }
    });
}

#[bench]
fn bench_write_macro2(bh: &mut Bencher) {
    bh.iter(|| {
        let mut mem = Vec::new();
        let wr = &mut mem as &mut Write;
        for _ in 0..1000 {
            write!(wr, "{}", "abc");
        }
    });
}

Typically giving something like the following:

$ cargo bench
running 4 tests
test bench_write_macro1 ... bench:      21,604 ns/iter (+/- 82)
test bench_write_macro2 ... bench:      29,273 ns/iter (+/- 85)
test bench_write_ref    ... bench:       1,396 ns/iter (+/- 387)
test bench_write_value  ... bench:       1,391 ns/iter (+/- 163)

@emberian emberian removed their assignment Jan 5, 2016
bluss added a commit to bluss/rust that referenced this issue Feb 29, 2016
`format_args!` uses one string literal per used `{}`, in many cases, the
separating string pieces are empty.

For example, `write!(w, "{}", x);` will attempt to write one empty
string before the `{}` format, and `write!(w, "{}{}{}", x, y, z);" will
write three empty string pieces between formats.

Simply skip writing if the string is empty. It is a cheap branch
compared to the virtual Write::write_str call that it makes possible to
skip.

It does not solve issue rust-lang#10761 in any way, yet that's where I noticed
this. The testcase in the issue shows a performance difference between
`write!(w, "abc")` and `write!(w, "{}", "abc")`, and this change halves
the size of the difference.
@ariasuni
Copy link
Contributor

I got approximately the same results with rust nightly.

@ritiek
Copy link
Member

ritiek commented Jul 14, 2017

Just for the record, running with v1.20.0-nightly:

$ cargo bench
running 4 tests
test bench_write_macro1 ... bench:      36,556 ns/iter (+/- 69)
test bench_write_macro2 ... bench:      54,377 ns/iter (+/- 958)
test bench_write_ref    ... bench:      13,730 ns/iter (+/- 24)
test bench_write_value  ... bench:      13,755 ns/iter (+/- 81)

@Mark-Simulacrum Mark-Simulacrum added the C-enhancement Category: An issue proposing an enhancement or a PR with one. label Jul 20, 2017
@steveklabnik
Copy link
Member

Today:

running 4 tests
test bench_write_macro1 ... bench:      16,220 ns/iter (+/- 982)
test bench_write_macro2 ... bench:      25,542 ns/iter (+/- 2,220)
test bench_write_ref    ... bench:       4,889 ns/iter (+/- 314)
test bench_write_value  ... bench:       4,819 ns/iter (+/- 956)

Centril added a commit to Centril/rust that referenced this issue Jan 15, 2019
Small perf improvement for fmt

Added benchmark is based on rust-lang#10761
Centril added a commit to Centril/rust that referenced this issue Jan 15, 2019
Small perf improvement for fmt

Added benchmark is based on rust-lang#10761
Centril added a commit to Centril/rust that referenced this issue Jan 22, 2019
Small perf improvement for fmt

Added benchmark is based on rust-lang#10761
@jonas-schievink jonas-schievink added T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jan 12, 2020
@steveklabnik
Copy link
Member

Two years later:

running 4 tests
test bench_write_macro1 ... bench:      17,561 ns/iter (+/- 174)
test bench_write_macro2 ... bench:      23,285 ns/iter (+/- 2,771)
test bench_write_ref    ... bench:       3,234 ns/iter (+/- 194)
test bench_write_value  ... bench:       3,238 ns/iter (+/- 123)

test result: ok. 0 passed; 0 failed; 0 ignored; 4 measured; 0 filtered out

❯ rustc +nightly --version
rustc 1.47.0-nightly (30f0a0768 2020-08-18)

@workingjubilee
Copy link
Member

I was curious if the difference was in the implementation of io::Write vs. fmt::Write, or in the details of write_fmt requiring format_args!, which is relevant since write! can be asked to "serve" either case.

To discover more information on this, I compared Vec to String, a seemingly "apples to apples...ish" comparison. https://gist.github.com/workingjubilee/2d2e3a7fded1c2101aafb51dc79a7ec5

running 10 tests
test string_write_fmt    ... bench:      10,053 ns/iter (+/- 1,141)
test string_write_macro1 ... bench:      10,177 ns/iter (+/- 2,363)
test string_write_macro2 ... bench:      17,499 ns/iter (+/- 1,847)
test string_write_ref    ... bench:       2,270 ns/iter (+/- 265)
test string_write_value  ... bench:       2,333 ns/iter (+/- 126)
test vec_write_fmt       ... bench:      15,722 ns/iter (+/- 1,673)
test vec_write_macro1    ... bench:      15,767 ns/iter (+/- 1,638)
test vec_write_macro2    ... bench:      23,968 ns/iter (+/- 8,942)
test vec_write_ref       ... bench:       2,296 ns/iter (+/- 178)
test vec_write_value     ... bench:       2,230 ns/iter (+/- 235)

test result: ok. 0 passed; 0 failed; 0 ignored; 10 measured; 0 filtered out

@workingjubilee
Copy link
Member

workingjubilee commented Sep 7, 2020

I found it interesting that this example revealed that String actually had less overhead than Vec on write! (the benchmark had high variance but I considered it indicative enough).

Then I looked and saw:

    fn write_fmt(&mut self, fmt: fmt::Arguments<'_>) -> Result<()> {
        // Create a shim which translates an io::Write to a fmt::Write and saves
        // off I/O errors. instead of discarding them
        struct Adaptor<'a, T: ?Sized + 'a> {
            inner: &'a mut T,
            error: Result<()>,
        }

        /* More code related to implementing and using the resulting shim,
         * seemingly involving a lot of poking a reference at runtime??? */
    }

So I am no longer surprised.
I believe it is possible to fast-path some cases, but this is not actually an apples-to-apples comparison here. The orange is write_fmt, which has to use a variety of formatting machinery. What needs to happen is that write! should skip write_fmt in simple cases where the formatting machinery is not required, or the formatting machinery should itself be much faster in the case of the obvious fast path.

@CAD97
Copy link
Contributor

CAD97 commented Jul 22, 2022

(checking oldest issues for fun and profit)

Did fmt::Arguments::as_str improve the situation here?

bors added a commit to rust-lang-ci/rust that referenced this issue Jul 25, 2022
…ykril

feat: hide type inlay hints for initializations of closures

![hide_closure_initialization](https://user-images.githubusercontent.com/12008103/168470158-6cb77b18-068e-4431-a8b5-e2b22d50d263.gif)

This PR adds an option to hide the inlay hints for `let IDENT_PAT = CLOSURE_EXPR;`, which is a somewhat common coding pattern. Currently the inlay hints for the assigned variable and the closure expression itself are both displayed, making it rather repetitive.

In order to be consistent with closure return type hints, only closures with block bodies will be hid by this option.

Personally I'd feel comfortable making it always enabled (or at least when closure return type hints are enabled), but considering the precedent set in rust-lang#10761, I introduced an off-by-default option for this.

changelog feature: option to hide type inlay hints for initializations of closures
@Kobzol
Copy link
Contributor

Kobzol commented Aug 10, 2022

Not really (nightly 2022-07-23):

running 5 tests
test string_write_fmt    ... bench:       6,953 ns/iter (+/- 161)
test string_write_macro1 ... bench:       7,034 ns/iter (+/- 227)
test string_write_macro2 ... bench:       9,308 ns/iter (+/- 760)
test string_write_ref    ... bench:         688 ns/iter (+/- 32)
test string_write_value  ... bench:         673 ns/iter (+/- 34)

@CAD97
Copy link
Contributor

CAD97 commented Aug 11, 2022

It seems to be just because nobody bothered to yet. https://gist.github.com/CAD97/2922d3132777310c1e22564f6c6dabdf

running 8 tests
test new_string_write_macro1 ... bench:       1,875 ns/iter (+/- 228)
test new_string_write_macro2 ... bench:      13,594 ns/iter (+/- 999)
test new_vec_write_macro1    ... bench:       1,637 ns/iter (+/- 175)
test new_vec_write_macro2    ... bench:      13,808 ns/iter (+/- 1,320)
test old_string_write_macro1 ... bench:      10,080 ns/iter (+/- 1,618)
test old_string_write_macro2 ... bench:      13,880 ns/iter (+/- 1,327)
test old_vec_write_macro1    ... bench:      10,037 ns/iter (+/- 779)
test old_vec_write_macro2    ... bench:      14,783 ns/iter (+/- 1,280)
--- old/src/std/io/mod.rs
+++ new/src/std/io/mod.rs
@@ -1650,8 +1650,12 @@
     #[stable(feature = "rust1", since = "1.0.0")]
     fn write_fmt(&mut self, fmt: fmt::Arguments<'_>) -> Result<()> {
+        if let Some(s) = fmt.as_str() {
+            return self.write_all(s.as_bytes());
+        }
+
         // Create a shim which translates a Write to a fmt::Write and saves
         // off I/O errors. instead of discarding them
         struct Adapter<'a, T: ?Sized + 'a> {
             inner: &'a mut T,
             error: Result<()>,
         }
--- old/src/core/fmt/mod.rs
+++ new/src/core/fmt/mod.rs
@@ -186,4 +186,8 @@
     #[stable(feature = "rust1", since = "1.0.0")]
     fn write_fmt(mut self: &mut Self, args: Arguments<'_>) -> Result {
+        if let Some(s) = args.as_str() {
+            self.write_str(s)
+        } else {
-        write(&mut self, args)
+            write(&mut self, args)
+        }
     }

@Kobzol
Copy link
Contributor

Kobzol commented Aug 11, 2022

It doesn't solve all the cases, but this definitely sounds like a nice improvement, that could be done in parallel with the ongoing fmt rewrite attempts. Will you send a PR? :)

flip1995 pushed a commit to flip1995/rust that referenced this issue May 20, 2023
Ignore `borrow_deref_ref` warnings in code from procedural macros.

Don't linting `borrow_deref_ref` if code was generated by procedural macro.

This PR fixes rust-lang/rust-clippy#8971

changelog: [`borrow_deref_ref`] avoiding warnings in macro-generated code
@kekeimiku
Copy link

ten years later:
rustc 1.76.0-nightly (f967532 2023-12-08)

test bench_write_macro1 ... bench:       6,062 ns/iter (+/- 41)
test bench_write_macro2 ... bench:       6,070 ns/iter (+/- 233)
test bench_write_ref    ... bench:         838 ns/iter (+/- 49)
test bench_write_value  ... bench:         813 ns/iter (+/- 7)

@Kobzol
Copy link
Contributor

Kobzol commented Dec 9, 2023

The macro1 and macro2 versions should be mostly identical now, thanks to format args flattening. Sadly, it's still not as fast as the non-macro version. Mara has recently posted an awesome blog post, where she explains the different trade-offs of format args macros. It's not practical to perfectly expand the macros everywhere, because while it could improve runtime performance, it could also hurt compile times and binary size a lot. That being said, in this simple case, where only a string is being written and only Display is used, it would be great if we could optimize it better.

bors added a commit to rust-lang-ci/rust that referenced this issue Feb 15, 2024
perf: improve write_fmt to handle simple strings

Per `@dtolnay` suggestion in serde-rs/serde#2697 (comment) - attempt to speed up performance in the cases of a simple string format without arguments:

```rust
write!(f, "text")  ->  f.write_str("text")
```

```diff
+ #[inline]
  pub fn write_fmt(&mut self, f: fmt::Arguments) -> fmt::Result {
+     if let Some(s) = f.as_str() {
+         self.buf.write_str(s)
+     } else {
          write(self.buf, f)
+     }
  }
```

* Hopefully it will improve the simple case for the rust-lang#99012
* Another related (original?) issues rust-lang#10761
* Previous similar attempt to fix it by by `@Kobzol` rust-lang#100700

CC: `@m-ou-se` as probably the biggest expert in everything `format!`
bors added a commit to rust-lang-ci/rust that referenced this issue Feb 20, 2024
perf: improve write_fmt to handle simple strings

In case format string has no arguments, simplify its implementation with a direct call to `output.write_str(value)`. This builds on `@dtolnay` original [suggestion](serde-rs/serde#2697 (comment)). This does not change any expectations because the original `fn write()` implementation calls `write_str` for parts of the format string.

```rust
write!(f, "text")  ->  f.write_str("text")
```

```diff
 /// [`write!`]: crate::write!
+#[inline]
 #[stable(feature = "rust1", since = "1.0.0")]
 pub fn write(output: &mut dyn Write, args: Arguments<'_>) -> Result {
+    if let Some(s) = args.as_str() { output.write_str(s) } else { write_internal(output, args) }
+}
+
+/// Actual implementation of the [`write`], but without the simple string optimization.
+fn write_internal(output: &mut dyn Write, args: Arguments<'_>) -> Result {
     let mut formatter = Formatter::new(output);
     let mut idx = 0;
```

* Hopefully it will improve the simple case for the rust-lang#99012
* Another related (original?) issues rust-lang#10761
* Previous similar attempt to fix it by by `@Kobzol` rust-lang#100700

CC: `@m-ou-se` as probably the biggest expert in everything `format!`
bors added a commit to rust-lang-ci/rust that referenced this issue Feb 21, 2024
perf: improve write_fmt to handle simple strings

In case format string has no arguments, simplify its implementation with a direct call to `output.write_str(value)`. This builds on `@dtolnay` original [suggestion](serde-rs/serde#2697 (comment)). This does not change any expectations because the original `fn write()` implementation calls `write_str` for parts of the format string.

```rust
write!(f, "text")  ->  f.write_str("text")
```

```diff
 /// [`write!`]: crate::write!
+#[inline]
 #[stable(feature = "rust1", since = "1.0.0")]
 pub fn write(output: &mut dyn Write, args: Arguments<'_>) -> Result {
+    if let Some(s) = args.as_str() { output.write_str(s) } else { write_internal(output, args) }
+}
+
+/// Actual implementation of the [`write`], but without the simple string optimization.
+fn write_internal(output: &mut dyn Write, args: Arguments<'_>) -> Result {
     let mut formatter = Formatter::new(output);
     let mut idx = 0;
```

* Hopefully it will improve the simple case for the rust-lang#99012
* Another related (original?) issues rust-lang#10761
* Previous similar attempt to fix it by by `@Kobzol` rust-lang#100700

CC: `@m-ou-se` as probably the biggest expert in everything `format!`
bors added a commit to rust-lang-ci/rust that referenced this issue Mar 5, 2024
perf: improve write_fmt to handle simple strings

In case format string has no arguments, simplify its implementation with a direct call to `output.write_str(value)`. This builds on `@dtolnay` original [suggestion](serde-rs/serde#2697 (comment)). This does not change any expectations because the original `fn write()` implementation calls `write_str` for parts of the format string.

```rust
write!(f, "text")  ->  f.write_str("text")
```

```diff
 /// [`write!`]: crate::write!
+#[inline]
 #[stable(feature = "rust1", since = "1.0.0")]
 pub fn write(output: &mut dyn Write, args: Arguments<'_>) -> Result {
+    if let Some(s) = args.as_str() { output.write_str(s) } else { write_internal(output, args) }
+}
+
+/// Actual implementation of the [`write`], but without the simple string optimization.
+fn write_internal(output: &mut dyn Write, args: Arguments<'_>) -> Result {
     let mut formatter = Formatter::new(output);
     let mut idx = 0;
```

* Hopefully it will improve the simple case for the rust-lang#99012
* Another related (original?) issues rust-lang#10761
* Previous similar attempt to fix it by by `@Kobzol` rust-lang#100700

CC: `@m-ou-se` as probably the biggest expert in everything `format!`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-fmt Area: `std::fmt` C-enhancement Category: An issue proposing an enhancement or a PR with one. I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests