Reduce fmt's rodata usage by unzipping String and Argument arrays #16662

pczarn · 2014-08-21T23:36:43Z

Based on an observation that strings and arguments are always interleaved, thanks to #15832. Additionally optimize invocations where formatting parameters are unspecified for all arguments, e.g. "{} {:?} {:x}", by emptying the __STATIC_FMTARGS array. Next, Arguments::new replaces an empty slice with None so that passing empty __STATIC_FMTARGS generates slightly less machine code when Arguments::new is inlined. Furthermore, formatting itself treats these cases separately without making redundant copies of formatting parameters.

All in all, this adds a single mov instruction per write! in most cases. That's why code size has increased.

brson · 2014-08-21T23:38:18Z

cc @alexcrichton

brson · 2014-08-21T23:40:23Z

Awesome wins!

pczarn · 2014-08-22T00:15:42Z

Example

Expansion of old println!("x is: {} foobar", 5);:

fn main() {
    let x = 5i;
    match (&x,) {
        (__arg0,) => {
            static __STATIC_FMTSTR: [Piece, ..3] = [
                String("x is: "),
                Argument(Argument {
                    position: ArgumentNext,
                    format: FormatSpec {
                        fill: ' ',
                        align: AlignUnknown,
                        flags: 0u,
                        precision: CountImplied,
                        width: CountImplied,
                    },
                },
                String(" foobar"),
            ];
            let __args_vec = &[argument(secret_string, __arg0)];
            let __args = unsafe { Arguments::new(__STATIC_FMTSTR, __args_vec) };

            println_args(&__args)
        }
    };
}

With this PR it becomes:

fn main() {
    let x = 5i;
    match (&x,) {
        (__arg0,) => {
            static __STATIC_FMTSTR: [&'static str, ..2u] = [
                "x is: ",
                " foobar"
            ];
            static __STATIC_FMTARGS: [rt::Argument<'static>, ..0u] = []; // optimized out
            let __args_vec = &[argument(secret_show, __arg0)];
            let __args = unsafe {
                Arguments::new(__STATIC_FMTSTR, __STATIC_FMTARGS, __args_vec)
            };
            println_args(&__args)
        }
    };
}

Wins

before
   text    data     bss     dec   filename
  15017   43168       8     58193 x86_64-unknown-linux-gnu/stage3/lib/libarena-4e7c5e5c.so
  66210   71260       4    137474 x86_64-unknown-linux-gnu/stage3/lib/libdebug-4e7c5e5c.so
  31659    3754       4     35417 x86_64-unknown-linux-gnu/stage3/lib/libflate-4e7c5e5c.so
  20138   45502       8     65648 x86_64-unknown-linux-gnu/stage3/lib/libfmt_macros-4e7c5e5c.so
  76981   83547       4    160532 x86_64-unknown-linux-gnu/stage3/lib/libgetopts-4e7c5e5c.so
  10269   44437       8     54714 x86_64-unknown-linux-gnu/stage3/lib/libgraphviz-4e7c5e5c.so
  33546   40371     200     74117 x86_64-unknown-linux-gnu/stage3/lib/liblog-4e7c5e5c.so
 341427  166384      24    507835 x86_64-unknown-linux-gnu/stage3/lib/libnative-4e7c5e5c.so
  47783  207122       4    254909 x86_64-unknown-linux-gnu/stage3/lib/librbml-4e7c5e5c.so
13324534 6303187     368 19628089 x86_64-unknown-linux-gnu/stage3/lib/librustc-4e7c5e5c.so
 236885  111322       4    348211 x86_64-unknown-linux-gnu/stage3/lib/librustc_back-4e7c5e5c.so
20742937 1035619   70464 21849020 x86_64-unknown-linux-gnu/stage3/lib/librustc_llvm-4e7c5e5c.so
1182102  508505    5280   1695887 x86_64-unknown-linux-gnu/stage3/lib/librustrt-4e7c5e5c.so
 242924  690433       4    933361 x86_64-unknown-linux-gnu/stage3/lib/libserialize-4e7c5e5c.so
 834141 1656905     344   2491390 x86_64-unknown-linux-gnu/stage3/lib/libstd-4e7c5e5c.so
 151137  671324       4    822465 x86_64-unknown-linux-gnu/stage3/lib/libsync-4e7c5e5c.so
3322627 3880814       8   7203449 x86_64-unknown-linux-gnu/stage3/lib/libsyntax-4e7c5e5c.so
 226532  107796       4    334332 x86_64-unknown-linux-gnu/stage3/lib/libterm-4e7c5e5c.so
  46988   66785       4    113777 x86_64-unknown-linux-gnu/stage3/lib/libtime-4e7c5e5c.so

after
   text    data     bss     dec      filename
  15472   43201       4      58677   x86_64-unknown-linux-gnu/stage3/lib/libarena-4e7c5e5c.so
  66250   68591       8     134849   x86_64-unknown-linux-gnu/stage3/lib/libdebug-4e7c5e5c.so
  31659    3752       8      35419   x86_64-unknown-linux-gnu/stage3/lib/libflate-4e7c5e5c.so
  20322   43650       4      63976   x86_64-unknown-linux-gnu/stage3/lib/libfmt_macros-4e7c5e5c.so
  78261   78171       4     156436   x86_64-unknown-linux-gnu/stage3/lib/libgetopts-4e7c5e5c.so
  10221   44019       4      54244   x86_64-unknown-linux-gnu/stage3/lib/libgraphviz-4e7c5e5c.so
  34034   33703     200      67937   x86_64-unknown-linux-gnu/stage3/lib/liblog-4e7c5e5c.so
 345751  153198      24     498973   x86_64-unknown-linux-gnu/stage3/lib/libnative-4e7c5e5c.so
  48403  196039       8     244450   x86_64-unknown-linux-gnu/stage3/lib/librbml-4e7c5e5c.so
13491702  5253560     368 18745630   x86_64-unknown-linux-gnu/stage3/lib/librustc-4e7c5e5c.so
 239161   99363       4     338528   x86_64-unknown-linux-gnu/stage3/lib/librustc_back-4e7c5e5c.so
20742865  1035050   70464 21848379   x86_64-unknown-linux-gnu/stage3/lib/librustc_llvm-4e7c5e5c.so
1186130  480401    5280    1671811   x86_64-unknown-linux-gnu/stage3/lib/librustrt-4e7c5e5c.so
 245928  668157       8     914093   x86_64-unknown-linux-gnu/stage3/lib/libserialize-4e7c5e5c.so
 838093 1597231     344    2435668   x86_64-unknown-linux-gnu/stage3/lib/libstd-4e7c5e5c.so
 152177  653764       4     805945   x86_64-unknown-linux-gnu/stage3/lib/libsync-4e7c5e5c.so
3349091 3563279       8    6912378   x86_64-unknown-linux-gnu/stage3/lib/libsyntax-4e7c5e5c.so
 229488  101380       4     330872   x86_64-unknown-linux-gnu/stage3/lib/libterm-4e7c5e5c.so
  48156   54777       4     102937   x86_64-unknown-linux-gnu/stage3/lib/libtime-4e7c5e5c.so

Difference (negative is better)

   text      data    bss     dec   filename
    455        33     -4       484 libarena-4e7c5e5c.so
     40     -2669      4     -2625 libdebug-4e7c5e5c.so
      0        -2      4         2 libflate-4e7c5e5c.so
    184     -1852     -4     -1672 libfmt_macros-4e7c5e5c.so
   1280     -5376      0     -4096 libgetopts-4e7c5e5c.so
    -48      -418     -4      -470 libgraphviz-4e7c5e5c.so
    488     -6668      0     -6180 liblog-4e7c5e5c.so
   4324    -13186      0     -8862 libnative-4e7c5e5c.so
    620    -11083      4    -10459 librbml-4e7c5e5c.so
  167168 -1049627      0   -882459 librustc-4e7c5e5c.so
   2276    -11959      0     -9683 librustc_back-4e7c5e5c.so
    -72      -569      0      -641 librustc_llvm-4e7c5e5c.so
   4028    -28104      0    -24076 librustrt-4e7c5e5c.so
   3004    -22276      4    -19268 libserialize-4e7c5e5c.so
   3952    -59674      0    -55722 libstd-4e7c5e5c.so
   1040    -17560      0    -16520 libsync-4e7c5e5c.so
  26464 -  317535      0   -291071 libsyntax-4e7c5e5c.so
   2956     -6416      0     -3460 libterm-4e7c5e5c.so
   1168    -12008      0    -10840 libtime-4e7c5e5c.so

    -1356 liballoc-4e7c5e5c.rlib
     1540 libarena-4e7c5e5c.rlib
      160 libarena-4e7c5e5c.so
   -74062 libcollections-4e7c5e5c.rlib
        0 libcompiler-rt.a
  -448244 libcore-4e7c5e5c.rlib
   -58910 libdebug-4e7c5e5c.rlib
      294 libdebug-4e7c5e5c.so
      120 libflate-4e7c5e5c.rlib
       -8 libflate-4e7c5e5c.so
   -58534 libgetopts-4e7c5e5c.rlib
    -4443 libgetopts-4e7c5e5c.so
    -1992 libglob-4e7c5e5c.rlib
     -767 libglob-4e7c5e5c.so
    -4686 libgraphviz-4e7c5e5c.rlib
     -384 libgraphviz-4e7c5e5c.so
   -39866 libgreen-4e7c5e5c.rlib
    -4547 libgreen-4e7c5e5c.so
        0 liblibc-4e7c5e5c.rlib
   -66714 liblog-4e7c5e5c.rlib
    -9066 liblog-4e7c5e5c.so
        0 libmorestack.a
  -140108 libnative-4e7c5e5c.rlib
   -10987 libnative-4e7c5e5c.so
   -56826 libnum-4e7c5e5c.rlib
    -2697 libnum-4e7c5e5c.so
   -31722 librand-4e7c5e5c.rlib
  -146418 librbml-4e7c5e5c.rlib
   -10939 librbml-4e7c5e5c.so
  -261958 libregex-4e7c5e5c.rlib
   -22534 libregex-4e7c5e5c.so
        0 librlibc-4e7c5e5c.rlib
  -187366 librustrt-4e7c5e5c.rlib
   -23967 librustrt-4e7c5e5c.so
  -291182 librustuv-4e7c5e5c.rlib
   -19400 librustuv-4e7c5e5c.so
    -4722 libsemver-4e7c5e5c.rlib
     2388 libsemver-4e7c5e5c.so
  -284034 libserialize-4e7c5e5c.rlib
   -19235 libserialize-4e7c5e5c.so
  -655282 libstd-4e7c5e5c.rlib
   -56009 libstd-4e7c5e5c.so
  -328036 libsync-4e7c5e5c.rlib
   -16513 libsync-4e7c5e5c.so
   -74328 libterm-4e7c5e5c.rlib
    -3352 libterm-4e7c5e5c.so
  -198478 libtest-4e7c5e5c.rlib
   -16490 libtest-4e7c5e5c.so
  -148164 libtime-4e7c5e5c.rlib
   -10166 libtime-4e7c5e5c.so
    -1050 libunicode-4e7c5e5c.rlib
   -38478 liburl-4e7c5e5c.rlib
    -5015 liburl-4e7c5e5c.so
   -35978 libuuid-4e7c5e5c.rlib
    -3100 libuuid-4e7c5e5c.so
 -3873611 total

DSTs could further reduce code bloat, in a simple way and without harmful indirection:

struct Pieces<'a> {
    args: &'a [Argument],
    strings: [&'a str]
}

static __STATIC_PIECES: Pieces = // ...
let __args = Arguments::new(__STATIC_PIECES, __args_vec);
// ...

I'm wondering whether to use vtables or store secret_* fn pointers in static arrays, too, if possible.

alexcrichton · 2014-08-22T16:24:20Z

src/libcore/fmt/mod.rs

+                try!(formatter.run(arg));
+            }
+        }
+    }


Out of curiosity, have you measured a runtime improvement with these two paths?

Not yet. By the way, these unwraps seem wrong, because formatting is used from fail!.

Without a clear difference between them, could we unify the two paths? Part of the contract of calling this function is that Arguments is valid where the number of string pieces is very close to the number of argument pieces. This is always valid when generated by the syntax extension, so unwrap should be ok.

One iterates over arguments, the other over placeholder specs.

As for the improvement:

#[bench] fn show_hashmap(b: &mut Bencher) { let mut map = HashMap::new(); for i in range(100u, 500) { map.insert(i, i + 1); } let mut buf = [0, ..4096]; b.iter(|| { let mut w = BufWriter::new(buf); (write!(w, "{}", map)).unwrap() }) }

before test show_hashmap ... bench: 49661 ns/iter (+/- 6403) after test show_hashmap ... bench: 46245 ns/iter (+/- 6816)

alexcrichton · 2014-08-22T16:25:46Z

Awesome wins @pczarn, nice work!

huonw · 2014-08-23T12:33:25Z

always interleaved

Really? Even format!("{}{}", ...)?

pczarn · 2014-08-23T12:39:36Z

@huonw, in these cases string pieces can be empty: ["", ""]

Edit: or __STATIC_FMTSTR could be None, but I've measured that this opportunity for optimization is rare. I'll reconsider it after DST.

huonw · 2014-08-23T12:42:05Z

src/libsyntax/ext/format.rs

@@ -54,6 +54,8 @@ struct Context<'a, 'b> {

    /// Collection of the compiled `rt::Piece` structures
    pieces: Vec<Gc<ast::Expr>>,
+    str_pieces: Vec<Gc<ast::Expr>>,
+    all_pieces_simple: bool,


Can you add doc comments for these fields? (e.g. explaining what 'simple' means here.)

huonw · 2014-08-23T12:47:26Z

in these cases string pieces can be empty: ["", ""]

Can be? or are? I'm asking because this (adjacent formatting) seems like something that breaks your stated assumption and thus means this code is broken. E.g. println!("foo{}{}baz", 'X', 'Y') should print fooXYbaz, not fooXbazY.

pczarn · 2014-08-23T12:52:18Z

They are, your example will print ["foo", "X", "", "Y", "baz"]. Writers can handle zero-length writes.

huonw · 2014-08-23T13:08:39Z

Ok, I don't quite see where the change to include the empty "" happened in your code. The current formatting code doesn't include them, e.g. the example I gave above expands to

            #[inline]
            #[allow(dead_code)]
            static __STATIC_FMTSTR: [::std::fmt::rt::Piece<'static>, ..4u] =
                [::std::fmt::rt::String("foo"),
                 ::std::fmt::rt::Argument(::std::fmt::rt::Argument{ ... }),
                 ::std::fmt::rt::Argument(::std::fmt::rt::Argument{ ... }),
                 ::std::fmt::rt::String("baz")];
            let __args_vec =
                &[::std::fmt::argument(::std::fmt::secret_show, __arg0),
                  ::std::fmt::argument(::std::fmt::secret_show, __arg1)];
            let __args =
                unsafe {
                    ::std::fmt::Arguments::new(__STATIC_FMTSTR, __args_vec)
                };
            ::std::io::stdio::println_args(&__args)

alexcrichton · 2014-08-24T21:09:14Z

src/libcore/fmt/mod.rs

-        secret_lower_hex::<uint>(&(*self as uint), f)
+        let res = secret_lower_hex::<uint>(&(*self as uint), f);
+        f.flags = 0;
+        res


This change worries me and makes me realize that this is a breaking change due to the differing semantics. It didn't look like there was much of a performance improvement over just calling fmt.arg, could there perhaps be a static instance of Argument representing the default formatting?

Yes, should I make a static instance or rather an impl of Default for Argument? Another thing that worries me is the change of the signature of public-facing Arguments::new.

Either way is fine, and it's ok to not worry about Arguments::new as it shouldn't be used/public anyway.

Done with a static

alexcrichton · 2014-08-25T12:26:05Z

Looks great, thanks @pczarn! Just a few more minor comments, and then I think this is good to go with some squashing of the intermediate commits.

pczarn · 2014-08-25T17:04:19Z

This is good to go

Format specs are ignored and not stored in case they're all default. Restore default formatting parameters during iteration. Pass `None` instead of empty slices of format specs to take advantage of non-nullable pointer optimization. Generate a call to one of two functions of `fmt::Argument`.

Based on an observation that strings and arguments are always interleaved, thanks to #15832. Additionally optimize invocations where formatting parameters are unspecified for all arguments, e.g. `"{} {:?} {:x}"`, by emptying the `__STATIC_FMTARGS` array. Next, `Arguments::new` replaces an empty slice with `None` so that passing empty `__STATIC_FMTARGS` generates slightly less machine code when `Arguments::new` is inlined. Furthermore, formatting itself treats these cases separately without making redundant copies of formatting parameters. All in all, this adds a single mov instruction per `write!` in most cases. That's why code size has increased.

Add test explorer This PR implements the vscode testing api similar to rust-lang#14589, this time using a set of lsp extensions in order to make it useful for clients other than vscode, and make the vscode client side logic simpler (its now around ~100 line of TS code) Fix rust-lang#3601

Some minor changes in the test explorer lsp extension followup rust-lang#16662 cc `@nemethf` `@ShuiRuTian`

alexcrichton reviewed Aug 22, 2014
View reviewed changes

huonw reviewed Aug 23, 2014
View reviewed changes

alexcrichton reviewed Aug 24, 2014
View reviewed changes

pczarn force-pushed the format-fmtstr-opt branch from 0b8cd02 to 11afef3 Compare August 25, 2014 13:34

pczarn added 3 commits September 9, 2014 20:34

Decouple string and argument pieces

696367f

coretest: Ensure that pointer formatting flags are cleaned up

fcf88b8

pczarn force-pushed the format-fmtstr-opt branch from 11afef3 to fcf88b8 Compare September 9, 2014 19:34

bors closed this Sep 10, 2014

bors merged commit fcf88b8 into rust-lang:master Sep 10, 2014

matthiaskrgr pushed a commit to matthiaskrgr/rust that referenced this pull request Mar 10, 2024

Auto merge of rust-lang#16794 - HKalbasi:test-explorer, r=lnicola

574e23e

Some minor changes in the test explorer lsp extension followup rust-lang#16662 cc `@nemethf` `@ShuiRuTian`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce fmt's rodata usage by unzipping String and Argument arrays #16662

Reduce fmt's rodata usage by unzipping String and Argument arrays #16662

pczarn commented Aug 21, 2014

brson commented Aug 21, 2014

brson commented Aug 21, 2014

pczarn commented Aug 22, 2014

alexcrichton Aug 22, 2014

pczarn Aug 22, 2014

alexcrichton Aug 22, 2014

pczarn Aug 22, 2014

alexcrichton commented Aug 22, 2014

huonw commented Aug 23, 2014

pczarn commented Aug 23, 2014

huonw Aug 23, 2014

pczarn Aug 23, 2014

huonw commented Aug 23, 2014

pczarn commented Aug 23, 2014

huonw commented Aug 23, 2014

alexcrichton Aug 24, 2014

pczarn Aug 24, 2014

alexcrichton Aug 24, 2014

pczarn Aug 25, 2014

alexcrichton commented Aug 25, 2014

pczarn commented Aug 25, 2014

Reduce fmt's rodata usage by unzipping String and Argument arrays #16662

Reduce fmt's rodata usage by unzipping String and Argument arrays #16662

Conversation

pczarn commented Aug 21, 2014

brson commented Aug 21, 2014

brson commented Aug 21, 2014

pczarn commented Aug 22, 2014

Example

Wins

Difference (negative is better)

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alexcrichton commented Aug 22, 2014

huonw commented Aug 23, 2014

pczarn commented Aug 23, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment

huonw commented Aug 23, 2014

pczarn commented Aug 23, 2014

huonw commented Aug 23, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alexcrichton commented Aug 25, 2014

pczarn commented Aug 25, 2014