Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use String::with_capacity in format! #39356

Merged
merged 4 commits into from
Feb 3, 2017
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion src/libcollections/fmt.rs
Original file line number Diff line number Diff line change
Expand Up @@ -539,7 +539,8 @@ use string;
/// [format!]: ../macro.format.html
#[stable(feature = "rust1", since = "1.0.0")]
pub fn format(args: Arguments) -> string::String {
let mut output = string::String::new();
let capacity = args.estimated_capacity();
let mut output = string::String::with_capacity(capacity);
let _ = output.write_fmt(args);
output
}
28 changes: 28 additions & 0 deletions src/libcore/fmt/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -265,6 +265,34 @@ impl<'a> Arguments<'a> {
args: args
}
}

/// Estimates the length of the formatted text.
///
/// This is intended to be used for setting initial `String` capacity
/// when using `format!`. Note: this is neither the lower nor upper bound.
#[doc(hidden)] #[inline]
#[unstable(feature = "fmt_internals", reason = "internal to format_args!",
issue = "0")]
pub fn estimated_capacity(&self) -> usize {
// Using wrapping arithmetics in this function, because
// wrong result is highly unlikely and doesn't cause unsafety.
use ::num::Wrapping as W;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is wrapping really necessary here? If we're wrapping around the max value of usize we surely want to just skip estimation entirely, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we'd really wrap, then either the program would fail from OOM anyway, or we'd just underestimate the capacity (in case of multiplication by 2). I've used Wrapping to avoid sprinkling the code with early returns, so it's easier to read.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I was just struck by the oddity of reaching for a wrapping type. If we wanted to be "correct" I'd expect saturating arithmetic here rather than wrapping. In both cases though something has gone seriously wrong if we wrap at all.

In that sense I'd probably argue that normal arithmetic is what I'd expect here: panic in debug and don't worry about it in optimized mode.


let pieces_length: W<usize> = self.pieces.iter()
.map(|x| W(x.len())).sum();

// If they are any arguments to format, the string will most likely
// double in size. So we're pre-doubling it here.
let multiplier = if self.args.is_empty() { W(1) } else { W(2) };

let capacity = multiplier * pieces_length;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why's this multiplying by the number of pieces? Those all have a statically known size, and we're guessing to the size of args, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's multiplying overall length of all pieces by either 1 or 2. The idea is that if there are any arguments, the string buffer will grow on the next push. To avoid this reallocation, we're pre-doubling the capacity. I've changed the code so it's more clear. Should I change pieces_length to something else too?

if multiplier == W(2) && (W(1)..W(8)).contains(capacity) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is .contains here just serving as an alternative to <=? If so, could that be used instead?

// Allocations smaller than 8 don't really make sense for String.
8
} else {
capacity.0
}
}
}

/// This structure represents a safely precompiled version of a format string
Expand Down
8 changes: 8 additions & 0 deletions src/libcoretest/fmt/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -28,3 +28,11 @@ fn test_pointer_formats_data_pointer() {
assert_eq!(format!("{:p}", s), format!("{:p}", s.as_ptr()));
assert_eq!(format!("{:p}", b), format!("{:p}", b.as_ptr()));
}

#[test]
fn test_estimated_capacity() {
assert_eq!(format_args!("{}", "").estimated_capacity(), 0);
assert_eq!(format_args!("Hello").estimated_capacity(), 5);
assert_eq!(format_args!("Hello, {}!", "").estimated_capacity(), 16);
assert_eq!(format_args!("{}, hello!", "World").estimated_capacity(), 16);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a test for

format_args!("{} {} {}", "12characters", "13_characters", "14__characters")

I am curious what estimated capacity will.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also measure time.

Copy link
Contributor Author

@krdln krdln Feb 1, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous algorithm would set the capacity to 8 (two spaces times two, rounded up to eight). I've updated the algorithm so it doesn't try to be smart in your case and also added some more benchmarks.

Also measure time.

You mean to add the benches to repo? I'm not sure where exactly they belong.

1 change: 1 addition & 0 deletions src/libcoretest/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@
#![feature(ordering_chaining)]
#![feature(result_unwrap_or_default)]
#![feature(ptr_unaligned)]
#![feature(fmt_internals)]

extern crate core;
extern crate test;
Expand Down