Improve speed of `fmt::Debug` for `str` and `char` #28662

semmaz · 2015-09-25T15:48:54Z

fixes #26920

rust-highfive · 2015-09-25T15:49:00Z

r? @aturon

(rust_highfive has picked a reviewer for you, use r? to override)

semmaz · 2015-09-25T15:51:36Z

Gist for benchmark and my results.

Oh, and props to @bluss for the idea how this could be implemented!

bluss · 2015-09-25T18:21:58Z

This looks great all around, I guess the problematic part is duplication of the string escaping logic from the char module.

I love the speedup on the debug output of old norse texts 👍

bluss · 2015-09-25T18:28:07Z

I'd like to solve this by adding something on EscapeDefault to query if the char needed escaping or not. This way the logic is not duplicated.

bluss · 2015-09-25T18:29:14Z

or if not on EscapeDefault, a method next to escape_default().

semmaz · 2015-09-25T19:07:54Z

@bluss Yeah, not a fan of this code dupe that I did.
Perhaps I should add FIXME saying that this should be refactored when/if such thing would exist on EscapeDefault or as a method of CharExt?

semmaz · 2015-09-25T19:12:19Z

Also, It would be great if something like is_printable would exist.
I'd like to make a follow-up pull request (for #24588) that will change Debug output of str, char and OsStr to not escape valid unicode code points, except those in Cc (and ', ", \ chars).
Such thing would be handy for that purpose.

And as a side effect this will further improve speed of Debug output, although this is not main concern.

Are those changes need RFC?
cc @alexcrichton

alexcrichton · 2015-09-28T15:52:03Z

I'm a little worried about the duplication here with the existing escape_default, especially if this is going to be expanded to a number of other locations as well. I don't think it's really that critical that Debug is super fast, so there's not necessarily a huge amount of urgency to deal with this! That being said, the idea here is totally fine by me and I'd love to see an improvement here.

New methods don't need to go through an RFC, but this shouldn't be too ambitious in adding any new features. If possible no API surface area should be added to the standard library, and only if absolutely necessary should an unstable API be added.

semmaz · 2015-09-28T22:49:11Z

Refactored escaping logic into needs_escape method of CharExt.

@alexcrichton Thank you for clarifying this! I wasn't worried about debug performance too much when opened this pull request. It's more just to close issue I discovered when worked on Debug escaping, with separate PR.

bluss · 2015-09-29T10:20:53Z

I think the implementation is nice and clean now. The crucial point is that we need some kind of char method to not repeat the needs_escape logic, and if we can get that allowed :-)

By the way, there is not just one escaping mode, so needs_escape should be named & documented to be explicitly connected with the escaping done by escape_default.

bluss · 2015-09-29T10:42:49Z

Speedup looks good. Look at the example foobarbazqux which needs no escaping, it improves from 278 ns to 84 ns. This corresponds to much less write calls on the underlying writer, calls that go through the Write trait object.

bluss · 2015-09-29T10:45:55Z

I can't find a test for debug formatting for strings. Can we add one in src/libcoretest/fmt/ ?

semmaz · 2015-09-29T11:39:46Z

By the way, there is not just one escaping mode, so needs_escape should be named & documented to be explicitly connected with the escaping done by escape_default.

Yeah, I just wanted to keep changes at minimum. I blame needs_escape name partly because of that, partly because my lack of creativity at the moment ;).

Still renaming needs_escape to needs_escape_default (or needs_default_escape?), adding needs_escape_unicode (for symmetry), marking them as unstable and documenting in librustc_unicode wouldn’t hurt I guess?

I can't find a test for debug formatting for strings

They are in run-pass/ifmt.rs I believe, or do you mean tests specific to libcore?

bluss · 2015-09-29T12:08:21Z

@semmaz Don't add more methods. I just didn't find those tests, but I guess it wouldn't hurt to extend them a bit, to make sure nonprintables, specials like \n and unicode are all escaped.

I think this is correct:

format!("{:?}", "foo\n \"\0\x01 \\ \u{3b1}") -> r#""foo\n \"\u{0}\u{1} \\ \u{3b1}""#

fixes rust-lang#26920

semmaz · 2015-09-29T14:28:04Z

Added test for string escaping.

bluss · 2015-09-29T15:58:27Z

This looks good to me.

alexcrichton · 2015-09-29T16:49:36Z

Ah so when I was thinking of a helper function to deduplicate logic, I was thinking that the helper would be shared among escape_default as well as these Debug implementations. If the "helper" is just called from the Debug implementation then I'd prefer it live in a private location in the formatting module, but could the escape_default function leverage it as well? That should help centralize the logic about what should be escaped and what shouldn't

semmaz · 2015-09-29T18:27:35Z

@alexcrichton Thanks for pointing that out! It somehow didn't even occur to me.
See latest commit. If it's not enough to leave it in CharExt I'll move it to private location of fmt module instead.

alexcrichton · 2015-09-29T23:33:41Z

Thanks @semmaz! After thinking a little more on this I wonder if we can actually get by without creating a method? Perhaps the Debug implementation could always call escape_default to get an iterator, and then it inspected the return value of size_hint? If the upper bound is Some(1) then it knows that no escaping was necessary, and otherwise it could assume that the escape should happen.

This would involve providing at least a small implementation of size_hint today, but that'd at least help prevent adding another method!

semmaz · 2015-09-30T15:09:54Z

Implemented size_hint and updated gist. Somewhat slower for longer strings then my initial implementation (still faster than Debug for str is now).

I guess after a squash it's good for merge?

@alexcrichton Thanks for size_hint hint :-)
I first thought about implementing count instead, but it will consume EscapeDefault, so, it's not that useful here.

ranma42 · 2015-09-30T16:21:46Z

src/libcore/char.rs

+            EscapeDefaultState::Char(_) => (1, Some(1)),
+            EscapeDefaultState::Backslash(_) => (2, Some(2)),
+            EscapeDefaultState::Unicode(_) => (0, Some(10)),
+            _ => (0, Some(0))


Would it be better to have EscapeDefaultState::Done here?
It would make it easier to understand why (0, Some(0)) is the correct return value and in the unlikely case that the enum is extended it would promptly point out the problem.

Yep, thanks for pointing out.

alexcrichton · 2015-09-30T21:14:11Z

src/libcore/char.rs

+        match self.state {
+            EscapeDefaultState::Char(_) => (1, Some(1)),
+            EscapeDefaultState::Backslash(_) => (2, Some(2)),
+            EscapeDefaultState::Unicode(ref iter) => iter.size_hint(),


For now if you want to retain the same speed as the original implementation you had this can possibly just be (0, None) (e.g. the default return value) and that way you don't have to bother with the calculations about the width of a character perhaps?

I actually tested this with (0, Some(10)) as return value here, before adding size_hint to EscapeUnicode and using it here. Didn't noticed any significant difference, so I guess slowdown comes from iterator construction. I'll double check that though.

And it is iterator construction.

Ah ok, thanks for checking!

alexcrichton · 2015-09-30T21:14:34Z

src/libcore/fmt/mod.rs

+        for (i, c) in self.char_indices() {
+            let esc = c.escape_default();
+            // If char needs escaping, flush backlog so far and write, else skip
+            if esc.size_hint().0 != 1 {


It may be best to check the upper bound here against Some(1) because that may be a better signal that only this one character will be emitted.

I'm not super happy about this unofficial communication through the size hint. A method on the iterator itself would be much cleaner, only problem is that it needs to be some degree of public (but unstable).

@alexcrichton Do you mean esc.size_hint().1 != Some(1) or more explicit esc.size_hint() != (1, Some(1))?

As @bluss I would prefer if we could add method to iterator (well, to EscapeDefault actually). That might communicate intention clearer and be simpler in implementation than size_hint.

Or maybe just revert to 025ca11 to avoid unnecessary construction of iterator?

@semmaz ah right, indeed I do mean that! I also think that a check against (1, Some(1)) would make it more explicit here. @bluss how do you feel about checking the whole size hint return value? That seems relatively more palatable to me at least.

And yeah the point of leveraging size_hint would be to avoid expansion of the API surface area for a function that's unlikely to ever be stable.

Any way of checking the size_hint is as good as any other IMO. I just prefer an explicit method, maybe a static method on EscapeDefault to make the connection quite clear. I don't see it as impossible to be part of a public api down the line, it might quite useful as we can see here to know if a char is default-escaped or not. But it's not the goal here, it should be unstable, and it's pub just so that we can use it cross modules.

A static method is a good API level since it ties the behavior tightly to the EscapeDefault iterator, without having to create the iterator to ask the escape yes / no question, a question that doesn't make very much sense on the stateful iterator value itself (because the iterator "forgets" the input char, depending on its state).

@alexcrichton Updated.
I agree with @bluss. I'm leaning more towards using it more directly (CharExt), but can understand concern about unclear ties and API.

I'm fine if it merges as it is right now. Although I really think that having less taxing way to query if char needs escape would be useful. Question is, should those methods be in scope of this PR?

alexcrichton · 2015-10-02T16:53:04Z

@bors: r+ 0294098

Ok, I'm gonna approve this as-is for now, but I'd be open to discussing adding a method on CharExt which avoids constructing the iterator as a separate PR. I'm pretty wary of adding unstable functionality like this to libcore without a clear path to stabilization, and I'm not sure it's worth its weight (this is a pretty niche function).

fixes #26920

bors · 2015-10-02T22:49:38Z

⌛ Testing commit 0294098 with merge bfb2603...

bors · 2015-10-03T00:36:49Z

☀️ Test successful - auto-linux-32-nopt-t, auto-linux-32-opt, auto-linux-64-nopt-t, auto-linux-64-opt, auto-linux-64-x-android-t, auto-mac-32-opt, auto-mac-64-nopt-t, auto-mac-64-opt, auto-win-gnu-32-nopt-t, auto-win-gnu-32-opt, auto-win-gnu-64-nopt-t, auto-win-gnu-64-opt, auto-win-msvc-32-opt, auto-win-msvc-64-opt

In rust-lang#28662, `size_hint` was made exact for `EscapeUnicode` and `EscapeDefault`, but neither was marked as `ExactSizeIterator`.

rust-highfive assigned aturon Sep 25, 2015

semmaz force-pushed the fmt-debug branch from e6f5cf0 to f059c41 Compare September 28, 2015 22:06

semmaz added 2 commits September 29, 2015 15:24

Improve speed of fmt::Debug for str and char

24b5d3a

fixes rust-lang#26920

Add fmt::Debug string escape tests

025ca11

semmaz force-pushed the fmt-debug branch from f059c41 to 025ca11 Compare September 29, 2015 12:56

Implement size_hint for EscapeDefault

d2d0872

semmaz force-pushed the fmt-debug branch from 6abdffc to d2d0872 Compare September 30, 2015 15:08

ranma42 reviewed Sep 30, 2015
View reviewed changes

semmaz force-pushed the fmt-debug branch from 44e9382 to aca0b7c Compare September 30, 2015 17:40

alexcrichton reviewed Sep 30, 2015
View reviewed changes

Implement size_hint for EscapeUnicode

0294098

semmaz force-pushed the fmt-debug branch from aca0b7c to 0294098 Compare October 1, 2015 17:38

bors added a commit that referenced this pull request Oct 2, 2015

Auto merge of #28662 - semmaz:fmt-debug, r=alexcrichton

bfb2603

fixes #26920

bors merged commit 0294098 into rust-lang:master Oct 3, 2015

semmaz deleted the fmt-debug branch October 4, 2015 21:24

ranma42 added a commit to ranma42/rust that referenced this pull request Jan 20, 2016

EscapeUnicode and EscapeDefault are ExactSizeIterators

7f5eae7

In rust-lang#28662, `size_hint` was made exact for `EscapeUnicode` and `EscapeDefault`, but neither was marked as `ExactSizeIterator`.

ranma42 added a commit to ranma42/rust that referenced this pull request May 26, 2016

EscapeUnicode and EscapeDefault are ExactSizeIterators

c30fa92

In rust-lang#28662, `size_hint` was made exact for `EscapeUnicode` and `EscapeDefault`, but neither was marked as `ExactSizeIterator`.

Improve speed of fmt::Debug for str and char #28662

Improve speed of fmt::Debug for str and char #28662

Conversation

semmaz commented Sep 25, 2015

rust-highfive commented Sep 25, 2015

semmaz commented Sep 25, 2015

bluss commented Sep 25, 2015

bluss commented Sep 25, 2015

bluss commented Sep 25, 2015

semmaz commented Sep 25, 2015

semmaz commented Sep 25, 2015

alexcrichton commented Sep 28, 2015

semmaz commented Sep 28, 2015

bluss commented Sep 29, 2015

bluss commented Sep 29, 2015

bluss commented Sep 29, 2015

semmaz commented Sep 29, 2015

bluss commented Sep 29, 2015

semmaz commented Sep 29, 2015

bluss commented Sep 29, 2015

alexcrichton commented Sep 29, 2015

semmaz commented Sep 29, 2015

alexcrichton commented Sep 29, 2015

semmaz commented Sep 30, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alexcrichton commented Oct 2, 2015

bors commented Oct 2, 2015

bors commented Oct 3, 2015

Improve speed of `fmt::Debug` for `str` and `char` #28662

Improve speed of `fmt::Debug` for `str` and `char` #28662