Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve speed of fmt::Debug for str and char #28662

Merged
merged 4 commits into from
Oct 3, 2015
Merged

Conversation

semmaz
Copy link
Contributor

@semmaz semmaz commented Sep 25, 2015

fixes #26920

@rust-highfive
Copy link
Collaborator

r? @aturon

(rust_highfive has picked a reviewer for you, use r? to override)

@semmaz
Copy link
Contributor Author

semmaz commented Sep 25, 2015

Gist for benchmark and my results.

Oh, and props to @bluss for the idea how this could be implemented!

@bluss
Copy link
Member

bluss commented Sep 25, 2015

This looks great all around, I guess the problematic part is duplication of the string escaping logic from the char module.

I love the speedup on the debug output of old norse texts 👍

@bluss
Copy link
Member

bluss commented Sep 25, 2015

I'd like to solve this by adding something on EscapeDefault to query if the char needed escaping or not. This way the logic is not duplicated.

@bluss
Copy link
Member

bluss commented Sep 25, 2015

or if not on EscapeDefault, a method next to escape_default().

@semmaz
Copy link
Contributor Author

semmaz commented Sep 25, 2015

@bluss Yeah, not a fan of this code dupe that I did.
Perhaps I should add FIXME saying that this should be refactored when/if such thing would exist on EscapeDefault or as a method of CharExt?

@semmaz
Copy link
Contributor Author

semmaz commented Sep 25, 2015

Also, It would be great if something like is_printable would exist.
I'd like to make a follow-up pull request (for #24588) that will change Debug output of str, char and OsStr to not escape valid unicode code points, except those in Cc (and ', ", \ chars).
Such thing would be handy for that purpose.

And as a side effect this will further improve speed of Debug output, although this is not main concern.

Are those changes need RFC?
cc @alexcrichton

@alexcrichton
Copy link
Member

I'm a little worried about the duplication here with the existing escape_default, especially if this is going to be expanded to a number of other locations as well. I don't think it's really that critical that Debug is super fast, so there's not necessarily a huge amount of urgency to deal with this! That being said, the idea here is totally fine by me and I'd love to see an improvement here.

New methods don't need to go through an RFC, but this shouldn't be too ambitious in adding any new features. If possible no API surface area should be added to the standard library, and only if absolutely necessary should an unstable API be added.

@semmaz
Copy link
Contributor Author

semmaz commented Sep 28, 2015

Refactored escaping logic into needs_escape method of CharExt.

@alexcrichton Thank you for clarifying this! I wasn't worried about debug performance too much when opened this pull request. It's more just to close issue I discovered when worked on Debug escaping, with separate PR.

@bluss
Copy link
Member

bluss commented Sep 29, 2015

I think the implementation is nice and clean now. The crucial point is that we need some kind of char method to not repeat the needs_escape logic, and if we can get that allowed :-)

By the way, there is not just one escaping mode, so needs_escape should be named & documented to be explicitly connected with the escaping done by escape_default.

@bluss
Copy link
Member

bluss commented Sep 29, 2015

Speedup looks good. Look at the example foobarbazqux which needs no escaping, it improves from 278 ns to 84 ns. This corresponds to much less write calls on the underlying writer, calls that go through the Write trait object.

@bluss
Copy link
Member

bluss commented Sep 29, 2015

I can't find a test for debug formatting for strings. Can we add one in src/libcoretest/fmt/ ?

@semmaz
Copy link
Contributor Author

semmaz commented Sep 29, 2015

By the way, there is not just one escaping mode, so needs_escape should be named & documented to be explicitly connected with the escaping done by escape_default.

Yeah, I just wanted to keep changes at minimum. I blame needs_escape name partly because of that, partly because my lack of creativity at the moment ;).

Still renaming needs_escape to needs_escape_default (or needs_default_escape?), adding needs_escape_unicode (for symmetry), marking them as unstable and documenting in librustc_unicode wouldn’t hurt I guess?

I can't find a test for debug formatting for strings

They are in run-pass/ifmt.rs I believe, or do you mean tests specific to libcore?

@bluss
Copy link
Member

bluss commented Sep 29, 2015

@semmaz Don't add more methods. I just didn't find those tests, but I guess it wouldn't hurt to extend them a bit, to make sure nonprintables, specials like \n and unicode are all escaped.

I think this is correct:

format!("{:?}", "foo\n \"\0\x01 \\ \u{3b1}") -> r#""foo\n \"\u{0}\u{1} \\ \u{3b1}""#

@semmaz
Copy link
Contributor Author

semmaz commented Sep 29, 2015

Added test for string escaping.

@bluss
Copy link
Member

bluss commented Sep 29, 2015

This looks good to me.

@alexcrichton
Copy link
Member

Ah so when I was thinking of a helper function to deduplicate logic, I was thinking that the helper would be shared among escape_default as well as these Debug implementations. If the "helper" is just called from the Debug implementation then I'd prefer it live in a private location in the formatting module, but could the escape_default function leverage it as well? That should help centralize the logic about what should be escaped and what shouldn't

@semmaz
Copy link
Contributor Author

semmaz commented Sep 29, 2015

@alexcrichton Thanks for pointing that out! It somehow didn't even occur to me.
See latest commit. If it's not enough to leave it in CharExt I'll move it to private location of fmt module instead.

@alexcrichton
Copy link
Member

Thanks @semmaz! After thinking a little more on this I wonder if we can actually get by without creating a method? Perhaps the Debug implementation could always call escape_default to get an iterator, and then it inspected the return value of size_hint? If the upper bound is Some(1) then it knows that no escaping was necessary, and otherwise it could assume that the escape should happen.

This would involve providing at least a small implementation of size_hint today, but that'd at least help prevent adding another method!

@semmaz
Copy link
Contributor Author

semmaz commented Sep 30, 2015

Implemented size_hint and updated gist. Somewhat slower for longer strings then my initial implementation (still faster than Debug for str is now).

I guess after a squash it's good for merge?

@alexcrichton Thanks for size_hint hint :-)
I first thought about implementing count instead, but it will consume EscapeDefault, so, it's not that useful here.

EscapeDefaultState::Char(_) => (1, Some(1)),
EscapeDefaultState::Backslash(_) => (2, Some(2)),
EscapeDefaultState::Unicode(_) => (0, Some(10)),
_ => (0, Some(0))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to have EscapeDefaultState::Done here?
It would make it easier to understand why (0, Some(0)) is the correct return value and in the unlikely case that the enum is extended it would promptly point out the problem.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, thanks for pointing out.

match self.state {
EscapeDefaultState::Char(_) => (1, Some(1)),
EscapeDefaultState::Backslash(_) => (2, Some(2)),
EscapeDefaultState::Unicode(ref iter) => iter.size_hint(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now if you want to retain the same speed as the original implementation you had this can possibly just be (0, None) (e.g. the default return value) and that way you don't have to bother with the calculations about the width of a character perhaps?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually tested this with (0, Some(10)) as return value here, before adding size_hint to EscapeUnicode and using it here. Didn't noticed any significant difference, so I guess slowdown comes from iterator construction. I'll double check that though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And it is iterator construction.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok, thanks for checking!

for (i, c) in self.char_indices() {
let esc = c.escape_default();
// If char needs escaping, flush backlog so far and write, else skip
if esc.size_hint().0 != 1 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be best to check the upper bound here against Some(1) because that may be a better signal that only this one character will be emitted.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not super happy about this unofficial communication through the size hint. A method on the iterator itself would be much cleaner, only problem is that it needs to be some degree of public (but unstable).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alexcrichton Do you mean esc.size_hint().1 != Some(1) or more explicit esc.size_hint() != (1, Some(1))?

As @bluss I would prefer if we could add method to iterator (well, to EscapeDefault actually). That might communicate intention clearer and be simpler in implementation than size_hint.

Or maybe just revert to 025ca11 to avoid unnecessary construction of iterator?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@semmaz ah right, indeed I do mean that! I also think that a check against (1, Some(1)) would make it more explicit here. @bluss how do you feel about checking the whole size hint return value? That seems relatively more palatable to me at least.

And yeah the point of leveraging size_hint would be to avoid expansion of the API surface area for a function that's unlikely to ever be stable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any way of checking the size_hint is as good as any other IMO. I just prefer an explicit method, maybe a static method on EscapeDefault to make the connection quite clear. I don't see it as impossible to be part of a public api down the line, it might quite useful as we can see here to know if a char is default-escaped or not. But it's not the goal here, it should be unstable, and it's pub just so that we can use it cross modules.

A static method is a good API level since it ties the behavior tightly to the EscapeDefault iterator, without having to create the iterator to ask the escape yes / no question, a question that doesn't make very much sense on the stateful iterator value itself (because the iterator "forgets" the input char, depending on its state).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alexcrichton Updated.
I agree with @bluss. I'm leaning more towards using it more directly (CharExt), but can understand concern about unclear ties and API.

I'm fine if it merges as it is right now. Although I really think that having less taxing way to query if char needs escape would be useful. Question is, should those methods be in scope of this PR?

@alexcrichton
Copy link
Member

@bors: r+ 0294098

Ok, I'm gonna approve this as-is for now, but I'd be open to discussing adding a method on CharExt which avoids constructing the iterator as a separate PR. I'm pretty wary of adding unstable functionality like this to libcore without a clear path to stabilization, and I'm not sure it's worth its weight (this is a pretty niche function).

bors added a commit that referenced this pull request Oct 2, 2015
@bors
Copy link
Contributor

bors commented Oct 2, 2015

⌛ Testing commit 0294098 with merge bfb2603...

@bors bors merged commit 0294098 into rust-lang:master Oct 3, 2015
@semmaz semmaz deleted the fmt-debug branch October 4, 2015 21:24
ranma42 added a commit to ranma42/rust that referenced this pull request Jan 20, 2016
In rust-lang#28662, `size_hint` was made exact for `EscapeUnicode` and
`EscapeDefault`, but neither was marked as `ExactSizeIterator`.
ranma42 added a commit to ranma42/rust that referenced this pull request May 26, 2016
In rust-lang#28662, `size_hint` was made exact for `EscapeUnicode` and
`EscapeDefault`, but neither was marked as `ExactSizeIterator`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Debug for str writes one byte at a time
7 participants