-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve speed of fmt::Debug
for str
and char
#28662
Conversation
r? @aturon (rust_highfive has picked a reviewer for you, use r? to override) |
This looks great all around, I guess the problematic part is duplication of the string escaping logic from the char module. I love the speedup on the debug output of old norse texts 👍 |
I'd like to solve this by adding something on |
or if not on EscapeDefault, a method next to |
@bluss Yeah, not a fan of this code dupe that I did. |
Also, It would be great if something like And as a side effect this will further improve speed of Are those changes need RFC? |
I'm a little worried about the duplication here with the existing New methods don't need to go through an RFC, but this shouldn't be too ambitious in adding any new features. If possible no API surface area should be added to the standard library, and only if absolutely necessary should an unstable API be added. |
Refactored escaping logic into @alexcrichton Thank you for clarifying this! I wasn't worried about debug performance too much when opened this pull request. It's more just to close issue I discovered when worked on |
I think the implementation is nice and clean now. The crucial point is that we need some kind of char method to not repeat the By the way, there is not just one escaping mode, so |
Speedup looks good. Look at the example foobarbazqux which needs no escaping, it improves from 278 ns to 84 ns. This corresponds to much less |
I can't find a test for debug formatting for strings. Can we add one in src/libcoretest/fmt/ ? |
Yeah, I just wanted to keep changes at minimum. I blame Still renaming
They are in run-pass/ifmt.rs I believe, or do you mean tests specific to libcore? |
@semmaz Don't add more methods. I just didn't find those tests, but I guess it wouldn't hurt to extend them a bit, to make sure nonprintables, specials like \n and unicode are all escaped. I think this is correct:
|
Added test for string escaping. |
This looks good to me. |
Ah so when I was thinking of a helper function to deduplicate logic, I was thinking that the helper would be shared among |
@alexcrichton Thanks for pointing that out! It somehow didn't even occur to me. |
Thanks @semmaz! After thinking a little more on this I wonder if we can actually get by without creating a method? Perhaps the This would involve providing at least a small implementation of |
Implemented I guess after a squash it's good for merge? @alexcrichton Thanks for |
EscapeDefaultState::Char(_) => (1, Some(1)), | ||
EscapeDefaultState::Backslash(_) => (2, Some(2)), | ||
EscapeDefaultState::Unicode(_) => (0, Some(10)), | ||
_ => (0, Some(0)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be better to have EscapeDefaultState::Done
here?
It would make it easier to understand why (0, Some(0))
is the correct return value and in the unlikely case that the enum is extended it would promptly point out the problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, thanks for pointing out.
match self.state { | ||
EscapeDefaultState::Char(_) => (1, Some(1)), | ||
EscapeDefaultState::Backslash(_) => (2, Some(2)), | ||
EscapeDefaultState::Unicode(ref iter) => iter.size_hint(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now if you want to retain the same speed as the original implementation you had this can possibly just be (0, None)
(e.g. the default return value) and that way you don't have to bother with the calculations about the width of a character perhaps?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually tested this with (0, Some(10))
as return value here, before adding size_hint
to EscapeUnicode
and using it here. Didn't noticed any significant difference, so I guess slowdown comes from iterator construction. I'll double check that though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And it is iterator construction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah ok, thanks for checking!
for (i, c) in self.char_indices() { | ||
let esc = c.escape_default(); | ||
// If char needs escaping, flush backlog so far and write, else skip | ||
if esc.size_hint().0 != 1 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be best to check the upper bound here against Some(1)
because that may be a better signal that only this one character will be emitted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not super happy about this unofficial communication through the size hint. A method on the iterator itself would be much cleaner, only problem is that it needs to be some degree of public (but unstable).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alexcrichton Do you mean esc.size_hint().1 != Some(1)
or more explicit esc.size_hint() != (1, Some(1))
?
As @bluss I would prefer if we could add method to iterator (well, to EscapeDefault
actually). That might communicate intention clearer and be simpler in implementation than size_hint
.
Or maybe just revert to 025ca11 to avoid unnecessary construction of iterator?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@semmaz ah right, indeed I do mean that! I also think that a check against (1, Some(1))
would make it more explicit here. @bluss how do you feel about checking the whole size hint return value? That seems relatively more palatable to me at least.
And yeah the point of leveraging size_hint
would be to avoid expansion of the API surface area for a function that's unlikely to ever be stable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any way of checking the size_hint is as good as any other IMO. I just prefer an explicit method, maybe a static method on EscapeDefault
to make the connection quite clear. I don't see it as impossible to be part of a public api down the line, it might quite useful as we can see here to know if a char is default-escaped or not. But it's not the goal here, it should be unstable, and it's pub
just so that we can use it cross modules.
A static method is a good API level since it ties the behavior tightly to the EscapeDefault
iterator, without having to create the iterator to ask the escape yes / no question, a question that doesn't make very much sense on the stateful iterator value itself (because the iterator "forgets" the input char, depending on its state).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alexcrichton Updated.
I agree with @bluss. I'm leaning more towards using it more directly (CharExt
), but can understand concern about unclear ties and API.
I'm fine if it merges as it is right now. Although I really think that having less taxing way to query if char needs escape would be useful. Question is, should those methods be in scope of this PR?
Ok, I'm gonna approve this as-is for now, but I'd be open to discussing adding a method on |
In rust-lang#28662, `size_hint` was made exact for `EscapeUnicode` and `EscapeDefault`, but neither was marked as `ExactSizeIterator`.
In rust-lang#28662, `size_hint` was made exact for `EscapeUnicode` and `EscapeDefault`, but neither was marked as `ExactSizeIterator`.
fixes #26920