-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Escape combining characters in char::Debug #49283
Conversation
LGTM although the character ranges should probably be generated with a script instead of hard-coded, as they can change across unicode versions. |
@clarcharr: have you got any ideas for how to detect ranges programmatically? Apart from parsing the Unicode symbol names (which does not seem robust), I couldn't find any reasonable suggestion for how to do this automatically. |
r? @SimonSapin |
This PR is modifying the However this method is also used in Regarding this being a breaking change, I think I’d be ok with documenting that the exact set of code points being escaped or not is not stable outside of the ASCII range. The current doc-comment is already out of sync with the implementation: it claims that anything non-ASCII is escaped, which hasn’t been the case since #34485. I also agree with not hard-coding code point ranges, and instead extracting more Unicode data in
Then there’s the
However #41922 suggests that only nonspacing marks are problematic? |
I'd be fine with just going for nonspacing marks. As for |
Yeah, I'm not sure about this. It'd be nicer if there was an extra argument (
Which doc comment are you referring to here?
I have a change that uses |
@SimonSapin: as far as I can tell, there's an issue using |
Wasn't the goal of having Personally I really dislike the |
#49698 merges It might be interesting to replace |
☔ The latest upstream changes (presumably #49698) made this pull request unmergeable. Please resolve the merge conflicts. |
fb39348
to
a1c6deb
Compare
@SimonSapin: #49698 makes everything much more pleasant :) I've rebased on top of it (and also cleaned up |
@varkor Thanks for your work on this. However I’m still not sure what the right thing to do here is, in terms of what characters exactly to escape or not in which context. Do you know what other languages do for debug-printing Unicode strings and characters? |
@SimonSapin: Swift does something similar to the proposed change here. I haven't tried working out exactly which characters they choose to escape, but it seems reasonable to assume they choose either the same, or a similar category. |
Ah, so Swift actually escapes any non-ASCII character, which is what Rust used to do before #24588: |
672b39f
to
59513ad
Compare
59513ad
to
c85cc88
Compare
c85cc88
to
e96a115
Compare
@SimonSapin: Friendly triage ping :) |
3591ecd
to
2fa22ef
Compare
@bors r+ |
📌 Commit b653937 has been approved by |
Escape combining characters in char::Debug Although combining characters are technically printable, they make little sense to print on their own with `Debug`: it'd be better to escape them like non-printable characters. This is a breaking change, but I imagine the fact `escape_debug` is rare and almost certainly primarily used for debugging that this is an acceptable change. Resolves #41922. r? @alexcrichton cc @clarcharr
☀️ Test successful - status-appveyor, status-travis |
Although combining characters are technically printable, they make little sense to print on their own with
Debug
: it'd be better to escape them like non-printable characters.This is a breaking change, but I imagine the fact
escape_debug
is rare and almost certainly primarily used for debugging that this is an acceptable change.Resolves #41922.
r? @alexcrichton
cc @clarcharr