-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Normalise Hex and unicode escape sequences in string #9280
Conversation
/// * `\u`, `\U'` and `\x`: To use lower case for the characters `a-f`. | ||
/// * `\N`: To use uppercase letters | ||
fn normalize(self, input: &str) -> Option<Cow<str>> { | ||
let mut normalised = String::new(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to avoid the allocation here when the string needs to be normalized, but making it work with the outer normalize_string
was somewhat complicated (relative string indices, different iterators etc). That's why I decided to accept the allocation here, considering that we only allocate for non-normalised (unformatted) escape sequences (rare).
There's also the case that we need to be able to recover in case the escape sequence is invalid, which means we can't perform the normalisation in place (and directly push to the outer output
String
)
fn normalize(self, input: &str) -> Option<Cow<str>> { | ||
let mut normalised = String::new(); | ||
|
||
let len = match self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This turned out more complicated than I hoped it would. Maybe a candidate to use a Regex? But I also think it's not worth bothering much. It's relatively straightforward code and I don't expect many changes to it,
|
1401dbd
to
4a49cdb
Compare
4a49cdb
to
14c1cc3
Compare
_ => { | ||
// not a valid escape sequence | ||
return None; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this happen? I'm getting syntax errors with invalid input (both ruff and cpython)
Summary
This PR implements Black's
hex_codes_in_unicode_sequences
preview style that normalises escape sequences:\u
,\U
in non-byte literals: To use lower-case characters (A-F
->a-f
)\x
: To use lower-case characters (A-F
->a-f
)\N
in non-byte literals: To use uppercase charactersUsing lowercase characters for hex escape sequences is consistent to Python's
repr
.But it has the downside that it is inconsistent with how we (and Black) format numbers in hexadecimal notations:
0xAF
.Using lowercase characters is the right choise in my view and I'm leaning towards changing number literals to use lower case too.
But we need to be careful with this because it seems black changed their preference a few times in the past.
Part of #8678
Test Plan
cargo test