-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Format numeric constants #5972
Format numeric constants #5972
Conversation
PR Check ResultsBenchmarkLinux
Windows
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh wow nice! Excellent work.
What I understand from the code is that the formatted now uses dynamic_text
for every number constant even if it is already correctly formatted. This is, unfortunately, somewhat expensive because every dynamic_text
allocates a String
when writing the text.
Would you be interested in taking a stab at implementing an optimisation similar to what we did in normalize_string
ruff/crates/ruff_python_formatter/src/expression/string.rs
Lines 413 to 481 in 33657d3
fn normalize_string(input: &str, quotes: StringQuotes) -> (Cow<str>, ContainsNewlines) { | |
// The normalized string if `input` is not yet normalized. | |
// `output` must remain empty if `input` is already normalized. | |
let mut output = String::new(); | |
// Tracks the last index of `input` that has been written to `output`. | |
// If `last_index` is `0` at the end, then the input is already normalized and can be returned as is. | |
let mut last_index = 0; | |
let mut newlines = ContainsNewlines::No; | |
let style = quotes.style; | |
let preferred_quote = style.as_char(); | |
let opposite_quote = style.invert().as_char(); | |
let mut chars = input.char_indices(); | |
while let Some((index, c)) = chars.next() { | |
if c == '\r' { | |
output.push_str(&input[last_index..index]); | |
// Skip over the '\r' character, keep the `\n` | |
if input.as_bytes().get(index + 1).copied() == Some(b'\n') { | |
chars.next(); | |
} | |
// Replace the `\r` with a `\n` | |
else { | |
output.push('\n'); | |
} | |
last_index = index + '\r'.len_utf8(); | |
newlines = ContainsNewlines::Yes; | |
} else if c == '\n' { | |
newlines = ContainsNewlines::Yes; | |
} else if !quotes.triple { | |
if c == '\\' { | |
if let Some(next) = input.as_bytes().get(index + 1).copied().map(char::from) { | |
#[allow(clippy::if_same_then_else)] | |
if next == opposite_quote { | |
// Remove the escape by ending before the backslash and starting again with the quote | |
chars.next(); | |
output.push_str(&input[last_index..index]); | |
last_index = index + '\\'.len_utf8(); | |
} else if next == preferred_quote { | |
// Quote is already escaped, skip over it. | |
chars.next(); | |
} else if next == '\\' { | |
// Skip over escaped backslashes | |
chars.next(); | |
} | |
} | |
} else if c == preferred_quote { | |
// Escape the quote | |
output.push_str(&input[last_index..index]); | |
output.push('\\'); | |
output.push(c); | |
last_index = index + preferred_quote.len_utf8(); | |
} | |
} | |
} | |
let normalized = if last_index == 0 { | |
Cow::Borrowed(input) | |
} else { | |
output.push_str(&input[last_index..]); | |
Cow::Owned(output) | |
}; | |
(normalized, newlines) | |
} |
The trick is to use a Cow
and return Cow::Borrowed
if the number is already correctly formatted (in which case we can print the code from the source using source_text_slice
(extremely fast). The implementation returns a Cow::Owned
if it reformatted the number and it then uses dynamic_text
.
The benefit of this optimisation is that we only pay the cost of allocating when formatting the file for the first time. Checking the formatting in subsequent passes (when there are no changes) takes the fast path instead.
Let me know if that's something you're interested in and if so, if you want to tackle this as part of this PR or want to do a follow up instead.
|
||
if content.starts_with("0x") || content.starts_with("0X") { | ||
let hex = content.get(2..).unwrap().to_ascii_uppercase(); | ||
write!(f, [text("0x"), dynamic_text(&hex, None)]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should pass range.start()
as the second argument to all dynamic_text
usages. The position is used in the generated source map and we'll use the source map for range formatting.
I'll implement optimization like |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright. I'll assign this back to you. Feel free to ping me if you have any questions and request another review once your done.
…r is already normalized
+x = (0B1011).conjugate() | ||
+x = (0O777).real | ||
+x = 123456789.123456789j.__add__((0b1011).bit_length()) | ||
+x = (0xB1ACC).conjugate() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
now that we have no more trailing dots in floats we can also fix the parentheses rules for them (maybe better in a follow-up PR so that the scope of this PR doesn't get too large)
for (index, c) in chars { | ||
if matches!(c, 'e' | 'E') { | ||
has_exponent = true; | ||
if input.as_bytes().get(index - 1).copied() == Some(b'.') { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this could, at least in theory, result in indexing between two character boundaries if the text starts with [e10]
. I don't think this is a problem in practice because it's impossible that the preceding character is a unicode character, because the number would then be part of an identifier. I still recommend fixing this just to be sure (e.g. by assigning `is_float on line 147)
Summary
Format int, float and complex constants.
Test Plan
Existing snapshots.