Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Format numeric constants #5972

Merged
merged 6 commits into from
Jul 24, 2023
Merged

Format numeric constants #5972

merged 6 commits into from
Jul 24, 2023

Conversation

lkh42t
Copy link
Contributor

@lkh42t lkh42t commented Jul 22, 2023

Summary

Format int, float and complex constants.

Test Plan

Existing snapshots.

@github-actions
Copy link
Contributor

github-actions bot commented Jul 22, 2023

PR Check Results

Benchmark

Linux

group                                      main                                   pr
-----                                      ----                                   --
formatter/large/dataset.py                 1.04      9.7±0.14ms     4.2 MB/sec    1.00      9.3±0.10ms     4.4 MB/sec
formatter/numpy/ctypeslib.py               1.04  1927.2±30.10µs     8.6 MB/sec    1.00   1854.3±4.19µs     9.0 MB/sec
formatter/numpy/globals.py                 1.05    213.5±0.56µs    13.8 MB/sec    1.00    203.0±0.59µs    14.5 MB/sec
formatter/pydantic/types.py                1.03      4.1±0.01ms     6.2 MB/sec    1.00      4.0±0.02ms     6.4 MB/sec
linter/all-rules/large/dataset.py          1.00     13.2±0.23ms     3.1 MB/sec    1.00     13.2±0.18ms     3.1 MB/sec
linter/all-rules/numpy/ctypeslib.py        1.00      3.3±0.04ms     5.0 MB/sec    1.00      3.3±0.04ms     5.0 MB/sec
linter/all-rules/numpy/globals.py          1.00    430.9±1.10µs     6.8 MB/sec    1.00    429.2±0.55µs     6.9 MB/sec
linter/all-rules/pydantic/types.py         1.01      5.9±0.10ms     4.3 MB/sec    1.00      5.9±0.07ms     4.3 MB/sec
linter/default-rules/large/dataset.py      1.00      6.6±0.05ms     6.2 MB/sec    1.00      6.6±0.06ms     6.2 MB/sec
linter/default-rules/numpy/ctypeslib.py    1.01   1411.3±2.00µs    11.8 MB/sec    1.00   1399.0±3.99µs    11.9 MB/sec
linter/default-rules/numpy/globals.py      1.02    157.9±1.19µs    18.7 MB/sec    1.00    155.0±0.87µs    19.0 MB/sec
linter/default-rules/pydantic/types.py     1.00      3.0±0.01ms     8.6 MB/sec    1.01      3.0±0.07ms     8.5 MB/sec

Windows

group                                      main                                   pr
-----                                      ----                                   --
formatter/large/dataset.py                 1.07     14.4±0.83ms     2.8 MB/sec    1.00     13.4±0.67ms     3.0 MB/sec
formatter/numpy/ctypeslib.py               1.03      2.8±0.20ms     5.9 MB/sec    1.00      2.7±0.21ms     6.1 MB/sec
formatter/numpy/globals.py                 1.07   316.7±29.17µs     9.3 MB/sec    1.00   295.9±26.09µs    10.0 MB/sec
formatter/pydantic/types.py                1.02      5.8±0.27ms     4.4 MB/sec    1.00      5.7±0.29ms     4.5 MB/sec
linter/all-rules/large/dataset.py          1.01     19.3±0.90ms     2.1 MB/sec    1.00     19.1±1.17ms     2.1 MB/sec
linter/all-rules/numpy/ctypeslib.py        1.00      4.7±0.30ms     3.6 MB/sec    1.09      5.1±0.24ms     3.3 MB/sec
linter/all-rules/numpy/globals.py          1.00   593.9±46.89µs     5.0 MB/sec    1.03   610.7±32.02µs     4.8 MB/sec
linter/all-rules/pydantic/types.py         1.00      9.0±0.55ms     2.8 MB/sec    1.03      9.2±0.54ms     2.8 MB/sec
linter/default-rules/large/dataset.py      1.03     10.2±0.54ms     4.0 MB/sec    1.00      9.9±0.49ms     4.1 MB/sec
linter/default-rules/numpy/ctypeslib.py    1.03      2.1±0.10ms     7.9 MB/sec    1.00      2.0±0.15ms     8.2 MB/sec
linter/default-rules/numpy/globals.py      1.07   246.7±18.02µs    12.0 MB/sec    1.00   230.5±14.44µs    12.8 MB/sec
linter/default-rules/pydantic/types.py     1.02      4.5±0.20ms     5.7 MB/sec    1.00      4.4±0.20ms     5.9 MB/sec

Copy link
Member

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wow nice! Excellent work.

What I understand from the code is that the formatted now uses dynamic_text for every number constant even if it is already correctly formatted. This is, unfortunately, somewhat expensive because every dynamic_text allocates a String when writing the text.

Would you be interested in taking a stab at implementing an optimisation similar to what we did in normalize_string

fn normalize_string(input: &str, quotes: StringQuotes) -> (Cow<str>, ContainsNewlines) {
// The normalized string if `input` is not yet normalized.
// `output` must remain empty if `input` is already normalized.
let mut output = String::new();
// Tracks the last index of `input` that has been written to `output`.
// If `last_index` is `0` at the end, then the input is already normalized and can be returned as is.
let mut last_index = 0;
let mut newlines = ContainsNewlines::No;
let style = quotes.style;
let preferred_quote = style.as_char();
let opposite_quote = style.invert().as_char();
let mut chars = input.char_indices();
while let Some((index, c)) = chars.next() {
if c == '\r' {
output.push_str(&input[last_index..index]);
// Skip over the '\r' character, keep the `\n`
if input.as_bytes().get(index + 1).copied() == Some(b'\n') {
chars.next();
}
// Replace the `\r` with a `\n`
else {
output.push('\n');
}
last_index = index + '\r'.len_utf8();
newlines = ContainsNewlines::Yes;
} else if c == '\n' {
newlines = ContainsNewlines::Yes;
} else if !quotes.triple {
if c == '\\' {
if let Some(next) = input.as_bytes().get(index + 1).copied().map(char::from) {
#[allow(clippy::if_same_then_else)]
if next == opposite_quote {
// Remove the escape by ending before the backslash and starting again with the quote
chars.next();
output.push_str(&input[last_index..index]);
last_index = index + '\\'.len_utf8();
} else if next == preferred_quote {
// Quote is already escaped, skip over it.
chars.next();
} else if next == '\\' {
// Skip over escaped backslashes
chars.next();
}
}
} else if c == preferred_quote {
// Escape the quote
output.push_str(&input[last_index..index]);
output.push('\\');
output.push(c);
last_index = index + preferred_quote.len_utf8();
}
}
}
let normalized = if last_index == 0 {
Cow::Borrowed(input)
} else {
output.push_str(&input[last_index..]);
Cow::Owned(output)
};
(normalized, newlines)
}

The trick is to use a Cow and return Cow::Borrowed if the number is already correctly formatted (in which case we can print the code from the source using source_text_slice (extremely fast). The implementation returns a Cow::Owned if it reformatted the number and it then uses dynamic_text.

The benefit of this optimisation is that we only pay the cost of allocating when formatting the file for the first time. Checking the formatting in subsequent passes (when there are no changes) takes the fast path instead.

Let me know if that's something you're interested in and if so, if you want to tackle this as part of this PR or want to do a follow up instead.


if content.starts_with("0x") || content.starts_with("0X") {
let hex = content.get(2..).unwrap().to_ascii_uppercase();
write!(f, [text("0x"), dynamic_text(&hex, None)])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should pass range.start() as the second argument to all dynamic_text usages. The position is used in the generated source map and we'll use the source map for range formatting.

@lkh42t
Copy link
Contributor Author

lkh42t commented Jul 22, 2023

I'll implement optimization like normalize_string. Thanks!

Copy link
Member

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright. I'll assign this back to you. Feel free to ping me if you have any questions and request another review once your done.

@lkh42t lkh42t requested a review from MichaReiser July 22, 2023 12:48
crates/ruff_python_formatter/src/expression/number.rs Outdated Show resolved Hide resolved
+x = (0B1011).conjugate()
+x = (0O777).real
+x = 123456789.123456789j.__add__((0b1011).bit_length())
+x = (0xB1ACC).conjugate()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now that we have no more trailing dots in floats we can also fix the parentheses rules for them (maybe better in a follow-up PR so that the scope of this PR doesn't get too large)

@konstin konstin added the formatter Related to the formatter label Jul 23, 2023
for (index, c) in chars {
if matches!(c, 'e' | 'E') {
has_exponent = true;
if input.as_bytes().get(index - 1).copied() == Some(b'.') {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this could, at least in theory, result in indexing between two character boundaries if the text starts with [e10]. I don't think this is a problem in practice because it's impossible that the preceding character is a unicode character, because the number would then be part of an identifier. I still recommend fixing this just to be sure (e.g. by assigning `is_float on line 147)

@MichaReiser MichaReiser enabled auto-merge (squash) July 24, 2023 06:58
@MichaReiser MichaReiser mentioned this pull request Jul 24, 2023
72 tasks
@MichaReiser MichaReiser merged commit dfa81b6 into astral-sh:main Jul 24, 2023
@lkh42t lkh42t deleted the format-number branch July 29, 2023 05:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
formatter Related to the formatter
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants