Fix formatter instability for lines only consisting of zero-width characters #11748
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #11724 (comment)
Code from the linked issue that produces instable formatting:
This turned out to be something entirely different than I expected.
I initially thought that it was an issue with the docstring rendering, which uses
trim
instead ofPythonWhitespace::trim
.But I learned the better when I compared the generated IR for the above snippet and the same snippet but with
\u{15}
(the non visible character in the snippet above) witha
.Both snippets generate the same IR! That means this is a bug in the
Printer
and not in the Python formatter.It turns out that the fix is relatively simple.
The
Printer
omitshard_line_break
IR elements or only renders a single\n
for aempty_line
IR elements when thecurrent line is all empty. The problem is that the
Printer
uses the line width to determine if the line is empty.However, using the line width here isn't correct because a line can be non-empty but has a zero width if it only consists of zero-width characters, which is exactly what we have in the reported instability.
I considered removing this behavior but we rely heavily on it in
suite.rs
and I'm not going to touch this to fix this bug!So what I did instead is to change the
Printer
to keep track of the start of the current line and test if the current formatted line is empty.I don't suspect that this changes performance much because it only adds a single write after a newline character. Testing whether the line gets a bit more expensive
because it now needs to dereference into the buffer, but I think in total, this is just noise.
And I was wrong lol. THis has a huge perf impact. Let's see if I can do better
Backwards compatibility
It's hard to be a 100% sure, but I'm confident that it doesn't have widespread impact because zero-width characters are rare, even more so zero-width lines only consisting of zero-width characters.
The only zero-width character that Python allows outside strings and comments is the form-feed character. Ruff does not preserve line-feed characters except in suppressed ranges. The existing implementation used a work-around to avoid this very specific printer behavior. This PR allows to remove the work around and the tests keep passing.
For strings, we write the entire string content as a single text, preserving line breaks as is.
For comments: That's the issue discovered here. Existing code would already have the trailing line stripped and reformatting the same code wouldn't insert the trailing line after the zero-width characters line.
So we should be good
Test Plan
Added test. All existing tests keep passing.
Thanks
Thanks to @jasikpark for finding the instability and helping me to reproduce the issue.