-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider line continuation character for re-lexing #12008
Conversation
|
if let Some(second_slash) = reverse_chars.next_if_eq(&'\\') { | ||
// Line continuation character has been escaped: `\\\n` | ||
newline_position = Some(current_position); | ||
// Set the newline position before updating the current position. | ||
current_position -= first_slash.text_len() - second_slash.text_len(); | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's still more complicate than this. What about \\\
Here, we have an escaped backslash followed by a continuation :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. I guess we'd need to count the number of backslashes and make a decision based on whether it's odd or even.
let mut backslash_count = 0; | ||
while reverse_chars.next_if_eq(&'\\').is_some() { | ||
backslash_count += 1; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Last comment :) Do we need to restrict the escape handling to cases where we know we're inside a string? Or wouldn't that work in case of an unterminated string literal?
I'm not sure if it matters because the parser is already in an error recovery state when encountering an escaped \
outside of a string, but it might be worth to add a test for it
test + a \\\
more
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I follow here. Do you mean to ask whether this logic needs to be restricted to only recovering within a string or not? I don't think so that is necessary, I'll add a test case for line continuation character encountered while re-lexing outside of a string.
## Summary This PR fixes a bug introduced in #12008 which didn't consider the two character newline after the line continuation character. For example, consider the following code highlighted with whitespaces: ```py call(foo # comment \\r\n \r\n def bar():\r\n ....pass\r\n ``` The lexer is at `def` when it's running the re-lexing logic and trying to move back to a newline character. It encounters `\n` and it's being escaped (incorrect) but `\r` is being escaped, so it moves the lexer to `\n` character. This creates an overlap in token ranges which causes the panic. ``` Name 0..4 Lpar 4..5 Name 5..8 Comment 9..20 NonLogicalNewline 20..22 <-- overlap between Newline 21..22 <-- these two tokens NonLogicalNewline 22..23 Def 23..26 ... ``` fixes: #12028 ## Test Plan Add a test case with line continuation and windows style newline character.
Summary
This PR fixes a bug where the re-lexing logic didn't consider the line continuation character being present before the newline character. This meant that the lexer was being moved back to the newline character which is actually ignored via
\
.Considering the following code:
The old token stream is:
Notice how the ranges are overlapping between the
FStringMiddle
token and the tokens emitted after moving the lexer backwards.After this fix, the new token stream which is without moving the lexer backwards in this scenario:
fixes: #12004
Test Plan
Add a test case and update the snapshots.