Fix curly brace escape handling in f-strings #7331

dhruvmanila · 2023-09-13T09:45:55Z

Summary

This PR fixes the escape handling of curly braces inside a f-string. There are 2
main changes:

Lexer

The lexer change was actually a bug. Instead of breaking as soon as we find a
curly brace after the \ character, we'll continue and let the next iteration
handle it in the curly brace branch. This fixes the following case:

f"\{{foo}}"
#  ^ use the curly brace branch to handle this character instead of breaking

Parser

We can encounter a \ as the last character in a FStringMiddle token which is
valid in this context¹. For example,

f"\{foo} \{bar:\}"
# ^     ^^     ^
# The marked characters are part of 3 different `FStringMiddle` token

Here, the FStringMiddle token content will be "\" and " \" which is invalid in
a regular string literal. However, it's valid here because it's a substring of a
f-string. Even though curly braces cannot be escaped, it's a valid syntax.

Test Plan

Verified that existing test cases are passing and add new test cases for the lexer and parser.

Refer to point 3 in https://peps.python.org/pep-0701/#rejected-ideas ↩

dhruvmanila · 2023-09-13T09:46:11Z

Current dependencies on/for this PR:

main
- PR Add a NotebookError type to avoid returning Diagnostics on error #7035
  - PR Make SourceKind a required parameter #7013
    - PR Add support for the new f-string tokens per PEP 701 #6659
      - PR Add support for parsing f-string as per PEP 701 #7041
        
        PR Use narrow type for string parsing patterns #7211
        
        PR Disallow non-parenthesized lambda expr in f-string #7263
        
        PR Fix curly brace escape handling in f-strings #7331 👈
        
        PR Update Indexer to use new f-string tokens #7325
        
        PR Detect noqa directives for multi-line f-strings #7326
        PR Update F541 to use new f-string tokens #7327
        PR Update Stylist quote detection with new f-string token #7328
        PR Update W605 to check in f-strings #7329

This comment was auto-generated by Graphite.

crates/ruff_python_parser/src/lexer.rs

MichaReiser · 2023-09-13T11:46:09Z

crates/ruff_python_parser/src/string.rs

+                // valid here because it's a substring of a f-string. Even though
+                // curly braces cannot be escaped, it's a valid syntax.
+                //
+                // Refer to point 3: https://peps.python.org/pep-0701/#rejected-ideas


This section doesn't explain why it is valid. It only explains why it isn't an escape sequence. I assume it is valid because allows invalid escape sequences in general (Do we have a lint rule for that, does it need updating?)

I actually noticed this while I was working on https://github.com/astral-sh/ruff/pull/7327/files#diff-739ef20efc60adce8f2acfaeda1005a932c58536433cbb895b323d335fffdd63 where at the end of the test file, the following was mentioned:

# To be fixed # Error: f-string: single '}' is not allowed at line 41 column 8 # f"\{{x}}"

So, then I started looking into it and realize that we don't parse this as a valid syntax. Python parses it as a valid syntax pre 3.12 as well although with 3.12 it produces a syntax warning:

$ python3.12-dev fstring.py /Users/dhruv/playground/ruff/fstring.py:2: SyntaxWarning: invalid escape sequence '\{' f"\{foo}"

(Do we have a lint rule for that, does it need updating?)

We do have W605 (tracking issue) but as we didn't parse the syntax before, we didn't raise any warning. Now, we have the ability to raise that warning and I think we should do it.

This will be used to extract the `leading` and `trailing` text for f-string debug expressions.

codspeed-hq · 2023-09-14T02:26:16Z

CodSpeed Performance Report

Merging #7331 will degrade performances by 9.55%

_{⚠️ No base runs were found}

_{Falling back to comparing dhruv/fstring-parser-4 (a8e4218) with main (04183b0)}

Summary

❌ 8 regressions
✅ 17 untouched benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

	Benchmark	`main`	`dhruv/fstring-parser-4`	Change
❌	`lexer[unicode/pypinyin.py]`	620.2 µs	672.2 µs	-7.73%
❌	`lexer[large/dataset.py]`	9.8 ms	10.7 ms	-8.22%
❌	`lexer[numpy/globals.py]`	233.2 µs	252.8 µs	-7.74%
❌	`lexer[numpy/ctypeslib.py]`	2 ms	2.1 ms	-7.55%
❌	`parser[large/dataset.py]`	68.4 ms	70.2 ms	-2.56%
❌	`parser[numpy/ctypeslib.py]`	12.4 ms	12.7 ms	-2.57%
❌	`parser[unicode/pypinyin.py]`	4.3 ms	4.4 ms	-2.78%
❌	`lexer[pydantic/types.py]`	4.1 ms	4.5 ms	-9.55%

## Summary This PR fixes the escape handling of curly braces inside a f-string. There are 2 main changes: ### Lexer The lexer change was actually a bug. Instead of breaking as soon as we find a curly brace after the `\` character, we'll continue and let the next iteration handle it in the curly brace branch. This fixes the following case: ```python f"\{{foo}}" # ^ use the curly brace branch to handle this character instead of breaking ``` ### Parser We can encounter a `\` as the last character in a `FStringMiddle` token which is valid in this context[^1]. For example, ```python f"\{foo} \{bar:\}" # ^ ^^ ^ # The marked characters are part of 3 different `FStringMiddle` token ``` Here, the `FStringMiddle` token content will be `"\"` and `" \"` which is invalid in a regular string literal. However, it's valid here because it's a substring of a f-string. Even though curly braces cannot be escaped, it's a valid syntax. [^1]: Refer to point 3 in https://peps.python.org/pep-0701/#rejected-ideas ## Test Plan Verified that existing test cases are passing and add new test cases for the lexer and parser.

dhruvmanila requested a review from MichaReiser September 13, 2023 09:47

dhruvmanila added parser Related to the parser python312 Related to Python 3.12 labels Sep 13, 2023

MichaReiser requested a review from charliermarsh September 13, 2023 11:40

MichaReiser approved these changes Sep 13, 2023

View reviewed changes

dhruvmanila force-pushed the dhruv/fstring-parser-4 branch from d184850 to 6c4f8c2 Compare September 13, 2023 12:48

dhruvmanila added 12 commits September 14, 2023 07:02

Add support for the new f-string tokens per PEP 701

0cc8221

Emit empty FStringMiddle token for special case

12a7c3a

Update comment from code review

72f9a21

Avoid tracking parentheses nesting multiple times

0159ae0

Add test for empty FStringMiddle tok in lambda expr

63f43dc

Code review changes

bfa0296

Emit empty token only in format spec, handle SingleRBrace error

5db6b20

Fix incorrect position to start the f-string token

be4a3eb

Update snapshots

adb7a05

Pass in the source code to the parser

22cfd95

This will be used to extract the `leading` and `trailing` text for f-string debug expressions.

Add support for parsing f-strings as per PEP 701

408c58f

Skip the empty FStringMiddle token emitted by lexer

b70048a

dhruvmanila added 8 commits September 14, 2023 07:02

Add some more f-string test cases

ab09524

Code review changes

cdc7b45

Pass arguments in the correct order

19653a3

Use narrow types for string parsing patterns

9d8012d

Remove single string literal pattern

18188d0

Fix typo (@L -> @R)

47c3215

Disallow non-parenthesized lambda expr in f-string

06ba358

Fix curly brace escape handling for f-strings

44f3155

dhruvmanila force-pushed the dhruv/fstring-parser-3 branch from 25fad0c to 06ba358 Compare September 14, 2023 01:37

dhruvmanila force-pushed the dhruv/fstring-parser-4 branch from 6c4f8c2 to 44f3155 Compare September 14, 2023 01:37

Base automatically changed from dhruv/fstring-parser-3 to dhruv/pep-701 September 14, 2023 02:15

Merge branch 'dhruv/pep-701' into dhruv/fstring-parser-4

a8e4218

dhruvmanila merged commit 4df8e0a into dhruv/pep-701 Sep 14, 2023
10 of 14 checks passed

dhruvmanila deleted the dhruv/fstring-parser-4 branch September 14, 2023 02:18

dhruvmanila mentioned this pull request Sep 14, 2023

Add support for PEP 701 #7376

Merged

dhruvmanila mentioned this pull request Sep 15, 2023

Add support for PEP 701 in the parser #7043

Closed

This was linked to issues Sep 15, 2023

Add support for PEP 701 in the parser #7043

Closed

Add support for PEP 701 in the lexer #7042

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix curly brace escape handling in f-strings #7331

Fix curly brace escape handling in f-strings #7331

dhruvmanila commented Sep 13, 2023 •

edited

Loading

dhruvmanila commented Sep 13, 2023

MichaReiser Sep 13, 2023

dhruvmanila Sep 13, 2023

codspeed-hq bot commented Sep 14, 2023

Fix curly brace escape handling in f-strings #7331

Fix curly brace escape handling in f-strings #7331

Conversation

dhruvmanila commented Sep 13, 2023 • edited Loading

Summary

Lexer

Parser

Test Plan

Footnotes

dhruvmanila commented Sep 13, 2023

MichaReiser Sep 13, 2023

Choose a reason for hiding this comment

dhruvmanila Sep 13, 2023

Choose a reason for hiding this comment

codspeed-hq bot commented Sep 14, 2023

CodSpeed Performance Report

Merging #7331 will degrade performances by 9.55%

Summary

Benchmarks breakdown

dhruvmanila commented Sep 13, 2023 •

edited

Loading