Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider emitting buffered DEDENT tokens on the last line #104976

Closed
pablogsal opened this issue May 26, 2023 · 6 comments
Closed

Consider emitting buffered DEDENT tokens on the last line #104976

pablogsal opened this issue May 26, 2023 · 6 comments
Assignees

Comments

@pablogsal
Copy link
Member

pablogsal commented May 26, 2023

In Python 3.12, porting the tokenizer to use the C tokenizer underneath to support PEP 701 has now a documented change in docs.python.org/3.12/whatsnew/3.12.html#changes-in-the-python-api:

Some final DEDENT tokens are now emitted within the bounds of the input. This means that for a file containing 3 lines, the old version of the tokenizer returned a DEDENT token in line 4 whilst the new version returns the token in line 3.

Apparently, this affects negatively some formatting tools (see PyCQA/pycodestyle#1142). Let's consider what options do we have and see if we can fix this without adding a lot of maintenance burden to the C tokenizer or slowing down everything.

Linked PRs

@pablogsal
Copy link
Member Author

pablogsal commented May 26, 2023

We can do something like this:

diff --git a/Lib/tokenize.py b/Lib/tokenize.py
index 911f0f12f9..63dc44b0dc 100644
--- a/Lib/tokenize.py
+++ b/Lib/tokenize.py
@@ -452,7 +452,9 @@ def _tokenize(rl_gen, encoding):
         yield token
     if token is not None:
         last_line, _ = token.start
-        yield TokenInfo(ENDMARKER, '', (last_line + 1, 0), (last_line + 1, 0), '')
+        if token.type != DEDENT:
+            last_line != 1
+        yield TokenInfo(ENDMARKER, '', (last_line, 0), (last_line, 0), '')


 def generate_tokens(readline):
diff --git a/Python/Python-tokenize.c b/Python/Python-tokenize.c
index 88087c1256..2a09dfd94a 100644
--- a/Python/Python-tokenize.c
+++ b/Python/Python-tokenize.c
@@ -214,6 +214,10 @@ tokenizeriter_next(tokenizeriterobject *it)
     }

     if (it->tok->tok_extra_tokens) {
+        if (type == DEDENT && it->tok->done == E_EOF) {
+            lineno = end_lineno = lineno + 1;
+            col_offset = end_col_offset = 0;
+        }
         // Necessary adjustments to match the original Python tokenize
         // implementation
         if (type > DEDENT && type < OP) {

but I think this forces us to somehow handle the ENDMARKER internally. Maybe that's a possible solution but I fear this still has some side effects.

@pablogsal
Copy link
Member Author

@lysnikolaou thoughts?

@pablogsal
Copy link
Member Author

CC: @mgmacias95

@pablogsal pablogsal self-assigned this May 26, 2023
@pablogsal
Copy link
Member Author

Opened #104980 to test this idea

@lysnikolaou
Copy link
Contributor

Yeah, I think that, if we want to support doing the same thing as 3.11, the only way is to special-case it Python-tokenize.c and not in the C tokenizer itself.

pablogsal added a commit to pablogsal/cpython that referenced this issue May 26, 2023
pablogsal added a commit to pablogsal/cpython that referenced this issue May 26, 2023
pablogsal added a commit to pablogsal/cpython that referenced this issue May 26, 2023
@pablogsal
Copy link
Member Author

Yeah, I think that, if we want to support doing the same thing as 3.11, the only way is to special-case it Python-tokenize.c and not in the C tokenizer itself.

Ok, then check if you like #104980

pablogsal added a commit to pablogsal/cpython that referenced this issue May 26, 2023
pablogsal added a commit to pablogsal/cpython that referenced this issue May 26, 2023
pablogsal added a commit to pablogsal/cpython that referenced this issue May 26, 2023
pablogsal added a commit to pablogsal/cpython that referenced this issue May 26, 2023
miss-islington pushed a commit to miss-islington/cpython that referenced this issue May 26, 2023
…vious tokenizer (pythonGH-104980)

(cherry picked from commit 46b52e6)

Co-authored-by: Pablo Galindo Salgado <[email protected]>
Signed-off-by: Pablo Galindo <[email protected]>
pablogsal added a commit that referenced this issue May 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants