-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tokenizer: skip lines that are just slash and whitespace #4343
Conversation
Ah. it affects multiline strings. Should be an easy fix... |
src/blib2to3/pgen2/tokenize.py
Outdated
|
||
# skip lines that are just a slash, to avoid storing that line's | ||
# indent information. | ||
if not contstr and line.rstrip("\n").strip(" \t") == "\\": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this equivalent to just line.strip() == "\\"
? Or do we need to care about exotic whitespace characters that are not newline/space/tab?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not entirely sure, so I tried to be conservative. Does Python normally treat characters like \r
or \v
as part of indentation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not too sure, there is code in our tokenizer that deals with \f
at least. In CPython (Parser/lexer/lexer.c
) I see some code dealing with \r
and with \f
(\014
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
$ printf 'def foo():\n \t pass' | python -m tokenize
1,0-1,3: NAME 'def'
1,4-1,7: NAME 'foo'
1,7-1,8: OP '('
1,8-1,9: OP ')'
1,9-1,10: OP ':'
1,10-1,11: NEWLINE '\n'
2,0-2,3: INDENT ' \t '
2,3-2,7: NAME 'pass'
2,7-2,8: NEWLINE ''
3,0-3,0: DEDENT ''
3,0-3,0: ENDMARKER ''
$ printf 'def foo():\n \f pass' | python -m tokenize
1,0-1,3: NAME 'def'
1,4-1,7: NAME 'foo'
1,7-1,8: OP '('
1,8-1,9: OP ')'
1,9-1,10: OP ':'
1,10-1,11: NEWLINE '\n'
2,0-2,3: INDENT ' \x0c '
2,3-2,7: NAME 'pass'
2,7-2,8: NEWLINE ''
3,0-3,0: DEDENT ''
3,0-3,0: ENDMARKER ''
$ printf 'def foo():\n \r pass' | python -m tokenize
1,0-1,3: NAME 'def'
1,4-1,7: NAME 'foo'
1,7-1,8: OP '('
1,8-1,9: OP ')'
1,9-1,10: OP ':'
1,10-1,11: NEWLINE '\n'
2,1-2,3: OP '\r '
2,3-2,7: NAME 'pass'
2,7-2,8: NEWLINE ''
3,0-3,0: ENDMARKER ''
So \f
is infact legitimate indentation, how interesting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
$ python -c 'print("def foo():\n \v pass")' | python -m tokenize
1,0-1,3: NAME 'def'
1,4-1,7: NAME 'foo'
1,7-1,8: OP '('
1,8-1,9: OP ')'
1,9-1,10: OP ':'
1,10-1,11: NEWLINE '\n'
2,0-2,1: INDENT ' '
<stdin>:2:2: error: invalid non-printable character U+000B
\v
is unparseable.
So editing the PR to do .lstrip(' \t\f')
should take care of all cases I believe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made changes. The reason I can't do .strip()
is because \
must be at the end of the line. If there's spaces after the backslash it's no longer escaping the newline:
$ printf 'print(2 + \\\n3)'
print(2 + \
3)
$ printf 'print(2 + \\\n3)' | python3
5
$ printf 'print(2 + \\ \n3)' | python3
File "<stdin>", line 1
print(2 + \
^
SyntaxError: unexpected character after line continuation character
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
form_feeds.py
contains a \f\
(line 42) that is getting preserved, while the line was being deleted entirely before. Which I think is fine.
@@ -156,6 +156,7 @@ def something(self): | |||
|
|||
# | |||
|
|||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is just a \n
Description
Resolves #4261
Checklist - did you ...
CHANGES.md
if necessary?