-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update proof of concept with respect to PEP 701 changes #22
Comments
To clarify, I wasn't proposing |
@rmorshea We just chatted about it, so good timing. Should we find a Discord somewhere to keep you in the loop? |
@pauleveritt, I'm happy to hop on discord. |
@rmorshea I'm sitting in the Python Discord ATM as |
I am slowly working through the update (starting from scratch, but expect to start copying from #1 soon). See https://github.com/python/cpython/compare/main...gvanrossum:tag-strings-v2?expand=1 |
Ready, see python/cpython#103835. Have fun! |
My first reaction from only looking at the PR (without trying out the tag string examples) is that the use of the recent changes in PEP 701 has very nicely simplified the necessary code for tag strings. So this validates our thinking about what PEP 701 could help support in its formalization/remove hacky code, even if it wasn't explicitly written to support tag strings. |
Most of the examples in this repo do work, but I saw two problems with the current PR. First, a useful helper function:
So in general, it works as expected:
But one assertion failure:
And it's unclear to me how we should handle Unicode escapes now (I guess I will have to read the code here more closely):
|
Thanks for flagging those. I'll try to figure out what's going on with the crash. For \N, what do "rf" strings do? |
I have pushed a fix for the crash. It was a bit weird. I'm still pondering the |
Hm, "sanely" may be an exaggeration. In the old branch
|
Speaking of the treatment of something like There have been some changes since then. Let's look at the following:
So at the very least we should expect it to behave like As to which one is sane... it's an interesting corner case. The argument for processing it like |
Pushed a fix for >>> tag"\N{GRINNING FACE}"
File "<stdin>", line 1
tag"\N{GRINNING FACE}"
^^^^^^^^^^^^^
SyntaxError: invalid syntax. Perhaps you forgot a comma?
>>> The workaround is to double the curlies: >>> tag"\N{{GRINNING FACE}}"
('\\N{', 'GRINNING FACE}')
>>> |
This is probably the right way to do it, given that we get |
@gvanrossum Never mind for now - I tried running on another Linux box and it does work. It's always possible I have some out of date header or whatever for SQLite, or somewhere else on the tool chain. But still strange that it's only with this usage. All examples in the examples directory now work (for some definition of work, I need to add a tests issue so some of them can be turned into such for automated testing).
|
Hm, interesting. It's not in the parser or bytecode compiler, as it prints the message about pip installing sqlalchemy when I try to run it. Note: the instructions contain a bug -- you need to quote the |
Hm, interesting that you have it working on a Linux box. Maybe sqlalchemy doesn't depend on greenlet there? |
So I looked at the failing tests in my draft PR, and tried to fix them. Unfortunately I found that one test seems to be failing due to an issue with resetting the tokenizer; I'll probably have to get @pablogsal or @lysnikolaou to help me with that. Most of the failures were of a simpler nature, which we will need to mention in the PEP and docs: where previously, if the letter or letters immediately preceding a string literal are incorrect (e.g. |
@lysnikolaou Any chance you can help me with getting this last test passing? See the commit in python/cpython#103835: python/cpython@e37d679 |
Sure I can help! From the commit message of python/cpython@e37d679:
I don't think that the parser backtracks and the tokenizer produces TAGSTRING_START instead of FSTRING_START. My understanding is that the conversion character ❯ ./python.exe
Python 3.12.0a7+ (heads/tag-strings-v2:e37d679a79, Apr 26 2023, 22:15:21) [Clang 14.0.3 (clang-1403.0.22.14.1)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import tokenize
>>> src = "f'{3!s'"
>>> for token in tokenize._generate_tokens_from_c_tokenizer(src):
... print(token)
...
TokenInfo(type=61 (FSTRING_START), string="f'", start=(1, 0), end=(1, 2), line="f'{3!s'\n")
TokenInfo(type=25 (LBRACE), string='{', start=(1, 2), end=(1, 3), line="f'{3!s'\n")
TokenInfo(type=2 (NUMBER), string='3', start=(1, 3), end=(1, 4), line="f'{3!s'\n")
TokenInfo(type=54 (EXCLAMATION), string='!', start=(1, 4), end=(1, 5), line="f'{3!s'\n")
TokenInfo(type=62 (TAGSTRING_START), string="s'", start=(1, 5), end=(1, 7), line="f'{3!s'\n")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/lysnikolaou/repos/python/cpython/Lib/tokenize.py", line 688, in _generate_tokens_from_c_tokenizer
for info in c_tokenizer.TokenizerIter(source):
File "<string>", line 1
f'{3!s'
^
SyntaxError: unterminated f-string literal (detected at line 1) I think what we will need to do is emit NAME after an exclamation point in an f-string interpolation even if it's followed by a quote. I can work on that if you want. By the way, congrats on everyone involved. This is indeed very exciting! |
There's also the slightly more difficult case of something like
With the current implementation the syntax error message reads |
Thanks! You're likely right. Interestingly you can get the main branch to emit the unexpected error too: >>> f'{3!f'
File "<stdin>", line 1
f'{3!f'
^^
SyntaxError: f-string: invalid conversion character
>>> (should be: "expecting '}'"). So your fix for this should be applied to main, not just to my branch. :-) The same argument applies to the other problem case: >>> f'{f'
File "<stdin>", line 1
f'{f'
^
SyntaxError: unterminated f-string literal (detected at line 1)
>>> And why is that treated differently than this? >>> f'{b'
File "<stdin>", line 1
f'{b'
^
SyntaxError: f-string: expecting '}'
>>> Honestly I don't care that much about getting the right error in every case -- mismatched string quotes can lead to many different errors. (E.g. what if a user accidentally puts an unescaped quote inside a string.) But the tests insist on certain error messages, so something needs to be done. |
I think that the error message on the main branch is okay, since
The difference is that string-prefixes that were there before ( There's another point that I'd like to bring up, just in case you haven't considered it for the PEP. I had a flash the other day about the time I tried to add a specific syntax error for invalid string prefixes. People started complaining because it was backwards-incompatible for cases like this: def f():
return'something' or 'something' if expression else'something else' Basically, all code that does not add spaces between a keyword and a string breaks. And that's true for this change as well. On main: ❯ ./python.exe
Python 3.12.0a7+ (heads/main:1d99e9e46e, May 3 2023, 13:07:40) [Clang 14.0.3 (clang-1403.0.22.14.1)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> def f():
... return'1'
...
>>> f()
'1'
>>> 'something' if 1 else'something else'
'something' On the tag-string branch: ❯ ./python.exe
Python 3.12.0a7+ (heads/tag-strings-v2:e37d679a79, May 3 2023, 13:11:32) [Clang 14.0.3 (clang-1403.0.22.14.1)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> def f():
... return'1'
...
>>> f()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 2, in f
NameError: name 'return' is not defined
>>> 'something' if 1 else'something else'
File "<stdin>", line 1
'something' if 1 else'something else'
^^^^^^^^^^^^^^^^
SyntaxError: expected 'else' after 'if' expression We ended up reverting that commit. There might be ways around this (one of which is transferring keyword handling in the tokenizer, which probably isn't a good idea) and I'd be up to do the work for it, even if it's just writing a section on the PEP, in case there's not one already. |
Yeah, there are endless variations on the theme of invalid conversion characters. Latest fun (on main): >>> f'{3!r'''
...
File "<stdin>", line 1
f'{3!r'''
^
SyntaxError: unterminated triple-quoted string literal (detected at line 1)
>>> (There's an invisible Personally I think the lexer probably has to special-case the character after About |
Following up on this issue, there are two specific changes we need to clarify the PEP. Other details are respect with the implementation, and be deferred for now.
whereas current cpython main is expected:
The straightforward solution proposed in #22 (comment) is probably what we need here.
from a PEP perspective, we should follow what the user has written: exactly one string (specifically
and this cooks to
@pauleveritt let's add a subsection to
|
Just sent @jimbaker a PR for review on this. |
PEP 701, as tracked in the CPython implementation and recently merged by @pablogsal , python/cpython#102856, required changes to the tokenizer and parser for the cleaned up implementation of f-strings. This impacts the initial proof of concept work in #1 written by @gvanrossum
First, we have discussed two possible syntaxes, as seen in the below examples:
html'<li>{name}</li>'
(this is what we have generally done, with the tag not being dotted and no whitespace between it and the string part)html @ f'<li>{name}</li>'
, as with the transpiler work by @rmorshea in Interim Transpiler #20In my reading of the PEP, the easiest/most straightforward change is to leverage PEP 701 f-strings directly, producing these tokens in either case. (So the scheme in #20 more explicitly maps to this.)
In other words, this is the exact tokenization produced as now, other than the first token
NAME
. This tokenization should be straightforward to then parse, including any necessary updates to the POC (possibly including removing more of the changes that were done, one can hope!).The advantage of this approach (at least at first glance) is that we avoid duplicating a lot of the new work that was done for PEP 701, such as working with brace nesting, and then any subsequent tokenization of the interpolations with potential nesting.
The text was updated successfully, but these errors were encountered: