Example tags #8

jimbaker · 2022-05-18T03:23:11Z

gvanrossum · 2022-05-18T04:40:55Z

We could also have one that removes indentation from a triple-quoted string, like textwrap.dedent(), so you can write

from textwrap import dd
def main():
    code = dd"""
        def f(x, y):
            print(x, x+y)
    """
    eval(code)

instead of having to place the text flush left (breaking the visual indentation) or call textwrap.dedent() manually on the code.

ericsnowcurrently · 2022-05-23T15:51:58Z

We could also have one that removes indentation from a triple-quoted string, like textwrap.dedent()

This could be especially helpful when you have multi-line text you inserting. Here's an example with complex regular expressions:

VERSION = textwrap.dedent(r'''
    (?:
        ( \d+ )  # <major>
        \.
        ( \d+ )  # <minor>
        (?:
            \.
            ( \d+ )  # <micro>
         )?
        (
            ( a | b | c | rc | f )  # <level>
            ( \d+ )  # <serial>
         )?
     )
'''
DATE = textwrap.dedent(r'''
    (?:
        ( \d{4} )  # <year>
        -
        ( \d\d )  # <month>
        -
        ( \d\d )  # <day>
     )
'''
REGEX = re.compile(rf'''
    ^
    (?:
        (?:
            ( v )?  # <prefix>
            {VERSION}
         )
        |
        (?:
            {DATE}
         )
     )
    $
''', re.VERBOSE)

I've had to deal with this in a number of projects. Without any extra effort, the resulting patterns are harder to read, which matters a lot when you are trying to debug a regex. A textwrap.dd could easily apply the indent to the interpolated multi-line string.

Another interesting variation would be a similar tag in the re module, for verbose patterns like the ones above.

jimbaker · 2022-05-24T11:58:16Z

There's potential utility for a built-in textwrap.dd that does interpolations, since that's often why we have quoted code fragments - we are building out code for eval. One possible gotcha in usage - in the above example, DATE is compiling a regex that uses \d{4} (common for years, perhaps the most common use of quantifiers to specify repetitions). On the other hand, as we see with REGEX, the use of format-style '{}' interpolations in regexes is very common in the stdlib, especially in tests. This distinction seems to work fine in practice.

This is the nature of having metacharacters - we have to distinguish them from the code that's using them (whether Python, regex minilanguage, or \LaTeX). Doubling the braces to get the behavior in the underlying language, as opposed to interpolation, seems reasonable and hopefully everyone is used to by now in using f-strings.

jimbaker · 2022-06-01T04:19:31Z

@gvanrossum See https://github.com/jimbaker/tagstr/blob/main/examples/code.py, which implements a fairly minimal Python code templating tag, code. However, so far I haven't figured out good ergonomics for this tag. In any event, it can be used like so:

from code import code

def main():
    my_code = code"""
        def f(x, y):
            print(x, x+y)
    """
    exec(str(my_code), globals())
    # f is now available in globals
    f(1, 2)

main()

(Note that exec is needed here, not eval.)

It does do the autodedent, plus supporting interpolation. Interestingly it can do stuff like this, where we have code inserted into code. I think it's sufficiently white-space aware, although clearly Python is not a Lisp with lots of parentheses.

from code import code

y = 7

def main():
    my_code = code"""
        def f(x, y):
            print(x, x+y)
    """
    exec(str(my_code), globals())
    f(1, 2)

    some_code = code"""
        def outer(x):
            {my_code}
            f(x, y)
    """

    print(some_code)
    exec(str(some_code), globals())    
    outer(5)

main()

It feels like there's something potentially useful here for templating code. I would like to cover examples like the ones seen in dataclasses, such as in its _create_fn - https://github.com/python/cpython/blob/3.10/Lib/dataclasses.py#L412

gvanrossum · 2022-06-01T04:38:59Z

Okay, so the weird thing is that you have to write exec(str(my_code)) rather than just exec(my_code), so that the second example, where {my_code} occurs inside another code string. Maybe we can make that smoother by making the tag return a subclass of str? Then exec(code"...") will just work, but the subclass can be special-cased for interpolations.

I wouldn't get too tied to dataclasses here, they're stdlib magic and we're more concerned about performance there than about readability of the implementation. :-)

jimbaker · 2022-06-01T04:44:49Z

@ericsnowcurrently It would be interesting to implement a minimal re tag that could provide some useful templating for regular expressions. Maybe this is just the equivalent of fr and textwrap.dedent. Another thought is that there might be a useful tag for a subset of regular expressions. So this could be something to support tags like this one:

version = globish"{major:\d+}.{minor:\d+}(.{micro:\d+})?({level:a|b|c|rc|f}{serial:\d+})?"

as we can see with the zeroth implementation of globish:

def globish(*args: str | Thunk):
    print(args)

which results in something like the following:

((<function <lambda> at 0x101b00520>, 'major', None, '\\d+'), '.', (<function <lambda> at 0x101b02780>, 'minor', None, '\\d+'), '(.', (<function <lambda> at 0x101b02af0>, 'micro', None, '\\d+'), ')?(', (<function <lambda> at 0x101b01c80>, 'level', None, 'a|b|c|rc|f'), (<function <lambda> at 0x101b8d860>, 'serial', None, '\\d+'), ')?')

In particular, we can ignore getvalue in the thunk, and instead use raw and formatspec for some very creative DSL construction. 😁 In particular, it might enable unapply schemes in structural pattern matching, as well as recursive construction of the globish matchers.

jimbaker · 2022-06-01T05:02:35Z

Okay, so the weird thing is that you have to write exec(str(my_code)) rather than just exec(my_code), so that the second example, where {my_code} occurs inside another code string. Maybe we can make that smoother by making the tag return a subclass of str? Then exec(code"...") will just work, but the subclass can be special-cased for interpolations.

Yes, that sounds like a very good nice simplification in usage! Most tags will have a natural string representation and corresponding usage. (Additional methods for the rest.) Note that I had a workaround for this in the shell example, where I used the fact subprocess.run would work with an __iter__, but still accept shell=True, but subclassing str would have been better. I'm sure it will come up in other tags too.

I wouldn't get too tied to dataclasses here, they're stdlib magic and we're more concerned about performance there than about readability of the implementation. :-)

Hah. Now arguably there might a suitable code tag that could have a C implementation, and be even faster than the tuned dataclasses implementation. But we are geting ahead of ourselves!

gvanrossum · 2022-06-01T05:05:01Z

Maybe there could be a built-in ‘re’ tag that compiles a regex. Somewhere could do `re”a.*z”.search(line)`. (Hm, that might look like line noise to some. Then again that is in regex’s nature. :-)

…

-- --Guido (mobile)

rmorshea · 2022-06-01T05:05:35Z

One that could be interesting would be a tag that parses an RST or Markdown string and produces a tree of docutils nodes. This could be useful for writing sphinx extensions. I ran into some cases where I found it easier to just write RST template strings than trying to figure out to construct the nodes properly myself. This strategy of just using normal string formatting turned out to be problematic when I introduced the Myst parser extension to my Sphinx project since that changes the underlying rendering machinery. Not something for the standard library, but useful nonetheless.

jimbaker · 2022-06-01T05:28:52Z

One that could be interesting would be a tag that parses an RST or Markdown string and produces a tree of docutils nodes...

This makes sense to me - basically if we have a DSL that would benefit from interpolation and/or recursive construction, it seems like the tag string support is actually quite nice. Let's work through an example!

Not something for the standard library, but useful nonetheless.

So far I don't think we have any tags that would be candidates for the stdlib. I was initially thinking we might need taglib in the stdlib, but given the simplifications that we reached at PyCon, it is so-far small and probably should evolve separately as a third-party library in PyPI. (Some functionality that should be added to it would include 1) template compilation; 2) caching support; 3) reconstruction of the entire raw string, including taking into account conv.)

jimbaker · 2022-06-02T14:05:29Z

I updated the example shell and code tags to use the marker string approach (subclass str using the usual approach). So this enables this usage, no str(...) required to exec:

from code import code

y = 7

def main():
    my_code = code"""
        def f(x, y):
            print(x, x+y)
    """
    exec(my_code, globals())
    f(1, 2)

    some_code = code"""
        def outer(x):
            {my_code}
            f(x, y)
    """

    print(some_code)
    exec(some_code, globals())    
    outer(5)

main()

jimbaker · 2022-06-05T06:17:56Z

Maybe there could be a built-in ‘re’ tag that compiles a regex. Somewhere could do re”a.*z”.search(line). (Hm, that might look like line noise to some. Then again that is in regex’s nature. :-)

I implemented the re tag in https://github.com/jimbaker/tagstr/blob/main/examples/linenoise.py

It's possible this might actually be useful! Although calling it re here in linenoise.py is a bit much I think 😀

ericsnowcurrently · 2022-06-06T22:18:02Z

The catch with a tag for regular expressions is that the exiting "r" prefix is usually important for regular expressions.

rmorshea · 2022-06-06T22:54:38Z

If it's correct to assert "\n".encode("unicode_escape").decode() == r"\n" (this particular example works, but I'm not sure about others), then there could be an rer tag which would make it as if the string were raw.

gvanrossum · 2022-06-06T23:35:12Z

The catch with a tag for regular expressions is that the exiting "r" prefix is usually important for regular expressions.

That's why the current proposal always passes the "raw mode" string to the tag function, i.e. rer"-\n-" is the same as rer(r"-\n-").

rmorshea · 2022-06-07T04:04:17Z

Maybe a path tag for pathlibcould be interesting? The tagged version seems a bit more readable to me:

some_dir = "dir1"
list_of_dirs = ["dir3", "dir4"]
something = "special.txt"

assert (
    path"/{some_dir}/dir2/{list_of_dirs}/this_is_{something}.txt"
    == Path("/", some_dir, "dir2", *list_of_dirs, f"this_is_{something}.txt")
)

gvanrossum · 2022-06-07T04:57:20Z

That example is only somewhat compelling because pathlib's / operator doesn't support list_of_dirs. But it looks like you could write it this way:

Path("/") / some_dir / "dir2" / Path(*list_of_dirs) / f"this_is_{something}.txt"

It's longer but such complicated paths are uncommon.

Though maybe the example would be more compelling if it referred to a real use case? Spam and ham aside, abstract examples don't speak to the reader as much as examples that look like something you'd actually use? E.g. an example using a Point class with an x and y coordinate is more interesting than one that uses MyClass with attr1 and attr2 attributes. (This is why database examples always used to be written using Employee tables. :-)

jimbaker · 2022-06-07T07:25:32Z

I added an example sql tag in https://github.com/jimbaker/tagstr/blob/main/examples/sql.py

So given

    table_name = 'lang'
    name = 'C'
    date = 1972

    names = ['Fortran', 'Python', 'Go']
    dates = [1957, 1991, 2009]

    with sqlite3.connect(':memory:') as conn:
        sql = Qmarkify(conn)
        sql'create table {table_name} (name, first_appeared)'
        sql'insert into {table_name} values ({name}, {date})'
        sql'insert into {table_name} values ({names}, {dates})'

it does the following:

Quotes identifiers like {table_name}
Subs in placeholders (specifically qmarks for the SQLite3 dialect it supports), and arranges for these interpolation values to be passed in as values to execute or executemany.

This particular tag would need more work for real usage, but it seems possibly useful.

gvanrossum · 2022-06-07T15:49:14Z

Cool. It looks like it would have to have complete knowledge of SQL grammar to be able to decide how to do the quoting, right?

I am a little hesitant about sql'blah blah' executing the code. And the way to construct the sql tag on the fly so it incorporates the connection is a little odd (all these things violate the initial impression "this is a string literal with some extra stuff").

How would you do a "prepare" style command? I guess you wouldn't.

ericvsmith · 2022-06-07T16:01:32Z

I think the only way you'd want to use this is to build a parameterized query. For example, I can't imagine wanting to allow

x = 'e t'
sql'creat{x}able {table_name} (name, first_appeared)'

Most if not all database engines cache parameterized queries, so I'm not sure how useful "prepare" is any more.

jimbaker · 2022-06-07T16:34:57Z

Cool. It looks like it would have to have complete knowledge of SQL grammar to be able to decide how to do the quoting, right?

I don't believe so. See for example @ericvsmith 's example:

    with sqlite3.connect(':memory:') as conn:
        x = 'e t'
        sql'creat{x}able {table_name} (name, first_appeared)'

results in this syntactically invalid SQL:

creat"e t"able "lang" (name, first_appeared)

which raises this error,
sqlite3.OperationalError: near "creat": syntax error.

Arguably this is a correct interpolation.

I am a little hesitant about sql'blah blah' executing the code.

For sure. This tag really gives access to all of SQL. So there's no need for a standard Bobby Tables SQL injection attack, this tag is directly enabling:

drop table {table_name}

and having the table_name be interpolated in (even if quoted). Why this madness? 😀 Because we sometimes want to write data definititions (DDL), not just data modification (DML).

A real tag built around this should make it clear what is being enabled, and where any specific interpolations can happen. I'm pretty sure the placeholder rewrite is safe because it uses the SQL values keyword to identify this location. TBD.

And the way to construct the sql tag on the fly so it incorporates the connection is a little odd (all these things violate the initial impression "this is a string literal with some extra stuff").

For this example tag, it might make more sense to have it support __str__ (so render insert into "lang" values (?, ?), creat"e t"able "lang" (name, first_appeared), etc), and then tie it in with a cursor/connection model. Regardless it seemed like an interesting way to implement the tag.

How would you do a "prepare" style command? I guess you wouldn't.

This concern certainly does not apply to SQLite. I'm not certain about any other SQL dialects at this point (I would agree with @ericvsmith in general on this, other than to think that SQL is so varied in its implementations).

jimbaker · 2022-06-08T04:44:08Z

I updated the example sql tag so that one can write code like this now:

There's no longer any implicit cursor management done. However, it does require knowing when to use execute or executemany.
sql and sql_unsafe tag functions are created so that we can choose whether to use with identifier quoting (unsafe) or placeholder only (safe).

def demo():
    table_name = 'lang'
    name = 'C'
    date = 1972

    names = ['Fortran', 'Python', 'Go']
    dates = [1957, 1991, 2009]

    with sqlite3.connect(':memory:') as conn:
        cur = conn.cursor()
        cur.execute(*sql_unsafe'create table {table_name} (name, first_appeared)')
        cur.execute(*sql_unsafe'insert into {table_name} values ({name}, {date})')
        cur.executemany(*sql'insert into lang values ({names}, {dates})')

        # FIXME time to write proper unit tests!
        # NOTE assumes that SQLite maintains insertion order (as it apparently does)
        assert list(cur.execute('select * from lang')) == \
            [('C', 1972),  ('Fortran', 1957), ('Python', 1991), ('Go', 2009)]

        try:
            cur.execute(*sql'drop table {table_name}')
            assert 'Did not raise error'
        except ValueError:
            pass

I also identified some additional changes that can be done, such as supporting recursive construction of the statement.

jimbaker · 2022-06-08T04:55:34Z

One rather cool thing is being able to see this, given the support for the raw expression text:

ValueError: Cannot interpolate 'table_name' in safe mode

jimbaker · 2022-06-27T01:27:09Z

I completely revamped the sql tag example such that identifiers have to be explicitly marked.

So this looks like the following:

cur.execute(*sql'create table {Identifier(table_name)} (name, first_appeared)')

This was motivated in part because if one looks at the sqlite3 reference manual, it becomes apparent that there is no easy way to identify where a placeholder vs an identifier could go, without writing a parser for the SQL dialect. While there are a few SQL parsers on PyPI that potentially be used here, with suitable interpolation points being placed to work with them, I didn't want to rely on them for the PEP.

I also removed the support for executemany since is problematic with the approach I'm trying here to support subqueries (use temporary tables presumably for such things).

jimbaker · 2023-04-26T22:38:57Z

Closing this out. We have written some interesting examples. Future work can look at internationalization, Latex, etc, in separate issues.

EmilStenstrom mentioned this issue Jan 26, 2023

Language injections for controlling syntax highlighting in string literals microsoft/pylance-release#3874

Closed

jimbaker closed this as completed Apr 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Example tags #8

Example tags #8

jimbaker commented May 18, 2022 •

edited

Loading

gvanrossum commented May 18, 2022 •

edited

Loading

ericsnowcurrently commented May 23, 2022

jimbaker commented May 24, 2022

jimbaker commented Jun 1, 2022 •

edited

Loading

gvanrossum commented Jun 1, 2022

jimbaker commented Jun 1, 2022

jimbaker commented Jun 1, 2022

gvanrossum commented Jun 1, 2022 via email

rmorshea commented Jun 1, 2022 •

edited

Loading

jimbaker commented Jun 1, 2022

jimbaker commented Jun 2, 2022

jimbaker commented Jun 5, 2022

ericsnowcurrently commented Jun 6, 2022

rmorshea commented Jun 6, 2022 •

edited

Loading

gvanrossum commented Jun 6, 2022

rmorshea commented Jun 7, 2022 •

edited

Loading

gvanrossum commented Jun 7, 2022

jimbaker commented Jun 7, 2022

gvanrossum commented Jun 7, 2022

ericvsmith commented Jun 7, 2022

jimbaker commented Jun 7, 2022

jimbaker commented Jun 8, 2022

jimbaker commented Jun 8, 2022

jimbaker commented Jun 27, 2022

jimbaker commented Apr 26, 2023

Example tags #8

Example tags #8

Comments

jimbaker commented May 18, 2022 • edited Loading

gvanrossum commented May 18, 2022 • edited Loading

ericsnowcurrently commented May 23, 2022

jimbaker commented May 24, 2022

jimbaker commented Jun 1, 2022 • edited Loading

gvanrossum commented Jun 1, 2022

jimbaker commented Jun 1, 2022

jimbaker commented Jun 1, 2022

gvanrossum commented Jun 1, 2022 via email

rmorshea commented Jun 1, 2022 • edited Loading

jimbaker commented Jun 1, 2022

jimbaker commented Jun 2, 2022

jimbaker commented Jun 5, 2022

ericsnowcurrently commented Jun 6, 2022

rmorshea commented Jun 6, 2022 • edited Loading

gvanrossum commented Jun 6, 2022

rmorshea commented Jun 7, 2022 • edited Loading

gvanrossum commented Jun 7, 2022

jimbaker commented Jun 7, 2022

gvanrossum commented Jun 7, 2022

ericvsmith commented Jun 7, 2022

jimbaker commented Jun 7, 2022

jimbaker commented Jun 8, 2022

jimbaker commented Jun 8, 2022

jimbaker commented Jun 27, 2022

jimbaker commented Apr 26, 2023

jimbaker commented May 18, 2022 •

edited

Loading

gvanrossum commented May 18, 2022 •

edited

Loading

jimbaker commented Jun 1, 2022 •

edited

Loading

rmorshea commented Jun 1, 2022 •

edited

Loading

rmorshea commented Jun 6, 2022 •

edited

Loading

rmorshea commented Jun 7, 2022 •

edited

Loading