Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interim Transpiler #20

Closed
rmorshea opened this issue Feb 8, 2023 · 26 comments
Closed

Interim Transpiler #20

rmorshea opened this issue Feb 8, 2023 · 26 comments

Comments

@rmorshea
Copy link
Collaborator

rmorshea commented Feb 8, 2023

I've been wishing I could use tag strings lately and so to satisfy that craving I thought it would be cool to create an import-time transpiler that would rewrite:

my_tag @ f"my {custom} string"
#      ^ or any other operator not typically used with strings

To be:

my_tag("my ", (lambda: custom, "custom", None, None), " string")

The syntax seems clever in a few different ways:

  • it's valid Python
  • syntax highlighters will highlight expressions in the string
  • the transpiler will be straightforward to implement since the AST's JoinedStr already splits out the expressions

Something like this seems like a rather natural extension of @pauleveritt's work in viewdom.

Implementation Details

Some potential issues I've thought of and ways to solve them.

Static Typing

To deal with static type analyzers complaining about the unsupported @ operator, tag functions can be decorated with:

def tag(func: TagFunc[T]) -> Tag[T]:
    # convince MyPy that our new syntax is valid
    return cast(Tag, func)

class TagFunc(Protocol[T]):
    def __call__(self, *args: Thunk) -> T: ...

class Tag(Generic[T]):
    def __call__(self, *args: Thunk) -> T: ...
    def __matmul__(self, other: str) -> T: ...

An alternate syntax of my_tag(f"...") would not require this typing hack since tag functions must already accept *args: str | Thunk. The main problem here is that there are probably many times where some_function(f"...") would show up in a normal code-base. Thus it would be hard to determine whether any given instance of that syntax ought to be transpiled. Solving this would require users to mark which identifiers should be treated as tags - perhaps with a line like set_tags("my_tag"). This seems more inconvenient than having the tag author add the aforementioned decorator though.

Performance

To avoid transpiling every module, users would need to indicate which ones should be rewritten at import-time. This could be done by having the user import this library somewhere at the top of their module. At import time, the transpiler, before parsing the file, would then scan it for a line like import <this_library> or from <this_library> import ....

@gvanrossum
Copy link
Collaborator

Cool, and good that you've already thought through some alternatives in the design space. @ seems brilliant because, indeed, it's not something one would do with an f-string on the RHS.

How were you planning to trigger the transpiler? As an import hook, or as a codec?

@rmorshea
Copy link
Collaborator Author

rmorshea commented Feb 8, 2023

My plan was to use an import hook, but I wasn't aware codecs could be used for this purpose, so that could be a good alternative as well.

@gvanrossum
Copy link
Collaborator

See https://peps.python.org/pep-0263/; you can register a codec with an arbitrary name. This would take the place of your "marker import". IIRC there are some issues with getting the codec registered when your package is installed though.

@rmorshea
Copy link
Collaborator Author

rmorshea commented Feb 8, 2023

I think a codec could be better for a lot of reasons:

  • I suspect that the codec only has to be re-run if a file has changed
  • As you mention, we already need some way to mark files for transpilation, which the # coding=... comment would achieve
  • Import hooks are pretty complicated to write and they could conflict with other hooks like the one PyTest uses.
  • Python 3.9 comes with a handy ast.unparse function that would make it easy to write back the modified AST

IIRC there are some issues with getting the codec registered when your package is installed though.

Would the approach be to use a .pth file to call codecs.register(my_codec) before the user's code is imported?

@gvanrossum
Copy link
Collaborator

Yeah, if the .py file and the .pyc file match, the source is never read, so the codec isn't run. Pure win!

I suspect that there are some problems with asy.unparse though, since the AST doesn't preserve comments or whitespace. Notably the line numbers after the unparsing will differ, which will make tracebacks hugely confusing.

We used the codec trick at Dropbox for pyxl3, and IIRC the rewrite was done very differently, to ensure that the line numbers matched. I think the "parsing" was probably done with a regular expression. That should work here too.

@pauleveritt
Copy link
Collaborator

What would be the story for tracebacks and getting back to lines in source?

@gvanrossum
Copy link
Collaborator

The traceback code looks up the line number in the untranslated source (linecache.py just opens the file in text mode, no encoding parameter). So pyxl ensures that the translated line numbers match the original line numbers (but the column offsets don't). I think the translation Ryan proposes should be able to preserve line numbers as well.

@gvanrossum gvanrossum changed the title Interum Transpiler Interim Transpiler Feb 9, 2023
@rmorshea
Copy link
Collaborator Author

rmorshea commented Feb 9, 2023

Crazy idea, but what if...

my_tag @ f"my {super} {custom} string"

Became

my_tag((       super,  custom), raw=("super", "custom"), conv=(None, None), formatspec=(None, None), strings=("my ", " ", " string"))

Where the @tag decorator would zip and merge things as necessary to resolve differences between this new interface and the one defined for tag strings. Doing this might seem convoluted, but the neat thing about it is that you'd be able to move the expressions from the string around to match their column offsets as needed. This would even work for multi-line strings:

my_tag @ f"""
my
extra {super}
{custom}
string
"""

Would become:

my_tag((

       super,
 custom), raw=("super", "custom"), conv=(None, None), formatspec=(None, None), strings=("my ", " ", " string")

@rmorshea
Copy link
Collaborator Author

rmorshea commented Feb 9, 2023

Shoot! The evaluation of the expressions would no longer be lazy.

@Archmonger Archmonger mentioned this issue Feb 9, 2023
@rmorshea
Copy link
Collaborator Author

rmorshea commented Feb 9, 2023

The last way I can think of to preserve column offsets in tracebacks is by passing information about the location of expressions in the original source and using that to modify tracebacks which arise within the tag function itself.

my_tag("my ", (lambda: custom, "custom", None, None, 1, 16), " string")

Where the @tag decorator would modify my_tag by do something like:

def tag(func):

    def wrapper(*args, src_info=None):
        new_args = []
        for a in args:
            match a:
                case str():
                    new_args.append(a)
                case getvalue, src, conv, spec, *src_info:
                    getvalue = modify_tracebacks(getvalue, *src_info)
                    new_args.append((getvalue, src, conv, spec))
        return func(*args)
    
    return wrapper

def modify_tracebacks(getvalue, lineno=None, col_offset=None):
    if not (lineno and col_offset):
        return getvalue
    
    def wrapper():
        try:
            return getvalue()
        except Exception as error:
            # modify the traceback with the appropriate lineno and col_offset somehow
            error.with_traceback(...)

    return new_getvalue

If this worked, it'd mean that the transpiler wouldn't even need to worry about preserving line numbers in the areas of code it modified.

@gvanrossum
Copy link
Collaborator

IMO it's not worth worrying about column offsets for the initial prototype.

@pauleveritt
Copy link
Collaborator

@rmorshea If there's a way for me to join in with what you're doing and re-parent my stuff on your interim transpiler, let me know.

@rmorshea
Copy link
Collaborator Author

rmorshea commented Feb 9, 2023

Will do. I might have time to create a repo for this tonight, but otherwise I won't be able to do much until next week.

@jimbaker
Copy link
Owner

@rmorshea while I like the syntax, it's problematic as I mention here (#3 (comment)) - we need to preserve thunks because they give the control on interpolation.

@rmorshea
Copy link
Collaborator Author

@jimbaker the intention here is to transpile the tag @ f'...' syntax such that it conforms to the tag string spec as explained here. I think this could be a useful tool for us as we work on this PEP, but also as a way to backport tag strings to older versions of Python.

@rmorshea
Copy link
Collaborator Author

rmorshea commented Apr 17, 2023

So, I managed to create a custom tagstr encoding. Unfortunately though, this doesn't play well with Black since it decodes the file before reformatting it. Thus, the version that gets saved is the transformed version, not the one the user authored. Anyone have ideas on how this could be avoided?

@rmorshea
Copy link
Collaborator Author

Ok, the hack I came up with involves stuffing the original source at the end of the file. The tagstr encoder then searches for the original source and returns that. This solves the problem of black saving the transformed text, but it doesn't allow black to do its job. The only way I can think of to work around this is to allow users to set an environment variable TAGSTR=off before running black.

It would be nice if there were a way to tell if the codec was running while formatting code so users didn't have to set the environment variable, but this works for now I suppose.

@gvanrossum
Copy link
Collaborator

Wow, the last time I used the encoding hack, things like Black weren't an issue...

I guess an import hook might be better.

@rmorshea
Copy link
Collaborator Author

rmorshea commented Apr 18, 2023

Welp, it's published.

pip install tagstr

The hook expects there to be an import tagstr statement at the top of any file that should be transformed:

import tagstr

@tagstr.tagfunc
def func(*args):
    print(args)

name = "world"
func @ f"hello {name}!"

I'll work on adding an IPython cell magic so this can be used in Jupyter Notebooks/Lab. Not really sure if there's a similar way to inject the transformer into the standard Python REPL though.

@rmorshea
Copy link
Collaborator Author

I threw this together pretty quickly so there's definitely gonna be some bugs and rough edges.

@pauleveritt
Copy link
Collaborator

Quite interesting @rmorshea any chance you're at PyCon? I'm sprinting the first day.

@rmorshea
Copy link
Collaborator Author

Unfortunately I am not. Would love to participate remotely if that's possible. Feel free to email me: [email protected]

@jimbaker
Copy link
Owner

I will also be in person at the sprints through Monday afternoon. This will be a chance for me to get back into this work - I have been very busy with other things. Fortunately I feel like discussing another issue has started to page back into my mind what we have been trying to do here 😁

@rmorshea
Copy link
Collaborator Author

rmorshea commented Apr 22, 2023

Published another release of the tagstr transpiler. Includes a number of fixes/changes:

  • The previous version didn't transpile files other than the entry point for some reason
  • Fixed a couple issues with the translation logic itself
  • Allow a # tagstr: on comment instead of import tagstr statement as the file marker

I still feel like I'm doing something wrong in the import hook so I suspect there are probably other latent issues to be fixed.

It's also worth noting that the tagfunc decorator I used in the earlier example is not technically required. Rather, it exists purely to satisfy type checkers:

# tagstr: on
name = "world"
print @ f"hello {name}!"
hello  (<function <lambda> at 0x7f7a8586b920>, 'name', None, None) !

@pauleveritt
Copy link
Collaborator

Now that @rmorshea has published the transpolar, can this ticket get closed?

@rmorshea
Copy link
Collaborator Author

I think so

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants