Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewriting expressions, using their lexical scope #10

Closed
jimbaker opened this issue May 26, 2022 · 5 comments
Closed

Rewriting expressions, using their lexical scope #10

jimbaker opened this issue May 26, 2022 · 5 comments

Comments

@jimbaker
Copy link
Owner

Rewriting expressions could support the popular numexpr library, which rewrites expressions on NumPy arrays to use its own bytecode interpreter for vectorized ops. Because this rewriting uses sys._getframe, it necessarily is using dynamic scope instead of lexical scope, which exposes that sharp edge to users who might expect to be able to use standard lexical scoping.

I have implemented the core of such support in https://gist.github.com/jimbaker/619de23e34a28affc14e4fdfb1b99996. It supports a new tag, numexpr, which returns for each thunk a new thunk with a code object that returns a mapping of variable names to their values. (Arguably the ergonomics of this rewriting is that it should return the old raw code text instead of what has been wrapped.)

The key piece here in the gist is this wrapping:

    wrapped = f"""
def outer({param_list(code.co_freevars)}):
    def inner():
        return \\
{textwrap.indent(name_bindings(all_names), '   ' * 3)}
"""

which implements part of what is done in lambda lifting, namely
"eliminating free variables in the function by adding parameters." So that's directly translated to, put the co_freevars in the outer function. (This is a variation - we still need to keep the inner function with the same params so the code object that is compiled conforms to the old code object, and therefore can be used in the same way.) Python's scope compilation then works as usual - global variables and free variables are appropriately analyzed and corresponding code is generated. For CPython specifically, we can see this in the emitted bytecode, with comparable LOAD_DEREF and LOAD_GLOBAL to what is in the original lambda-wrapped expression.

To demonstrate how it could be used, I'm thinking of showing how to construct NumExpr3's NumExpr directly from such a tag (ignoring some subtleties, such as indicating an out variable). This will be added as an example tag in the examples folder.

@jimbaker jimbaker mentioned this issue May 26, 2022
11 tasks
@jimbaker
Copy link
Owner Author

jimbaker commented Jun 1, 2022

@gvanrossum
Copy link
Collaborator

I'm sorry, but this seems too esoteric to be a motivational example. Certainly if you want this to be considered motivational you'd have to explain it to someone who is not already familiar with lambda lifting, numexpr (whose docs are just weird), and "rewriting" (rewriting how?).

@jimbaker
Copy link
Owner Author

jimbaker commented Jun 1, 2022

Yeah, this one is very powerful, and interesting to PL nerds like myself, but definitely esoteric. I agree that it's outside the scope of a tutorial, even with respect to the NumExpr library. Having said that, I will try to complete this example here to show it could be used there.

The fact that we lambda-wrap is still useful for other usages, especially lazy f-strings. Another possible example is constructing logging's LogRecord instances directly.

@jimbaker
Copy link
Owner Author

jimbaker commented Jun 28, 2022

There's now another way to bind the free variables in an expression, without doing the outer function trick. Not surprisingly, this is necessary for PEP 649 - python/cpython#92204

This is a good reason to update the branch we are working against, since it was pushed into main just after @gvanrossum started the work in #1 - the usual activity around PyCon sprints.

It was also mentioned in this recent thread that David Mertz started, https://mail.python.org/archives/list/[email protected]/thread/DQTR3CYWMLSRRKR6MBLZNTGCG762QNDF/#DQTR3CYWMLSRRKR6MBLZNTGCG762QNDF , which is where I became aware of it.

It's likely that the rewrite using a lambda-wrapped expression that captures its lexical scope + its source (so two elements of Thunk, getvalue and raw) is the same idea that's necessary for an implementation of later that can could enable some form of "execution plan rewrite". In particular, the Dask example looks roughly similar at a high level to what is done with the NumExpr library, but extended to an expression graph in Python code.

@jimbaker
Copy link
Owner Author

This is interesting, but not relevant to the main work on tag string.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants