Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flag for inline typechecking #208

Closed
marcelroed opened this issue Oct 26, 2021 · 57 comments
Closed

Flag for inline typechecking #208

marcelroed opened this issue Oct 26, 2021 · 57 comments

Comments

@marcelroed
Copy link

Is your feature request related to a problem? Please describe.
I often need to typecheck calls to functions that are outside of my control, since they cause bugs further down the line.
For example, calling external functions with multiple return types I sometimes need to force one of them.

def has_inline_typecheck():
    x: float = function_produces_float_or_int(100)  # Should error if the result isn't a float
    print(x)

My specific issue relates to using torchtyping with PyTorch.

def inline_typecheck_tensor():
    x: torchtyping.TensorType['batch', 'width', 'height'] = produce_images()

In this context, the type matters at every step, and also considerably helps with documenting the code, as currently I need to add comments for the shape of the tensors everywhere.

Describe the solution you'd like
I would like to add a flag to the typechecked decorator that adds support for inline typechecking. It could look like this.

from typeguard import typechecked
@typeguard(typecheck_inline=True)
def has_inline_typecheck():
    x: float = function_produces_float_or_int(100)  # Should error if the result isn't a float
    print(x)

Describe alternatives you've considered
I've considered

  • wrapping the function I'm calling in a typechecked decorator. This doesn't work for functions that don't have type hints, and doesn't allow me to force checking against a certain type.
  • using the check_type function. Results in a lot of boilerplate and still requires me to add an inline type hint for my IDE to understand the type.

I'm willing to submit an implementation if this is approved. Thanks!

@agronholm
Copy link
Owner

I'm curious as to how you would implement this without the import hook.

@marcelroed
Copy link
Author

marcelroed commented Nov 6, 2021

I'm curious as to how you would implement this without the import hook.

My idea is to get the source of the function, parse it using ast and look for ast.AnnAssign. Replace these statements with a wrapper that checks the type, or an assignment and an assert for the type of the variable.

@agronholm
Copy link
Owner

That still sounds like you would need to do it in the import hook. Otherwise how would you change the code after the fact?

@marcelroed
Copy link
Author

I'm not sure I understand what you mean. Here's a minimal prototype of my implementation.

@agronholm
Copy link
Owner

Hm, this actually looks pretty clever. You're basically recompiling the function on import. Nested functions decorated this way could be problematic, but right now I don't see a problem for top level functions and methods. Surely this only works from py3.9 onwards (I didn't know about ast.unparse()), but with proper documentation of that limitation I don't see a problem. I'm interested in including this in v3.0 if you would like to polish this for a PR.

@marcelroed
Copy link
Author

marcelroed commented Nov 7, 2021

Great! astunparse also exists as a library, so ast.unparse can be backported to earlier versions of Python.

Do you have a timeline for v3.0 release?

@agronholm
Copy link
Owner

Hi, it's been a while. I'm preparing the 3.0 release (at beta 2 as of now), so if you'd still like to get this into typeguard, you could make a PR as a starting point.

@marcelroed
Copy link
Author

Sure, I'm happy to contribute it. Do you think the above approach is reasonable and would work with the rest of the library?

@agronholm
Copy link
Owner

I think it would. The parameter name could possibly be named differently, like check_local_assignments because typecheck_inline doesn't really have any obvious meaning.

@agronholm
Copy link
Owner

It looks like the astunparse package only supports up to Python 3.5, leaving 3.7 and 3.8 users hanging.

@marcelroed
Copy link
Author

Ah, this is because it's supported natively in the ast module from Python 3.6 and on. It's a backport of newer functionality.

@agronholm
Copy link
Owner

But ast.unparse() is new to Python 3.9, as we have discussed earlier? How would you do that on Python 3.8?

@marcelroed
Copy link
Author

Oh wait, I meant from 3.9. Where did you find that astunparse only supports up to 3.5? From the PyPI page it seems to support up to 3.8. https://pypi.org/project/astunparse/

@agronholm
Copy link
Owner

Oh, I got stuck in the first paragraph of the README which states: This library is single-source compatible with Python 2.6 through Python 3.5. The changelog, though, says it's 3.8 compatible.

@agronholm
Copy link
Owner

I would actually like to get rid of the @typechecked wrapper entirely, in favor of the import hook (on which I have done significant work as of late). Ideally I would leave just the import hook and check_type() function as the user-facing entry points to Typeguard. Do you see a compelling argument for not ditching @typechecked?

@marcelroed
Copy link
Author

Hmmm, that's interesting. I think if you only care about typechecking certain classes or functions, the decorator is still the best way of doing so, no?

Additionally, from what I can tell, quite a few library users aren't so keen on monkey patching since a small statement can have such global and potentially unpredictable effects, as well as it being hard to predict how several monkey patches will interact with each other.

As I see it I think offering both would still be useful. Maybe instead we could try to maximize the amount of shared code between these approaches?

@agronholm
Copy link
Owner

Yes, if @typechecked can be implemented properly w/o the wrapper, then it would be acceptable to keep it in Typeguard. But check_argument_types() and check_return_type() have to go from the public API. Those are too tricky to use properly otherwise. Fishing for function objects in the garbage collector is dirty and I want to get rid of that. The generator wrappers need to go too.

@agronholm
Copy link
Owner

I think I'm missing something here. So first, we get the source file with inspect.getsource(func) (assuming the source file exists), parse it to AST, modify it and recompile it. Where does ast.unparse() come in?

@marcelroed
Copy link
Author

Yeah, I agree with getting rid of check_argument_types() and check_return_type() and generators.

So ast.unparse takes in the AST and produces actual source code from it. We need the actual source code to run it using exec.

@agronholm
Copy link
Owner

Do we really need to exec() the code? Wouldn't compiling the AST to code work? I'm experimenting with that route.

@agronholm
Copy link
Owner

agronholm commented Jan 25, 2023

I'm pretty far along. The only problem I have left is that the memo creation in the injected code is raising NameError because it can't find the function definition, but I believe that's a solvable problem. I'm calling it a day. Tomorrow I can probably get this working.

@marcelroed
Copy link
Author

How do you do that? Can you compile the AST directly to bytecode? Awesome! Could you share a draft PR or a branch showing your progress?

@agronholm
Copy link
Owner

As it says in the documentation of the compile() built-in:

Compile the source into a code or AST object. Code objects can be executed by exec() or eval(). source can either be a normal string, a byte string, or an AST object (emphasis mine).

@agronholm
Copy link
Owner

So to be clear: I am compiling the AST to a module, exec()ing it and then retrieving the function from that faux module.

@agronholm
Copy link
Owner

The juicy bits:

class Instrumentor(ast.NodeVisitor):
    def __init__(self, target_name: str):
        target_name = target_name.replace(".<locals>", "")
        self.target_name = target_name
        self.path: list[str] = []
        self.transformer = TypeguardTransformer("_call_memo")
        self.modified_function: ast.FunctionDef | ast.AsyncFunctionDef | None = None

    def visit_Module(self, node: ast.Module) -> Any:
        self.generic_visit(node)
        if self.modified_function:
            node.body[:] = [self.modified_function]
            ast.fix_missing_locations(node)

    def visit_ClassDef(self, node: ast.ClassDef) -> None:
        self.path.append(node.name)
        self.generic_visit(node)
        del self.path[-1]

    def visit_FunctionDef(self, node: ast.FunctionDef | ast.AsyncFunctionDef) -> None:
        self.path.append(node.name)
        full_path = ".".join(self.path)
        self.generic_visit(node)
        if full_path == self.target_name:
            self.modified_function: ast.FunctionDef = self.transformer.visit(node)
            self.modified_function.decorator_list.clear()
            self.modified_function.body.insert(0, ast.Import([ast.alias("typeguard")]))

        del self.path[-1]

    def visit_AsyncFunctionDef(self, node: ast.AsyncFunctionDef) -> None:
        return self.visit_FunctionDef(node)


def instrument(f: T_CallableOrType) -> Callable | str:
    if not getattr(f, "__annotations__", None):
        return "no type annotations present"
    elif not getattr(f, "__code__", None):
        return "no code associated"
    elif not getattr(f, "__module__", None):
        return "__module__ attribute is not set"

    module = sys.modules[f.__module__]
    module_source = inspect.getsource(sys.modules[f.__module__])
    module_ast = TypeguardTransformer("_call_memo").visit(ast.parse(module_source))
    instrumentor = Instrumentor(f.__qualname__)
    instrumentor.visit(module_ast)
    if instrumentor.modified_function is None:
        return "cannot find function in AST"

    code = compile(module_ast, module.__file__, "exec", dont_inherit=True)
    new_globals = {}
    exec(code, new_globals)
    new_function = new_globals[f.__name__]
    update_wrapper(new_function, f)
    new_function.__globals__.clear()
    new_function.__globals__.update(f.__globals__)
    return new_function

@agronholm
Copy link
Owner

agronholm commented Jan 26, 2023

I'm down to 20 failures. Most of them are due to either the code not yet supporting [Async]Iterator or [Async]Iterable, or more NameErrors due to the function closure not being accessible. Some others are due to failing Self checks. At least one failure was due to lack of missing type checks for generator.send().

@agronholm
Copy link
Owner

I'm having trouble with closures. Like here:

    def test_classmethod_return_valid(self):
        class Foo:
            @classmethod
            @typechecked
            def method(cls) -> Self:
                return Foo()

        Foo.method()

The injected code in method() raises NameError because it can't find Foo which is part of the closure. I'm missing something. I tried to create a new function using just the new code, but the code object seems incompatible with the closure. I'm trying to investigate further.

@agronholm
Copy link
Owner

Ok, looks like the problem is that the newly generated code object doesn't have the free variables the original one didn't, and that makes the closure incompatible with the new code. I have to trick the compiler to give it the same free variables.

@agronholm
Copy link
Owner

agronholm commented Jan 27, 2023

I managed to fix the test I was focusing on, but a bunch of other tests broke in the process 😓
But, I managed to refactor the code to avoid exec() entirely! I figured out how to fish the function code object directly from the module code object, even when it's deeply nested.

@agronholm
Copy link
Owner

Agh, you can construct a types.CellType() in Python 3.8+, but not in 3.7. I'm not sure how to deal even with Python 3.8 in this regard, let alone 3.7 where cells can't be constructed explicitly.

@agronholm
Copy link
Owner

Ok, so turns out _Cell.cell_contents is writable even on py3.7. I managed to fix all of the related test failures. I'm down to 15 failures now, with most errors being either Self or generator/iterator related.

@marcelroed
Copy link
Author

Sweet! Good progress! Are the issues with Self a problem in 3.11, or just in older versions with typing-extensions?

@agronholm
Copy link
Owner

The problems with Self lie in how the call memo binds the value of Self. I'm looking into it right now. As for generators/iterators, those tests broke due to how type checking is now done there (the semantics changed quite a bit), and one test fails because there is no checking for send values yet.

@agronholm
Copy link
Owner

Ok, the problem with Self is that my new Instrumentor only uses TypeguardTransformer to do the code injection specifically on the target function, but TypeguardTransformer uses information gathered during node tree traversal to determine whether the function is a method or not. Since it doesn't have that info, it assumes that the function is not a method, and the injected code passes has_self_arg=False to CallMemo.

@agronholm
Copy link
Owner

Managed to fix the Self failures. Now, 8 failures remain, all of them generator/iterator issues.

@agronholm
Copy link
Owner

I fixed all the test failures, except on PyPy where I still get 3 regex mismatches on the error messages, like:

E   AssertionError: Regex pattern did not match.
E    Regex: 'argument "another" is not an instance of the self type \\(tests.test_typechecked\\.TestSelf\\.test_classmethod_arg_invalid\\.<locals>\\.Foo\\)'
E    Input: 'argument "another" is not an instance of the self type (module)'

@agronholm
Copy link
Owner

I've pushed my changes so far and opened a draft PR as #289.

@agronholm
Copy link
Owner

I didn't mark the PR as fixing this issue since the actual feature you asked for is still missing. This new PR is just the foundation for that work.

@agronholm
Copy link
Owner

Strangely, all the failing tests seem to work just fine when run individually. Looks like a test isolation problem.

@marcelroed
Copy link
Author

Oh, interesting. Is this a known problem with pytest in PyPy?

@agronholm
Copy link
Owner

I don't think the problem has anything to do with pytest, but is, rather, an artifact of PyPy's garbage collection. Curiously, if I run pytest tests/test_typechecked.py with PyPy, I get 4 errors instead of 3! As I mentioned earlier, this is most likely caused by the caching done in the CallMemo class.

@agronholm
Copy link
Owner

Looks like my initial diagnosis was incorrect. The problem seems to lie in the way CallMemo does argument binding. It incorrectly picks the typeguard import as the self argument, but only sometimes. Once I fix this flaky behavior, that should fix the rest of the failures.

@agronholm
Copy link
Owner

I managed to fix the test flakiness. The PR still needs quite a bit of polishing before I'm ready to merge it to master, but I think it's safe to say that the hardest part is over now.

@agronholm
Copy link
Owner

agronholm commented Jan 29, 2023

A bit of rubber ducking here 😃

As I continue to refactor the PR, I encountered a tricky issue. If there's a static method in a class within a function, like:

def container():
    class Foo:
        @typechecked
        @staticmethod
        def mymethod(x: int) -> None:
            pass

This would be rewritten as:

def container():
    class Foo:
        @staticmethod
        def mymethod(x: int) -> None:
            from typeguard import CallMemo, check_argument_types
            _call_memo = CallMemo(Foo.mymethod)
           check_argument_types(_call_memo)

In its current state, the instrumented code fails because it creates a new cell for Foo but places mymethod in that cell. The only reason the previous version worked (the one you currently see in the PR branch) is because CallMemo() got mymethod as its first argument, and since it was added to the closure too, it worked.

Now I have two options: either I create a non-conflicting local variable that holds a reference to the function object, or I create a chain of SimpleNamespace objects that would let me keep the reference as it is. Then again, this would be invisible to the end user and would just incur an extra cost in the form of attribute lookups, so maybe I should stick to the method I accidentally discovered and keep adding the function directly to the closure. I just need to tell the AST transformer to change the reference to match if the target function is nested.

@marcelroed
Copy link
Author

Is this completely solved now? I have some more time this week.

@agronholm
Copy link
Owner

No, it's not solved yet. This work exhausted me and another project required my attention. I will be able to continue work next Saturday, at the earliest.

@agronholm
Copy link
Owner

The basic idea is to find a non-conflicting name and assign the function object to the closure under that name so it can be used to create a CallMemo.

@agronholm
Copy link
Owner

I managed to fix the problem now, and pushed the changes. I'm still not quite done with the PR though, but soon.

@agronholm
Copy link
Owner

@marcelroed I marked the PR as ready. There's room for optimization but I'm pretty satisfied with the results thus far. Would you like to review it before I merge? I realize it's a large, complicated changeset so I don't expect you to do any in-depth inspection, just maybe playing around with it a bit.

@marcelroed
Copy link
Author

Awesome, I'll have a look! Are there any aspects you feel like don't have enough tests?

@agronholm
Copy link
Owner

If there are bugs, they would most likely triggered by a pathological combination of nested functions, classes in functions and other abominations like that. I'm not 100% confident about generators (vanilla or async) but there are some tests trying to ensure that they work right.

@agronholm
Copy link
Owner

agronholm commented Feb 6, 2023

BTW, I've already started work on the actual feature you requested. AST transformation checks are now working in 4 cases:

  1. Annotated assignment (x: int = foo())
  2. Later assignment to local annotated variable (x: int; x = foo())
  3. Multiple assignments to annotated variables (x: int; y: str; x, y = foo(); checks only locally annotated variables)
  4. Assignment expressions (x: int; if x := foo(): ...)

What's still missing:

  1. Checking of assignments to variables defined as function arguments
  2. Checking of augmented assignments (x: int; x += 1)

@agronholm
Copy link
Owner

I've created a new draft PR that builds on the previous one. I resolved the issues I wrote about in my previous comment, and all forms of assignments to local variables and arguments should be guarded by type checks now.

@agronholm
Copy link
Owner

I need to improve the type checking error message emitted by bad variable assignments. Something to be done later!

@agronholm
Copy link
Owner

@marcelroed could I get some comments on the PR? Does it do what you want?

@marcelroed
Copy link
Author

Yep, working on it now!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants