Flag for inline typechecking #208

marcelroed · 2021-10-26T12:07:18Z

Is your feature request related to a problem? Please describe.
I often need to typecheck calls to functions that are outside of my control, since they cause bugs further down the line.
For example, calling external functions with multiple return types I sometimes need to force one of them.

def has_inline_typecheck():
    x: float = function_produces_float_or_int(100)  # Should error if the result isn't a float
    print(x)

My specific issue relates to using torchtyping with PyTorch.

def inline_typecheck_tensor():
    x: torchtyping.TensorType['batch', 'width', 'height'] = produce_images()

In this context, the type matters at every step, and also considerably helps with documenting the code, as currently I need to add comments for the shape of the tensors everywhere.

Describe the solution you'd like
I would like to add a flag to the typechecked decorator that adds support for inline typechecking. It could look like this.

from typeguard import typechecked
@typeguard(typecheck_inline=True)
def has_inline_typecheck():
    x: float = function_produces_float_or_int(100)  # Should error if the result isn't a float
    print(x)

Describe alternatives you've considered
I've considered

wrapping the function I'm calling in a typechecked decorator. This doesn't work for functions that don't have type hints, and doesn't allow me to force checking against a certain type.
using the check_type function. Results in a lot of boilerplate and still requires me to add an inline type hint for my IDE to understand the type.

I'm willing to submit an implementation if this is approved. Thanks!

The text was updated successfully, but these errors were encountered:

agronholm · 2021-11-06T11:26:37Z

I'm curious as to how you would implement this without the import hook.

marcelroed · 2021-11-06T13:14:55Z

I'm curious as to how you would implement this without the import hook.

My idea is to get the source of the function, parse it using ast and look for ast.AnnAssign. Replace these statements with a wrapper that checks the type, or an assignment and an assert for the type of the variable.

agronholm · 2021-11-06T13:19:31Z

That still sounds like you would need to do it in the import hook. Otherwise how would you change the code after the fact?

marcelroed · 2021-11-06T16:15:08Z

I'm not sure I understand what you mean. Here's a minimal prototype of my implementation.

agronholm · 2021-11-06T21:33:05Z

Hm, this actually looks pretty clever. You're basically recompiling the function on import. Nested functions decorated this way could be problematic, but right now I don't see a problem for top level functions and methods. Surely this only works from py3.9 onwards (I didn't know about ast.unparse()), but with proper documentation of that limitation I don't see a problem. I'm interested in including this in v3.0 if you would like to polish this for a PR.

marcelroed · 2021-11-07T05:49:55Z

Great! astunparse also exists as a library, so ast.unparse can be backported to earlier versions of Python.

Do you have a timeline for v3.0 release?

agronholm · 2023-01-11T20:54:49Z

Hi, it's been a while. I'm preparing the 3.0 release (at beta 2 as of now), so if you'd still like to get this into typeguard, you could make a PR as a starting point.

marcelroed · 2023-01-11T21:02:52Z

Sure, I'm happy to contribute it. Do you think the above approach is reasonable and would work with the rest of the library?

agronholm · 2023-01-11T21:08:24Z

I think it would. The parameter name could possibly be named differently, like check_local_assignments because typecheck_inline doesn't really have any obvious meaning.

agronholm · 2023-01-23T23:42:50Z

It looks like the astunparse package only supports up to Python 3.5, leaving 3.7 and 3.8 users hanging.

marcelroed · 2023-01-23T23:46:52Z

Ah, this is because it's supported natively in the ast module from Python 3.6 and on. It's a backport of newer functionality.

agronholm · 2023-01-23T23:48:20Z

But ast.unparse() is new to Python 3.9, as we have discussed earlier? How would you do that on Python 3.8?

marcelroed · 2023-01-23T23:50:15Z

Oh wait, I meant from 3.9. Where did you find that astunparse only supports up to 3.5? From the PyPI page it seems to support up to 3.8. https://pypi.org/project/astunparse/

agronholm · 2023-01-23T23:51:18Z

Oh, I got stuck in the first paragraph of the README which states: This library is single-source compatible with Python 2.6 through Python 3.5. The changelog, though, says it's 3.8 compatible.

agronholm · 2023-01-23T23:54:41Z

I would actually like to get rid of the @typechecked wrapper entirely, in favor of the import hook (on which I have done significant work as of late). Ideally I would leave just the import hook and check_type() function as the user-facing entry points to Typeguard. Do you see a compelling argument for not ditching @typechecked?

marcelroed · 2023-01-24T03:32:34Z

Hmmm, that's interesting. I think if you only care about typechecking certain classes or functions, the decorator is still the best way of doing so, no?

Additionally, from what I can tell, quite a few library users aren't so keen on monkey patching since a small statement can have such global and potentially unpredictable effects, as well as it being hard to predict how several monkey patches will interact with each other.

As I see it I think offering both would still be useful. Maybe instead we could try to maximize the amount of shared code between these approaches?

agronholm · 2023-01-24T09:09:23Z

Yes, if @typechecked can be implemented properly w/o the wrapper, then it would be acceptable to keep it in Typeguard. But check_argument_types() and check_return_type() have to go from the public API. Those are too tricky to use properly otherwise. Fishing for function objects in the garbage collector is dirty and I want to get rid of that. The generator wrappers need to go too.

agronholm · 2023-01-24T09:29:10Z

I think I'm missing something here. So first, we get the source file with inspect.getsource(func) (assuming the source file exists), parse it to AST, modify it and recompile it. Where does ast.unparse() come in?

marcelroed · 2023-01-24T18:58:13Z

Yeah, I agree with getting rid of check_argument_types() and check_return_type() and generators.

So ast.unparse takes in the AST and produces actual source code from it. We need the actual source code to run it using exec.

agronholm · 2023-01-24T23:00:21Z

Do we really need to exec() the code? Wouldn't compiling the AST to code work? I'm experimenting with that route.

agronholm · 2023-01-25T00:10:03Z

I'm pretty far along. The only problem I have left is that the memo creation in the injected code is raising NameError because it can't find the function definition, but I believe that's a solvable problem. I'm calling it a day. Tomorrow I can probably get this working.

marcelroed · 2023-01-25T02:43:27Z

How do you do that? Can you compile the AST directly to bytecode? Awesome! Could you share a draft PR or a branch showing your progress?

agronholm · 2023-01-25T07:54:36Z

As it says in the documentation of the compile() built-in:

Compile the source into a code or AST object. Code objects can be executed by exec() or eval(). source can either be a normal string, a byte string, or an AST object (emphasis mine).

agronholm · 2023-01-25T07:57:47Z

So to be clear: I am compiling the AST to a module, exec()ing it and then retrieving the function from that faux module.

agronholm · 2023-01-25T08:41:42Z

The juicy bits:

class Instrumentor(ast.NodeVisitor):
    def __init__(self, target_name: str):
        target_name = target_name.replace(".<locals>", "")
        self.target_name = target_name
        self.path: list[str] = []
        self.transformer = TypeguardTransformer("_call_memo")
        self.modified_function: ast.FunctionDef | ast.AsyncFunctionDef | None = None

    def visit_Module(self, node: ast.Module) -> Any:
        self.generic_visit(node)
        if self.modified_function:
            node.body[:] = [self.modified_function]
            ast.fix_missing_locations(node)

    def visit_ClassDef(self, node: ast.ClassDef) -> None:
        self.path.append(node.name)
        self.generic_visit(node)
        del self.path[-1]

    def visit_FunctionDef(self, node: ast.FunctionDef | ast.AsyncFunctionDef) -> None:
        self.path.append(node.name)
        full_path = ".".join(self.path)
        self.generic_visit(node)
        if full_path == self.target_name:
            self.modified_function: ast.FunctionDef = self.transformer.visit(node)
            self.modified_function.decorator_list.clear()
            self.modified_function.body.insert(0, ast.Import([ast.alias("typeguard")]))

        del self.path[-1]

    def visit_AsyncFunctionDef(self, node: ast.AsyncFunctionDef) -> None:
        return self.visit_FunctionDef(node)


def instrument(f: T_CallableOrType) -> Callable | str:
    if not getattr(f, "__annotations__", None):
        return "no type annotations present"
    elif not getattr(f, "__code__", None):
        return "no code associated"
    elif not getattr(f, "__module__", None):
        return "__module__ attribute is not set"

    module = sys.modules[f.__module__]
    module_source = inspect.getsource(sys.modules[f.__module__])
    module_ast = TypeguardTransformer("_call_memo").visit(ast.parse(module_source))
    instrumentor = Instrumentor(f.__qualname__)
    instrumentor.visit(module_ast)
    if instrumentor.modified_function is None:
        return "cannot find function in AST"

    code = compile(module_ast, module.__file__, "exec", dont_inherit=True)
    new_globals = {}
    exec(code, new_globals)
    new_function = new_globals[f.__name__]
    update_wrapper(new_function, f)
    new_function.__globals__.clear()
    new_function.__globals__.update(f.__globals__)
    return new_function

agronholm · 2023-01-26T09:06:20Z

I'm down to 20 failures. Most of them are due to either the code not yet supporting [Async]Iterator or [Async]Iterable, or more NameErrors due to the function closure not being accessible. Some others are due to failing Self checks. At least one failure was due to lack of missing type checks for generator.send().

agronholm · 2023-01-26T23:20:36Z

I'm having trouble with closures. Like here:

    def test_classmethod_return_valid(self):
        class Foo:
            @classmethod
            @typechecked
            def method(cls) -> Self:
                return Foo()

        Foo.method()

The injected code in method() raises NameError because it can't find Foo which is part of the closure. I'm missing something. I tried to create a new function using just the new code, but the code object seems incompatible with the closure. I'm trying to investigate further.

agronholm · 2023-01-26T23:32:16Z

Ok, looks like the problem is that the newly generated code object doesn't have the free variables the original one didn't, and that makes the closure incompatible with the new code. I have to trick the compiler to give it the same free variables.

agronholm · 2023-01-27T14:12:15Z

I managed to fix the test I was focusing on, but a bunch of other tests broke in the process 😓
But, I managed to refactor the code to avoid exec() entirely! I figured out how to fish the function code object directly from the module code object, even when it's deeply nested.

agronholm · 2023-01-27T16:39:45Z

Agh, you can construct a types.CellType() in Python 3.8+, but not in 3.7. I'm not sure how to deal even with Python 3.8 in this regard, let alone 3.7 where cells can't be constructed explicitly.

agronholm · 2023-01-27T19:38:41Z

Ok, so turns out _Cell.cell_contents is writable even on py3.7. I managed to fix all of the related test failures. I'm down to 15 failures now, with most errors being either Self or generator/iterator related.

marcelroed · 2023-01-27T20:34:46Z

Sweet! Good progress! Are the issues with Self a problem in 3.11, or just in older versions with typing-extensions?

agronholm · 2023-01-27T20:37:47Z

The problems with Self lie in how the call memo binds the value of Self. I'm looking into it right now. As for generators/iterators, those tests broke due to how type checking is now done there (the semantics changed quite a bit), and one test fails because there is no checking for send values yet.

agronholm · 2023-01-27T21:02:16Z

Ok, the problem with Self is that my new Instrumentor only uses TypeguardTransformer to do the code injection specifically on the target function, but TypeguardTransformer uses information gathered during node tree traversal to determine whether the function is a method or not. Since it doesn't have that info, it assumes that the function is not a method, and the injected code passes has_self_arg=False to CallMemo.

agronholm · 2023-01-28T11:27:53Z

Managed to fix the Self failures. Now, 8 failures remain, all of them generator/iterator issues.

agronholm · 2023-01-28T20:24:48Z

I fixed all the test failures, except on PyPy where I still get 3 regex mismatches on the error messages, like:

E   AssertionError: Regex pattern did not match.
E    Regex: 'argument "another" is not an instance of the self type \\(tests.test_typechecked\\.TestSelf\\.test_classmethod_arg_invalid\\.<locals>\\.Foo\\)'
E    Input: 'argument "another" is not an instance of the self type (module)'

agronholm · 2023-01-28T20:38:58Z

I've pushed my changes so far and opened a draft PR as #289.

agronholm · 2023-01-28T20:48:34Z

I didn't mark the PR as fixing this issue since the actual feature you asked for is still missing. This new PR is just the foundation for that work.

agronholm · 2023-01-28T20:54:02Z

Strangely, all the failing tests seem to work just fine when run individually. Looks like a test isolation problem.

marcelroed · 2023-01-28T22:43:09Z

Oh, interesting. Is this a known problem with pytest in PyPy?

agronholm · 2023-01-28T22:47:10Z

I don't think the problem has anything to do with pytest, but is, rather, an artifact of PyPy's garbage collection. Curiously, if I run pytest tests/test_typechecked.py with PyPy, I get 4 errors instead of 3! As I mentioned earlier, this is most likely caused by the caching done in the CallMemo class.

agronholm · 2023-01-29T11:22:10Z

Looks like my initial diagnosis was incorrect. The problem seems to lie in the way CallMemo does argument binding. It incorrectly picks the typeguard import as the self argument, but only sometimes. Once I fix this flaky behavior, that should fix the rest of the failures.

agronholm · 2023-01-29T11:47:01Z

I managed to fix the test flakiness. The PR still needs quite a bit of polishing before I'm ready to merge it to master, but I think it's safe to say that the hardest part is over now.

agronholm · 2023-01-29T23:39:48Z

A bit of rubber ducking here 😃

As I continue to refactor the PR, I encountered a tricky issue. If there's a static method in a class within a function, like:

def container():
    class Foo:
        @typechecked
        @staticmethod
        def mymethod(x: int) -> None:
            pass

This would be rewritten as:

def container():
    class Foo:
        @staticmethod
        def mymethod(x: int) -> None:
            from typeguard import CallMemo, check_argument_types
            _call_memo = CallMemo(Foo.mymethod)
           check_argument_types(_call_memo)

In its current state, the instrumented code fails because it creates a new cell for Foo but places mymethod in that cell. The only reason the previous version worked (the one you currently see in the PR branch) is because CallMemo() got mymethod as its first argument, and since it was added to the closure too, it worked.

Now I have two options: either I create a non-conflicting local variable that holds a reference to the function object, or I create a chain of SimpleNamespace objects that would let me keep the reference as it is. Then again, this would be invisible to the end user and would just incur an extra cost in the form of attribute lookups, so maybe I should stick to the method I accidentally discovered and keep adding the function directly to the closure. I just need to tell the AST transformer to change the reference to match if the target function is nested.

marcelroed · 2023-02-02T00:21:40Z

Is this completely solved now? I have some more time this week.

agronholm · 2023-02-02T11:37:38Z

No, it's not solved yet. This work exhausted me and another project required my attention. I will be able to continue work next Saturday, at the earliest.

agronholm · 2023-02-02T11:38:20Z

The basic idea is to find a non-conflicting name and assign the function object to the closure under that name so it can be used to create a CallMemo.

agronholm · 2023-02-06T00:16:32Z

I managed to fix the problem now, and pushed the changes. I'm still not quite done with the PR though, but soon.

agronholm · 2023-02-06T23:00:29Z

@marcelroed I marked the PR as ready. There's room for optimization but I'm pretty satisfied with the results thus far. Would you like to review it before I merge? I realize it's a large, complicated changeset so I don't expect you to do any in-depth inspection, just maybe playing around with it a bit.

marcelroed · 2023-02-06T23:02:27Z

Awesome, I'll have a look! Are there any aspects you feel like don't have enough tests?

agronholm · 2023-02-06T23:04:12Z

If there are bugs, they would most likely triggered by a pathological combination of nested functions, classes in functions and other abominations like that. I'm not 100% confident about generators (vanilla or async) but there are some tests trying to ensure that they work right.

agronholm · 2023-02-06T23:56:26Z

BTW, I've already started work on the actual feature you requested. AST transformation checks are now working in 4 cases:

Annotated assignment (x: int = foo())
Later assignment to local annotated variable (x: int; x = foo())
Multiple assignments to annotated variables (x: int; y: str; x, y = foo(); checks only locally annotated variables)
Assignment expressions (x: int; if x := foo(): ...)

What's still missing:

Checking of assignments to variables defined as function arguments
Checking of augmented assignments (x: int; x += 1)

Closes #208.

agronholm · 2023-02-07T23:04:39Z

I've created a new draft PR that builds on the previous one. I resolved the issues I wrote about in my previous comment, and all forms of assignments to local variables and arguments should be guarded by type checks now.

agronholm · 2023-02-07T23:13:03Z

I need to improve the type checking error message emitted by bad variable assignments. Something to be done later!

agronholm · 2023-02-09T10:10:02Z

@marcelroed could I get some comments on the PR? Does it do what you want?

marcelroed · 2023-02-09T16:10:30Z

Yep, working on it now!

marcelroed added the enhancement label Oct 26, 2021

agronholm added a commit that referenced this issue Feb 7, 2023

Check variable assignments in instrumented functions

0123f4e

Closes #208.

agronholm closed this as completed in e9a5247 Feb 25, 2023

Flag for inline typechecking #208

Flag for inline typechecking #208

Comments

marcelroed commented Oct 26, 2021

agronholm commented Nov 6, 2021

marcelroed commented Nov 6, 2021 • edited Loading

agronholm commented Nov 6, 2021

marcelroed commented Nov 6, 2021

agronholm commented Nov 6, 2021

marcelroed commented Nov 7, 2021 • edited Loading

agronholm commented Jan 11, 2023

marcelroed commented Jan 11, 2023

agronholm commented Jan 11, 2023

agronholm commented Jan 23, 2023

marcelroed commented Jan 23, 2023

agronholm commented Jan 23, 2023

marcelroed commented Jan 23, 2023

agronholm commented Jan 23, 2023

agronholm commented Jan 23, 2023

marcelroed commented Jan 24, 2023

agronholm commented Jan 24, 2023

agronholm commented Jan 24, 2023

marcelroed commented Jan 24, 2023

agronholm commented Jan 24, 2023

agronholm commented Jan 25, 2023 • edited Loading

marcelroed commented Jan 25, 2023

agronholm commented Jan 25, 2023

agronholm commented Jan 25, 2023

agronholm commented Jan 25, 2023

agronholm commented Jan 26, 2023 • edited Loading

agronholm commented Jan 26, 2023

agronholm commented Jan 26, 2023

agronholm commented Jan 27, 2023 • edited Loading

agronholm commented Jan 27, 2023

agronholm commented Jan 27, 2023

marcelroed commented Jan 27, 2023

agronholm commented Jan 27, 2023

agronholm commented Jan 27, 2023

agronholm commented Jan 28, 2023

agronholm commented Jan 28, 2023

agronholm commented Jan 28, 2023

agronholm commented Jan 28, 2023

agronholm commented Jan 28, 2023

marcelroed commented Jan 28, 2023

agronholm commented Jan 28, 2023

agronholm commented Jan 29, 2023

agronholm commented Jan 29, 2023

agronholm commented Jan 29, 2023 • edited Loading

marcelroed commented Feb 2, 2023

agronholm commented Feb 2, 2023

agronholm commented Feb 2, 2023

agronholm commented Feb 6, 2023

agronholm commented Feb 6, 2023

marcelroed commented Feb 6, 2023

agronholm commented Feb 6, 2023

agronholm commented Feb 6, 2023 • edited Loading

agronholm commented Feb 7, 2023

agronholm commented Feb 7, 2023

agronholm commented Feb 9, 2023

marcelroed commented Feb 9, 2023

marcelroed commented Nov 6, 2021 •

edited

Loading

marcelroed commented Nov 7, 2021 •

edited

Loading

agronholm commented Jan 25, 2023 •

edited

Loading

agronholm commented Jan 26, 2023 •

edited

Loading

agronholm commented Jan 27, 2023 •

edited

Loading

agronholm commented Jan 29, 2023 •

edited

Loading

agronholm commented Feb 6, 2023 •

edited

Loading