Allow subclassing NamedTuple as a nested class #4419

elliott-beach · 2018-01-02T20:46:56Z

This 1-line change allows subclassing NamedTuple inside a function.

The current problem is that inner NamedTuples aren't added to the symbol table.
Inner classes get added in self.setup_class_def_analysis(defn) , so we just call that method before calling analyze_namedtuple_classdef, which looks up the node in the symbol table.

As far as I can tell, setup_class_def_analysis does the work that would be done in pass 1's visit_class_defn, which, for inner classes, adds the defn to the local symbol table and the global symbol table, under a mangled name. That should also happen for NamedTuples.

I added some semaneval unit tests for this change to verify the change in behavior, but that may not be the right place. I might want to add a test to prove that local NamedTuples need to be go in the global symbol table as well.

JelleZijlstra · 2018-01-03T01:04:10Z

Thanks! Could you also add a direct test for local namedtuples (i.e., not a semanal test)?

elliott-beach · 2018-01-03T02:26:59Z

done!

elliott-beach · 2018-01-07T20:23:00Z

Friendly ping on this.

gvanrossum · 2018-01-07T20:42:21Z

I hope someone else can approve/merge this; I'd be lying if I said I understand the repercussions of moving self.setup_class_def_analysis(defn) up. Sorry. :-(

emmatyping · 2018-01-07T20:58:01Z

mypy/semanal.py

@@ -671,6 +671,7 @@ def analyze_class_body(self, defn: ClassDef) -> Iterator[bool]:
            if self.analyze_typeddict_classdef(defn):
                yield False
                return
+            self.setup_class_def_analysis(defn)


While this works, I'm not sure if this is the best change that could be made. As we can see in analyze_namedtuple_classdef:
(referencing this TODO),
the root of the issue is that the analysis and processing of namedtuples is unconditionally global. If I understand things correctly, your change adds the node early to the symbol table so it is coerced into being a local definition by setup_class_def_analysis. That works, but is rather brittle.

I agree that this is incomplete because we need to handle local definitions of NamedTuple in a few other places, but I would still argue that we should call setup_class_def_analysis for NamedTuples in this way. Local classes are added to the symbol table in setup_class_def_analysis, and there is no reason not to add local NamedTuples at that point as well.

This doesn't seem early to me because semantic analysis appears to me to proceed in two steps of (1) adding nodes with basic info to the symbol table and then (2) adding more complex type info. I think here is the right place to do step (1) for NamedTuples.

The TODO comments by @JukkaL appear to be referring to assigning node.kind as GDEF or LDEF, which I think would be simple to fix.

Okay I think this is acceptable.

Great, Let me fix the todos.

Actually, my understanding is LDEF and GDEF for NamedTuples, TypedDicts, and Enums is a longstanding issue that is not the cause of the specific problem this PR is solving, and I am not too confident on the solution. So I won't address those.

elliott-beach · 2018-01-12T02:51:11Z

@JukkaL Would you look at this? I think you could merge this / request changes with confidence.

JukkaL · 2018-01-12T10:49:02Z

I'll have a look. This could have implications for incremental and fine-grained incremental modes. In particular, it's possible for a local named tuple to leak to the surrounding class, and this might plausibly cause trouble:

class A:
    def f(self) -> None:
        class N(NamedTuple):
            ...
        self.n = N(...)

elliott-beach · 2018-01-13T02:15:57Z

it's possible for a local named tuple to leak to the surrounding class, and this might plausibly cause trouble

I'll be looking at what you've been doing on dmypy to gain a better understanding of this.

gvanrossum · 2018-01-13T04:33:06Z

I'll be looking at what you've been doing on dmypy to gain a better understanding of this.

I don't think dmypy has anything to do with this, does it?

elliott-beach · 2018-01-13T04:35:09Z

My understanding was @JukkaL was referring to dmypy when talking about fine-grained incremental mode.

…

On Fri, Jan 12, 2018 at 10:33 PM Guido van Rossum ***@***.***> wrote: I'll be looking at what you've been doing on dmypy to gain a better understanding of this. I don't think dmypy has anything to do with this, does it? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#4419 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ANBOAkW008CELeQx92l08xkRWk2raKEFks5tKDIFgaJpZM4RRDyF> .

gvanrossum · 2018-01-13T05:05:23Z

Not quite -- dmypy is a recent experimental "daemon" mode that is separate from incremental (-i, not new) and fine-grained (super experimental) modes.

The problem Jukka was referring to can occur with regular incremental mode. If there's a NamedTuple definition local to a function that's used to declare the function's return type, then (IIRC) the JSON cache file written by incremental mode will contain a reference to that type by name, which won't be resolvable when the cache file is read back in on a subsequent incremental run.

In any case, you nerd-sniped me, and I constructed the following example (with your patch applied):

Create file a.py:

from typing import NamedTuple
class C:
    def __init__(self) -> None:
        class A(NamedTuple):
            x: int
        self.a = A(1)

Run mypy a.py
Create file main.py:

from a import C
reveal_type(C().a)

Run mypy -i a.py -- crash!

Without your patch, the line self.a = A(1) fails.

Note that if instead of class A... we use A = NamedTuple('A', [('x', int)]) things have a happy ending -- that's because class definitions are processed differently than NamedTuple(...) calls.

In case you're interested, here's the traceback:

Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/Users/guido/src/mypy/mypy/__main__.py", line 11, in <module>
    main(None)
  File "/Users/guido/src/mypy/mypy/main.py", line 66, in main
    res = type_check_only(sources, bin_dir, options)
  File "/Users/guido/src/mypy/mypy/main.py", line 119, in type_check_only
    options=options)
  File "/Users/guido/src/mypy/mypy/build.py", line 218, in build
    graph = dispatch(sources, manager)
  File "/Users/guido/src/mypy/mypy/build.py", line 2001, in dispatch
    process_graph(graph, manager)
  File "/Users/guido/src/mypy/mypy/build.py", line 2298, in process_graph
    process_fresh_scc(graph, prev_scc, manager)
  File "/Users/guido/src/mypy/mypy/build.py", line 2375, in process_fresh_scc
    graph[id].fix_cross_refs()
  File "/Users/guido/src/mypy/mypy/build.py", line 1701, in fix_cross_refs
    self.manager.options.quick_and_dirty)
  File "/Users/guido/src/mypy/mypy/fixup.py", line 22, in fixup_module_pass_one
    node_fixer.visit_symbol_table(tree.names)
  File "/Users/guido/src/mypy/mypy/fixup.py", line 84, in visit_symbol_table
    self.quick_and_dirty)
  File "/Users/guido/src/mypy/mypy/fixup.py", line 284, in lookup_qualified_stnode
    assert key in names, "Cannot find %s for %s" % (key, name)
AssertionError: Cannot find A for a.C.A

elliott-beach · 2018-01-14T02:03:38Z

I'm grateful for the very detailed explanation on this! I'm reading through the related discussion (like #2855) while thinking about a solution.

elliott-beach · 2018-01-14T20:08:42Z

Sorry that this is so long, but I wanted to document my thought process as I debugged this. I'm planning on going with approach (1) below.

The problem is local namedtuples are stored under the classe's symbol table but local classes are stored under the global symbol table, and subclasses of namedtuple under my patch have inconsistent behavior that combines both.

Consider the file a.py:

from typing import NamedTuple
class C:
    def __init__(self) -> None:
        class A(NamedTuple):
            x: int
        class B:
            def __init__(self, x):
                self.x = x
        D = NamedTuple('D', [('x', int)])
        self.a = A(1)
        self.b = B(1)
        self.d = D(1)

B and D work, but A is broke for incremental.

Inner NamedTuples defined with D = NamedTuple('D', ...) expect to be stored under the containing class because the _fullname attribute of the symbol table node is a.C.D, local to the class.

Whereas inner classes expect to be stored in the global symbol table because the _fullname for classes is module.LocalClass, e.g., a.B.

This matters because if the node with fullname a.B were stored in a.C.B (a.C's symbol table), then serialization will assume that this is a cross-reference and not store the full definition of B:

mypy/mypy/nodes.py

Lines 2418 to 2421 in 6fce774

    
           if (fullname is not None and '.' in fullname and 
        
                   fullname != prefix + '.' + name): 
        
               data['cross_ref'] = fullname 
        
               return data

Likewise, under my patch, A has the fullname a.C.A but it is stored in the symbol table for a, not a.C, so the cache does not write the definition.

I see three possible approaches:

Store A under the C's namespace as a.C.A. This requires the least changes and would be the path I am inclined to, but I wanted to ask about the other approaches. I feel like it may leave technical debt because the behavior is different for NamedTuples and TypeDicts than inner classes.
Change the fullname for A to be a.A and store it under the global namespace. This would require modifying build_namedtuple_typeinfo and require the same change for D, as well as typedicts. However I think it makes more sense to store inner namedtuples under the class scope.
Change the fullname for A and B to a.C.A and a.C.B respectively, storing inner classes and inner NamedTuple classes under the class symbol table. This makes sense to me because this is how D as well as inner TypedDicts are stored.

I believe the reason inner classes are stored under the global symbol table but inner namedtuples/typedicts are stored in the class symbol table is that the code was written by different developers. ilevkivskyi implemented caching local classes in #2855, whereas gvanrossum implemented caching inner TypedDicts and D-style NamedTuples in #2553. My insight here is not 20/20 though.

elliott-beach · 2018-01-15T03:28:14Z

Also related: #2931

Possible fix would be to store all "anonymous" classes in globals.

elliott-beach · 2018-01-16T03:01:16Z

mypy/semanal.py

@@ -964,10 +968,14 @@ def analyze_namedtuple_classdef(self, defn: ClassDef) -> Optional[TypeInfo]:
                if base_expr.fullname == 'typing.NamedTuple':
                    node = self.lookup(defn.name, defn)
                    if node is not None:
+                        if self.type or self.is_func_scope():
+                            name = defn.name + '@' + str(defn.line)


The name has to be mangled so that, for incremental mode, it matches the name of type (but it makes sense anyway so it does not leak outside the containing scope).

This is different from how we do name mangling for other kinds of types, such as enums. We should only do it in a function scope.

elliott-beach · 2018-01-20T20:17:26Z

Bump for feedback. @JukkaL I did fix the type leaking issue by storing NamedTuples class definitions under the outer class, if it exists.

JukkaL

This is basically reasonable, modulo some details. If the details can't be worked out cleanly, it might make sense to do some refactoring before fixing the underlying issue. It's not a problem with this PR specifically -- I'm a little worried that this adds extra complexity to code that is already very messy (and kind of broken even).

Also, it seems like TypedDict classes have the same issues, but this only addresses NamedTuple classes. This brings some asymmetry. At least a TODO comment about this would be helpful.

JukkaL · 2018-02-06T15:41:46Z

mypy/semanal.py

@@ -964,10 +968,14 @@ def analyze_namedtuple_classdef(self, defn: ClassDef) -> Optional[TypeInfo]:
                if base_expr.fullname == 'typing.NamedTuple':
                    node = self.lookup(defn.name, defn)
                    if node is not None:
+                        if self.type or self.is_func_scope():
+                            name = defn.name + '@' + str(defn.line)


This is different from how we do name mangling for other kinds of types, such as enums. We should only do it in a function scope.

JukkaL · 2018-02-06T15:48:51Z

mypy/semanal.py

@@ -1046,7 +1054,11 @@ def setup_class_def_analysis(self, defn: ClassDef) -> None:
                local_name = defn.info._fullname + '@' + str(defn.line)
                defn.info._fullname = self.cur_mod_id + '.' + local_name
                defn.fullname = defn.info._fullname
-                self.globals[local_name] = node
+                if self.type and self.is_namedtuple_classdef(defn):


This special casing of named tuples here is kind ugly. Why do we need this?

The problem is that, inside a class method, local namedtuples are stored under the classes's symbol table (and are expected to be stored there in other parts of the code) but local classes are stored under the global symbol table. Storing subclasses of namedtuple in the global table breaks incremental mode.

As you suggested a little refactoring might be more effective than this haphazard fix, I would suggest storing local classes under the class symbol table. Then we wouldn't need this. IIRC local TypedDicts and other special types (???) are also stored in the class symbol table, so it be most consistent to always store local classes under the local symbol table.

I went a little it more into the technical details in #4419 (comment)

I would be delighted to make that change if you agree.

JukkaL · 2018-02-06T15:50:00Z

test-data/unit/semanal-namedtuple.test

+[case testNamedTupleDirectSubclass]
+from typing import NamedTuple
+class A(NamedTuple):
+	x: int


Style nit: indent should be 4 spaces.

JukkaL · 2018-02-06T15:50:36Z

test-data/unit/semanal-namedtuple.test

+from typing import NamedTuple
+def foo():
+    class A(NamedTuple):
+            x: int


Style nit: indent should be 4 spaces.

JukkaL · 2018-02-06T16:12:18Z

mypy/semanal.py

@@ -956,6 +956,10 @@ def get_all_bases_tvars(self, defn: ClassDef, removed: List[int]) -> TypeVarList
                tvars.extend(base_tvars)
        return remove_dups(tvars)

+    def is_namedtuple_classdef(self, defn: ClassDef) -> bool:
+        base_exprs = defn.base_type_exprs
+        return any(getattr(b, 'fullname', None) == 'typing.NamedTuple' for b in base_exprs)


This duplicates similar logic in analyze_namedtuple_classdef. There should be only one implementation of looking for a typing.NamedTuple base class.

Style nit: We avoid the use of getattr if there's a reasonable way around it, since it can't be type checked.

emmatyping · 2018-05-29T05:03:35Z

So is the decision here to do the refactor first?

gvanrossum · 2018-09-24T19:21:29Z

This is now quite stale -- there was no response to the code review from February, and there's now (of course) a merge conflict. Should we just close it? (That would make me sad given the effort put into this, but sometimes it's the only alternative.)

elliott-beach · 2018-09-25T01:30:28Z

I would say there was actually a response to the code review. Look! #4419 (comment)
I did snub Ethan in May.

Anyway, after getting a real job I view this whole affair as nonessential tinkering that may benefit no-one, which is probably why Jukka doesn't have time to respond. I also think expert work should be left to experts. So long, and thanks for all the fish.

gvanrossum · 2018-09-25T03:29:41Z

Thanks for clarifying, and sorry for misreading the issue.

elliott-beach added 2 commits January 2, 2018 13:29

tests for local NamedTuple subclass (python#4349)

a295319

Allow subclassing NamedTuple in function (python#4349)

dc180e7

direct test for inner namedtuple

ea573e2

emmatyping requested changes Jan 7, 2018

View reviewed changes

emmatyping approved these changes Jan 12, 2018

View reviewed changes

elliott-beach changed the title ~~Allow NamedTuple inside function~~ Allow subclassing NamedTuple inside function Jan 14, 2018

elliott-beach changed the title ~~Allow subclassing NamedTuple inside function~~ Allow subclassing NamedTuple as a nested class Jan 14, 2018

elliott-beach added 3 commits January 14, 2018 15:56

add test for incremental namedtuple

6eae524

handle caching namedtuple

abf5a0a

fix lint errors

b398542

elliott-beach commented Jan 16, 2018

View reviewed changes

JukkaL reviewed Feb 6, 2018

View reviewed changes

elliott-beach closed this Sep 25, 2018

Allow subclassing NamedTuple as a nested class #4419

Allow subclassing NamedTuple as a nested class #4419

Conversation

elliott-beach commented Jan 2, 2018 • edited Loading

JelleZijlstra commented Jan 3, 2018

elliott-beach commented Jan 3, 2018

elliott-beach commented Jan 7, 2018 • edited Loading

gvanrossum commented Jan 7, 2018

emmatyping Jan 7, 2018 • edited Loading

Choose a reason for hiding this comment

elliott-beach Jan 7, 2018 • edited Loading

Choose a reason for hiding this comment

elliott-beach Jan 7, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

elliott-beach Jan 12, 2018 • edited Loading

Choose a reason for hiding this comment

elliott-beach commented Jan 12, 2018

JukkaL commented Jan 12, 2018

elliott-beach commented Jan 13, 2018

gvanrossum commented Jan 13, 2018

elliott-beach commented Jan 13, 2018 via email

gvanrossum commented Jan 13, 2018

elliott-beach commented Jan 14, 2018 • edited Loading

elliott-beach commented Jan 14, 2018 • edited Loading

elliott-beach commented Jan 15, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

elliott-beach commented Jan 20, 2018 • edited Loading

JukkaL left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

elliott-beach Feb 8, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

emmatyping commented May 29, 2018

gvanrossum commented Sep 24, 2018

elliott-beach commented Sep 25, 2018 • edited Loading

gvanrossum commented Sep 25, 2018 via email

elliott-beach commented Jan 2, 2018 •

edited

Loading

elliott-beach commented Jan 7, 2018 •

edited

Loading

emmatyping Jan 7, 2018 •

edited

Loading

elliott-beach Jan 7, 2018 •

edited

Loading

elliott-beach Jan 7, 2018 •

edited

Loading

elliott-beach Jan 12, 2018 •

edited

Loading

elliott-beach commented Jan 14, 2018 •

edited

Loading

elliott-beach commented Jan 14, 2018 •

edited

Loading

elliott-beach commented Jan 20, 2018 •

edited

Loading

elliott-beach Feb 8, 2018 •

edited

Loading

elliott-beach commented Sep 25, 2018 •

edited

Loading