Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow subclassing NamedTuple as a nested class #4419

Closed
wants to merge 6 commits into from

Conversation

elliott-beach
Copy link
Contributor

@elliott-beach elliott-beach commented Jan 2, 2018

Fixes #4349.

This 1-line change allows subclassing NamedTuple inside a function.

The current problem is that inner NamedTuples aren't added to the symbol table.
Inner classes get added in self.setup_class_def_analysis(defn) , so we just call that method before calling analyze_namedtuple_classdef, which looks up the node in the symbol table.

As far as I can tell, setup_class_def_analysis does the work that would be done in pass 1's visit_class_defn, which, for inner classes, adds the defn to the local symbol table and the global symbol table, under a mangled name. That should also happen for NamedTuples.

I added some semaneval unit tests for this change to verify the change in behavior, but that may not be the right place. I might want to add a test to prove that local NamedTuples need to be go in the global symbol table as well.

@JelleZijlstra
Copy link
Member

Thanks! Could you also add a direct test for local namedtuples (i.e., not a semanal test)?

@elliott-beach
Copy link
Contributor Author

done!

@elliott-beach
Copy link
Contributor Author

elliott-beach commented Jan 7, 2018

Friendly ping on this.

@gvanrossum
Copy link
Member

I hope someone else can approve/merge this; I'd be lying if I said I understand the repercussions of moving self.setup_class_def_analysis(defn) up. Sorry. :-(

@@ -671,6 +671,7 @@ def analyze_class_body(self, defn: ClassDef) -> Iterator[bool]:
if self.analyze_typeddict_classdef(defn):
yield False
return
self.setup_class_def_analysis(defn)
Copy link
Collaborator

@emmatyping emmatyping Jan 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While this works, I'm not sure if this is the best change that could be made. As we can see in analyze_namedtuple_classdef:
(referencing this TODO),
the root of the issue is that the analysis and processing of namedtuples is unconditionally global. If I understand things correctly, your change adds the node early to the symbol table so it is coerced into being a local definition by setup_class_def_analysis. That works, but is rather brittle.

Copy link
Contributor Author

@elliott-beach elliott-beach Jan 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that this is incomplete because we need to handle local definitions of NamedTuple in a few other places, but I would still argue that we should call setup_class_def_analysis for NamedTuples in this way. Local classes are added to the symbol table in setup_class_def_analysis, and there is no reason not to add local NamedTuples at that point as well.

This doesn't seem early to me because semantic analysis appears to me to proceed in two steps of (1) adding nodes with basic info to the symbol table and then (2) adding more complex type info. I think here is the right place to do step (1) for NamedTuples.

Copy link
Contributor Author

@elliott-beach elliott-beach Jan 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The TODO comments by @JukkaL appear to be referring to assigning node.kind as GDEF or LDEF, which I think would be simple to fix.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay I think this is acceptable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, Let me fix the todos.

Copy link
Contributor Author

@elliott-beach elliott-beach Jan 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, my understanding is LDEF and GDEF for NamedTuples, TypedDicts, and Enums is a longstanding issue that is not the cause of the specific problem this PR is solving, and I am not too confident on the solution. So I won't address those.

@elliott-beach
Copy link
Contributor Author

@JukkaL Would you look at this? I think you could merge this / request changes with confidence.

@JukkaL
Copy link
Collaborator

JukkaL commented Jan 12, 2018

I'll have a look. This could have implications for incremental and fine-grained incremental modes. In particular, it's possible for a local named tuple to leak to the surrounding class, and this might plausibly cause trouble:

class A:
    def f(self) -> None:
        class N(NamedTuple):
            ...
        self.n = N(...)

@elliott-beach
Copy link
Contributor Author

it's possible for a local named tuple to leak to the surrounding class, and this might plausibly cause trouble

I'll be looking at what you've been doing on dmypy to gain a better understanding of this.

@gvanrossum
Copy link
Member

I'll be looking at what you've been doing on dmypy to gain a better understanding of this.

I don't think dmypy has anything to do with this, does it?

@elliott-beach
Copy link
Contributor Author

elliott-beach commented Jan 13, 2018 via email

@gvanrossum
Copy link
Member

Not quite -- dmypy is a recent experimental "daemon" mode that is separate from incremental (-i, not new) and fine-grained (super experimental) modes.

The problem Jukka was referring to can occur with regular incremental mode. If there's a NamedTuple definition local to a function that's used to declare the function's return type, then (IIRC) the JSON cache file written by incremental mode will contain a reference to that type by name, which won't be resolvable when the cache file is read back in on a subsequent incremental run.

In any case, you nerd-sniped me, and I constructed the following example (with your patch applied):

  • Create file a.py:
from typing import NamedTuple
class C:
    def __init__(self) -> None:
        class A(NamedTuple):
            x: int
        self.a = A(1)
  • Run mypy a.py
  • Create file main.py:
from a import C
reveal_type(C().a)
  • Run mypy -i a.py -- crash!

Without your patch, the line self.a = A(1) fails.

Note that if instead of class A... we use A = NamedTuple('A', [('x', int)]) things have a happy ending -- that's because class definitions are processed differently than NamedTuple(...) calls.

In case you're interested, here's the traceback:

Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/Users/guido/src/mypy/mypy/__main__.py", line 11, in <module>
    main(None)
  File "/Users/guido/src/mypy/mypy/main.py", line 66, in main
    res = type_check_only(sources, bin_dir, options)
  File "/Users/guido/src/mypy/mypy/main.py", line 119, in type_check_only
    options=options)
  File "/Users/guido/src/mypy/mypy/build.py", line 218, in build
    graph = dispatch(sources, manager)
  File "/Users/guido/src/mypy/mypy/build.py", line 2001, in dispatch
    process_graph(graph, manager)
  File "/Users/guido/src/mypy/mypy/build.py", line 2298, in process_graph
    process_fresh_scc(graph, prev_scc, manager)
  File "/Users/guido/src/mypy/mypy/build.py", line 2375, in process_fresh_scc
    graph[id].fix_cross_refs()
  File "/Users/guido/src/mypy/mypy/build.py", line 1701, in fix_cross_refs
    self.manager.options.quick_and_dirty)
  File "/Users/guido/src/mypy/mypy/fixup.py", line 22, in fixup_module_pass_one
    node_fixer.visit_symbol_table(tree.names)
  File "/Users/guido/src/mypy/mypy/fixup.py", line 84, in visit_symbol_table
    self.quick_and_dirty)
  File "/Users/guido/src/mypy/mypy/fixup.py", line 284, in lookup_qualified_stnode
    assert key in names, "Cannot find %s for %s" % (key, name)
AssertionError: Cannot find A for a.C.A

@elliott-beach elliott-beach changed the title Allow NamedTuple inside function Allow subclassing NamedTuple inside function Jan 14, 2018
@elliott-beach
Copy link
Contributor Author

elliott-beach commented Jan 14, 2018

I'm grateful for the very detailed explanation on this! I'm reading through the related discussion (like #2855) while thinking about a solution.

@elliott-beach
Copy link
Contributor Author

elliott-beach commented Jan 14, 2018

Sorry that this is so long, but I wanted to document my thought process as I debugged this. I'm planning on going with approach (1) below.

The problem is local namedtuples are stored under the classe's symbol table but local classes are stored under the global symbol table, and subclasses of namedtuple under my patch have inconsistent behavior that combines both.

Consider the file a.py:

from typing import NamedTuple
class C:
    def __init__(self) -> None:
        class A(NamedTuple):
            x: int
        class B:
            def __init__(self, x):
                self.x = x
        D = NamedTuple('D', [('x', int)])
        self.a = A(1)
        self.b = B(1)
        self.d = D(1)

B and D work, but A is broke for incremental.

Inner NamedTuples defined with D = NamedTuple('D', ...) expect to be stored under the containing class because the _fullname attribute of the symbol table node is a.C.D, local to the class.

Whereas inner classes expect to be stored in the global symbol table because the _fullname for classes is module.LocalClass, e.g., a.B.

This matters because if the node with fullname a.B were stored in a.C.B (a.C's symbol table), then serialization will assume that this is a cross-reference and not store the full definition of B:

mypy/mypy/nodes.py

Lines 2418 to 2421 in 6fce774

if (fullname is not None and '.' in fullname and
fullname != prefix + '.' + name):
data['cross_ref'] = fullname
return data

Likewise, under my patch, A has the fullname a.C.A but it is stored in the symbol table for a, not a.C, so the cache does not write the definition.

I see three possible approaches:

  1. Store A under the C's namespace as a.C.A. This requires the least changes and would be the path I am inclined to, but I wanted to ask about the other approaches. I feel like it may leave technical debt because the behavior is different for NamedTuples and TypeDicts than inner classes.

  2. Change the fullname for A to be a.A and store it under the global namespace. This would require modifying build_namedtuple_typeinfo and require the same change for D, as well as typedicts. However I think it makes more sense to store inner namedtuples under the class scope.

  3. Change the fullname for A and B to a.C.A and a.C.B respectively, storing inner classes and inner NamedTuple classes under the class symbol table. This makes sense to me because this is how D as well as inner TypedDicts are stored.

I believe the reason inner classes are stored under the global symbol table but inner namedtuples/typedicts are stored in the class symbol table is that the code was written by different developers. ilevkivskyi implemented caching local classes in #2855, whereas gvanrossum implemented caching inner TypedDicts and D-style NamedTuples in #2553. My insight here is not 20/20 though.

@elliott-beach elliott-beach changed the title Allow subclassing NamedTuple inside function Allow subclassing NamedTuple as a nested class Jan 14, 2018
@elliott-beach
Copy link
Contributor Author

Also related: #2931

Possible fix would be to store all "anonymous" classes in globals.

@@ -964,10 +968,14 @@ def analyze_namedtuple_classdef(self, defn: ClassDef) -> Optional[TypeInfo]:
if base_expr.fullname == 'typing.NamedTuple':
node = self.lookup(defn.name, defn)
if node is not None:
if self.type or self.is_func_scope():
name = defn.name + '@' + str(defn.line)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name has to be mangled so that, for incremental mode, it matches the name of type (but it makes sense anyway so it does not leak outside the containing scope).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is different from how we do name mangling for other kinds of types, such as enums. We should only do it in a function scope.

@elliott-beach
Copy link
Contributor Author

elliott-beach commented Jan 20, 2018

Bump for feedback. @JukkaL I did fix the type leaking issue by storing NamedTuples class definitions under the outer class, if it exists.

Copy link
Collaborator

@JukkaL JukkaL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is basically reasonable, modulo some details. If the details can't be worked out cleanly, it might make sense to do some refactoring before fixing the underlying issue. It's not a problem with this PR specifically -- I'm a little worried that this adds extra complexity to code that is already very messy (and kind of broken even).

Also, it seems like TypedDict classes have the same issues, but this only addresses NamedTuple classes. This brings some asymmetry. At least a TODO comment about this would be helpful.

@@ -964,10 +968,14 @@ def analyze_namedtuple_classdef(self, defn: ClassDef) -> Optional[TypeInfo]:
if base_expr.fullname == 'typing.NamedTuple':
node = self.lookup(defn.name, defn)
if node is not None:
if self.type or self.is_func_scope():
name = defn.name + '@' + str(defn.line)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is different from how we do name mangling for other kinds of types, such as enums. We should only do it in a function scope.

@@ -1046,7 +1054,11 @@ def setup_class_def_analysis(self, defn: ClassDef) -> None:
local_name = defn.info._fullname + '@' + str(defn.line)
defn.info._fullname = self.cur_mod_id + '.' + local_name
defn.fullname = defn.info._fullname
self.globals[local_name] = node
if self.type and self.is_namedtuple_classdef(defn):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This special casing of named tuples here is kind ugly. Why do we need this?

Copy link
Contributor Author

@elliott-beach elliott-beach Feb 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that, inside a class method, local namedtuples are stored under the classes's symbol table (and are expected to be stored there in other parts of the code) but local classes are stored under the global symbol table. Storing subclasses of namedtuple in the global table breaks incremental mode.

As you suggested a little refactoring might be more effective than this haphazard fix, I would suggest storing local classes under the class symbol table. Then we wouldn't need this. IIRC local TypedDicts and other special types (???) are also stored in the class symbol table, so it be most consistent to always store local classes under the local symbol table.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went a little it more into the technical details in #4419 (comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be delighted to make that change if you agree.

[case testNamedTupleDirectSubclass]
from typing import NamedTuple
class A(NamedTuple):
x: int
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style nit: indent should be 4 spaces.

from typing import NamedTuple
def foo():
class A(NamedTuple):
x: int
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style nit: indent should be 4 spaces.

@@ -956,6 +956,10 @@ def get_all_bases_tvars(self, defn: ClassDef, removed: List[int]) -> TypeVarList
tvars.extend(base_tvars)
return remove_dups(tvars)

def is_namedtuple_classdef(self, defn: ClassDef) -> bool:
base_exprs = defn.base_type_exprs
return any(getattr(b, 'fullname', None) == 'typing.NamedTuple' for b in base_exprs)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This duplicates similar logic in analyze_namedtuple_classdef. There should be only one implementation of looking for a typing.NamedTuple base class.

Style nit: We avoid the use of getattr if there's a reasonable way around it, since it can't be type checked.

@emmatyping
Copy link
Collaborator

So is the decision here to do the refactor first?

@gvanrossum
Copy link
Member

This is now quite stale -- there was no response to the code review from February, and there's now (of course) a merge conflict. Should we just close it? (That would make me sad given the effort put into this, but sometimes it's the only alternative.)

@elliott-beach
Copy link
Contributor Author

elliott-beach commented Sep 25, 2018

I would say there was actually a response to the code review. Look! #4419 (comment)
I did snub Ethan in May.

Anyway, after getting a real job I view this whole affair as nonessential tinkering that may benefit no-one, which is probably why Jukka doesn't have time to respond. I also think expert work should be left to experts. So long, and thanks for all the fish.

@gvanrossum
Copy link
Member

gvanrossum commented Sep 25, 2018 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants