[WIP] Namespace implementation #4277

m-novikov · 2017-11-22T06:30:04Z

Namespace implementation

My use case pretty much described here #1645 (comment)
What behavior should be if we run mypy -p namespace_pkg
Ref: #2773 #1645

emmatyping

It seems you are failing the correct test :^)
In general, I think I'd prefer not have the if self.namespaces sprinkled around the code.
Do you have particular questions about process_stale_scc?

emmatyping · 2017-11-23T07:30:14Z

mypy/build.py

@@ -859,8 +859,12 @@ def find_modules_recursive(module: str, lib_path: List[str]) -> List[BuildSource

 def verify_module(id: str, path: str) -> bool:
    """Check that all packages containing id have a __init__ file."""
+    if path.endswith(tuple(PYTHON_EXTENSIONS)):


I think you should probably rewrite the rest of this function, and update the docstring, as this change makes the docstring incorrect, and much of the rest of the function duplicative (eg path.endswith(('__init__.py', '__init__.pyi')) will never be reached...

Yes, current purpose of this code was to run it on existing codebase with namespace packages to gather some more information about corner cases.
Such functions as verify_module, find_module, find_module_recursive should change their behavior based on option namespace_packages.

emmatyping · 2017-11-23T07:32:53Z

mypy/build.py

-            self.tree = manager.parse_file(self.id, self.xpath, source,
-                                           self.ignore_all or self.options.ignore_errors)
+
+            if not self.namespace:


A small nit:

if <condition>: action else: other action

rather than

if not <condition>: other action else: action

carljm

Thanks for working on this!

carljm · 2017-11-26T06:01:39Z

mypy/build.py

@@ -539,6 +544,7 @@ def __init__(self, data_dir: str,
        self.version_id = version_id
        self.modules = {}  # type: Dict[str, MypyFile]
        self.missing_modules = set()  # type: Set[str]
+        self.module_discovery = module_discovery
        self.plugin = plugin
        self.semantic_analyzer = SemanticAnalyzerPass2(self.modules, self.missing_modules,
                                                  lib_path, self.errors, self.plugin)


I'm not sure why BuildManager should get lib_path as a separate argument at all anymore; that should be encapsulated in ModuleDiscovery. Passing in both is just an opportunity for them to be out of sync, or for some code to use lib_path improperly, bypassing the ModuleDiscovery logic.

The only thing BuildManager seems to use lib_path for now is to pass it along to SemanaticAnalyzerPass2, but the latter doesn't seem to do anything with it either that I can find; it just stashes it on self and never looks at it again. So unless I miss something, I'd eliminate it from both.

Yes, thanks, I will do that.

carljm · 2017-11-26T06:13:10Z

mypy/build.py


-    def find() -> Optional[str]:
+    def find(self, id: str) -> Optional[str]:


Given that this class now handles both regular modules and namespaces, but this method handles only module-finding internals, I think it'd be clearer to name it _find_module instead of find; find as a public method (and first in source order) strongly implies it's the primary public API, which it's not.

carljm · 2017-11-26T06:25:50Z

mypy/build.py

@@ -1823,13 +1879,20 @@ def parse_file(self) -> None:
        self.check_blockers()

    def semantic_analysis(self) -> None:
+        if self.namespace:


Scattering these all through here seems ugly. I'm not very familiar with this logic, but why isn't it possible to just treat a namespace as an empty module and not need to skip all these methods?

carljm · 2017-11-26T06:33:12Z

test-data/unit/check-namespaces.test

+mypy_path = ./tmp/dir:./tmp/other_dir
+[file dir/mod_or_ns.py]
+a = None
+[file other_dir/m/b.py]


Shouldn't m here be mod_or_ns instead, for this test to be useful?

m-novikov · 2017-11-26T09:46:59Z

So it seems there a lot of dimensions to this problem.
Right now mypy will discover the same module located in separate directories.
Given following directory structure:

dir1/mod1/__init__.py
dir2/mod1/a.py
dir2/mod1/__init__.py
dir2/mod1/b.py

Script:

from m import a, b
dir1 = a  # type: str
dir2 = b  # type: str

Produces following errors:

test.py:3: error: Incompatible types in assignment (expression has type Module, variable has type "str")
test.py:4: error: Incompatible types in assignment (expression has type Module, variable has type "str")

But it should, in my opinion at least fail to import module a or module b depending of MYPYPATH directory order.

m-novikov · 2017-11-26T09:50:04Z

Requirements for module discovery is following:

merge stubs with existing modules.
merge namespaces from different directories
not merge modules located in different directories.

m-novikov · 2017-11-26T19:28:14Z

@ethanhs could you help me with this error

import typing as t

b = ['a', 'b', 'c']  # type: t.List[str]
a = tuple(p for p in b)  # type: t.Tuple[str]

fails with:

error: Incompatible types in assignment (expression has type "Tuple[str, ...]", variable has type "Tuple[str]")

I don't understand

ilevkivskyi · 2017-11-26T20:01:33Z

@m-novikov Tuple[str] is a tuple with exactly one element of type str, Tuple[str, ...] is a tuple where all elements have type str (without length specified).

emmatyping · 2017-11-26T20:02:55Z

@m-novikov also, you may be interested in reading the docs here: https://docs.python.org/3/library/typing.html#typing.Tuple for more. The best way to fix that would be to change the type comment to be variable length as well (Tuple[str, ...]). Depending on how it is used later you may want to cast it.

ilevkivskyi · 2017-11-26T20:03:29Z

@m-novikov Also why do you need tuple(p for p in b) and not just tuple(b)?

m-novikov · 2017-11-26T20:24:04Z

Also why do you need tuple(p for p in b) and not just tuple(b)?
@ilevkivskyi it was the minimal sample that I came up with. I should have read more docs

m-novikov · 2017-11-26T20:28:22Z

Should I remove find_module_clear_caches function?

m-novikov · 2017-11-27T07:45:11Z

@ethanhs could you review?
It was a bit more changes that I have anticipated.
But the previous version of find_module was behaving differently from python interpreter.
I am still not sure what needs to be done with find_module_*_cache. Probably all of this should be pulled to ModuleDiscovery and function find_module_clear_caches removed. As it used only in tests and ModuleDiscovery already has an empty cache on creation.

m-novikov · 2017-11-27T20:49:10Z

@carljm could you try running it on your projects using namespaces to see if any problems arise?
it could be run with command line flag:

mypy --namespace-packages

Thanks.

emmatyping · 2017-11-27T21:10:30Z

I'm afraid I won't have time to review this now. I will dismiss my review instead.

Review issues resolved.

m-novikov · 2017-11-27T21:19:12Z

@ilevkivskyi can you review this pull request?

carljm · 2017-11-29T01:32:14Z

@m-novikov Yes, I will give it a try tomorrow and report back.

ilevkivskyi · 2017-11-29T20:02:19Z

@m-novikov I can take a quick look later. I will wait for @carljm review.

carljm · 2017-11-29T22:16:42Z

First issue I found testing this on our codebases: it doesn't prioritize normal packages or stubs over potential namespace packages (as Python does).

One way to reproduce this: create a file with import locale; locale.setlocale(locale.LC_ALL, '') and create an empty locale/ directory next to it. With --namespace-packages option this will generate a pair of errors, because mypy is using the empty locale/ directory as an empty package, instead of the typeshed stub for the locale module.

Another repro: create path1/foo/ as an empty directory and path2/foo/__init__.py with BAR = 1, and test.py with import foo; foo.BAR. Run MYPYPATH=path1:path2 mypy --namespace-packages test.py and you'll get Module has no attribute BAR. This is wrong; mypy should prefer path2/foo/__init__.py (even though it comes later on the path) and ignore path1/foo/.

(Haven't finished analyzing other new issues for possible problems; will post separately if I find others, and make a note of when I'm done.)

carljm · 2017-11-30T00:19:52Z

The new commit fixes the priority problem I previously identified. I've gone through all the new errors generated by this branch (with the new --namespace-packages option) on our code and confirmed that the rest are all due to code in namespace packages that was previously not found and is now found. So the branch appears to be functionally ready (at least as far as our codebase is concerned).

ilevkivskyi · 2017-12-01T10:20:48Z

@carljm I will take a look at this when you will an approving review.

carljm

Great start here, but I think there are some issues to address before it's ready.

carljm · 2017-12-01T19:12:04Z

mypy/test/testmodulediscovery.py

+from typing import List, Set
+
+from mypy.build import ModuleDiscovery, find_module_clear_caches
+from mypy.myunit import Suite, assert_equal


I had the vague impression that myunit was deprecated(ish?) and new tests should be written as normal Python unittest/pytest tests? But this could be totally wrong, I'm not sure.

@ilevkivskyi can you check me here?

Myunit usage is discouraged and we are trying to move to pytest to run our test suite.

Yes, myunit is being gradually replaced by pytest.

carljm · 2017-12-01T19:26:23Z

mypy/test/testmodulediscovery.py

+    def _is_dir(self, path: str) -> bool:
+        for item in self.files:
+            if not item.endswith('/'):
+                item += '/'


This doesn't seem right. It makes sense to append / to path before the startswith check, but I don't see why you'd append it to every file path in the filesystem. Wouldn't that mean that _is_dir would return True for any existing path, whether file or directory?

carljm · 2017-12-01T19:32:11Z

mypy/test/testmodulediscovery.py

+
+    def _is_dir(self, path: str) -> bool:
+        for item in self.files:
+            if not item.endswith('/'):


should these two lines use os.path.sep instead?

carljm · 2017-12-01T19:53:04Z

mypy/test/testmodulediscovery.py

+        self._isfile_patcher.stop()
+        self._isdir_patcher.stop()
+
+    def test_module_vs_package(self) -> None:


There's probably quite a lot of existing and not-directly-tested find-module behavior that we could potentially add tests for here; that's not directly related to this diff but does increase confidence that the changes haven't broken existing behavior. I'm wondering about adding tests for prioritization of module vs package vs stub when they are all found together in the same directory on lib_path.

carljm · 2017-12-01T19:55:34Z

mypy/test/testmodulediscovery.py

+            os.path.join('dir1', 'mod', 'a.py'),
+            os.path.join('dir2', 'mod', 'b.py'),
+        }
+        m = ModuleDiscovery(['dir1', 'dir2'], namespaces_allowed=True)


how about testing the same scenario but with namespaces_allowed=False?

carljm · 2017-12-02T03:36:29Z

mypy/build.py

+
+        return True
+
+    def _verify_namespace(self, path: str) -> bool:


We don't really need this verification here either, because _collect_paths will only try to create a namespace package if it's already exhausted the possibility of a normal package.

In general I think most of the difficulty in understanding this code comes from splitting (or duplicating) this kind of logic between ImportContext and ModuleDiscovery._collect_paths; I recommend consolidating the logic in just one place.

Indeed it would greatly simplify the case if we didn't need to keep track of multiple stubs and implementations. As I wrote earlier, it would break existing behaviors.
Following directory structure would work on master:

dir1/pkg/__init__.py dir1/pkg/a.py dir2/pkg/__init__.py dir2/pkg/b.py dir3/pkg/__init__.pyi dir3/pkg/c.pyi

And this code will not raise any errors:

from pkg import a, b, c

Obviously, multiple implementations are incorrect. But in the case of stubs, I am not sure.

ImportContext is state machine in this case when it reaches final state it discards input.
And collect path provides it with the certain order of input.

Yes, I see the reasoning. It may have helped my understanding on first reading to have clarified in comment or docstring the critical dependency on paths being provided in the proper order of priority.

Regardless I think my initial comment here is still correct?

We don't really need this verification here either, because _collect_paths will only try to create a namespace package if it's already exhausted the possibility of a normal package.

carljm · 2017-12-02T03:37:15Z

mypy/build.py

+        self.type = type
+        self.paths.append(path)
+
+    def _verify_module(self, path: str) -> bool:


We don't need this verification at all, _collect_paths won't ever try to add a path that it hasn't already verified exists.

We don't need this verification at all, _collect_paths won't ever try to add a path that it hasn't already verified exists.

That is incorrect we trying to add dynamically computed path and we don't know which of them exists at this moment yet. We are just trying all combinations in the specific order and hope that one of them fits.

carljm · 2017-12-02T03:40:56Z

mypy/build.py

+
+        srcs = []  # type: List[BuildSource]
+        for path in module_paths:
+            if is_module_path(path) or is_pkg_path(path):


again here (and below) there's a lot of re-checking of the path type that is redundant; _find_module already had this information and threw it away. Can't we have _find_module throw away less info, return an object instead of just paths, and avoid unnecessary string path checking?

carljm · 2017-12-02T03:46:49Z

mypy/build.py


+class ImportContext:


Why is this called ImportContext instead of just e.g. Module?

Also, I wonder if we really need both this and BuildSource as separate classes? There seems to be a lot of overlap, maybe just add a type (and the ability to hold multiple paths for namespace packages) to BuildSource and unify them?

carljm · 2017-12-02T03:56:04Z

mypy/build.py

+            if is_pkg_path(path):
+                path = dirname(path)
+                for submodule in self._find_submodules(module, path):
+                    srcs += self._find_modules_recursive(submodule)


by going all the way back to _find_module (which starts at the very beginning with all of lib_path) on every recursive call here, we are going to do a lot of repeated unnecessary work. It's worse with namespace packages than with the previous code. Imagine a namespace package that exists in three different paths, dir1/foo, dir2/foo, and dir3/foo. Consider that each of those has a different sub-package, dir1/foo/a/, dir2/foo/b, and dir3/foo/c. When we find dir1/foo/a, we throw away the context we already have (that we are within dir1/foo/a) and just call _find_module('foo.a'), which will end up unnecessarily looking at every single lib_path entry again to see if it has a foo/a/ within it.

It may be that caching can reduce the impact of this multiplied repeated work, but not fully. And I think we can easily do better in the algorithm. _collect_paths already has the right logic to look for the final component of a submodule within a given limited set of parent paths. Can't we use _collect_paths here instead of recursive calls to be more efficient and only look within the relevant parent?

emmatyping · 2017-12-10T23:54:40Z

@m-novikov I'm getting back to work on my PR #4278 on Wednesday. If you find time to respond to Carl's review and fix the merge conflict between now and then, that would be great. If you don't have time, that is entirely fine also, I will probably just work on my PR without namespace package support, and you may have to rebase on that. Thank you for your work on this so far!

carljm · 2017-12-18T17:49:09Z

Hi @m-novikov, any progress last week? I'm interested in seeing this land; if you won't have time for it soon I would be willing to pick it up and try to push it forward.

gvanrossum · 2018-02-21T22:19:28Z

This is sadly getting stale. @m-novikov do you want to continue to work on this? Then please resolve the conflicts. Alternatively, @carljm you can clone this PR and finish it if you want -- IIUC it's mostly ready to land? Maybe we can make the upcoming #4606 release (or else the one after that -- we plan to resume the ~3-week cadence we had towards the end of last year).

carljm · 2018-02-21T22:26:00Z

@gvanrossum Sure, I'd be happy to pick this up and try to push it through to readiness, if @m-novikov doesn't plan to finish it.

There is one question about desired/intended behavior that came up that would be helpful to get feedback from you or other core devs. It's buried in a line-comment thread on an outdated revision, but should still be linkable: #4277 (comment)

(If you can't get to the question from that link, lmk and I'll copy it again here.)

carljm · 2018-02-21T22:27:51Z

Hmm, that link doesn't seem to work if the thread isn't already expanded. Here's the issue:

Do we need to maintain the (I think accidentally implemented, but possibly useful) behavior that stub packages (with __init__.pyi) already in practice behave like a namespace package? That is, if you have this structure:

.
├── src
│   └── pkg
│       ├── a.py
│       ├── b.py
│       └── __init__.py
└── stubs
    └── pkg
        ├── b.pyi
        └── __init__.pyi

with both stubs and src on MYPYPATH (in that order), mypy will still find and use src/pkg/a.py, even though that violates Python import's usual handling of non-namespace packages?

My feeling is that people may likely be relying on this (I think we currently are at Instagram, actually), but I don't want to put a lot of work into supporting it in this PR without confirmation that it should be formally supported, rather than fixed as a bug.

Currently the same is also true even if the stubs dir contains all .py files instead of .pyi; IOW mypy currently effectively treats all packages like namespace packages (but still requires the __init__.py). This part I'm confident is just a bug that should be fixed.

gvanrossum · 2018-02-21T22:35:06Z

IIRC I used to be convinced that mypy didn't have that feature until I tried it. :-) Dropbox isn't depending on it and if Instagram can be made to live without it I think it's better not to preserve that behavior. I also think the mypy docs probably deny its existence, and PEP 484 may do the same.

carljm · 2018-02-21T22:36:37Z

Yes, we can live without it. I agree that it's better for mypy imports to consistently behave like Python imports.

carljm · 2018-02-28T05:36:36Z

This has a fair amount of overlap/conflict with #4403 and #4623, so given that those are up to date and active, I'm holding off on bringing this up to date until they are merged.

carljm · 2018-03-20T17:34:39Z

Now just waiting on #4693 to go in before I bring this up to date.

emmatyping · 2018-04-17T22:47:19Z

@carljm now that PEP 561 is in, are you interested in picking this up?

carljm · 2018-04-18T00:36:35Z

@ethanhs Yes, I'm interested and aware that the conflicting PRs have landed, just haven't had time yet. Not sure if I will before PyCon sprints.

emmatyping · 2018-04-18T01:02:44Z

ethanhs Yes, I'm interested and aware that the conflicting PRs have landed, just haven't had time yet. Not sure if I will before PyCon sprints.

Okay, no worries! I look forward to see you at the sprints.

gvanrossum · 2018-08-01T21:49:16Z

So @carljm are you still interested in this? I could understand if not (maybe pyre does what you want). In that case I propose to close this -- it's probably too far out of date to be of much help to anyone else who wants to implement this feature.

carljm · 2018-08-01T22:58:02Z

@gvanrossum For a while I was hoping to still follow through on this, just because I'd said I would, but I think I have to face reality that it's not going to make it to the top of my priority list now. Sorry for not clarifying that sooner.

I agree re closing this specific PR; I was planning to reuse the tests as a starting point, but expected to have to start over with the implementation.

gvanrossum · 2018-08-01T23:24:35Z

I'm a little sad, but understand.

m-novikov · 2018-08-02T08:08:34Z

Actually now I have time and willing to continue work on it, provided I have some consultation regarding edge cases.

ilevkivskyi · 2018-08-02T10:44:10Z

@m-novikov I can't guarantee that I will be able to help you with all questions, but I think we have enough people following here who will be able to help. So please go ahead!

gvanrossum · 2018-08-02T14:51:53Z

It's probably best to open a new PR then, once you're ready.

m-novikov force-pushed the namespaces branch 2 times, most recently from 79315e2 to 6eac2f7 Compare November 23, 2017 07:21

emmatyping previously requested changes Nov 23, 2017

View reviewed changes

emmatyping mentioned this pull request Nov 24, 2017

[WIP] Begin implementing PEP 561 checking #4278

Closed

5 tasks

carljm reviewed Nov 26, 2017

View reviewed changes

m-novikov force-pushed the namespaces branch from ef9d497 to 6b6ce1c Compare November 27, 2017 20:42

m-novikov changed the title ~~[WIP] Namespace implementation~~ [RDY] Namespace implementation Nov 27, 2017

m-novikov force-pushed the namespaces branch from a5726bf to 28c6ca7 Compare November 28, 2017 11:50

m-novikov force-pushed the namespaces branch 2 times, most recently from 5bb5579 to 7bc38ee Compare November 30, 2017 15:27

carljm suggested changes Dec 2, 2017

View reviewed changes

m-novikov added 6 commits December 13, 2017 09:03

Namespace implementation (python#1645)

0737912

Added clarifying comments

5c77c2d

Prefer modules and packages to namespaces regardless of path order

c84a63c

Fix lib_path discovery when using -p with namespaces

ea64ec0

Review fixes

d488c2f

merger

b6b0e9a

m-novikov force-pushed the namespaces branch from 7bc38ee to 2d6077e Compare December 14, 2017 08:57

carljm mentioned this pull request Dec 14, 2017

Proposed refactoring of mypy/build.py (and cache metadata) #4365

Closed

m-novikov force-pushed the namespaces branch from 2d6077e to b6b0e9a Compare December 18, 2017 20:16

m-novikov changed the title ~~[RDY] Namespace implementation~~ [WIP] Namespace implementation Dec 18, 2017

emmatyping mentioned this pull request Feb 16, 2018

Implement PEP 561 searching #4403

Closed

8 tasks

ilevkivskyi mentioned this pull request Apr 3, 2018

Release 0.590 planning #4839

Closed

gvanrossum closed this Aug 1, 2018


		def find() -> Optional[str]:
		def find(self, id: str) -> Optional[str]:

[WIP] Namespace implementation #4277

[WIP] Namespace implementation #4277

Conversation

m-novikov commented Nov 22, 2017 • edited Loading

emmatyping left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

carljm left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

m-novikov commented Nov 26, 2017 • edited Loading

m-novikov commented Nov 26, 2017 • edited Loading

m-novikov commented Nov 26, 2017

ilevkivskyi commented Nov 26, 2017

emmatyping commented Nov 26, 2017

ilevkivskyi commented Nov 26, 2017

m-novikov commented Nov 26, 2017

m-novikov commented Nov 26, 2017

m-novikov commented Nov 27, 2017

m-novikov commented Nov 27, 2017

emmatyping commented Nov 27, 2017

m-novikov commented Nov 27, 2017

carljm commented Nov 29, 2017

ilevkivskyi commented Nov 29, 2017

carljm commented Nov 29, 2017

carljm commented Nov 30, 2017 • edited Loading

ilevkivskyi commented Dec 1, 2017

carljm left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

emmatyping commented Dec 10, 2017

carljm commented Dec 18, 2017

gvanrossum commented Feb 21, 2018

carljm commented Feb 21, 2018

carljm commented Feb 21, 2018 • edited Loading

gvanrossum commented Feb 21, 2018 via email

carljm commented Feb 21, 2018

carljm commented Feb 28, 2018

carljm commented Mar 20, 2018

emmatyping commented Apr 17, 2018

carljm commented Apr 18, 2018

emmatyping commented Apr 18, 2018

gvanrossum commented Aug 1, 2018

carljm commented Aug 1, 2018

gvanrossum commented Aug 1, 2018 via email

m-novikov commented Aug 2, 2018

ilevkivskyi commented Aug 2, 2018

gvanrossum commented Aug 2, 2018

m-novikov commented Nov 22, 2017 •

edited

Loading

m-novikov commented Nov 26, 2017 •

edited

Loading

m-novikov commented Nov 26, 2017 •

edited

Loading

carljm commented Nov 30, 2017 •

edited

Loading

carljm commented Feb 21, 2018 •

edited

Loading