-
-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Namespace implementation #4277
Conversation
79315e2
to
6eac2f7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems you are failing the correct test :^)
In general, I think I'd prefer not have the if self.namespace
s sprinkled around the code.
Do you have particular questions about process_stale_scc
?
mypy/build.py
Outdated
@@ -859,8 +859,12 @@ def find_modules_recursive(module: str, lib_path: List[str]) -> List[BuildSource | |||
|
|||
def verify_module(id: str, path: str) -> bool: | |||
"""Check that all packages containing id have a __init__ file.""" | |||
if path.endswith(tuple(PYTHON_EXTENSIONS)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you should probably rewrite the rest of this function, and update the docstring, as this change makes the docstring incorrect, and much of the rest of the function duplicative (eg path.endswith(('__init__.py', '__init__.pyi'))
will never be reached...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, current purpose of this code was to run it on existing codebase with namespace packages to gather some more information about corner cases.
Such functions as verify_module, find_module, find_module_recursive should change their behavior based on option namespace_packages.
mypy/build.py
Outdated
self.tree = manager.parse_file(self.id, self.xpath, source, | ||
self.ignore_all or self.options.ignore_errors) | ||
|
||
if not self.namespace: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A small nit:
if <condition>:
action
else:
other action
rather than
if not <condition>:
other action
else:
action
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this!
mypy/build.py
Outdated
@@ -539,6 +544,7 @@ def __init__(self, data_dir: str, | |||
self.version_id = version_id | |||
self.modules = {} # type: Dict[str, MypyFile] | |||
self.missing_modules = set() # type: Set[str] | |||
self.module_discovery = module_discovery | |||
self.plugin = plugin | |||
self.semantic_analyzer = SemanticAnalyzerPass2(self.modules, self.missing_modules, | |||
lib_path, self.errors, self.plugin) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure why BuildManager should get lib_path
as a separate argument at all anymore; that should be encapsulated in ModuleDiscovery
. Passing in both is just an opportunity for them to be out of sync, or for some code to use lib_path
improperly, bypassing the ModuleDiscovery
logic.
The only thing BuildManager
seems to use lib_path
for now is to pass it along to SemanaticAnalyzerPass2
, but the latter doesn't seem to do anything with it either that I can find; it just stashes it on self
and never looks at it again. So unless I miss something, I'd eliminate it from both.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, thanks, I will do that.
mypy/build.py
Outdated
|
||
def find() -> Optional[str]: | ||
def find(self, id: str) -> Optional[str]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that this class now handles both regular modules and namespaces, but this method handles only module-finding internals, I think it'd be clearer to name it _find_module
instead of find
; find
as a public method (and first in source order) strongly implies it's the primary public API, which it's not.
mypy/build.py
Outdated
@@ -1823,13 +1879,20 @@ def parse_file(self) -> None: | |||
self.check_blockers() | |||
|
|||
def semantic_analysis(self) -> None: | |||
if self.namespace: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Scattering these all through here seems ugly. I'm not very familiar with this logic, but why isn't it possible to just treat a namespace as an empty module and not need to skip all these methods?
test-data/unit/check-namespaces.test
Outdated
mypy_path = ./tmp/dir:./tmp/other_dir | ||
[file dir/mod_or_ns.py] | ||
a = None | ||
[file other_dir/m/b.py] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't m
here be mod_or_ns
instead, for this test to be useful?
So it seems there a lot of dimensions to this problem.
Script: from m import a, b
dir1 = a # type: str
dir2 = b # type: str Produces following errors:
But it should, in my opinion at least fail to import module a or module b depending of MYPYPATH directory order. |
Requirements for module discovery is following:
|
@ethanhs could you help me with this error import typing as t
b = ['a', 'b', 'c'] # type: t.List[str]
a = tuple(p for p in b) # type: t.Tuple[str] fails with:
I don't understand |
@m-novikov |
@m-novikov also, you may be interested in reading the docs here: https://docs.python.org/3/library/typing.html#typing.Tuple for more. The best way to fix that would be to change the type comment to be variable length as well ( |
@m-novikov Also why do you need |
|
Should I remove find_module_clear_caches function? |
@ethanhs could you review? |
ef9d497
to
6b6ce1c
Compare
@carljm could you try running it on your projects using namespaces to see if any problems arise?
Thanks. |
I'm afraid I won't have time to review this now. I will dismiss my review instead. |
@ilevkivskyi can you review this pull request? |
a5726bf
to
28c6ca7
Compare
@m-novikov Yes, I will give it a try tomorrow and report back. |
@m-novikov I can take a quick look later. I will wait for @carljm review. |
First issue I found testing this on our codebases: it doesn't prioritize normal packages or stubs over potential namespace packages (as Python does). One way to reproduce this: create a file with Another repro: create (Haven't finished analyzing other new issues for possible problems; will post separately if I find others, and make a note of when I'm done.) |
The new commit fixes the priority problem I previously identified. I've gone through all the new errors generated by this branch (with the new |
5bb5579
to
7bc38ee
Compare
@carljm I will take a look at this when you will an approving review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great start here, but I think there are some issues to address before it's ready.
mypy/test/testmodulediscovery.py
Outdated
from typing import List, Set | ||
|
||
from mypy.build import ModuleDiscovery, find_module_clear_caches | ||
from mypy.myunit import Suite, assert_equal |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had the vague impression that myunit was deprecated(ish?) and new tests should be written as normal Python unittest/pytest tests? But this could be totally wrong, I'm not sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ilevkivskyi can you check me here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Myunit usage is discouraged and we are trying to move to pytest to run our test suite.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, myunit is being gradually replaced by pytest.
mypy/test/testmodulediscovery.py
Outdated
def _is_dir(self, path: str) -> bool: | ||
for item in self.files: | ||
if not item.endswith('/'): | ||
item += '/' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't seem right. It makes sense to append /
to path
before the startswith check, but I don't see why you'd append it to every file path in the filesystem. Wouldn't that mean that _is_dir
would return True
for any existing path, whether file or directory?
mypy/test/testmodulediscovery.py
Outdated
|
||
def _is_dir(self, path: str) -> bool: | ||
for item in self.files: | ||
if not item.endswith('/'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should these two lines use os.path.sep
instead?
self._isfile_patcher.stop() | ||
self._isdir_patcher.stop() | ||
|
||
def test_module_vs_package(self) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's probably quite a lot of existing and not-directly-tested find-module behavior that we could potentially add tests for here; that's not directly related to this diff but does increase confidence that the changes haven't broken existing behavior. I'm wondering about adding tests for prioritization of module vs package vs stub when they are all found together in the same directory on lib_path.
os.path.join('dir1', 'mod', 'a.py'), | ||
os.path.join('dir2', 'mod', 'b.py'), | ||
} | ||
m = ModuleDiscovery(['dir1', 'dir2'], namespaces_allowed=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about testing the same scenario but with namespaces_allowed=False
?
mypy/build.py
Outdated
|
||
return True | ||
|
||
def _verify_namespace(self, path: str) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't really need this verification here either, because _collect_paths
will only try to create a namespace package if it's already exhausted the possibility of a normal package.
In general I think most of the difficulty in understanding this code comes from splitting (or duplicating) this kind of logic between ImportContext
and ModuleDiscovery._collect_paths
; I recommend consolidating the logic in just one place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed it would greatly simplify the case if we didn't need to keep track of multiple stubs and implementations. As I wrote earlier, it would break existing behaviors.
Following directory structure would work on master:
dir1/pkg/__init__.py
dir1/pkg/a.py
dir2/pkg/__init__.py
dir2/pkg/b.py
dir3/pkg/__init__.pyi
dir3/pkg/c.pyi
And this code will not raise any errors:
from pkg import a, b, c
Obviously, multiple implementations are incorrect. But in the case of stubs, I am not sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ImportContext is state machine in this case when it reaches final state it discards input.
And collect path provides it with the certain order of input.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I see the reasoning. It may have helped my understanding on first reading to have clarified in comment or docstring the critical dependency on paths being provided in the proper order of priority.
Regardless I think my initial comment here is still correct?
We don't really need this verification here either, because _collect_paths will only try to create a namespace package if it's already exhausted the possibility of a normal package.
mypy/build.py
Outdated
self.type = type | ||
self.paths.append(path) | ||
|
||
def _verify_module(self, path: str) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need this verification at all, _collect_paths
won't ever try to add a path that it hasn't already verified exists.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need this verification at all, _collect_paths won't ever try to add a path that it hasn't already verified exists.
That is incorrect we trying to add dynamically computed path and we don't know which of them exists at this moment yet. We are just trying all combinations in the specific order and hope that one of them fits.
mypy/build.py
Outdated
|
||
srcs = [] # type: List[BuildSource] | ||
for path in module_paths: | ||
if is_module_path(path) or is_pkg_path(path): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again here (and below) there's a lot of re-checking of the path type that is redundant; _find_module
already had this information and threw it away. Can't we have _find_module
throw away less info, return an object instead of just paths, and avoid unnecessary string path checking?
mypy/build.py
Outdated
|
||
class ImportContext: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this called ImportContext
instead of just e.g. Module
?
Also, I wonder if we really need both this and BuildSource
as separate classes? There seems to be a lot of overlap, maybe just add a type
(and the ability to hold multiple paths for namespace packages) to BuildSource
and unify them?
mypy/build.py
Outdated
if is_pkg_path(path): | ||
path = dirname(path) | ||
for submodule in self._find_submodules(module, path): | ||
srcs += self._find_modules_recursive(submodule) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
by going all the way back to _find_module
(which starts at the very beginning with all of lib_path
) on every recursive call here, we are going to do a lot of repeated unnecessary work. It's worse with namespace packages than with the previous code. Imagine a namespace package that exists in three different paths, dir1/foo
, dir2/foo
, and dir3/foo
. Consider that each of those has a different sub-package, dir1/foo/a/
, dir2/foo/b
, and dir3/foo/c
. When we find dir1/foo/a
, we throw away the context we already have (that we are within dir1/foo/a
) and just call _find_module('foo.a')
, which will end up unnecessarily looking at every single lib_path
entry again to see if it has a foo/a/
within it.
It may be that caching can reduce the impact of this multiplied repeated work, but not fully. And I think we can easily do better in the algorithm. _collect_paths
already has the right logic to look for the final component of a submodule within a given limited set of parent paths. Can't we use _collect_paths
here instead of recursive calls to be more efficient and only look within the relevant parent?
@m-novikov I'm getting back to work on my PR #4278 on Wednesday. If you find time to respond to Carl's review and fix the merge conflict between now and then, that would be great. If you don't have time, that is entirely fine also, I will probably just work on my PR without namespace package support, and you may have to rebase on that. Thank you for your work on this so far! |
7bc38ee
to
2d6077e
Compare
Hi @m-novikov, any progress last week? I'm interested in seeing this land; if you won't have time for it soon I would be willing to pick it up and try to push it forward. |
2d6077e
to
b6b0e9a
Compare
This is sadly getting stale. @m-novikov do you want to continue to work on this? Then please resolve the conflicts. Alternatively, @carljm you can clone this PR and finish it if you want -- IIUC it's mostly ready to land? Maybe we can make the upcoming #4606 release (or else the one after that -- we plan to resume the ~3-week cadence we had towards the end of last year). |
@gvanrossum Sure, I'd be happy to pick this up and try to push it through to readiness, if @m-novikov doesn't plan to finish it. There is one question about desired/intended behavior that came up that would be helpful to get feedback from you or other core devs. It's buried in a line-comment thread on an outdated revision, but should still be linkable: #4277 (comment) (If you can't get to the question from that link, lmk and I'll copy it again here.) |
Hmm, that link doesn't seem to work if the thread isn't already expanded. Here's the issue: Do we need to maintain the (I think accidentally implemented, but possibly useful) behavior that stub packages (with
with both My feeling is that people may likely be relying on this (I think we currently are at Instagram, actually), but I don't want to put a lot of work into supporting it in this PR without confirmation that it should be formally supported, rather than fixed as a bug. Currently the same is also true even if the |
IIRC I used to be convinced that mypy didn't have that feature until I
tried it. :-) Dropbox isn't depending on it and if Instagram can be made to
live without it I think it's better not to preserve that behavior. I also
think the mypy docs probably deny its existence, and PEP 484 may do the
same.
|
Yes, we can live without it. I agree that it's better for mypy imports to consistently behave like Python imports. |
Now just waiting on #4693 to go in before I bring this up to date. |
@carljm now that PEP 561 is in, are you interested in picking this up? |
@ethanhs Yes, I'm interested and aware that the conflicting PRs have landed, just haven't had time yet. Not sure if I will before PyCon sprints. |
Okay, no worries! I look forward to see you at the sprints. |
So @carljm are you still interested in this? I could understand if not (maybe pyre does what you want). In that case I propose to close this -- it's probably too far out of date to be of much help to anyone else who wants to implement this feature. |
@gvanrossum For a while I was hoping to still follow through on this, just because I'd said I would, but I think I have to face reality that it's not going to make it to the top of my priority list now. Sorry for not clarifying that sooner. I agree re closing this specific PR; I was planning to reuse the tests as a starting point, but expected to have to start over with the implementation. |
I'm a little sad, but understand.
|
Actually now I have time and willing to continue work on it, provided I have some consultation regarding edge cases. |
@m-novikov I can't guarantee that I will be able to help you with all questions, but I think we have enough people following here who will be able to help. So please go ahead! |
It's probably best to open a new PR then, once you're ready. |
Namespace implementation
My use case pretty much described here #1645 (comment)
What behavior should be if we run
mypy -p namespace_pkg
Ref: #2773 #1645