Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Namespace implementation #4277

Closed
wants to merge 6 commits into from
Closed

Conversation

m-novikov
Copy link

@m-novikov m-novikov commented Nov 22, 2017

Namespace implementation

My use case pretty much described here #1645 (comment)
What behavior should be if we run mypy -p namespace_pkg
Ref: #2773 #1645

@m-novikov m-novikov force-pushed the namespaces branch 2 times, most recently from 79315e2 to 6eac2f7 Compare November 23, 2017 07:21
Copy link
Collaborator

@emmatyping emmatyping left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems you are failing the correct test :^)
In general, I think I'd prefer not have the if self.namespaces sprinkled around the code.
Do you have particular questions about process_stale_scc?

mypy/build.py Outdated
@@ -859,8 +859,12 @@ def find_modules_recursive(module: str, lib_path: List[str]) -> List[BuildSource

def verify_module(id: str, path: str) -> bool:
"""Check that all packages containing id have a __init__ file."""
if path.endswith(tuple(PYTHON_EXTENSIONS)):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should probably rewrite the rest of this function, and update the docstring, as this change makes the docstring incorrect, and much of the rest of the function duplicative (eg path.endswith(('__init__.py', '__init__.pyi')) will never be reached...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, current purpose of this code was to run it on existing codebase with namespace packages to gather some more information about corner cases.
Such functions as verify_module, find_module, find_module_recursive should change their behavior based on option namespace_packages.

mypy/build.py Outdated
self.tree = manager.parse_file(self.id, self.xpath, source,
self.ignore_all or self.options.ignore_errors)

if not self.namespace:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A small nit:

if <condition>:
   action
else:
   other action

rather than

if not <condition>:
    other action
else:
    action

Copy link
Member

@carljm carljm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this!

mypy/build.py Outdated
@@ -539,6 +544,7 @@ def __init__(self, data_dir: str,
self.version_id = version_id
self.modules = {} # type: Dict[str, MypyFile]
self.missing_modules = set() # type: Set[str]
self.module_discovery = module_discovery
self.plugin = plugin
self.semantic_analyzer = SemanticAnalyzerPass2(self.modules, self.missing_modules,
lib_path, self.errors, self.plugin)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why BuildManager should get lib_path as a separate argument at all anymore; that should be encapsulated in ModuleDiscovery. Passing in both is just an opportunity for them to be out of sync, or for some code to use lib_path improperly, bypassing the ModuleDiscovery logic.

The only thing BuildManager seems to use lib_path for now is to pass it along to SemanaticAnalyzerPass2, but the latter doesn't seem to do anything with it either that I can find; it just stashes it on self and never looks at it again. So unless I miss something, I'd eliminate it from both.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, thanks, I will do that.

mypy/build.py Outdated

def find() -> Optional[str]:
def find(self, id: str) -> Optional[str]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that this class now handles both regular modules and namespaces, but this method handles only module-finding internals, I think it'd be clearer to name it _find_module instead of find; find as a public method (and first in source order) strongly implies it's the primary public API, which it's not.

mypy/build.py Outdated
@@ -1823,13 +1879,20 @@ def parse_file(self) -> None:
self.check_blockers()

def semantic_analysis(self) -> None:
if self.namespace:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scattering these all through here seems ugly. I'm not very familiar with this logic, but why isn't it possible to just treat a namespace as an empty module and not need to skip all these methods?

mypy_path = ./tmp/dir:./tmp/other_dir
[file dir/mod_or_ns.py]
a = None
[file other_dir/m/b.py]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't m here be mod_or_ns instead, for this test to be useful?

@m-novikov
Copy link
Author

m-novikov commented Nov 26, 2017

So it seems there a lot of dimensions to this problem.
Right now mypy will discover the same module located in separate directories.
Given following directory structure:

dir1/mod1/__init__.py
dir2/mod1/a.py
dir2/mod1/__init__.py
dir2/mod1/b.py

Script:

from m import a, b
dir1 = a  # type: str
dir2 = b  # type: str

Produces following errors:

test.py:3: error: Incompatible types in assignment (expression has type Module, variable has type "str")
test.py:4: error: Incompatible types in assignment (expression has type Module, variable has type "str")

But it should, in my opinion at least fail to import module a or module b depending of MYPYPATH directory order.

@m-novikov
Copy link
Author

m-novikov commented Nov 26, 2017

Requirements for module discovery is following:

  • merge stubs with existing modules.
  • merge namespaces from different directories
  • not merge modules located in different directories.

@m-novikov
Copy link
Author

@ethanhs could you help me with this error

import typing as t

b = ['a', 'b', 'c']  # type: t.List[str]
a = tuple(p for p in b)  # type: t.Tuple[str]

fails with:

error: Incompatible types in assignment (expression has type "Tuple[str, ...]", variable has type "Tuple[str]")

I don't understand

@ilevkivskyi
Copy link
Member

@m-novikov Tuple[str] is a tuple with exactly one element of type str, Tuple[str, ...] is a tuple where all elements have type str (without length specified).

@emmatyping
Copy link
Collaborator

@m-novikov also, you may be interested in reading the docs here: https://docs.python.org/3/library/typing.html#typing.Tuple for more. The best way to fix that would be to change the type comment to be variable length as well (Tuple[str, ...]). Depending on how it is used later you may want to cast it.

@ilevkivskyi
Copy link
Member

@m-novikov Also why do you need tuple(p for p in b) and not just tuple(b)?

@m-novikov
Copy link
Author

Also why do you need tuple(p for p in b) and not just tuple(b)?
@ilevkivskyi it was the minimal sample that I came up with. I should have read more docs

@m-novikov
Copy link
Author

Should I remove find_module_clear_caches function?

@m-novikov
Copy link
Author

@ethanhs could you review?
It was a bit more changes that I have anticipated.
But the previous version of find_module was behaving differently from python interpreter.
I am still not sure what needs to be done with find_module_*_cache. Probably all of this should be pulled to ModuleDiscovery and function find_module_clear_caches removed. As it used only in tests and ModuleDiscovery already has an empty cache on creation.

@m-novikov m-novikov changed the title [WIP] Namespace implementation [RDY] Namespace implementation Nov 27, 2017
@m-novikov
Copy link
Author

@carljm could you try running it on your projects using namespaces to see if any problems arise?
it could be run with command line flag:

mypy --namespace-packages

Thanks.

@emmatyping
Copy link
Collaborator

I'm afraid I won't have time to review this now. I will dismiss my review instead.

@emmatyping emmatyping dismissed their stale review November 27, 2017 21:11

Review issues resolved.

@m-novikov
Copy link
Author

@ilevkivskyi can you review this pull request?

@carljm
Copy link
Member

carljm commented Nov 29, 2017

@m-novikov Yes, I will give it a try tomorrow and report back.

@ilevkivskyi
Copy link
Member

@m-novikov I can take a quick look later. I will wait for @carljm review.

@carljm
Copy link
Member

carljm commented Nov 29, 2017

First issue I found testing this on our codebases: it doesn't prioritize normal packages or stubs over potential namespace packages (as Python does).

One way to reproduce this: create a file with import locale; locale.setlocale(locale.LC_ALL, '') and create an empty locale/ directory next to it. With --namespace-packages option this will generate a pair of errors, because mypy is using the empty locale/ directory as an empty package, instead of the typeshed stub for the locale module.

Another repro: create path1/foo/ as an empty directory and path2/foo/__init__.py with BAR = 1, and test.py with import foo; foo.BAR. Run MYPYPATH=path1:path2 mypy --namespace-packages test.py and you'll get Module has no attribute BAR. This is wrong; mypy should prefer path2/foo/__init__.py (even though it comes later on the path) and ignore path1/foo/.

(Haven't finished analyzing other new issues for possible problems; will post separately if I find others, and make a note of when I'm done.)

@carljm
Copy link
Member

carljm commented Nov 30, 2017

The new commit fixes the priority problem I previously identified. I've gone through all the new errors generated by this branch (with the new --namespace-packages option) on our code and confirmed that the rest are all due to code in namespace packages that was previously not found and is now found. So the branch appears to be functionally ready (at least as far as our codebase is concerned).

@m-novikov m-novikov force-pushed the namespaces branch 2 times, most recently from 5bb5579 to 7bc38ee Compare November 30, 2017 15:27
@ilevkivskyi
Copy link
Member

@carljm I will take a look at this when you will an approving review.

Copy link
Member

@carljm carljm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great start here, but I think there are some issues to address before it's ready.

from typing import List, Set

from mypy.build import ModuleDiscovery, find_module_clear_caches
from mypy.myunit import Suite, assert_equal
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had the vague impression that myunit was deprecated(ish?) and new tests should be written as normal Python unittest/pytest tests? But this could be totally wrong, I'm not sure.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ilevkivskyi can you check me here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Myunit usage is discouraged and we are trying to move to pytest to run our test suite.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, myunit is being gradually replaced by pytest.

def _is_dir(self, path: str) -> bool:
for item in self.files:
if not item.endswith('/'):
item += '/'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem right. It makes sense to append / to path before the startswith check, but I don't see why you'd append it to every file path in the filesystem. Wouldn't that mean that _is_dir would return True for any existing path, whether file or directory?


def _is_dir(self, path: str) -> bool:
for item in self.files:
if not item.endswith('/'):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should these two lines use os.path.sep instead?

self._isfile_patcher.stop()
self._isdir_patcher.stop()

def test_module_vs_package(self) -> None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's probably quite a lot of existing and not-directly-tested find-module behavior that we could potentially add tests for here; that's not directly related to this diff but does increase confidence that the changes haven't broken existing behavior. I'm wondering about adding tests for prioritization of module vs package vs stub when they are all found together in the same directory on lib_path.

os.path.join('dir1', 'mod', 'a.py'),
os.path.join('dir2', 'mod', 'b.py'),
}
m = ModuleDiscovery(['dir1', 'dir2'], namespaces_allowed=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about testing the same scenario but with namespaces_allowed=False?

mypy/build.py Outdated

return True

def _verify_namespace(self, path: str) -> bool:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't really need this verification here either, because _collect_paths will only try to create a namespace package if it's already exhausted the possibility of a normal package.

In general I think most of the difficulty in understanding this code comes from splitting (or duplicating) this kind of logic between ImportContext and ModuleDiscovery._collect_paths; I recommend consolidating the logic in just one place.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed it would greatly simplify the case if we didn't need to keep track of multiple stubs and implementations. As I wrote earlier, it would break existing behaviors.
Following directory structure would work on master:

dir1/pkg/__init__.py
dir1/pkg/a.py
dir2/pkg/__init__.py
dir2/pkg/b.py
dir3/pkg/__init__.pyi
dir3/pkg/c.pyi

And this code will not raise any errors:

from pkg import a, b, c

Obviously, multiple implementations are incorrect. But in the case of stubs, I am not sure.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ImportContext is state machine in this case when it reaches final state it discards input.
And collect path provides it with the certain order of input.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I see the reasoning. It may have helped my understanding on first reading to have clarified in comment or docstring the critical dependency on paths being provided in the proper order of priority.

Regardless I think my initial comment here is still correct?

We don't really need this verification here either, because _collect_paths will only try to create a namespace package if it's already exhausted the possibility of a normal package.

mypy/build.py Outdated
self.type = type
self.paths.append(path)

def _verify_module(self, path: str) -> bool:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need this verification at all, _collect_paths won't ever try to add a path that it hasn't already verified exists.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need this verification at all, _collect_paths won't ever try to add a path that it hasn't already verified exists.

That is incorrect we trying to add dynamically computed path and we don't know which of them exists at this moment yet. We are just trying all combinations in the specific order and hope that one of them fits.

mypy/build.py Outdated

srcs = [] # type: List[BuildSource]
for path in module_paths:
if is_module_path(path) or is_pkg_path(path):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again here (and below) there's a lot of re-checking of the path type that is redundant; _find_module already had this information and threw it away. Can't we have _find_module throw away less info, return an object instead of just paths, and avoid unnecessary string path checking?

mypy/build.py Outdated

class ImportContext:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this called ImportContext instead of just e.g. Module?

Also, I wonder if we really need both this and BuildSource as separate classes? There seems to be a lot of overlap, maybe just add a type (and the ability to hold multiple paths for namespace packages) to BuildSource and unify them?

mypy/build.py Outdated
if is_pkg_path(path):
path = dirname(path)
for submodule in self._find_submodules(module, path):
srcs += self._find_modules_recursive(submodule)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by going all the way back to _find_module (which starts at the very beginning with all of lib_path) on every recursive call here, we are going to do a lot of repeated unnecessary work. It's worse with namespace packages than with the previous code. Imagine a namespace package that exists in three different paths, dir1/foo, dir2/foo, and dir3/foo. Consider that each of those has a different sub-package, dir1/foo/a/, dir2/foo/b, and dir3/foo/c. When we find dir1/foo/a, we throw away the context we already have (that we are within dir1/foo/a) and just call _find_module('foo.a'), which will end up unnecessarily looking at every single lib_path entry again to see if it has a foo/a/ within it.

It may be that caching can reduce the impact of this multiplied repeated work, but not fully. And I think we can easily do better in the algorithm. _collect_paths already has the right logic to look for the final component of a submodule within a given limited set of parent paths. Can't we use _collect_paths here instead of recursive calls to be more efficient and only look within the relevant parent?

@emmatyping
Copy link
Collaborator

@m-novikov I'm getting back to work on my PR #4278 on Wednesday. If you find time to respond to Carl's review and fix the merge conflict between now and then, that would be great. If you don't have time, that is entirely fine also, I will probably just work on my PR without namespace package support, and you may have to rebase on that. Thank you for your work on this so far!

@carljm
Copy link
Member

carljm commented Dec 18, 2017

Hi @m-novikov, any progress last week? I'm interested in seeing this land; if you won't have time for it soon I would be willing to pick it up and try to push it forward.

@m-novikov m-novikov changed the title [RDY] Namespace implementation [WIP] Namespace implementation Dec 18, 2017
@emmatyping emmatyping mentioned this pull request Feb 16, 2018
8 tasks
@gvanrossum
Copy link
Member

This is sadly getting stale. @m-novikov do you want to continue to work on this? Then please resolve the conflicts. Alternatively, @carljm you can clone this PR and finish it if you want -- IIUC it's mostly ready to land? Maybe we can make the upcoming #4606 release (or else the one after that -- we plan to resume the ~3-week cadence we had towards the end of last year).

@carljm
Copy link
Member

carljm commented Feb 21, 2018

@gvanrossum Sure, I'd be happy to pick this up and try to push it through to readiness, if @m-novikov doesn't plan to finish it.

There is one question about desired/intended behavior that came up that would be helpful to get feedback from you or other core devs. It's buried in a line-comment thread on an outdated revision, but should still be linkable: #4277 (comment)

(If you can't get to the question from that link, lmk and I'll copy it again here.)

@carljm
Copy link
Member

carljm commented Feb 21, 2018

Hmm, that link doesn't seem to work if the thread isn't already expanded. Here's the issue:

Do we need to maintain the (I think accidentally implemented, but possibly useful) behavior that stub packages (with __init__.pyi) already in practice behave like a namespace package? That is, if you have this structure:

.
├── src
│   └── pkg
│       ├── a.py
│       ├── b.py
│       └── __init__.py
└── stubs
    └── pkg
        ├── b.pyi
        └── __init__.pyi

with both stubs and src on MYPYPATH (in that order), mypy will still find and use src/pkg/a.py, even though that violates Python import's usual handling of non-namespace packages?

My feeling is that people may likely be relying on this (I think we currently are at Instagram, actually), but I don't want to put a lot of work into supporting it in this PR without confirmation that it should be formally supported, rather than fixed as a bug.

Currently the same is also true even if the stubs dir contains all .py files instead of .pyi; IOW mypy currently effectively treats all packages like namespace packages (but still requires the __init__.py). This part I'm confident is just a bug that should be fixed.

@gvanrossum
Copy link
Member

gvanrossum commented Feb 21, 2018 via email

@carljm
Copy link
Member

carljm commented Feb 21, 2018

Yes, we can live without it. I agree that it's better for mypy imports to consistently behave like Python imports.

@carljm
Copy link
Member

carljm commented Feb 28, 2018

This has a fair amount of overlap/conflict with #4403 and #4623, so given that those are up to date and active, I'm holding off on bringing this up to date until they are merged.

@carljm
Copy link
Member

carljm commented Mar 20, 2018

Now just waiting on #4693 to go in before I bring this up to date.

@emmatyping
Copy link
Collaborator

@carljm now that PEP 561 is in, are you interested in picking this up?

@carljm
Copy link
Member

carljm commented Apr 18, 2018

@ethanhs Yes, I'm interested and aware that the conflicting PRs have landed, just haven't had time yet. Not sure if I will before PyCon sprints.

@emmatyping
Copy link
Collaborator

ethanhs Yes, I'm interested and aware that the conflicting PRs have landed, just haven't had time yet. Not sure if I will before PyCon sprints.

Okay, no worries! I look forward to see you at the sprints.

@gvanrossum
Copy link
Member

So @carljm are you still interested in this? I could understand if not (maybe pyre does what you want). In that case I propose to close this -- it's probably too far out of date to be of much help to anyone else who wants to implement this feature.

@carljm
Copy link
Member

carljm commented Aug 1, 2018

@gvanrossum For a while I was hoping to still follow through on this, just because I'd said I would, but I think I have to face reality that it's not going to make it to the top of my priority list now. Sorry for not clarifying that sooner.

I agree re closing this specific PR; I was planning to reuse the tests as a starting point, but expected to have to start over with the implementation.

@gvanrossum gvanrossum closed this Aug 1, 2018
@gvanrossum
Copy link
Member

gvanrossum commented Aug 1, 2018 via email

@m-novikov
Copy link
Author

Actually now I have time and willing to continue work on it, provided I have some consultation regarding edge cases.

@ilevkivskyi
Copy link
Member

@m-novikov I can't guarantee that I will be able to help you with all questions, but I think we have enough people following here who will be able to help. So please go ahead!

@gvanrossum
Copy link
Member

It's probably best to open a new PR then, once you're ready.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants