-
-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change importlib to first try to import modules using the standard mechanism #11997
Conversation
9dbb819
to
e054ca6
Compare
b7bdb67
to
bf112c3
Compare
@flying-sheep I believe the changes here are the correct approach for the |
@flying-sheep would appreciate if you can test this PR using your original project to ensure it works as intended (if it is simple and you have the time, of course). 👍 |
bf112c3
to
6fdac9f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Beautiful! If I understand correctly, this is basically the more efficient way to do what I did in my PR by brute forcing. I have a thought regarding the changed wording for --import-mode=importlib
:
Indeed. The previous semantics were:
This has now changed to:
In retrospect this is what we should have implemented since the beginning, but the thought when this was initially implemented was that it would be used to import test modules only, but we missed the case of importing normal modules too (for doctesting for example). |
An alternative way of solving this would have been to only collect doctests from roots using Thank you for spending time solving this! |
I tested this with our package, and it seems to indeed work flawlessly! |
That would solve doctests, but I think the problem is more general and best solved in
Thanks! |
I haven't reviewed the changes yet, but regarding
Just to make sure, IIRC one of the reasons for adding importlib mode was to deal with the non-package-conftest problem where it's like
and then we try to load If so, doesn't "first try to import the module normally" still run into a problem here, that a module with the same name but different path is already loaded, and we'll return it? |
Assuming
Here we will try to import I will improve the tests to also cover this scenario so there is no doubt this works as intended. |
Hmm I realize now probably the wording "standalone" is confusing; I meant "not part of a package", but I can see that one can understands this as "using the module's base name only". Appreciate any suggestions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The import_module
might import (= execute) modules that were not intended. I don't think that we can assume that arbitrary names (names of test files) are safe to import.
E.g. consider I'm running pytest x.py
and I have an x
module in my site-packages, then the x
module in site-packages will get executed, this is quite unexpected and possibly dangerous. (The code as written also has a bug, where even in the case the if mod.__file__ and Path(mod.__file__) == path
condition is false, the module from the site-packages will still be returned, because of the return sys.modules[module_name]
just below).
The Python importlib stuff confuses me greatly but it seems to me that the existing importlib mode code basically has the right idea -- it first finds the module spec and only executes it afterwards. Perhaps we can just tweak/extend the existing sys.meta_path
search code to work, and keep something like the mod.__file__ and Path(mod.__file__) == path
check, but with the module spec, not the actual module?
Ouch, you are definitely right. Will have to think about this some more then. |
Note that the |
Great catch here @bluetech, I really dropped the ball on this one. 😓 Using the specs for that first import, which crucially also uses the module location to import the file and not only the bare module name as my previous attempt, nicely solves this. I added another test to ensure we are covering the situation you described, where |
8a0f1a9
to
f953bf4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't review yet, only looked at the resolve_pkg_root_and_module_name
function.
src/_pytest/pathlib.py
Outdated
|
||
Passing the full path to `models.py` will yield Path("src") and "app.core.models". | ||
|
||
If consider_ns_packages is True, then we additionally check if the top-level directory |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, this handles the case where there is a python package (directory structure with __init__.py
, like app
above, and not a single file) nested inside a namespace package (like let's say mycompany
). I have two questions/comments:
-
Why only handle python packages (
__init__.py
) and not the case whereapp.core.models
doesn't have__init__.py
's either? -
Namespace packages can have multiple levels, e.g.
com.mycompany.app.core.models
not justmycompany.app.core.models
. IIUC the current code only handles one level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
Yeah that's something we can revisit later, pytest never supported packages without a
__init__
correctly (is there a name for it?), I recall we had several discussions with other package-related authors (pip maintainers, Jason Coombs, etc) on how to support that, but the discussion stalled. I would be happy to tackle that in a separate PR though, as this requires more tests in order to make sure the support is sound. -
Do they? I always thought it was a mechanism just to share the top-level, based on the use cases out there and the docs at https://packaging.python.org/en/latest/guides/packaging-namespace-packages/#native-namespace-packages. I will double check if this is really supported, and if so update the code accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
About 2, you are right, it does support multiple levels (TIL). 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm hesitant on adding multiple level support for namespace packages, let me explain my reasoning.
I can see situations (might be contrived I admit) where we identify a module as part of a namespace package incorrectly, just because the directory is in sys.path
somehow.
For example:
temp/
some/
folder/
myproject/
src/
com/
company/
myapp/
__init__.py
core.py
If we look upwards from core.py
in order to find if it is in a namespace package, we must keep looking upwards until we find a directory in sys.path
or until we reach the root of the file system and give up. Suppose for some reason temp/
is in sys.path
(possibly due to a misconfiguration), this means we could end up determining that core.py
is be part of the package temp.some.folder.myproject.src.com.company.myapp.core
, which is incorrect, depending on the order directories appear in sys.path
, if temp/
appears first: temp/
or temp/some/folder/myproject/src
.
The namespace package rules are easy to follow when you try to figure out a file path starting from a module name: to see if com.company.myapp.core
as it appears in the source code is valid , we need to through each sys.path
entry, join it with com/company/myapp/core.py
and see if the file exists: if it does, double check that there are no __init__.py
files present, then import it. This is what Python needs to do and why it works unambiguously.
But the situation is not so simple when you need to derive a module name starting from a file path, because you need to search recursively upwards and see if you can build a valid module name using that full path, but even if we do build a valid module name we don't know if that was intentional or not, and we might end up with a module named temp.some.folder.myproject.src.com.company.myapp.core
; in the previous case, going from module name to path, the user is telling us the module name, so there is no ambiguity.
I recall that was the central point when this matter was being discussed with other package-related authors in the past (I cannot find the thread, perhaps @RonnyPfannschmidt can recall at least which forum that took place?).
Hope I was able to explain the situation, sorry if the explanation is lacking.
Btw, I only introduced this change regarding namespace packages because 1 specific test in doctests that is meant to import from namespace packages, but AFAICT it was working with namespace packages by accident, due to how we were building the module name from the path, not due to any proper namespace package support -- so much that I changed it slighly the test to use a src
layout, while keeping the namespace package configuration, and then the code in main
could not find it anymore. The change implemented here made we at least recognize namespace packages with one level.
I might be overthinking this too, so would like to hear your opinions on this.
Unless my assessment is incorrect, I think a compromise would be to:
- Support just one level for now; up until now we did not support any level at all, so this is a small test to provide full support.
- Keep the status quo of "pytest does not yet support namespace packages", remove the partial support I added here, and mark the test as xfail.
- Implement full support in this PR.
I'm OK with 3), just wanted to present the full picture and see if this is the direction we want to go.
There might also a simpler/safer way to determine if a file path is part of a full namespace package that I'm not seeing, in which case much of my concerns here are moot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To aid in the discussion I went ahead and implemented it in 1232d19.
The implementation is simple, I'm a bit hesitant if this can have undesired side effects in the wild.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think your analysis is correct, particularly the difference between going module path => filesystem path (what Python does) and the reverse (what we want to do).
I think the partial support is too confusing, so IMO we shouldn't do option 1 (I prefer 2 over 1).
I totally understand the concern with option 3. However, I think we can try it and see. We may end up with these long unintended module paths, but I don't think it would be actively dangerous (=> actually importing unintended stuff) because intermediate paths in namespace packages are not executing anything (there is no __init__.py
to import).
Longer term, and this has already been brought by several people, I do think we need to add some concept of "workspace roots", which are Directory
collector nodes directly below Session
, and which put a strict bound on filesystem traversals (never going beyond them). (Also, initial conftests only collected from workspace roots. The confcutdir
concept is gone. --pypkg
args become workspace roots. All nodeids are rooted in roots, can't have <rootdir>/../bla.py
or /foo/bar.py
nodeids anymore). Hopefully I can put up a proper proposal sometime soon.
Suppose for some reason temp/ is in sys.path (possibly due to a misconfiguration), this means we could end up determining that core.py is be part of the package temp.some.folder.myproject.src.com.company.myapp.core, which is incorrect, depending on the order directories appear in sys.path, if temp/ appears first: temp/ or temp/some/folder/myproject/src.
Since we're iterating pkg_path.parents
in order, I don't think there is ambiguity -- we'll always try com.company.myapp.core
in temp/some/folder/myproject/src
first, so that's the one that would be found, even if temp/
is in sys.path.
Regarding the initial implementation, in the loop for parent in pkg_path.parents:
, is there some special handling needed if one of the parents
has a __init__.py
? Can namespace packages be nested in regular packages? Or the other way around?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we're iterating pkg_path.parents in order, I don't think there is ambiguity -- we'll always try com.company.myapp.core in temp/some/folder/myproject/src first, so that's the one that would be found, even if temp/ is in sys.path.
Yes, you are right.
Regarding the initial implementation, in the loop for parent in pkg_path.parents:, is there some special handling needed if one of the parents has a init.py?
Good catch, indeed if there is an __init__.py
it is no longer a namespace package. Added a specific test for this.
Went ahead with option 3 then, please take a look.
I also double checked and the pytest-doctest-import-mismatch
(the reproducer for the issue) still passes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, indeed if there is an init.py it is no longer a namespace package. Added a specific test for this.
So I've just seen in the link you provided https://packaging.python.org/en/latest/guides/packaging-namespace-packages/#native-namespace-packages a regular package nested inside of a namespace package... So I assume it is possible?
306d9e0
to
5cc0ee0
Compare
src/_pytest/pathlib.py
Outdated
|
||
Passing the full path to `models.py` will yield Path("src") and "app.core.models". | ||
|
||
If consider_ns_packages is True, then we additionally check if the top-level directory |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, indeed if there is an init.py it is no longer a namespace package. Added a specific test for this.
So I've just seen in the link you provided https://packaging.python.org/en/latest/guides/packaging-namespace-packages/#native-namespace-packages a regular package nested inside of a namespace package... So I assume it is possible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't review the code yet, just some light comments. Will review a bit later.
(I can't seem to reply inline to your original comment) Sub-packages can have Consider this:
In the example above, Using this simple script to test it: import sys
sys.path.append(r"e:\Temp\src\dist1")
sys.path.append(r"e:\Temp\src\dist2")
import com.company.app.core.models
import com.company.calc.algo.algorithms
print(com.company.app.core.models)
print(com.company.calc.algo.algorithms) This prints:
To prove this, we create
But as I mentioned, the sub-packages can have This is why |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Went over the code now, it looks good to me in general! I left a few comments in addition to the previous ones.
Also for your consideration, I've split the PR into commits, I think it will help with reviewing/bisecting/reverting compared to one big commit. https://github.com/bluetech/pytest/commits/11475-importlib/
5cc0ee0
to
0ae96e5
Compare
0ae96e5
to
7dc2939
Compare
*, | ||
mode: Union[str, ImportMode] = ImportMode.prepend, | ||
root: Path, | ||
consider_namespace_packages: bool, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use a default here?
I prefer to force everywhere that calls it to pass the new option explicitly, lest we miss passing the option in production code by mistake.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me!
The test seems wrong, and we haven't been able to figure out what it's trying to test (the original issue is lost in time).
Ensure the implementation isn't changed to trigger such a bug.
Will be reused.
Will be reused.
…name` This applies to `append` and `prepend` import modes; support for `importlib` mode will be added in a separate change.
…chanism As detailed in pytest-dev#11475 (comment), currently with `--import-mode=importlib` pytest will try to import every file by using a unique module name, regardless if that module could be imported using the normal import mechanism without touching `sys.path`. This has the consequence that non-test modules available in `sys.path` (via other mechanism, such as being installed into a virtualenv, PYTHONPATH, etc) would end up being imported as standalone modules, instead of imported with their expected module names. To illustrate: ``` .env/ lib/ site-packages/ anndata/ core.py ``` Given `anndata` is installed into the virtual environment, `python -c "import anndata.core"` works, but pytest with `importlib` mode would import that module as a standalone module named `".env.lib.site-packages.anndata.core"`, because importlib module was designed to import test files which are not reachable from `sys.path`, but now it is clear that normal modules should be imported using the standard mechanisms if possible. Now `imporlib` mode will first try to import the module normally, without changing `sys.path`, and if that fails it falls back to importing the module as a standalone module. This also makes `importlib` respect namespace packages. This supersedes pytest-dev#11931. Fix pytest-dev#11475 Close pytest-dev#11931
Test with namespace packages support even when it will not find namespace packages to ensure it will at least not give weird results or crashes.
7dc2939
to
d6134bc
Compare
/edit: moved to #11475 (comment) |
As detailed in #11475 (comment), currently with
--import-mode=importlib
pytest will try to import every file by using a unique module name, regardless if that module could be imported using the normal import mechanism without touchingsys.path
.This has the consequence that non-test modules available in
sys.path
(via other mechanism, such as being installed into a virtualenv, PYTHONPATH, etc) would end up being imported as standalone modules, instead of imported with their expected module names.To illustrate:
Given
anndata
is installed into the virtual environment,python -c "import anndata.core"
works, but pytest withimportlib
mode would import that module as a standalone module named".env.lib.site-packages.anndata.core"
, because importlib module was designed to import test files which are not reachable fromsys.path
, but now it is clear that normal modules should be imported using the standard mechanisms if possible.Now
imporlib
mode will first try to import the module normally, without changingsys.path
, and if that fails it falls back to importing the module as a standalone module.This supersedes #11931.
Fix #11475
Close #11931