-
-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Search sys.path for PEP-561 compliant packages #11143
Conversation
I think we should require the user to pass --python-executable for using |
I am inclined to say that we should either add a new flag to enable searching sys.path or enable it if the @superherointj, unfortunately mypy happens at the speed of volunteer time. I see this affects Nix, though it's not clear to me exactly how, could you explain? With a better idea of what the issue is I can help figure out the best path forward. |
I think that enabling this functionality by default is the best option, but there should be a way to turn this off, since this could cause problems in some configurations. Here's my reasoning:
We'd need to be prepared to tweak the implementation if users report problems with it. I expect that we can make the new behavior work reliably for (almost) all users, at least after some iteration. @ethanhs Do you foresee some specific problems that this could cause, or is it more about unknown issues? |
Yeah I would be in favor of enabling it by default, but my concern is that it could easily lead to significant performance regressions if a user has a long sys.path I suppose that isn't super common so it should be reasonable to enable this by default. |
@ethanhs thanks for taking the time to answer me.
I'll be waiting for the decision. (I don't mean pressure anyone, it's just that I'm waiting for this to happen so I can sort it out downstream.)
I can totally understand.
Nix doesn't use the usual means of package management, opting for it's own system (and for good reasons, reproducibility etc). So it is common to have a few things to break or need patching. Without this patch, the workaround we have is to wrap I'd prefer to wait for the decision here, so we can take the right course of action. Then I can implement/validade it downstream. As we build all packages, I can report back any issues. |
It would be good to measure the performance impact with typical A reasonable benchmark could be something as simple as |
As everything else for us here :)
Basically, Nix doesn't have a global There is an opt-in facility in Nix to create “one big directory with all the Python packages” on demand, but relying on it would require changes in all Nix packages that rely on
Are there cases where users have a long |
I did a quick check, using
vs
|
@nbraud that's likely because mypy is using the incremental cache. I'd recommend running with |
I'm guessing this issue would also resolve some issues that I experienced myself (I was unable to create minimal repo that reproduce it, so I never reported it). It typically happened to me with applications that depended on fastapi (which in turn depend on starlette). When I run mypy in a nix-shell I got errors with missing type information for starlette. If I run mypy installed in venv everything work correctly. For applications that did not use fastapi, mypy seemed to work correctly in both cases. |
I've run some benchmarks with the recommended flags and here's what I've found: The testsI've created a virtualenv with 77 packages in by running: λ python -m venv venv
λ venv/bin/pip install --upgrade setuptools pip wheel
λ venv/bin/pip install jupyter seaborn networkx requests types-requests -e /path/to/mypy
λ for req in $(venv/bin/pip freeze | grep -v ' '); do pkg=$(echo $req | cut -d '=' -f 1); mkdir -p venv/pkgs/$pkg/lib/python3.10/site-packages; done # The reason for this will become clear soon I'm running in Python 3.10 on Arch Linux with an Intel i7-1165G7 at 2.80GHz and a WD_BLACK SN850 NVMe SSD. Experiment 1BaselineThis test is run without my changes (ie on commit b44d2bc).
With sys.path searchingThese tests are run with my changes.
Experiment 2I've then split the virtualenv from Experiment 1 into another with each package installed to a different directory: λ python -m venv split_venv
λ split_venv/bin/pip install --upgrade setuptools pip wheel
λ for req in $(venv/bin/pip freeze | grep -v ' '); do pkg=$(echo $req | cut -d '=' -f 1); mkdir -p split_venv/pkgs/$pkg; split_venv/bin/pip install --no-deps --prefix split_venv/pkgs/$pkg $req; done
λ split_venv/bin/pip install --no-deps -e /path/to/mypy I've added the following sitecustomize to both virtualenvs so that they are both doing the same amount of work to start up Python: import os
import sys
pkgs_dir = os.path.join(sys.prefix, "pkgs")
for pkg in os.listdir(pkgs_dir):
directory = os.path.join(pkgs_dir, pkg, "lib", "python3.10", "site-packages")
sys.path.append(directory) BaselineThis test is run without my changes (ie on commit b44d2bc).
With sys.path searchingThese tests are run with my changes.
ConclusionFor a sys.path 6 entries long, there is no significant slowdown (about 0.02s). For a sys.path 82 entries long I see half a second of slowdown. I don't know how this scales with the size of sys.path. My opinion, if you've got a sys.path that long then I think you're consenting to imports being slow in a regular Python session. I think it's fair for mypy to be slower for the same reason, and therefore using this new behaviour by default is fine. A half second of slowdown isn't that big and needing the ability to turn this off does not seem worth it for the additional code complexity. What do we think? |
I havent dug through the implementation details outside of this PR, but figured I'd just ask: Should package_path=tuple(sys_path + egg_dirs + site_packages) And in a similar vein, why bother pulling out site packages at all, why not just use |
The mypy code diverged enough that these patch no longer works. Can somebody rebase it and maybe merge it? |
Hi @takeda, rebased the patch and solved the conflict on nixpkgs. Should be resolved downstream: NixOS/nixpkgs#165019 |
@JukkaL will this ever get merged? Are there any other concerns about this patch? |
This comment has been minimized.
This comment has been minimized.
I've done a rebase and resolved the conflicts. I've also switched to searching just sys.path, and not separately searching site-pacakges and egg directories as well. This has given a slight speedup. |
This comment has been minimized.
This comment has been minimized.
There's now also this alternative PR, which I believe is a bit simpler, given it always uses |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This pull request is ready for review again. I've made a couple of optimisations since the initial submission and updated the test results (#11143 (comment)) to reflect the performance difference of the new implementation. |
Ok I finally got around to running the performance test that I've been keen to, which is to see what impact this had on checking large projects like Zulip, which has a lot of code and over 270 packages installed. Running mypy on Zulip using their test script and not using a cache: mypy 0.742:
after installing a local copy of mypy with this PR:
So it is basically system noise :) That is quite encouraging. I'll take a closer look at the changes soon. |
According to mypy_primer, this change has no effect on the checked open source code. 🤖🎉 |
@ethanhs Apologies if you were waiting for another comment from me to indicate the readiness of this, but I've already made the requested changes and this is ready for review again. |
Hello! When might this be released? |
It is already on the main branch so on the next release, average cadence seems to be close to 1 tagged release per month and last tagged release was about a month ago so soon(tm). |
After python/mypy#11143, the search path for mypy has changed, resulting in the current path being present. Explicitly adding in the base path results in an error about the directory from the search_path (cwd) being present in `MYPYPATH`. Removing the base path from `MYPYPATH` and relying on just `pyinfo` to load the current path in appears to work as expected.
After python/mypy#11143, the search path for mypy has changed, resulting in the current path being present. Explicitly adding in the base path results in an error about the directory from the search_path (cwd) being present in `MYPYPATH`. Removing the base path from `MYPYPATH` and relying on just `pyinfo` to load the current path in appears to work as expected.
After python/mypy#11143, the search path for mypy has changed, resulting in the current path being present. Explicitly adding in the base path results in an error about the directory from the search_path (cwd) being present in `MYPYPATH`. Removing the base path from `MYPYPATH` and relying on just `pyinfo` to load the current path in appears to work as expected.
Thank you! We don't need to monkey patch our mypy setup in Bazel anymore 🤩 |
Description
Closes #5701
This change means that mypy will now search the directories on
sys.path
for PEP-561 compliant packages. The current directory is excluded from these new default search paths, as are any directories that contain the standard library because those definitions come from typeshed.sys.path
is searched unconditionally in this change but, given that this could be quite a disruptive change, it would be possible to searchsys.path
only when specified with a new command line flag if preferred.Test Plan
I've added a test that fails without this change.