Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Search sys.path for PEP-561 compliant packages #11143

Merged
merged 3 commits into from
May 30, 2022

Conversation

AWhetter
Copy link
Contributor

Description

Closes #5701

This change means that mypy will now search the directories on sys.path for PEP-561 compliant packages. The current directory is excluded from these new default search paths, as are any directories that contain the standard library because those definitions come from typeshed.

sys.path is searched unconditionally in this change but, given that this could be quite a disruptive change, it would be possible to search sys.path only when specified with a new command line flag if preferred.

Test Plan

I've added a test that fails without this change.

@emmatyping
Copy link
Collaborator

I think we should require the user to pass --python-executable for using sys.path to avoid confusion if they are just running mypy but don't want to use sys.path.

@superherointj
Copy link

superherointj commented Oct 6, 2021

@AWhetter Can you update/rebase your PR? (It won't apply as it is.)
@ethanhs A decision here is important because it is causing issues downstream, updates of MyPy are blocked until this issue is resolved. (Or workarounds would be necessary where mypy is used.)

@emmatyping
Copy link
Collaborator

I am inclined to say that we should either add a new flag to enable searching sys.path or enable it if the python_executable option is set.

@superherointj, unfortunately mypy happens at the speed of volunteer time. I see this affects Nix, though it's not clear to me exactly how, could you explain? With a better idea of what the issue is I can help figure out the best path forward.

@JukkaL
Copy link
Collaborator

JukkaL commented Oct 6, 2021

I think that enabling this functionality by default is the best option, but there should be a way to turn this off, since this could cause problems in some configurations.

Here's my reasoning:

  1. If this would need to be enabled with a flag, users who need this will likely still encounter problems at first, and they have to read the docs to figure out how to work around the issue. I don't expect that many users use python_executable, for example. MYPYPATH can already be used to work around this to a certain extent, so the change would only make the workaround easier, instead of solving the problem. I think it's better that basic things "just work".
  2. For most users, using sys.path will make no difference at all, since the existing implementation is basically equivalent to the proposed one for them. This only affects users who customize sys.path, which I believe to be a small subset of users, and for them things would usually get better, except for some unusual configurations that we don't know about yet, or bugs in the implementation.

We'd need to be prepared to tweak the implementation if users report problems with it. I expect that we can make the new behavior work reliably for (almost) all users, at least after some iteration.

@ethanhs Do you foresee some specific problems that this could cause, or is it more about unknown issues?

@emmatyping
Copy link
Collaborator

Yeah I would be in favor of enabling it by default, but my concern is that it could easily lead to significant performance regressions if a user has a long sys.path

I suppose that isn't super common so it should be reasonable to enable this by default.

@superherointj
Copy link

superherointj commented Oct 6, 2021

@ethanhs thanks for taking the time to answer me.

I am inclined to say that we should either add a new flag to enable searching sys.path or enable it if the python_executable option is set.

I'll be waiting for the decision. (I don't mean pressure anyone, it's just that I'm waiting for this to happen so I can sort it out downstream.)

@superherointj, unfortunately mypy happens at the speed of volunteer time.

I can totally understand.

I see this affects Nix, though it's not clear to me exactly how, could you explain? With a better idea of what the issue is I can help figure out the best path forward.

Nix doesn't use the usual means of package management, opting for it's own system (and for good reasons, reproducibility etc). So it is common to have a few things to break or need patching.

Without this patch, the workaround we have is to wrap mypy using pyEnv with type libraries to force availability. In practice, we would have to update downstream packages where MyPy is used.

I'd prefer to wait for the decision here, so we can take the right course of action. Then I can implement/validade it downstream. As we build all packages, I can report back any issues. I have (briefly) tested @AWhetter fork and it worked for downstream packages at Nixpkgs.

@JukkaL
Copy link
Collaborator

JukkaL commented Oct 6, 2021

It would be good to measure the performance impact with typical sys.path lengths (on a few platforms). Also it would be good to check the new search path items in a few different configurations to see whether there are some unexpected entries that could slow down import processing.

A reasonable benchmark could be something as simple as mypy -c "import foo" (run it a few times so that we get incremental checking) where foo is an installed PEP-561 style package.

@nbraud
Copy link

nbraud commented Oct 6, 2021

@superherointj, unfortunately mypy happens at the speed of volunteer time.

As everything else for us here :)

I see this affects Nix, though it's not clear to me exactly how, could you explain? With a better idea of what the issue is I can help figure out the best path forward.

Basically, Nix doesn't have a global site-packages directory, and instead has separate installation paths for each package (Python or otherwise) and when an environment is requested with a given set of packages it simply constructs the right PYTHONPATH without materializing a single directory holding all the packages.

There is an opt-in facility in Nix to create “one big directory with all the Python packages” on demand, but relying on it would require changes in all Nix packages that rely on mypy (or mypyc), whereas making it the default would work out of the box.

my concern is that it could easily lead to significant performance regressions if a user has a long sys.path

Are there cases where users have a long sys.path and do not need it used? In any case, the concern seems to me a bit premature without first measuring the performance impact.

@nbraud
Copy link

nbraud commented Oct 6, 2021

I did a quick check, using hyperfine and mypy -c "import requests", and there does not appear to be a significant difference between a clean environment (w/ 17 entries in $PYTHONPATH) and some arbitrary dev environment I had lying around (w/ 87 entries) :

$ # “Clean” environment
$ nix-shell -p python39.pkgs.mypy -p python39.pkgs.requests -p hyperfine
$ cut -d: --output-delimiter=$'\n' -f1-999 <<< "$PYTHONPATH" | wc -l
17

$ hyperfine -r 100 -- 'mypy -c "import requests"'
Benchmark #1: mypy -c "import requests"
  Time (mean ± σ):     320.7 ms ±  17.9 ms    [User: 291.8 ms, System: 28.1 ms]
  Range (min … max):   286.6 ms … 377.6 ms    100 runs

vs

$ # “Dev” environment
$ cut -d: --output-delimiter=$'\n' -f1-999 <<< "$PYTHONPATH" | wc -l
87

$ hyperfine -r 100 -- 'mypy -c "import requests"'
Benchmark #1: mypy -c "import requests"
  Time (mean ± σ):     382.0 ms ±  19.4 ms    [User: 344.7 ms, System: 36.3 ms]
  Range (min … max):   344.9 ms … 454.6 ms    100 runs

@emmatyping
Copy link
Collaborator

@nbraud that's likely because mypy is using the incremental cache. I'd recommend running with --cache-dir=/dev/null --no-incremental to stimulate a cold run and not write the cache out.

@takeda
Copy link

takeda commented Dec 7, 2021

I'm guessing this issue would also resolve some issues that I experienced myself (I was unable to create minimal repo that reproduce it, so I never reported it).

It typically happened to me with applications that depended on fastapi (which in turn depend on starlette). When I run mypy in a nix-shell I got errors with missing type information for starlette. If I run mypy installed in venv everything work correctly. For applications that did not use fastapi, mypy seemed to work correctly in both cases.

@AWhetter
Copy link
Contributor Author

AWhetter commented Jan 7, 2022

I've run some benchmarks with the recommended flags and here's what I've found:

The tests

I've created a virtualenv with 77 packages in by running:

λ python -m venv venv
λ venv/bin/pip install --upgrade setuptools pip wheel
λ venv/bin/pip install jupyter seaborn networkx requests types-requests -e /path/to/mypy
λ for req in $(venv/bin/pip freeze | grep -v ' '); do pkg=$(echo $req | cut -d '=' -f 1); mkdir -p venv/pkgs/$pkg/lib/python3.10/site-packages; done  # The reason for this will become clear soon

I'm running in Python 3.10 on Arch Linux with an Intel i7-1165G7 at 2.80GHz and a WD_BLACK SN850 NVMe SSD.

Experiment 1

Baseline

This test is run without my changes (ie on commit b44d2bc).

λ hyperfine --runs 30 -- 'venv/bin/mypy --cache-dir=/dev/null --no-incremental -c "import requests"'
Benchmark 1: venv/bin/mypy --cache-dir=/dev/null --no-incremental -c "import requests"
  Time (mean ± σ):      2.634 s ±  0.026 s    [User: 2.581 s, System: 0.045 s]
  Range (min … max):    2.578 s …  2.690 s    30 runs

With sys.path searching

These tests are run with my changes.

λ hyperfine --runs 30 -- 'venv/bin/mypy --cache-dir=/dev/null --no-incremental -c "import requests"'
Benchmark 1: venv/bin/mypy --cache-dir=/dev/null --no-incremental -c "import requests"
  Time (mean ± σ):      2.652 s ±  0.024 s    [User: 2.615 s, System: 0.028 s]
  Range (min … max):    2.612 s …  2.703 s    30 runs

Experiment 2

I've then split the virtualenv from Experiment 1 into another with each package installed to a different directory:

λ python -m venv split_venv
λ split_venv/bin/pip install --upgrade setuptools pip wheel
λ for req in $(venv/bin/pip freeze | grep -v ' '); do pkg=$(echo $req | cut -d '=' -f 1); mkdir -p split_venv/pkgs/$pkg; split_venv/bin/pip install --no-deps --prefix split_venv/pkgs/$pkg $req; done
λ split_venv/bin/pip install --no-deps -e /path/to/mypy

I've added the following sitecustomize to both virtualenvs so that they are both doing the same amount of work to start up Python:

import os
import sys

pkgs_dir = os.path.join(sys.prefix, "pkgs")
for pkg in os.listdir(pkgs_dir):
    directory = os.path.join(pkgs_dir, pkg, "lib", "python3.10", "site-packages")
    sys.path.append(directory)

Baseline

This test is run without my changes (ie on commit b44d2bc).

λ hyperfine --runs 30 -- 'venv/bin/mypy --cache-dir=/dev/null --no-incremental -c "import requests"'
Benchmark 1: venv/bin/mypy --cache-dir=/dev/null --no-incremental -c "import requests"
  Time (mean ± σ):      2.680 s ±  0.070 s    [User: 2.638 s, System: 0.031 s]
  Range (min … max):    2.607 s …  2.889 s    30 runs

With sys.path searching

These tests are run with my changes.

λ hyperfine --runs 30 -- 'venv/bin/mypy --cache-dir=/dev/null --no-incremental -c "import requests"'
Benchmark 1: venv/bin/mypy --cache-dir=/dev/null --no-incremental -c "import requests"
  Time (mean ± σ):      3.029 s ±  0.031 s    [User: 2.962 s, System: 0.057 s]
  Range (min … max):    2.979 s …  3.079 s    30 runs

λ hyperfine --runs 30 -- 'split_venv/bin/mypy --cache-dir=/dev/null --no-incremental -c "import requests"'
Benchmark 1: split_venv/bin/mypy --cache-dir=/dev/null --no-incremental -c "import requests"
  Time (mean ± σ):      3.067 s ±  0.031 s    [User: 3.019 s, System: 0.039 s]
  Range (min … max):    3.015 s …  3.194 s    30 runs

Conclusion

For a sys.path 6 entries long, there is no significant slowdown (about 0.02s). For a sys.path 82 entries long I see half a second of slowdown. I don't know how this scales with the size of sys.path.

My opinion, if you've got a sys.path that long then I think you're consenting to imports being slow in a regular Python session. I think it's fair for mypy to be slower for the same reason, and therefore using this new behaviour by default is fine. A half second of slowdown isn't that big and needing the ability to turn this off does not seem worth it for the additional code complexity.

What do we think?

@llchan
Copy link
Contributor

llchan commented Feb 3, 2022

I havent dug through the implementation details outside of this PR, but figured I'd just ask:

Should sys.path paths take precedence over site packages? i.e. should we have the following instead?

package_path=tuple(sys_path + egg_dirs + site_packages)

And in a similar vein, why bother pulling out site packages at all, why not just use sys.path as much as possible (aside from pulling out stdlib)? There could conceivably be sitecustomize.py which rearrange sys.path in non-standard ways. My preference would be to leverage Python's existing logic as much as possible and avoid reimplementing in a close-but-not-quite-the-same-code-path way.

@takeda
Copy link

takeda commented Mar 16, 2022

The mypy code diverged enough that these patch no longer works. Can somebody rebase it and maybe merge it?

@fabianhjr
Copy link
Contributor

Hi @takeda, rebased the patch and solved the conflict on nixpkgs. Should be resolved downstream: NixOS/nixpkgs#165019

@takeda
Copy link

takeda commented Mar 24, 2022

@JukkaL will this ever get merged? Are there any other concerns about this patch?

@github-actions

This comment has been minimized.

@AWhetter
Copy link
Contributor Author

AWhetter commented Apr 1, 2022

I've done a rebase and resolved the conflicts. I've also switched to searching just sys.path, and not separately searching site-pacakges and egg directories as well. This has given a slight speedup.
There's still a second of difference introduced, as seen in these tests: #11143 (comment) I'm still not 100% why that is. The "_find_module_non_stub_helper" method in modulefinder.py goes from being called 1300 times to being called 51,000 times and seems to be where most of the slowness is.

@github-actions

This comment has been minimized.

@Apteryks
Copy link

Apteryks commented Apr 6, 2022

There's now also this alternative PR, which I believe is a bit simpler, given it always uses sys.pathand avoids reinventing what Python's site.py already provides. I'll give it a good test in the context of GNU Guix.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@AWhetter
Copy link
Contributor Author

This pull request is ready for review again. I've made a couple of optimisations since the initial submission and updated the test results (#11143 (comment)) to reflect the performance difference of the new implementation.

@emmatyping
Copy link
Collaborator

emmatyping commented Apr 21, 2022

Ok I finally got around to running the performance test that I've been keen to, which is to see what impact this had on checking large projects like Zulip, which has a lot of code and over 270 packages installed.

Running mypy on Zulip using their test script and not using a cache:

mypy 0.742:

Benchmark #1: ./tools/run-mypy
  Time (mean ± σ):     19.477 s ±  0.591 s    [User: 19.083 s, System: 0.391 s]
  Range (min … max):   18.905 s … 20.967 s    20 runs

after installing a local copy of mypy with this PR:

Benchmark #1: ./tools/run-mypy
  Time (mean ± σ):     19.052 s ±  0.398 s    [User: 18.689 s, System: 0.364 s]
  Range (min … max):   18.457 s … 19.894 s    20 runs

So it is basically system noise :)

That is quite encouraging. I'll take a closer look at the changes soon.

@github-actions
Copy link
Contributor

According to mypy_primer, this change has no effect on the checked open source code. 🤖🎉

@AWhetter
Copy link
Contributor Author

@ethanhs Apologies if you were waiting for another comment from me to indicate the readiness of this, but I've already made the requested changes and this is ready for review again.

@ofek
Copy link

ofek commented Jul 4, 2022

Hello! When might this be released?

@fabianhjr
Copy link
Contributor

It is already on the main branch so on the next release, average cadence seems to be close to 1 tagged release per month and last tagged release was about a month ago so soon(tm).

DamianZaremba added a commit to DamianZaremba/pytest-mypy-plugins that referenced this pull request Jul 20, 2022
After python/mypy#11143, the search path for
mypy has changed, resulting in the current path being present.

Explicitly adding in the base path results in an error about the
directory from the search_path (cwd) being present in `MYPYPATH`.

Removing the base path from `MYPYPATH` and relying on just `pyinfo` to
load the current path in appears to work as expected.
DamianZaremba added a commit to DamianZaremba/pytest-mypy-plugins that referenced this pull request Jul 20, 2022
After python/mypy#11143, the search path for
mypy has changed, resulting in the current path being present.

Explicitly adding in the base path results in an error about the
directory from the search_path (cwd) being present in `MYPYPATH`.

Removing the base path from `MYPYPATH` and relying on just `pyinfo` to
load the current path in appears to work as expected.
DamianZaremba added a commit to DamianZaremba/pytest-mypy-plugins that referenced this pull request Jul 20, 2022
After python/mypy#11143, the search path for
mypy has changed, resulting in the current path being present.

Explicitly adding in the base path results in an error about the
directory from the search_path (cwd) being present in `MYPYPATH`.

Removing the base path from `MYPYPATH` and relying on just `pyinfo` to
load the current path in appears to work as expected.
@robin-wayve
Copy link

Thank you! We don't need to monkey patch our mypy setup in Bazel anymore 🤩

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sitecustomize/full sys.path support