Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid full path enumeration on import of setuptools or pkg_resources? #510

Closed
ghost opened this issue Mar 1, 2016 · 99 comments · Fixed by #2194
Closed

Avoid full path enumeration on import of setuptools or pkg_resources? #510

ghost opened this issue Mar 1, 2016 · 99 comments · Fixed by #2194

Comments

@ghost
Copy link

ghost commented Mar 1, 2016

Originally reported by: konstantint (Bitbucket: konstantint, GitHub: konstantint)


At the moment on my machine, it takes about 1.28 seconds to do a bare import pkg_resources, 1.47 seconds to do a bare import setuptools, 1.36 seconds to do a bare from pkg_resources import load_entry_point and 1.25 seconds to do a bare from pkg_resources import load_entry_point.

This obviously affects all of the python scripts that are installed as console entry points, because each and every one of them starts with a line like that. In code which does not rely on entry points this may be a problem whenever I want to use resource_filename to consistently access static data.

I believe this problem is decently common, yet I did not find any issue or discussion, hence I'm creating one, hoping I'm not repeating what has been said already elsewhere unnecessarily.

I am using Anaconda Python, which comes along with a fairly large package, alongside several of my own packages, which I commonly add my path via setup.py develop, however I do not believe this setup is anything out of the ordinary. There are 37 items on my sys.path at the moment. Profiling import pkg_resources shows that this leads to 76 calls to workingset.add_entry (timing at about a second), of which most of the time is spent in 466 calls to Distribution.from_location.

Obviously, the reason for the problem lies in the two _call_aside methods at the end of pkg_resources which lead to a full scan of the python path at the moment when the package is imported, and the only way to alleviate it would be to somehow avoid or delay the need for this scan as much as possible.

I see two straightforward remedies:
a) Make the scanning lazy. After all, if all one needs is to find a particular package, the scan could stop as soon as the corresponding package is located. At the very least this would allow me to "fix" my ipython loading problem by moving it up in the path. This might break some import rules which do not respect the precedence of the path, which I'm not aware.
b) Cache a precomputed index and update it lazily. Yes, this might requre some ad-hoc rules for resolving inconsistencies, and this may lead to ugly conflicts with external tools that attempt to install multiple versions of a package, but this will basically avoid the current startup delay in 99.99% of cases and solve so much of my problems, that I'd be willing to pay the price.

Although both options might seem somewhat controversial, the problem itself seems to be serious enough to deserve at least some fix eventually (for example, I've recently discovered I'm reluctant to start ipython for short calculations because of its startup delay which I've now tracked back to this same issue).

I'm contemplating making a separate utility, e.g. fast_pkg_resources, which would implement the strategy b) by simply caching calls to pkg_resources in an external file, yet I thought of raising the issue here to figure out whether someone has already addressed it, whether there are plans to do something about it in the setuptools core codebase, or perhaps I'm missing something obvious.


@jbohren
Copy link

jbohren commented Apr 28, 2016

@jaraco
Copy link
Member

jaraco commented Apr 29, 2016

Is the performance better with the same packages installed using pip? What about those packages installed with pip install --egg?

As long as console entry points require the validation of all packages in the chain, I expect startup to be somewhat slow.

I worry that remedy (a) might only have modest benefits while imposing new, possibly conflicting instructions to the user on how to implement the remedy.

Remedy (b) promises a nicer use-case, but as you point out, caching is fraught with challenges.

It sounds like you have a decent grasp of the motivations behind the current implementation, so you're at a good place to draft an implementation.

@jbohren
Copy link

jbohren commented Apr 29, 2016

Is the performance better with the same packages installed using pip? What about those packages installed with pip install --egg?

Even when installing via pip or with --egg, it's still over 300ms for my use case. As an aside, the reason we want to decrease this startup time is so that we can use the tool in interactive tab-completion.

@Carreau
Copy link
Contributor

Carreau commented Jun 27, 2016

Might be of interest : https://pypi.python.org/pypi/entrypoints (https://github.com/takluyver/entrypoints) but agreed that the load time is impacting a few other project like everython that rely on prompt_toolkit.

@scopatz
Copy link

scopatz commented Jun 27, 2016

And everyone that relies on pygments. I have some profiling available at https://github.com/xonsh/import-profiling where I have a nasty sys.modules['pkg_resources'] = None hack to prevent its import.

Importing pygments:

So just by importing pkg_resources, the slowdown is ~100x. In wall clock time, I have consistently tested the pkg_resources overhead to be at least 150 - 200 ms. This makes pkg_resoucres unusable in command line utilities that require fast start up times.

In xonsh, we have resorted to the above hacks to prevent our dependencies (pygments, prompt_toolkit) from accidentally importing it.

@olliebun
Copy link

olliebun commented Jul 8, 2016

I'm seeing a consistent ~150ms wall clock time as well. I'm writing a command-line utility with autocompletion, so it's a serious challenge. It's not clear how to fix this without giving up all of setuptools' advantages.

@scopatz
Copy link

scopatz commented Jul 22, 2016

Yesterday, I released the lazyasd package (pip install lazyasd) which has the ability perform imports on a background thread. This was written specifically to mitigate the long pkg_resources import times.

Background thread docs and example here https://github.com/xonsh/lazyasd#background-imports

Feel free to use or copy the lazyasd module into your projects.

@ninjaaron
Copy link

ninjaaron commented Aug 4, 2016

I wrote a tiny module called fastentrypoints that monkey patches the mechanism behind entry_points to generate scripts that don't import pkg_resources.

https://github.com/ninjaaron/fast-entry_points

@Fak3
Copy link

Fak3 commented Oct 31, 2016

@ninjaaron Thanks for fastentrypoints. I managed to fix the distribution issue by adding it to MANIFEST.in:

include fastentrypoints.py

@ninjaaron
Copy link

ninjaaron commented Nov 6, 2016

@Fak3 Good idea! I took that crazy bit about downloading and exec-ing the code out of the docs for fastentrypoints and mentioned using MANIFEST.in instead. The fastep command also now appends this line to MANIFEST.in

I have to admit, I loved the way I came up with (because it's so evil), but using MANIFEST.in is waaaaay saner.

Kami added a commit to StackStorm/st2 that referenced this issue Nov 17, 2016
…ols.

eventlet depends on openssl which depends on cryptography which uses
pkg_resources from setuptools which is very slow and adds 500-1000ms to module
import time.

For details see pypa/setuptools#510
@cachedout
Copy link

I certainly don't want to pile on but I did want to chime and say that this is an enoromous problem for big Python projects right now. It's severely impacted the performance of SaltStack's CLI toolset, which takes ~2.0s to load, of which 1.9s is spent purely in pkg_resources. Unfortunately, we can't just rip out any imports of pkg_resources because so many of the libs we use end up importing in anyhow. (This is generally the requests package, but could be others.)

We're exploring ways to mitigate this right now but anything we can do to help out here we'd gladly contribute to. It's a big issue for us.

(We're looking at fastentrypoints by @ninjaaron today.) I'll report back with any results. :]

@ninjaaron
Copy link

@cachedout I don't think fastentrypoints can solve the problem if you are importing pkg_resources anyway. It only takes it out of the automatically-generated scripts. However, if you are or another library is importing it anyway... :(

I myself have actually moved away from using requests for trivial scripts just to avoid the "tax" of importing it. I'm sure this isn't a solution for you, but you might try (or several of us might try) working with the developer of requests to move away from pkg_resources.

Also, I know the developers of xonsh (@scopatz and co., and I think they are not the only ones) have created mechanisms for lazily importing modules only when they are actually required. This kind of lazy import strategy might be appropriate for your project.

@cachedout
Copy link

@ninjaaron Thanks so much for the feedback! Yes, after looking at fastentrypoints it's not the right solution for us, unfortunately.

Yes, we're in the process of deprecating requests directly but we have plugins that use it so it would be very challenging to remove it entirely. I'll head over to the requests project and see if I can get an issue filed.

We do have a lazy plugin system that we really like but unfortunately it doesn't quite get us out of this problem because of the way it's written. There might be some room for improvement though, certainly. I'll be investigating.

One very ugly workaround that we did find (though likely won't use) is to simply fool the importer into skipping over pkg_resources. Somewhat surprisingly, this works at the top of a CLI script:

# This needs to be the first thing you do. Obviously, if `pkg_resources` is already imported you are too late!
import sys
sys.modules['pkg_resources'] = None
<do work>
del sys.modules['pkg_resources']

I'm not necessarily advocating this in all cases but I'll leave it here as a possible workaround for others.

That said, I would still really like to hear from the setuptools folks on this if possible. Having a simple module import stat the disk almost 18,000 times as it does in my test case all but makes many python projects unusable. Would they accept a PR to move away from this behavior by default or at the least, gate it behind an environment variable?

@jaraco
Copy link
Member

jaraco commented Dec 9, 2016

I don't currently have it paged into my mind why this behavior is the way it is. I can't recall if anyone has analyzed the behavior to see if there is something that can be done. Would someone search to see if there have been solutions proposed outside this ticket and link those here? It sounds like fast entry points suggests a partial solution but not a general one. If one were to analyze the code, does that lead to other ideas? I'm happy to review a PR, especially one that's not too complicated.

What about moving the implicit behavior to another module, like pkg_resources.preload. Then projects and scripts that import for the implicit behavior could try/except to import that module, and those that don't can simply import pkg_resources.

It would be a backward incompatible release, but if that clears the way for more efficient implementations, I'm for it.

@untitaker
Copy link

this is an issue for me as well. This might be naive, but is there a reason why the scripts written by ScriptWriter can't import the entry point directly? (i.e. only use pkg_resources.load_entry_point at install time, not runtime).

@ninjaaron
Copy link

ninjaaron commented Dec 25, 2016

@untitaker
Not a clear reason. script generated with wheel do just that. fastentrypoints monkey-patches ScriptWriter for the same behavior, and it seems to work.

Apparently someone thought this was needed when they wrote it, but clearly it doesn't affect the general use-case!

@untitaker
Copy link

Is it possible that some other hook is also executed when load_entry_point is used? That would explain the indirection.

@ninjaaron
Copy link

I guess it's possible, but I think the fact that wheels don't behave this way is a pretty good indication that it's unnecessary.

I have a suspicion it's a case of getting so involved in one's own API that it seems like the obvious way to do something, even when there is a much simpler solution. We've all been there...

@untitaker
Copy link

I'm currently working on this and it's more complicated. Installing from eggs doesn't work with your patch.

@cjw296
Copy link

cjw296 commented Apr 21, 2020

@pradyunsg - from a user perspective, having a really badly performing thing done because some library I don't know about isn't installed by a tool that should require it does end up feeling like a bug with the tool...

@pfmoore
Copy link
Member

pfmoore commented Apr 21, 2020

@pfmoore can wheel be added to build process using setup_requires parameter of setuptools.setup?

@kapsh not safely, setup_requires is deprecated because it uses easy_install to install the packages, not pip. pyproject.toml is the correct, supported way. What's the use case that works with setup_requires but not with pyproject.toml (because we'd like to fix it!)?

from a user perspective, having a really badly performing thing done because some library I don't know about isn't installed by a tool that should require it does end up feeling like a bug with the tool...

@cjw296 Agreed, up to a point. While I understand that the history isn't the point here, we're in a transition period. The setuptools wrappers were historically the approved solution, and pip did setup.py install. We moved away from setup.py install to PEP 517 (pyproject.toml and building wheels) but we're still part-way through that process (pyproject.toml adoption is still in progress, but the new wrappers depend on wheel). The transition to PEP 517 is not about better wrappers itself, but they come as a consequence.

The fix pip is progressing towards is making all builds go via PEP 517. We only support setup.py install any more to avoid breaking projects that haven't done anything about the transition yet, and break under the new process. Conversely, setuptools isn't interested in updating their wrappers as they are being phased out by the pip change (and the general move away from installing via setuptools directly).

So yes, it's a bug, but it's being fixed. The fix is just rather long-winded, for compatibility reasons, and we're doing our best to apply mitigations while the process is ongoing.

@kapsh
Copy link

kapsh commented Apr 21, 2020

@pfmoore sorry, I am not very proficient with setuptools and can't tell you use cases where pep-517-way would fail (personally I like to abuse pip install -e ., but that's another story). I've only used setup_requires to make setuptools_scm available to the packaging process. Yet I have to notice that documentation here https://setuptools.readthedocs.io/en/latest/setuptools.html#new-and-changed-setup-keywords never mentions deprecation of setup_requires keyword, maybe you would like to fix that detail. Big red warning while building sounds useful.

Thanks for your brief on current situation, this is interesting to know about.

@pfmoore
Copy link
Member

pfmoore commented Apr 21, 2020

Yet I have to notice that documentation here https://setuptools.readthedocs.io/en/latest/setuptools.html#new-and-changed-setup-keywords never mentions deprecation of setup_requires keyword, maybe you would like to fix that detail. Big red warning while building sounds useful.

Good point. I'm not a setuptools developer, so I'll leave it to them to pick up on that.

@tgbugs
Copy link

tgbugs commented Apr 21, 2020

One context where the degraded setuptools scripts are generated is for any/every python package on gentoo that has an entry point. This is probably ultimately a issue that the gentoo python team (e.g. @mgorny) would have to tackle, but it affects all system installed python packages.

@pganssle
Copy link
Member

Yet I have to notice that documentation here https://setuptools.readthedocs.io/en/latest/setuptools.html#new-and-changed-setup-keywords never mentions deprecation of setup_requires keyword, maybe you would like to fix that detail. Big red warning while building sounds useful.

setup_requires is sort of semi-deprecated. It's not the preferred way to add things to the build dependencies, but it is compatible with PEP 517/518 and feeds into get_requires_for_build_wheel. We can probably open a separate issue to discuss this.

quimey added a commit to quimey/confight that referenced this issue Apr 27, 2020
Importing it is slow and might harm performance on CLI applications
that run many times for a short time.
See pypa/setuptools#510
lpsinger added a commit to lpsinger/ligo.skymap that referenced this issue Apr 29, 2020
Speed up `import ligo.skymap` by up to a second by replacing uses of
`pkg_resources` with the new Python standard library module
`importlib.resources` (or, for Python < 3.7, the backport
`importlib_resources`). The old `pkg_resources` module is known to be
slow because it does a lot of work on startup.

See, for example,
[pypa/setuptools#926](pypa/setuptools#926) and
[pypa/setuptools#510](pypa/setuptools#510).
@cjw296
Copy link

cjw296 commented Apr 30, 2020

@pfmoore - okay, so I think I'm doing everything you said, but still getting entrypoint scripts built using pkg_resources:

$ pip freeze --all | egrep -i 'wheel|pip|setuptools'
pip==20.1
setuptools==46.1.3
wheel==0.34.2
$ pip install -e .
Obtaining file:///home/chris/energenie
...
Successfully installed energenie
$ cat `which check`
#!/home/chris/virtualenvs/energenie/bin/python3.5
# EASY-INSTALL-ENTRY-SCRIPT: 'energenie','console_scripts','check'
__requires__ = 'energenie'
import re
import sys
from pkg_resources import load_entry_point

if __name__ == '__main__':
    sys.argv[0] = re.sub(r'(-script\.pyw?|\.exe)?$', '', sys.argv[0])
    sys.exit(
        load_entry_point('energenie', 'console_scripts', 'check')()
    )

What am I doing wrong?

@pfmoore
Copy link
Member

pfmoore commented Apr 30, 2020

@cjw296 Ah, you're using editable installs (-e). They go via setup.py develop, because they are a setuptools-specific feature, not handled via wheels or any published standard. So you get setuptools wrappers in that case, and there's no avoiding it. Sorry, I forgot to mention that case.

@cjw296
Copy link

cjw296 commented Apr 30, 2020

It feels like the chances of getting a non-sucky entrypoint script are really pretty small, no?
(Honestly, reading through the above feels like some magic incantation, rather than a standard way to install software in one of the most popular programming languages in the world...)

@cjw296
Copy link

cjw296 commented Apr 30, 2020

More constructively, what's the current state of play on publishing a standard for editable installs? (-e seems pretty ubiquitous, I seem to remember flit having something too, not actually sure what conda does or if they care...)

@gaborbernat
Copy link
Contributor

@cjw296 see https://discuss.python.org/t/third-try-on-editable-installs/3986/25

@pfmoore
Copy link
Member

pfmoore commented Apr 30, 2020

There's a lot of debate on editable installs, but the latest round of discussions is here.

To cut through some of the packaging community specifics there, there's one proposal that hasn't been completely written up yet (a rough spec is here) which is waiting on someone with time to build a proof of concept implementation for some build backend (probably setuptools) and for pip. There's still a lot of debate over whether this is the best approach, but TBH, we need someone to write code at this point, not to discuss ideas (we've got plenty of people willing to do that 🙂)

Edit: @gaborbernat posted a link to some additional points that I'd not spotted since I last checked the topic, so we're a bit further forward than I suggested above.

@bluetech
Copy link

bluetech commented May 3, 2020

I use Arch Linux, which installs all python packages using setup.py install (see package guidelines). So all Python executables installed through the system package manager (tox, virtualenv, meson, youtube-dl, docker-compose, borg, and many more) get the 250ms startup slowdown due to the pkg_resources import, which is unfortunate.

I reported this issue to the Arch Linux devs, and they explained that they prefer the pkg_resources method because it provides a nice informative error message if one of the dependencies is broken or missing, for example:

Traceback (most recent call last):
  File "/usr/bin/pyrsa-keygen", line 6, in <module>
    from pkg_resources import load_entry_point
  File "/usr/lib/python3.8/site-packages/pkg_resources/__init__.py", line 3259, in <module>
    def _initialize_master_working_set():
  File "/usr/lib/python3.8/site-packages/pkg_resources/__init__.py", line 3242, in _call_aside
    f(*args, **kwargs)
  File "/usr/lib/python3.8/site-packages/pkg_resources/__init__.py", line 3271, in _initialize_master_working_set
    working_set = WorkingSet._build_master()
  File "/usr/lib/python3.8/site-packages/pkg_resources/__init__.py", line 584, in _build_master
    ws.require(__requires__)
  File "/usr/lib/python3.8/site-packages/pkg_resources/__init__.py", line 901, in require
    needed = self.resolve(parse_requirements(requirements))
  File "/usr/lib/python3.8/site-packages/pkg_resources/__init__.py", line 787, in resolve
    raise DistributionNotFound(req, requirers)
pkg_resources.DistributionNotFound: The 'pyasn1>=0.1.3' distribution was not found and is required by rsa

compared to some import error, or worse, a silently broken program if the import is conditional.

I wonder if the Python devs have any recommendations to distros on this, or if other distros do something different.

(Apologies if I missed previous discussion on this.)

@pganssle
Copy link
Member

pganssle commented May 3, 2020

The recommendation to distros is definitely to not use setup.py install .

We are 100% planning on removing setup.py install , and for several years we haven't been fixing bugs that can be fixed by using pip. They don't have to use pip, but they should be using something equivalent. The sooner they migrate to something else the better.

@mgorny
Copy link
Contributor

mgorny commented May 3, 2020

Could you please indicate what 'something equivalent' useful for distributions is? It's easy to remove features you don't need for your workflow. It's much harder to provide a good alternative, and a plan to update thousands of packages to work. Flit/poetry has already caused enough mess by not caring at all about what distributions need.

@pfmoore
Copy link
Member

pfmoore commented May 3, 2020

The recommendation is to install using pip. If a distribution doesn't like the script wrappers pip generates, they can certainly write their own (or write a tool to generate something that works as they want). As things stand, I think you'd have to overwrite the pip-created wrappers (or put your own earlier on PATH so they get priority) but it would be a reasonable request for pip install to have a flag that omits generating script wrappers.

@mgorny
Copy link
Contributor

mgorny commented May 3, 2020

What advantage does pip have over setup.py install? Besides creating even bigger circular dependency graph that makes switching to a new Python version an experience wasting hundreds of hours of our time.

@pganssle
Copy link
Member

pganssle commented May 3, 2020

I think maybe we should take this to a new issue, since we're getting a bit far off the original topic of discussion.

@mgorny If you do not like the supported workflow, or the fact that setup.py install is deprecated and unsupported, would you mind opening a new issue?

Considering that this issue is closed and seems to be a lightning rod for off-topic discussions, I recommend we lock it.

@gaborbernat
Copy link
Contributor

gaborbernat commented May 3, 2020

What advantage does pip have over setup.py install? Besides creating even bigger circular dependency graph that makes switching to a new Python version an experience wasting hundreds of hours of our time.

Pretty much every word of pep 518 and pep 517.

@FFY00
Copy link
Member

FFY00 commented May 4, 2020

@gaborbernat you completely missed the point, PEP 517 and 518 are completely irrelevant for what is being discussed.

@pypa pypa locked as off-topic and limited conversation to collaborators May 4, 2020
@pganssle
Copy link
Member

pganssle commented May 4, 2020

I've gone ahead and locked this topic for the sake of the inboxes of the people who followed this issue looking for updates on pkg_resources and who don't care about linux distributions. If other maintainers feel I've overstepped my bounds here, they are welcome to unlock it.

I would like to say to the Linux distributors, (and particularly the Arch Linux packagers; a distro I've been using and heartily recommending for years) — thank you for the work you've been doing. We definitely would like to continue working with you to find a reasonable way to take your important use case into account. You are always welcome to open an issue on setuptools, a thread on the packaging discourse, or even to e-mail me personally. Next time PyCon happens, we'll be having a packaging summit, and we'd be happy to have you involved.

@pradyunsg
Copy link
Member

I would like to say to the Linux distributors, (and particularly the Arch Linux packagers; a distro I've been using and heartily recommending for years) — thank you for the work you've been doing. We definitely would like to continue working with you to find a reasonable way to take your important use case into account.

+1

You are always welcome to open an issue on setuptools, a thread on the packaging discourse, or even to e-mail me personally.

I'll extend the same offer from pip's side as well!

Alphadelta14 added a commit to Alphadelta14/setuptools that referenced this issue Sep 23, 2020
Alphadelta14 added a commit to Alphadelta14/setuptools that referenced this issue Sep 23, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet