-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimise logic for displaying packages at the end of pip install
#12791
Conversation
bcb47f2
to
f023907
Compare
Wheee! Nice find @morotti! Could you add a news fragment (as described in the section linked from the details of the failure)? Specifically, it should be (the approval above is subject to CI being happy, which it isn't yet due to the issue with the missing news fragment!) |
I will note that most of the overhead of |
f023907
to
b107afa
Compare
b107afa
to
ea5a1ee
Compare
Hello, I think I fixed everything but there is a hook left failing "- hook id: end-of-file-fixer" rejecting the news file. How is the news file supposed to be formatted? |
ea5a1ee
to
3e5dd90
Compare
From what I recall, that means you don't have a newline at the end of the file. |
I've done a bit more testing with various types and amount of packages to install. I am seeing more than 25% of the runtime of "pip install" is taken to do the final print statement when you reach a few hundreds packages to install. Obviously it's increasing It's amazing, I haven't seen a bug like this for years. That might genuinely make pip the third slowest package manager ever written, right behind easy_install that had a similar |
I've retitled this PR to something that more clearly reflects the change made, rather than the effect of the change. @morotti I'd like to ask you to read https://cbea.ms/git-commit/ and rephrase the commit message to follow the guidance provided there. (for other maintainers: when merging this, please either squash merge this with the commit message modified to follow the typical format or rewrite the commit message via a push yourself before merging this!) |
pip install
get_distribution("package_name") does a loop over all packages to find the intended package name. the code to display installed packages was accidentally O(N^2) for iterating over packages and calling get_distribution() on each. Resolves: pypa#12791
3e5dd90
to
d77af5a
Compare
I've read the blog article and adjusted the commit message. I hope that suits you. I notice the windows tests are failing, for reasons unrelated to this commit, and there is a format check still complaining about the news, I tried to adjust the formatting but I'm sorry to say I don't get it. Can I offer to any maintainer to make the commit and news as you expect and merge? I really don't mind having or not having the commit under my account. I would just like for fixes to be merged and make the world a faster place. |
As I said previously, you simply need to add a newline at the end of the news file. I'm afraid I don't know enough git to fix this PR in place, so it'll be simpler if you do it, sorry. |
get_distribution("package_name") does a loop over all packages to find the intended package name. the code to display installed packages was accidentally O(N^2) for iterating over packages and calling get_distribution() on each. Resolves: pypa#12791
d77af5a
to
0beffc7
Compare
All green now, this can be merged. |
Oh, and thank you for the PR! It's nice to see the low-hanging fruit for optimising pip be addressed 🙂 |
I'm sure you'll be amused to hear that the old logic was also quadratic and was similarly linearised ~6 years ago cbc21a5 :) |
Co-authored-by: Richard Si <[email protected]>
All adjusted as per the last comments. should be good to merge. I'm not sure if you squash-merge or if you expect me to squash in the branch? |
PERF: 5% faster pip install. algorithmic bug at the end of pip install in the code to display installed packages. O(n^2) to enumerate installed packages due to accidental loop in loop.
get_distribution(package_name)
does a loop over all installed packages.that is quite surprising and unexpected. it should not be used inside a loop.
I rewrote the loop to iterate with iter_all_distributions(). that's what was called under the hood.
by the way, this code to display installed packages is more complex than it ought to be. I think it was intentional to display the package name exactly as it was provided by the user.
on a pip install run that takes 11 seconds:
the loop takes 0.735960 seconds on main branch.
the loop takes 0.064672 seconds with this fix.