-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
importlib: Read distribution name/version from metadata directory names, if possible #12656
Conversation
ddd7dba
to
2ddbd85
Compare
@uranusjr I know you contributed the importlib metadata backend, so I'd appreciate it if you could you review this! Do you think wheels with invalid metadata directories are frequent enough in the wild to render this untenable? Thanks! |
The reason we didn’t do this is because some people rely on the “real” name they gave to the package, not the canonical name. This is an artistic choice that they value very high. We can probably use the version (since |
Hmm, yeah that's a good point. However, I do think the optimisation is still safe here as the finder already normalises the distribution names (it doesn't use pip/src/pip/_internal/metadata/importlib/_envs.py Lines 61 to 71 in 3f3bc60
If I remove the optimization from I'll take a closer look at the Version attribute and try to find ways it would lead to undesirable behaviour sometime later this week. |
I believe we use the raw name in some places, such as in |
@uranusjr sorry I should've clarified that |
0c846cb
to
137fb7e
Compare
I was profiling pip and I stumbled on a clearer profile to sell the benefit of this optimisation. Reinstalling a local wheel without dependencies spends a lot of time (barring imports) within
There are five calls to |
…e, if possible importlib does not cache metadata in-memory, so querying even simple attributes like distribution names and versions can quickly become expensive (as each access requires reading METADATA). Fortunately, `Distribution.canonical_name` is optimized to parse the metadata directory name to query the name if possible. This commit extends this optimization to the finder implementation and version attribute. .egg-info directory names tend to not include the version so they are not considered for optimizing version lookup. simplewheel-2.0-1-py2.py3-none-any.whl had to be modified to rename the .dist-info directory which mistakenly included the wheel build tag (in violation of the wheel specification). simplewheel/__init__.py simplewheel-2.0-1.dist-info/DESCRIPTION.rst simplewheel-2.0-1.dist-info/metadata.json simplewheel-2.0-1.dist-info/top_level.txt simplewheel-2.0-1.dist-info/WHEEL simplewheel-2.0-1.dist-info/METADATA simplewheel-2.0-1.dist-info/RECORD Otherwise, it was mistaken for part of the version and led pip to think the wheel was a post-release, breaking tests...
137fb7e
to
d247c1e
Compare
Co-authored-by: Tzu-ping Chung <[email protected]>
Thank you for the review @pradyunsg and @uranusjr! |
importlib does not cache metadata in-memory, so querying even simple attributes like distribution names and versions can quickly become expensive (as each access requires reading
METADATA
). Fortunately,Distribution.canonical_name
is optimized to parse the metadata directory name to query the name if possible. This commit extends this optimization to the finder implementation and version attribute..egg-info
directory names tend to not include the version so they are not considered for optimizing version lookup.simplewheel-2.0-1-py2.py3-none-any.whl
had to be modified to rename the.dist-info
directory which mistakenly included the wheel build tag (in violation of the wheel specification).Otherwise, it was mistaken for part of the version and led pip to think the wheel was a post-release, breaking tests...
This caught my eye when I was profiling the performance of various commands.
iter_all_distributions()
was often a noticeable chunk of the profile. For example, readingMETADATA
is responsible for 15% of the totalpip check
runtime in my Python 3.11.7 environment with 66 packages installed.cProfile graph
With this patch, the percentage decreases to 5% (only reading
Requires-Dist
now requires expensive file IO)cProfile graph