Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metadata backend with importlib.metadata #10709

Merged
merged 26 commits into from
Apr 15, 2022

Conversation

uranusjr
Copy link
Member

@uranusjr uranusjr commented Dec 5, 2021

This doesn’t take a lot of code because we’ve done most work upfront. There are definitely optimisations available since importlib.metadata has a pretty different implementation, but this should make the most common cases “work”.

Likely far from perfect, I’m only submitting this as draft to run a new workflow against it to see how things currently stand. Better now.

Currently the backend is only accessible via a private environment variable. Eventually I want to target the implementation to only 3.10 or later (because importlib.metadata API is quite unstable before that), expose behind the feature flag first, and slowly rolling it out to maybe 3.11 the earliest.

@uranusjr uranusjr force-pushed the metadata-importlib-backend branch 6 times, most recently from 8c7e867 to 73b236e Compare December 5, 2021 17:01
@uranusjr
Copy link
Member Author

uranusjr commented Dec 6, 2021

Some thoughts after playing with this during the weekend. The implementation we have right now actually works quite well—only two test failures remaining (discussed below) and I’d say this already works for all regular, modern usages.

The two tests are failing because importlib.metadata does not support eggs (not egg-infos; those are supported fine) and egg-links. egg-links are trivial to add since they are just symlinks, but eggs need more work. It’s still not that much of a problem (we have a reference implementation in pkg_resources, after all), but I’m wondering if it’s even worth supporting at this point. eggs are now only generated by direct setup.py install (not through pip install; pip passes flags to setuptools to build egg-info instead), or pre-shipped in ancient system environments (IIRC Debian stable still does this, not sure about Sid), so it can be argued it does not need to be re-implemented at all at this point.

Any thoughts on this? I think I can produce an implementation that passes all tests with one or two more weekends of work, although I can’t guarantee when those weekends might happen (I think I’m free next weekend but things don’t always work out).

@uranusjr uranusjr force-pushed the metadata-importlib-backend branch 6 times, most recently from 4e25f68 to ce08fc5 Compare December 12, 2021 19:55
@uranusjr
Copy link
Member Author

I added egg “support” by falling back to the old backend. I still feel we likely should deprecate and remove discovering egg presence alltogether (it’s not used anywhere in any mainstream environments, from what I know), but this should ease migration. The next step is probably to design feature flags for this; I’m thinking in the first phase we should

  • No flags: Use the old pkg_resources backend with an INFO message.
  • --use-feature=importlib-metadata: Use the new importlib.metadata backend without egg support.
  • --use-feature=importlib-metadata --use-deprecated=eggs: Use the new importlib.metadata backend with egg support, and ask the user to report the use case.

@uranusjr
Copy link
Member Author

I think this is reviewable. How the feature flags should be designed can be discussed (or even implemented) separately.

@uranusjr uranusjr marked this pull request as ready for review December 13, 2021 07:03
Copy link
Member

@pradyunsg pradyunsg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great. I've mostly skimmed the meaty parts of this PR, and the overall structure looks good to me!

I do think it'd be useful to encode some of our assumptions as assertions in this code, both because (a) it'll be something that would get checked when we run the code, (b) ensure that there's an explicit failure when these assumptions aren't valid and (c) encode these assumptions so that they're explicitly communicated rather than being in comments or whatnot. This likely only makes sense for the "opt-in-to-use" phase of this, but I do think there's still a whole lot of value in doing this with early adopters at least.

so we do this to avoid having to rewrite too many things. Hopefully we can
eliminate this some day.
"""
return getattr(d, "_path", None)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a warning message here, for when this doesn't get a PathDistribution object? Keep a note to remove this, once we decide to expose this as the default, as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function does expect non-PathDistribution objects though, and simply returns None for them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/cc @jaraco @warsaw @brettcannon for awareness about this hack.

Personally, I'm fine with this -- I don't imagine there being too many code changes in importlib-metadata, in the 3.11+ standard library. If there are, hopefully, we'll be able to deal with them in an expedient manner on pip's side. :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer that pip determine why it cares about the "location" of the metadata and ask what interfaces would be appropriate for a package not on the file system to satisfy those needs. But for now, this hack is probably acceptable. Out of curiousity, what breaks if a Distribution doesn't have a _path?

Copy link
Member Author

@uranusjr uranusjr Apr 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Distributions without _path (in this implementation) won’t be able to be uninstalled (or upgraded, because upgrade in pip is just uninstall + install). That’s the only thing pip needs from the metadata location (aside from error messages in some edge cases).

src/pip/_internal/metadata/importlib.py Outdated Show resolved Hide resolved
src/pip/_internal/metadata/importlib.py Outdated Show resolved Hide resolved
src/pip/_internal/metadata/importlib.py Outdated Show resolved Hide resolved
src/pip/_internal/metadata/base.py Outdated Show resolved Hide resolved
@github-actions github-actions bot added the needs rebase or merge PR has conflicts with current master label Feb 19, 2022
@pypa-bot pypa-bot removed the needs rebase or merge PR has conflicts with current master label Mar 11, 2022
@uranusjr
Copy link
Member Author

Anyone fancy progressing this? I have a mind to start thinking about how we can start moving off LegacyVersion after this. pkg_resources is a blocker for this since we can’t control it parsing non-PEP-440 versions.

@github-actions github-actions bot added the needs rebase or merge PR has conflicts with current master label Apr 7, 2022
@pypa-bot pypa-bot removed the needs rebase or merge PR has conflicts with current master label Apr 8, 2022
Copy link
Member

@pradyunsg pradyunsg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs a rebase, and a few code changes to pacify the linters.

@pradyunsg
Copy link
Member

pradyunsg commented Apr 8, 2022

This looks good to me, based on a desk review. Let's fix the linters and see what the CI says.

Comment on lines +30 to +38
@functools.lru_cache(maxsize=None)
def select_backend() -> Backend:
if os.environ.get("_PIP_METADATA_BACKEND_IMPORTLIB"):
from . import importlib

return cast(Backend, importlib)
from . import pkg_resources

return cast(Backend, pkg_resources)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a note for the future -- this is intended to be a temporary measure. We'll swap this out for a more "sane" selection logic in the future.

@github-actions github-actions bot added the needs rebase or merge PR has conflicts with current master label Apr 9, 2022
@warsaw
Copy link

warsaw commented Apr 11, 2022

I'm not seeing any documentation on this change. Am I missing it or is it all internal?

@uranusjr
Copy link
Member Author

uranusjr commented Apr 11, 2022

It’s internal and experiemental at this point. Also there should be no user-facing behavioural changes (except bugs) after we switch to this. A change log will be added when we make the switch.

This matches pkg_resources's behavior. There are various places in the
code base that assumes only one entry is returned for a package name,
and not doing that would potentially cause a distribution in lower
precedence path to override a higher precedence one, if the caller is
not careful.

Eventually we probably want to make it possible to see lower precedence
installations as well (it's useful), but this is not the time.
This replaces importlib.metadata's parser and allows us to "properly"
normalize extras as we need. It is not wrong for importlib.metadata to
not normalize extras --- if extra normalization is standardized
properly, packaging.markers should instead implement 'evaluate()' to
properly normalize on comparison, instead of just doing a naive string
equality check. Unfortunately, no-one has made a concrete effort to make
that happen yet, so pip needs to do what it needs to do to keep things
working.
A distutils installation is always "flat" (not in e.g. egg form), so
if this distribution's info location is NOT a pathlib.Path (but e.g.
zipfile.Path), it can never contain any distutils scripts.
@pradyunsg pradyunsg added the type: enhancement Improvements to functionality label Apr 15, 2022
@pradyunsg pradyunsg merged commit c6e274e into pypa:main Apr 15, 2022
@uranusjr uranusjr deleted the metadata-importlib-backend branch April 15, 2022 17:50
@github-actions github-actions bot locked as resolved and limited conversation to collaborators May 1, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
type: enhancement Improvements to functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants