Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve interop with importlib.metadata #195

Merged
merged 5 commits into from
Sep 29, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions src/pyproject_hooks/_in_process/_in_process.py
Original file line number Diff line number Diff line change
Expand Up @@ -106,6 +106,18 @@ def find_spec(self, fullname, _path, _target=None):

return spec

def find_distributions(self, context=None):
# Delayed import: Python 3.7 does not contain importlib.metadata
# If this method is being called it must be because
# `importlib.metadata`/`importlib_metadata` is available.
try:
from importlib_metadata import DistributionFinder, MetadataPathFinder
except ImportError:
from importlib.metadata import DistributionFinder, MetadataPathFinder
Comment on lines +109 to +116
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we only define this if importlib.metadata is available?

I'm mildly concerned about importing importlib_metadata from here since pip would need to modify its vendored copy of pyproject-hooks to not import an installed module (we've had issues due to attempting to do this in the past, usually resulting in breakage of pip because the user somehow ended up with a bad copy of said package in their environment and the user needing to understand the nuances + details of how stuff works to fix their environment).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @pradyunsg here. Pip would need to patch this out.

Can we not simply do a version check? Or even just prefer the stdlib version always? That way, pip (which only supports Python 3.8+) would never pick up importlib_metadata, avoiding any problems. Is there any justification for using importlib_metadata in preference to the stdlib implementation, when the latter is present?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any justification for using importlib_metadata in preference to the stdlib implementation, when the latter is present?

IIRC, importlib.metadata was added to the stdlib, but then had significant API changes in Python 3.10, and importlib_metadata made those changes available to previous Python versions. So various tools using it prefer the backport over the stdlib module.

I also don't really like the idea that this will do different things depending on whether a separate package is installed or not. Maybe we should pretend that importlib.metadata was only added from Python 3.10 and do a version check?

(I remember this backporting causing issues before as well, though I don't recall which project that was in. I think for most stdlib modules doing backports like this would have been fine, but because importlib.metadata defines an interface between third party packages, the packages using it need to agree on whether they use the backport or not. With the benefit of hindsight, I think once it was in the stdlib, backports should have stopped, so everyone just used the stdlib version, even if that meant we had to wait longer to use new APIs.)

Copy link
Contributor Author

@abravalheri abravalheri May 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, unfortunately importlib_metadata is not 100% backward compatible with importlib.metadata.

In general, I find that the following works relatively fine:

  • using importlib_metadata.DistributionFinder, MetadataPathFinder to implement find_distrubitions, but then using importlib.metadata high-level APIs

And the following does not work:

  • using importlib.metadata.DistributionFinder, MetadataPathFinder to implement find_distrubitions, but then using importlib_metadata high-level APIs

(Of course that depends on the exact functionalities being used by all the other packages in the virtual environment)

That is the main reason why I preferred this order.
But as discussed in #195 (comment), I have no personal problem in exchanging the order in the try...except... if that is a better approach. Or even just using importlib.metadata all the time in Python 3.8+. Please let me know and I will proceed accordingly.


Can we not simply do a version check? Or even just prefer the stdlib version always? That way, pip (which only supports Python 3.8+) would never pick up importlib_metadata, avoiding any problems.

I believe that preferring the stdlib version always (or not attempting to use importlib_metadata at all) will have the same limitations.

An anecdote about that:

Think about abravalheri/setuptools#12 and the patch necessary to make it work. What happens there is that the except branch is executed and importlib.metadata is always used.
However setuptools uses its own vendored version of importlib_metadata high-level API which causes the incompatibility documented in python/importlib_metadata#486. The good side is that, if setuptools was not forced to vendor dependencies and use importlib_metadata directly, no patch would be necessary.

In fact any package using importlib_metadata is always subject to these problems as mentioned by Thomas in #195 (comment).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, unfortunately importlib_metadata is not 100% backward compatible with importlib.metadata.

That's a problem for importlib_metadata/importlib.metadata IMO. But it's been discussed elsewhere and I don't want to reiterate old arguments here.

For this PR, the main point is that we do not want a situation where pip's behaviour can change as a result of the user installing importlib_metadata (whether it's a pristine copy, or some patched nightmare that a distro, in their wisdom, has created). That's something we try very hard to avoid in pip, because it has caused significant issues for us in the past - it's at the root of why our policy on debundling is basically "please don't, but if you must, it's your problem to deal with".

Pip doesn't support Python 3.7, so I'm happy with any solution that only uses the stdlib importlib.metadata for Python 3.8+. At worst, it'll be pip patching this library when we vendor it, so nothing has to change here. But I'd expect other users of this library, like build, to have similar issues, and so I think it's worth discussing a solution that isn't pip-specific. But if the conclusion is "we don't want to bother about that", then that's OK - pip will patch it.

Copy link
Contributor Author

@abravalheri abravalheri May 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am happy to go with importlib.metadata always. That was more similar to my initial implementation.

But I'd expect other users of this library, like build, to have similar issues, and so I think it's worth discussing a solution that isn't pip-specific.

I believe that build also uses the same approach of preferring importlib_metadata and falling back to importlib.metadata (for Python < 3.10.2). In this sense, the current implementation of this PR may be more compatible with build.

Copy link
Contributor Author

@abravalheri abravalheri May 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will open a separate PR with that approach so the maintainers can choose which one they prefer and close the other one: #199


context = DistributionFinder.Context(path=self.backend_path)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this line be in an if context is None block so that passing a context in works correctly? Or perhaps it should intersect the search path from a passed in context with its own self.backend_path?

Copy link
Contributor Author

@abravalheri abravalheri Apr 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if the passed context will ever contain the backend_path, given that the importlib.metadata does not know about it1.

My intuition is that we can simply discard the given context, and other finders will handle it.

Footnotes

  1. at least for the current state of the implementation, in which backend_path is not added to sys.path.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see, it defaults to sys.path if no other path is passed. And neither of the places you can pass something else - Distribution.discover() or distributions() - are mentioned in the docs. So I think you're probably right that we can get away with ignoring the context argument.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, this looks right to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jaraco is there any other optional part that needs to be implemented from importlib.metadata, importlib.resources point of view?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is all that's needed for metadata. I'm unsure if resources need an explicit hook. I'm 90% sure that because the __spec__ is a PathFinder, it will have resources support.

return MetadataPathFinder.find_distributions(context=context)


def _supported_features():
"""Return the list of options features supported by the backend.
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
Name: _test_bootstrap
Version: 0.0.1
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
[_test_backend.importlib_metadata]
hello = world
7 changes: 7 additions & 0 deletions tests/samples/pkg_intree_metadata/backend/intree_backend.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
from importlib.metadata import distribution


def get_requires_for_build_sdist(config_settings):
dist = distribution("_test_bootstrap") # discovered in backend-path
ep = next(iter(dist.entry_points))
return [ep.group, ep.name, ep.value]
3 changes: 3 additions & 0 deletions tests/samples/pkg_intree_metadata/pyproject.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
[build-system]
build-backend = 'intree_backend'
backend-path = ['backend']
11 changes: 11 additions & 0 deletions tests/test_inplace_hooks.py
Original file line number Diff line number Diff line change
Expand Up @@ -89,6 +89,17 @@ def test_intree_backend_loaded_from_correct_backend_path():
assert res == ["intree_backend_called"]


def test_intree_backend_importlib_metadata_interoperation():
pytest.importorskip("importlib.metadata")

hooks = get_hooks("pkg_intree_metadata", backend="intree_backend")
assert hooks.get_requires_for_build_sdist({}) == [
"_test_backend.importlib_metadata",
"hello",
"world",
]


def install_finder_with_sitecustomize(directory, mapping):
finder = f"""
import sys
Expand Down
Loading