Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use protocols instead of importlib.abc.Loader/MetaPathFinder/PathEntryFinder #11890

Merged
merged 9 commits into from
May 12, 2024

Conversation

abravalheri
Copy link
Contributor

@abravalheri abravalheri commented May 10, 2024

As mentioned in #11882, #11541, #2468 most of the APIs related to importlib.abc.Loader/MetaPathFinder/PathEntryFinder were originally designed to handle what we call nowadays "Protocols".

Checking for actual subclasses lead to inconsistencies and false positives/negatives. The issues mentioned above bring evidence of these problems.

This PR proposes using Protocols and introduces _typeshed.importlib to house them.

The rationale behind adding _typeshed.importlib is that these protocols are implicitly defined in the standard library and documentation and there are projects out there that would benefit from sharing them so that they can also be used to when defining APIs1. I believe it is the same reasoning behind other members of _typeshed.

I have used the suffix Protocol just as a mean of distinguishing them better from the regular abcs. I usually don't like this (it is a kind of Hungarian notation variant...), but I think it is justifiable to avoid confusion.

Closes #11882
Closes #11541
Closes #2468, but not take into consideration Python 2 or deprecated methods.

Footnotes

  1. I can confirm that least setuptools/pkg_resources would benefit a lot from sharing the protocols. See discussion in https://github.com/pypa/setuptools/pull/4246/files#r1593000435.

This comment has been minimized.

def load_module(self, fullname: str, /) -> ModuleType: ...

class MetaPathFinderProtocol(Protocol):
def find_spec(self, fullname: str, path: Sequence[str] | None, target: ModuleType | None = ..., /) -> ModuleSpec | None: ...
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure about making it positional only arguments... but I copied the definitions from stdlib/sys/__init__.pyi, stdlib/importlib/abc.pyi, stdlib/types.pyi.

Maybe it is better just to remove the /?

Copy link
Member

@JelleZijlstra JelleZijlstra May 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Protocols should usually have positional-only parameters, because otherwise implementations need to use the exact same parameter names. Use positional-or-keyword parameters only if consumers of the protocol pass keyword arguments.

Copy link
Contributor Author

@abravalheri abravalheri May 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see, that makes a lot of sense! Thank you very much for the clarification.

In accordance to that, I also introduced 6cddd52 to make the arguments in PathEntryFinderProtocol positional only.

This comment has been minimized.

This comment has been minimized.

@abravalheri abravalheri marked this pull request as ready for review May 10, 2024 13:28

__all__ = ["LoaderProtocol", "MetaPathFinderProtocol", "PathEntryFinderProtocol"]

class LoaderProtocol(Protocol):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should be able to replace types._LoaderProtocol with this as well.

Copy link
Contributor Author

@abravalheri abravalheri May 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replacing that in types.pyi would create an import loop between types.pyi and _typeshed/importlib.pyi. Is that OK to do for type stubs?

I can do the replacement in the pkg_resources stub without problems.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Loops are not a problem in stubs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the clarification, I introduced this change in 7075e5e.

This comment has been minimized.

This comment has been minimized.

@layday
Copy link
Contributor

layday commented May 11, 2024

If these ABCs do not need to be implemented at runtime, can't we just pretend that they are protocols in the typeshed, similar to how it's done for collections.abc?

@AlexWaygood
Copy link
Member

If these ABCs do not need to be implemented at runtime, can't we just pretend that they are protocols in the typeshed, similar to how it's done for collections.abc?

We do this for multiple other simple ABCs in the stdlib that are "protocol-like" in typeshed as well, such as os.PathLike and contextlib.AbstractContextManager

@abravalheri
Copy link
Contributor Author

If these ABCs do not need to be implemented at runtime, can't we just pretend that they are protocols in the typeshed, similar to how it's done for collections.abc?

I don't think I understood this question...

The reason why I added in _typeshed is because I think they are needed by community packages to specify proper types signatures is various APIs. But I don't know if that was the question.

@layday
Copy link
Contributor

layday commented May 11, 2024

It looks like these parallel protocols are slimmed-down versions of the ABCs. IIUC, this PR is meant to address two separate issues: (1) importlib ABCs don't need to be subclassed, and (2) some of their methods are optional and the ABCs do in fact implement these.

@abravalheri
Copy link
Contributor Author

If these ABCs do not need to be implemented at runtime, can't we just pretend that they are protocols in the typeshed, similar to how it's done for collections.abc?

I don't think I understood this question...

The reason why I added in _typeshed is because I think they are needed by community packages to specify proper types signatures is various APIs. But I don't know if that was the question.


But on the other hand, if the question means something like: "why not make importlib.abc.Loader inherit from Protocol in the typeshed", I guess that is probably because protocols don't have support for optional methods?

@abravalheri
Copy link
Contributor Author

It looks like these parallel protocols are slimmed-down versions of the ABCs. IICU, this PR is meant addresses two separate issues: (1) importlib ABCs don't need to be subclassed, and (2) some of their methods are optional and the ABCs do in fact implement these.

I am not sure about this. I think that within the stdlib the optional methods are used. These ABCs are at the top of a hierarchy of many loaders/finders which use inheritance to share code.

@layday
Copy link
Contributor

layday commented May 11, 2024

The optional methods are no-ops on the ABCs in the stdlib. Either way, I don't have anything more to suggest here, but I thought clarifying what the intention is with these changes might help.

@abravalheri
Copy link
Contributor Author

abravalheri commented May 11, 2024

clarifying what the intention is with these changes might help.

The original intention of writing this PR is to solve #11882, #11541, #2468.

These issues are caused because when the stubs were written in typeshed there was a likely misinterpretation of the loader/finders PEP and documentations of importlib. The ABCs were never meant to be enforced, but rather informational.


Because the ABCs are at the core of some CPython inheritance chain, I think it is just easier to leave them alone, no?

Also, I think it is conceptually cleaner for third party packages to use _typeshed.importlib.*Protocol when they actually mean protocol.

@layday
Copy link
Contributor

layday commented May 11, 2024

These issues are caused because when the stubs were written in typeshed there was a likely misinterpretation of the loader/finders PEP and documentations of importlib. The ABCs were never meant to be enforced, but rather informational.

I'm not sure I understand what not "meant to be enforced" means - are you saying they don't need to be suclassed? Then they can be safely converted to protocols in the typeshed and you'd have the option to type check that any random object satisfies the protocol in full. We should of course keep the slimmed-down protocols since there's no way to mark a protocol member as being optional. However, they shouldn't be synonymous with their parent ABCs to avoid confusion.

Because the ABCs are at the core of some CPython inheritance chain, I think it is just easier to leave them alone, no?

I don't think it matters to typeshed users if they are subclassed inside of the stdlib, but I might be misunderstanding.

@abravalheri
Copy link
Contributor Author

abravalheri commented May 11, 2024

I'm not sure I understand what not "meant to be enforced" means - are you saying they don't need to be suclassed?

No, that is not what I was saying.

APIs that rely on finder/loaders don't need the finders and loaders to be subclasses of the ABCs, they just need them to comply with the protocol.

However, it is a fact that the ABCs are subclassed somewhere (we can verify that empirically, then it is only logical that there is a need for subclassing). Several classes in importlib.machinery, importlib.util subclass them - that is the evidence for the "need".


It is getting very hard to understand each other I am afraid 😅.

If you would like to open an alternative PR with the ideas that you are exposing, maybe that would be a good approach forward?

I am very happy to back off from this PR if anyone has a better implementation.

Copy link
Collaborator

@Avasam Avasam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am personally perfectly happy with these changes.

On the question of making the original ABCs into protocols in typeshed, I'll leave that up to other maintainers, as it may affect more than I realize. As @AlexWaygood mentioned, there's precedence to do so. But I'm not sure how the optional methods should be handled then, and I like the current separation between the original ABC and a type-only Protocol built after it.

@layday
Copy link
Contributor

layday commented May 11, 2024

APIs that rely on finder/loaders don't need the finders and loaders to be subclasses of the ABCs, they just need them to comply with the protocol.

We are in agreement.

However, it is a fact that the ABCs are subclassed somewhere (we can verify that empirically, then it is only logical that there is a need for subclassing). Several classes in importlib.machinery, importlib.util subclass them - that is the evidence for the "need".

At the risk of sounding presumptuous, I'll just note that protocols are themselves subclassable and doing so is often beneficial because it allows type checkers to verify their implementation. At runtime, Protocols are generalised ABCs. Why would the importlib ABCs being subclassed in the stdlib matter at all?

If you would like to open an alternative PR with the ideas that you are exposing, maybe that would be a good approach forward?

What I'm proposing is fairly straightforward: convert these 3 ABCs to protocols. If we can't agree that's a good thing, there's hardly any point in opening a PR for it.

I'm not trying to "challenge" your PR - I was confused initially, thinking the protocols were identical to the ABCs. It wasn't clear from the description that that wasn't the case; I tried to help by pointing that out and restating the problem the PR's intending to solve. I've also made two suggestions, that I don't mind if you chose to ignore.

@@ -359,7 +360,7 @@ def evaluate_marker(text: str, extra: Incomplete | None = None) -> bool: ...
class NullProvider:
egg_name: str | None
egg_info: str | None
loader: types._LoaderProtocol | None
loader: LoaderProtocol | None
Copy link
Collaborator

@Avasam Avasam May 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@abravalheri Sorry I just noticed. But you won't be able to use new typeshed symbols in 3rd-party stubs in the same PR. Until the next mypy & pyright release, as they'll need to first include the new stdlib changes from typeshed.

Since this is the only 3rd party stub affected, you can simply duplicate the protocol here for now.

pyright is released quite often. So realistically this just means waiting for mypy for a follow-up PR. I can open a PR to update stubs/setuptools/pkg_resources/__init__.pyi as soon as this one is merged. And keep it on hold until the next version of mypy. (anyway I'm trying to keep both the stubs and setuptools updated as I'm adding first-party annotations and fixing typeshed stubs)

Copy link
Contributor Author

@abravalheri abravalheri May 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, wasn't this part the suggestion in #11890 (comment)? (or at least that is how I interpreted the comment and then I went ahead to implement the change as a way of addressing it)

If types._LoaderProtocol has to be maintained so that 3rd-party stubs can use them, then it makes almost no difference to replace types._LoaderProtocol because it is only used in 2 places: internally in types and in pkg_resources... We should probably just simply revert 7075e5e and 45b7a5c.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But you won't be able to use new typeshed symbols in 3rd-party stubs in the same PR

Is this documented in the CONTRIBUTING guide? Sorry I might have missed that part.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can replace all usages of types._LoaderProtocol in stdlib in this PR. Just not for 3rd-party stubs. Otherwise uses of the most recent types-setuptools will have a reference to _typeshed.importlib.LoaderProtocol in their stub that doesn't exist yet.

types._LoaderProtocol should still exist until mypy updates their vendored typeshed stdlib stubs (which is done every version).

When I made that comment, I didn't think about the usage in setuptools stubs. Since it's only used there and in types, it might be cleaner to just revert as you're suggesting. And to remove types_LoaderProtocol along the follow-up PR that will use _typeshed.importlib.LoaderProtocol in setuptools stubs.

Sorry for the "flip-flopping" here ^^"

Is this documented in the CONTRIBUTING guide? Sorry I might have missed that part.

I don't think so, but it probably should, now that you mention it. It's rare, but we've been bitten by it twice (that I can remember) in the past.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't look like this comment from Avasam was addressed before the PR was merged — I think our setuptools stubs are now importing a protocol from _typeshed which, from mypy's perspective, doesn't exist yet. I think we need to change this so that the LoaderProtocol definition is temporarily duplicated in our setuptools stubs, rather than being imported from _typeshed, until a version of mypy is released that includes this protocol in its vendored stdlib stubs

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AlexWaygood
Copy link
Member

I didn't realise the ABCs had optional methods that were omitted from the protocols being added here -- in that case, this approach makes sense. My bad for not looking more closely; apologies!

@abravalheri
Copy link
Contributor Author

abravalheri commented May 12, 2024

I'm not trying to "challenge" your PR - I was confused initially, thinking the protocols were identical to the ABCs. It wasn't clear from the description that that wasn't the case; I tried to help by pointing that out and restating the problem the PR's intending to solve. I've also made two suggestions, that I don't mind if you chose to ignore.
(in #11890 (comment))

That is OK, I never took it as a challenge. Thank you very much for taking the time to have a look on this BTW. My comment was sincerely in the spirit that if we are having problem to understand each other, maybe having another PR showcasing exactly what you mean in the code would be the best way for communicating (instead of keep discussing and asking back and forth). I just wanted to make clear that I was completely OK with that.


I tried to help by pointing that out and restating the problem the PR's intending to solve.

I am sorry, when I read #11890 (comment) that actually confused me more 😅. I don't think about the original intent of the PR as something that matches the description in the comment. My problem is that using these ABCs in type signatures of other functions doesn't seem like the right thing to do (the issues linked have evidence for that).

What I'm proposing is fairly straightforward: convert these 3 ABCs to protocols. If we can't agree that's a good thing, there's hardly any point in opening a PR for it.

Now I might have started to understand with more clarity what you are saying...

Personally, I think that it is better to keep these things separated (but I am also fine to do otherwise if the maintainers request). This opinion is biased by the fact that in the past I have been bitten by little innocent/convenient "lies" in type stubs and now I think that they all come back to bite you.

What we know so far:

  1. There are a couple of implicit protocols in importlib that have not been formally/publicly expressed in a way that allows sound type hints given the way the Python type system works.
  2. importlib.abc.* has a bunch of classes that are a mix of informational/examples/docs and code sharing by inheritance.

We probably don't need to mix them up (well, mixing them up is what got us here in the first place..).


I've also made two suggestions,

Could you please clarify what are the suggestions other than converting the ABCs into protocols? I might have lost them in the middle of the discussion...

Copy link
Contributor

According to mypy_primer, this change has no effect on the checked open source code. 🤖🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
7 participants