Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What to do when the runtime lies? #11141

Closed
tungol opened this issue Dec 10, 2023 · 8 comments
Closed

What to do when the runtime lies? #11141

tungol opened this issue Dec 10, 2023 · 8 comments
Labels
project: policy Organization of the typeshed project

Comments

@tungol
Copy link
Contributor

tungol commented Dec 10, 2023

I've been cleaning up inheritance mismatches. I've found that several modules with a C implementation fudge the module name of their classes. For example:

ReferenceType is a class defined in the _weakref C implementation, and then imported from there into the weakref module. Because it's a C implementation, it's free to declare it's own qualified name, and calls itself weakref.ReferenceType: https://github.com/python/cpython/blob/main/Objects/weakrefobject.c#L368C1-L368C1

The intention, I believe, being that this class canonically lives in the weakref module, and the fact that it's being defined in _weakref is an implementation detail which is hidden at runtime. Typeshed declares the class in _weakref.pyi, which means it gets a qualified name of _weakref.ReferenceType

I can see two possible arguments:

  • Typeshed should continue to follow the layout of implementation over the declared implementation. e.g. ReferenceType is defined in _weakref originally, so that's the correct place for it.

OR

  • Typeshed should follow the declared implementation over the layout. e.g. ReferenceType considers itself to live in the weakref module, that's where the majority of users will get it from, so that's the correct place for it.

This pattern pops up in a lot of places: in _ast, _decimal, and _sqlite3 are some of the others I've noticed. I suspect the current state of typeshed is inconsistent on this issue, since not all modules that have a C implementation currently have a _foo.pyi stub file.

Is there a consensus among typeshed's maintainers here? I'm a little biased towards option 2 at the moment, since I'm trying to make as many inheritances match as I can, but I think there's a good argument for either side, and this is, in the end, a very minor issue with not much practical effect.

@srittau
Copy link
Collaborator

srittau commented Dec 11, 2023

Generally we try to follow the actual implementation, until there is a good reason not to. Typing private modules is "better", but it was often neglected, because practically speaking there isn't too much difference, and is slightly more work.

@tungol
Copy link
Contributor Author

tungol commented Dec 11, 2023

I think what I mean is that it's not obvious what "follow the implementation" means here, because different parts of the implementation are conflicting with each other.

If we have:

_weakref.pyi

class ReferenceType: ...

weakref.pyi

from _weakref import ReferenceType as ReferenceType

Then we have followed the import structure accurately, but ReferenceType has a different qualname in typeshed and at runtime.

If we have instead:

_weakref.pyi

from weakref import ReferenceType as ReferenceType

weakref.pyi

class ReferenceType: ...

Now ReferenceType has a matching qualname in typeshed and at runtime, but we've had to reverse the import structure in order to do that. I'd make the case that the second version follows the implementation a little better, since the qualname is visible to mypy/stubcheck, while (I don't think) the import structure really is. I might be wrong about that?

At any rate, I wasn't suggesting that we shouldn't make these available in the private module namespace.

@hauntsaninja
Copy link
Collaborator

This is a great question. I don't have a strong opinion, but I lean towards:

Typeshed should follow the declared implementation over the layout. e.g. ReferenceType considers itself to live in the weakref module, that's where the majority of users will get it from, so that's the correct place for it.

I have this preference for two reasons:

  • It results in better user facing diagnostics
  • It better reflects the truth in alternative Python implementations or cases where C accelerator modules are missing

This is also why in the "various inheritance" patch from #3968 I have a few cases of special casing underscore prefixed modules and classes: python/mypy@master...hauntsaninja:mypy:stubtestbaseclass#diff-55639a1f38efeacbb10988649399037c13ad11e9c82890b453d98ea2bed9caf8R267

tungol added a commit to tungol/typeshed that referenced this issue Dec 14, 2023
The classes imported from the _ast module are defined in C,
but set their __module__ to ast. This arrangement ensures that
the type stubs have the same.

related to python#3968
and the disccusion in python#11141
tungol added a commit to tungol/typeshed that referenced this issue Dec 14, 2023
The classes imported from the _ast module are defined in C,
but set their __module__ to ast. This arrangement ensures that
the type stubs have the same.

related to python#3968
and the discussion in python#11141
tungol added a commit to tungol/typeshed that referenced this issue Dec 14, 2023
This improves fidelity of naming and inheritance on 3.11+

related to python#3968 and python#11141
tungol added a commit to tungol/typeshed that referenced this issue Dec 15, 2023
This matches the name reported by the cass at runtime.

related to python#11141
tungol added a commit to tungol/typeshed that referenced this issue Dec 15, 2023
This matches the name reported by the cass at runtime.

related to python#11141
tungol added a commit to tungol/typeshed that referenced this issue Dec 16, 2023
This aligns with the implementation while giving greater fidelity
to runtime naming and inheritance

Related to python#3968 and python#11141
@tungol
Copy link
Contributor Author

tungol commented Dec 17, 2023

I made several MRs along these lines, but then I discovered that the _pydecimal module changes its __name__ attribute, which I hadn't though of as an option: https://github.com/python/cpython/blob/d91e43ed7839b601cbeadd137d89e69e2cc3efda/Lib/_pydecimal.py#L151

Which got me thinking: what if mypy could detect cases where a module level __name__ = "foo" is set in the stubs, and set the name appropriately? Then we could follow both parts of the implementation, layout and naming.

I'm willing to take a run at that MR, if people think it's a reasonable idea.

@tungol
Copy link
Contributor Author

tungol commented Dec 22, 2023

As it turned out, classes which set their own __module__ attribute were a lot easier to handle than modules which set their own __name__ attribute. I have a draft implementation for that here: python/mypy#16698

With that, typeshed could set __module__ on classes which claim to be from a different module, and stubcheck could perform more accurate comparisons with runtime.

@Akuli
Copy link
Collaborator

Akuli commented Jan 3, 2024

There are a few pull requests that add missing _foo modules (#11187 #11155 #11159 #11174). What problem does that solve? In other words, when does a typeshed user want stubs for a _weakref or _io or _ssl module?

If I understood correctly, we decided to do this (which IMO is a good idea for several reasons):

e.g. ReferenceType considers itself to live in the weakref module, that's where the majority of users will get it from, so that's the correct place for it.

And if you have everything you'll ever need in weakref, then why do you need _weakref?

Note that I'm talking about adding _foo modules for completeness. I'm not saying that we should delete existing _foo stubs. For example, _tkinter stubs contain various things that aren't exposed in tkinter, but can be useful when working with tkinter.

@tungol
Copy link
Contributor Author

tungol commented Jan 3, 2024

I created those while working on naming and inheritance - those all contain something which is a parent class of something else, so adding them brings the inheritance structure in typeshed closer to what it looks like at runtime. In other cases, it's a naming issue: the mod.Foo type calls itself _mod.Foo at runtime, and adding the _mod module lets typeshed align names that way.

It's a fair question whether that's justified in all cases or not. But that's not actually the subject of this ticket. Note that typeshed already has _weakref, and aligning the naming there would mean moving weakref.ReferenceType out of _weakref (where it's currently defined by both typeshed and runtime) and into weakref (where it will match the name given by runtime).

For another example and one more on subject: I've been investigating the issue of decoupling collections.abc from it's typing aliases (#6257). Runtime defines these classes in _collections_abc.py rather than collections/abc.py so that they can be imported without incurring the performance hit of also importing collections/__init__.py, but then sets __name__ = "collections.abc" in that file so that the various clases call themselves collections.abc.Foo instead of _collections_abc.Foo.

Typeshed has the same performance concerns; If we define them in collections.abc, that's an increase in the number of files involved in the circular dependency cycle that contains typing and builtins. I haven't tested with collections.abc, but making sure that that mypy can resolve that dependency cycle fast enough was already something I had to pay attention to while doing some tests that move these classes from typing.pyi into _collections_abc.pyi. So I think defining them in _collections_abc.pyi but adjusting their naming with either the module-level __name__ = directive or a class-level __module__ = directive is the best option.

@JelleZijlstra
Copy link
Member

I agree with @hauntsaninja above and nobody has forced disagreement, so let's go with that.

I think that also aligns with my thinking as a CPython maintainer. For example, typing.Generic conceptually lives in the typing module, even if (as an implementation detail) we get it there by way of _typing.

JelleZijlstra added a commit that referenced this issue Oct 2, 2024
This matches the name reported by the cass at runtime.

related to #11141

Co-authored-by: Jelle Zijlstra <[email protected]>
JelleZijlstra added a commit that referenced this issue Oct 2, 2024
This improves fidelity of naming and inheritance on 3.11+

related to #3968 and #11141

Co-authored-by: Jelle Zijlstra <[email protected]>
JelleZijlstra pushed a commit that referenced this issue Oct 5, 2024
This aligns with the implementation while giving greater fidelity
to runtime naming and inheritance

Related to #3968 and #11141
JelleZijlstra pushed a commit that referenced this issue Oct 5, 2024
The classes imported from the _ast module are defined in C,
but set their __module__ to ast. This arrangement ensures that
the type stubs have the same.

related to #3968
and the discussion in #11141
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
project: policy Organization of the typeshed project
Projects
None yet
Development

No branches or pull requests

5 participants