Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

protobuf type stubs obscure other imports from google.* #5800

Closed
nullie opened this issue Jul 23, 2021 · 29 comments
Closed

protobuf type stubs obscure other imports from google.* #5800

nullie opened this issue Jul 23, 2021 · 29 comments

Comments

@nullie
Copy link

nullie commented Jul 23, 2021

See this issue: microsoft/pyright#2113

@srittau
Copy link
Collaborator

srittau commented Jul 23, 2021

We need to mark this package as "partial" by adding a py.typed file that contains the string partial\n (PEP 561). We need to add this functionality to the stub uploader, and also add a field to METADATA.toml.

@romanofski
Copy link
Contributor

@srittau does this need to be fixed first before adding any more packages in the google namespace? I'd be happy to give it a crack. It seems easy to knock off. Would it be possible to point me to the stub uploader code? I've grepped through the source code and can't see where the magic happens :)

@JelleZijlstra
Copy link
Member

It's in https://github.com/typeshed-internal/stub_uploader. Probably https://github.com/typeshed-internal/stub_uploader/blob/main/scripts/build_wheel.py is what needs to change.

@romanofski
Copy link
Contributor

I spend some time on this today. Apart from the weird problem I ran with pyright, I've found that the way we currently package those namespace packages leads to problems when uninstalling them.

E.g. if I install types-protobuf and an additional google namespace package like types-google-cloud-ndb from #5821 they'll clobber each others contents in google-stubs. Perhaps the METADATA.toml isn't of significance to Python (I don't know), but at the point someone will uninstall at least one package, it'll wipe out one of the __init__.pyi files leading to import problems.

Now I wonder whether any google namespace stub packages will have to be packaged together to avoid such a scenario. Or they will have to share a common directory under stubs. Just spitballing here what a possible solution could look like.

@srittau
Copy link
Collaborator

srittau commented Aug 8, 2021

It seems, google-cloud-ndb depends on protobuf, so we could just make the types package depend on each other like this. This way, protobuf has the "common" files, and google-cloud-ndb doesn't.

@nullie
Copy link
Author

nullie commented Aug 8, 2021

Maybe there should be a separate namespace package on which other packages would depend

@srittau
Copy link
Collaborator

srittau commented Aug 9, 2021

Reading PEP 561 again, I'm not sure, partial will do the trick:

Many stub packages will only have part of the type interface for libraries completed, especially initially. For the benefit of type checking and code editors, packages can be "partial". This means modules not found in the stub package SHOULD be searched for in parts four and five of the module resolution order above, namely inline packages and typeshed.

Steps 4 and 5 are inline packages and typeshed, respectively. This means that partial can't be used to merge two third-party packages, if I understand correctly. Maybe @ethanhs can clarify.

@jakebailey
Copy link
Contributor

jakebailey commented Aug 9, 2021

That's my personal understanding (from our implementation in pyright/Pylance); the spec doesn't really allow for more than one partial package at a time (especially given that they must have py.typed at the root, which I don't think is possible across multiple distributions?). I believe we end up using the partial stub folder that's first in the search paths, but maybe that's not true for namespace packages.

By the same rules, a partial stub also can't override a py.typed library. Both things combined make our use of partial stubs in Pylance (working around hard-to-analyze libraries without fully typing them) a little fragile depending on the rest of the ecosystem.

@srittau
Copy link
Collaborator

srittau commented Aug 9, 2021

Maybe there should be a separate namespace package on which other packages would depend

This is also a good idea.

@emmatyping
Copy link
Contributor

If I remember correctly, technically PEP 561 talks about the order to resolve modules aka scipy.linalg not just entire packages (just scipy), so I believe you can have stubs that are both partial that implement different parts of a larger package, they just can't implement stubs for the same module.

@jakebailey
Copy link
Contributor

Can you publish two partial distributions if py.typed needs to be at the module root of both? That's the bit that I'm finding restrictive (as I can only construct a situation where this works when they are in different import roots and therefore both provide py.typed, but that isn't how site-modules works).

@emmatyping
Copy link
Contributor

I'm afraid I'm not sure I understand your scenario. Are you saying two namespace packages would interfere because they both need the py.typed at their root, and that would be an error on installation?

@jakebailey
Copy link
Contributor

Yes, that's what I'm describing. This would affect both typeshed and regular packages published to PyPI, as they all get placed in site-packages.

@romanofski
Copy link
Contributor

I've actually had a look why google.cloud.ndb depends on protobuf. It seems it's possible to deserialise an ndb.Model instance from a protobuffer.
Public interface wise there is actually nothing in common, except one single private function:

def _entity_from_protobuf(protobuf):
    """Deserialize an entity from a protobuffer.

    Args:
        protobuf (google.cloud.datastore_v1.types.Entity): An entity protobuf
            to be deserialized.

    Returns:
        .Model: The deserialized entity.
    """
    ds_entity = helpers.entity_from_protobuf(protobuf)
    return _entity_from_ds_entity(ds_entity)

So arguably the types do not depend on each other, but I think the possibility for them to be installed in the same site-packages is real.

The idea of having a common namespace package is neat, but I doubt it would scale very well if there are more packages to be typed.

@nipunn1313
Copy link
Contributor

Looking at the protobuf and google-cloud-core examples, the libraries produce modules google.protobuf and google.cloud

According to PEP-561 - it appears that when a stub package is present, type checkers should use it w/o falling back to the implementation library.

See this comment in pyright's issue

However, in this case, typeshed's stub uploader is creating a package google-stubs for protobuf, which is causing this to occur. We could add a py.typed partial to google.protobuf, but this seems broken... because google.protobuf is actually complete. It's just that there are multiple packages in the google namespace package.

What strategy should we use here? Ideally google.protobuf and google.cloud libraries which ship independently via pip can also ship independently when stubbed automatically out of typeshed.

Is suspect we need to have the stub package for protobuf declare the namespace package here for things to work - so the "complete" protobuf stubs don't make it think that google.cloud is complete. I will try to experiment with it. https://github.com/typeshed-internal/stub_uploader/blob/main/scripts/build_wheel.py#L53

@nipunn1313
Copy link
Contributor

nipunn1313 commented Sep 2, 2021

I played with this for a couple of hours.

How do we expect a stub package for a namespace package to look? Ideally, we are able to claim full types on a stub package like types-protobuf (google.protobuf) without breaking other packages within the namespace package (eg google.cloud.firestore)

The best guidance I can find is here https://www.python.org/dev/peps/pep-0561/#stub-only-packages - but it doesn't reference how namespace packages should work.

Currently typeshed generates a package that looks like this

➜  tree venv_3.8.11/lib/python3.8/site-packages/google-stubs
venv_3.8.11/lib/python3.8/site-packages/google-stubs
├── METADATA.toml
├── __init__.pyi
└── protobuf
    ├── __init__.pyi
    ├── any_pb2.pyi
    ├── api_pb2.pyi
    ├── (etc etc - rest omitted)

This seems wrong since google contains an __init__.pyi which seems expressly forbidden for a namespace package. Removing the __init__.pyi makes mypy extremely unhappy (can't find stubs any more). Pyright can still find google.protobuf, but still cannot find google.cloud

I don't think the PEP-561 is clear on how this should work. I see two possible interpretations (neither of which work with either mypy or pyright today in my testing). The typeshed stub uploader does neither of these (see above).

Interpretation 1

google-stubs
├── METADATA.toml
└── protobuf
    ├── __init__.pyi
    ├── any_pb2.pyi
    ├── api_pb2.pyi
    ├── (etc etc - rest omitted)

Interpretation 2

google
├── METADATA.toml
└── protobuf-stubs
    ├── __init__.pyi
    ├── any_pb2.pyi
    ├── api_pb2.pyi
    ├── (etc etc - rest omitted)

How should we proceed?

[EDIT] - fixed a typo in my interpretations s/google-types/google-stubs

@jakebailey
Copy link
Contributor

jakebailey commented Sep 2, 2021

Pyright can still find google.protobuf, but still cannot find google.cloud

This would strike me as a bug (or, an oversight); this is probably the first time anyone's ever tried to make namespace stub packages. Namespace packages are notoriously tricky, and adding stub package logic on top is probably extra tricky and not yet handled.

I think interpretation 1 is the one I would be expecting; IIRC we look for stub packages in the import roots, not deeper per-module; that'd be even more complicated, as any subdir of an import root could somehow modify the types of other import roots just by having a -stub in it. I.e., you use namespace packages to stub other namespace packages.

@jakebailey
Copy link
Contributor

@erictraut may have some thoughts (he's probably only following the issues on pyright/pylance).

@romanofski
Copy link
Contributor

Yeah this strikes me a bit as undefined behaviour? I actually commented on the original pyright issue from which this ticket was created, but it's very easy to get lost in confusing setups and I wasn't really helping getting to the bottom of this. I wanted to try reproduce this from scratch on the pyright side, but just haven't found the time to do so.

@nipunn1313
Copy link
Contributor

Ok! So I was able to poke with this further.

I was working on supporting Interpretation 1 in mypy.

While working on python/mypy#11141 - I found that it does work in mypy if the --namespace-packages flag is provided - interpretation 1 works.

Agree w/ @jakebailey on eliminating Interpretation 2 out of inelegance/unnecessary complexity

My Proposed Solution

Accept Interpretation 1 above. Action items

  • Update the typeshed stub uploader so that google-stubs won't have an __init__.pyi and will be a true namespace stub package.
  • Follow up with Cannot resolve google.cloud imports when types-protobuf are installed microsoft/pyright#2113 - to ensure pyright treats google-stubs/protobuf as complete, but google-stubs as incomplete (since it's a namespace package).
  • Potentially follow up with mypy to figure out what to do with --namespace-packages flag. It strikes me as unnecessary given that PEP 420 has stabilized. Perhaps eliminate it or default it to on.
  • Potentially amend PEP561 to add clarity for what to do for stub-only packages involving namespace packages (specify interpretation 1)

Alternate Solution Ideas

Mandate __init__.pyi files in namespace stub packages (eg google-stubs).

  • Would require some other way of determining that it's a namespace package (so as not to assume completeness)

Ban namespace packages from using stub-only package distributions

  • Would regress google.protobuf in the short term unfortunately

@erictraut
Copy link
Contributor

Thanks @nipunn1313, that's great progress. I like your proposal.

@emmatyping
Copy link
Contributor

Hi, sorry I haven't been keeping up with this thread. Interpretation 1 is definitely correct. Since the package is called google, the stub package should be called google-stubs and it should be handled just like a runtime namespace package (the protobuf and cloud sub-packages would live side by side in the site directory and should both be searched if both installed).

As for namespace packages in mypy, there is python/mypy#9636 to make it default but there are some issues with making it work with the daemon and cache, so it may not happen immediately.

I definitely would be happy to review a suggested wording change to PEP 561 on this, I agree it really isn't clear.

@srittau
Copy link
Collaborator

srittau commented Sep 20, 2021

Just to clarify: Would we have one package google-stubs that includes google.protobuf, google.ndb, google.auth etc. or would we have a separate packages for those? The latter seems correct to me as we have multiple upstream packages, but I can live if we had just one package for now until all issues are resolved.

@nipunn1313
Copy link
Contributor

@srittau - I'm have a terminology confusion about what a "package" is - I think the term is sometimes referring to a pip package, and sometimes to an installed directory (in site-packages).

I'll try to be clear about my terminology - and explain my sense of the ideal.

Each of these google packages have a separate pypi distribution (eg protobuf and ndb and firestore. Each one can decide independently whether to use inline types via py.typed or a stub-only package. For example, currently firestore uses inline types and protobuf uses a stub-only distribution (types-protobuf - autogenerated from typeshed)

Each stub-only distribution will install into a common namespace-package in site-packages. For example
google-stubs/protobuf
google-stubs/cloud/ndb (hypothetical at this point - see #5821 which wanted to add this).

In this example, google-stubs would be a stub-only namespace package (corresponding to from google import xyz). google-stubs.cloud is also actually a stub-only namespace package (corresponding to from google.cloud import xyz)

JelleZijlstra pushed a commit to python/mypy that referenced this issue Oct 12, 2021
This only adds tests that show the current behavior when we provide a stub-only package for a subpackage of a namespace package

Proposals for behavior changes exist in python/typeshed#5800

Real world example is the `protobuf` package which installs to `google.protobuf` and the `types-protobuf` package which installs to `google-stubs/protobuf` (where `google` is a namespace package)
@schibrikov
Copy link

Are there any workarounds for that now? I would just remove protobuf-types, but it's a transient dependency.

@nipunn1313
Copy link
Contributor

Are there any workarounds for that now? I would just remove protobuf-types, but it's a transient dependency.

I reproed like this with pyright (not with mypy)

python -m venv env
source env/bin/activate
pip install google-cloud-firestore
echo "import google.cloud.firestore >> test.py"

mypy test.py # works
pyright test.py # works

pip install types-protobuf

mypy test.py # works
pyright test.py # fails to resolve google.cloud.firestore

This is a workaround microsoft/pyright#2113 (comment)

And we hope to have an imminent fix available here #8937 (comment)

@AlexWaygood
Copy link
Member

Is this still an issue now that google/__init__.pyi has been deleted from the typeshed stubs, and types-protobuf 4.21.0.0 has been released?

@henribru
Copy link
Contributor

It appears to be fixed, thanks!

@AlexWaygood
Copy link
Member

It appears to be fixed, thanks!

Fantastic!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests