Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inferred type for type.__subclasses__() could be more specific #2236

Closed
evelynchase opened this issue Oct 10, 2016 · 18 comments
Closed

Inferred type for type.__subclasses__() could be more specific #2236

evelynchase opened this issue Oct 10, 2016 · 18 comments
Labels

Comments

@evelynchase
Copy link

Example:

from typing import Sequence, List, Type

class A(object):
  pass

class B(A):
  pass

class C(A):
  pass

subs_1 = A.__subclasses__() # type: Sequence[Type[A]]
subs_2 = [t for t in A.__subclasses__() if issubclass(t, A)] # type: List[Type[A]]

Expected behavior

The inferred types for subs_1 and subs_2 are as annotated

Actual behavior

The inferred type is List[type] for both subs_1 and subs_2.

@gvanrossum
Copy link
Member

You're right, though I have yet to think of a use case where this matters. (Did you discover this just by playing around, or do you have an actual application that needs this?)

In any case, we may be able to fix this in typeshed once "self-type" is implemented - see PR #2193 implementing the design of issue #1212.

@evelynchase
Copy link
Author

I encountered this while writing a script that parses human-readable log messages from another application. There is a BaseRecord type that contains information common to all log messages (e.g. timestamp, thread name, etc.) and then many subclasses of BaseRecord that know how to parse a particular kind of log message (e.g. turning a log line like "free memory: 224.21 MB" into a FreeMemoryRecord with a memory attribute which contains the float value 224.21). The definition of BaseRecord looks something like this:

class BaseRecord(object):
  @classmethod
  def parse(cls, message: str) -> "BaseRecord":
    for subclass in cls.__subclasses__():
      try:
        return subclass.parse(message)
      except ParseError:
        pass
    return self._parse(message)

  @classmethod
  def _parse(cls, message: str) -> "BaseRecord":
    # extract common attributes from message

And the subclasses provide their own implementation of parse. This way, the BaseRecord.parse method returns an instance of the most specific class relevant to the given log message (which may be BaseRecord itself if there's no additional information that can be extracted). Code further down in the processing pipeline will do something like this:

all_records = [BaseRecord.parse(line) for line in log_file]
mem_records = [r for r in all_records if isinstance(r, FreeMemoryRecord)]
# do stuff with mem_records

A little niche I know, and there are ways to solve it (e.g. by maintaining a list of "record parsers") but it seems like a "self-type" would solve this problem:

class type:
  def __subclasses__(self) -> Sequence[Type[SelfType]]:
    ...

@gvanrossum
Copy link
Member

Yeah, this is hardly what __subclasses__ was meant for (it's pretty sketchy to assume you have control over all subclasses, or that there's only one subclass that can parse the record you're looking at, or that there are no additional levels of subclassing).

But once we have the self-type we'll revisit this.

@elazarg
Copy link
Contributor

elazarg commented Oct 10, 2016

Yes, it only requires self-type and a very small tweak to builtins.pyi.

@gvanrossum gvanrossum added this to the 0.5 milestone Oct 13, 2016
@elazarg
Copy link
Contributor

elazarg commented Oct 27, 2016

It works now by moving __subclasses__ to be a classmethod in object in typeshed:

    @classmethod
    def __subclasses__(cls: Type[_T]) -> List[Type[_T]]: ...

Is it reasonable?

@gvanrossum
Copy link
Member

gvanrossum commented Oct 27, 2016 via email

@elazarg
Copy link
Contributor

elazarg commented Oct 27, 2016

But it won't work with type.__subclasses__(A). The coercion of types to CallableType is annoying :(

@gvanrossum
Copy link
Member

gvanrossum commented Oct 27, 2016 via email

@elazarg
Copy link
Contributor

elazarg commented Oct 27, 2016

Sorry.

I am not sure I understand your second sentence. But:
Moving __subclasses__ to be a @classmethod in object works in most cases, but it does not work on static access: type.__subclasses__ will bind type to the first argument, which is wrong:

$ mypy -c 'reveal_type(type.__subclasses__(bool))'
<string>:1: error: Revealed type is 'builtins.list[Type[builtins.type*]]'
<string>:1: error: Too many arguments for "__subclasses__" of "object"

Adding generic self to __subclasses__ in class type does not work:

$ mypy -c 'reveal_type(type.__subclasses__(bool))'
<string>:1: error: Revealed type is 'builtins.list[def (o: builtins.object =) -> builtins.bool]'

bool turned into a callable.

@gvanrossum
Copy link
Member

(Sorry, I missed the fact that this is the issue about __subclasses__. :-)

I would say it should definitely not become a classmethod on object.

I hope that the output you are seeing can be improved and that the actual inferred type is actually still a class. (Sometimes classes are represented internally as callables with a certain fallback type, but IIRC the string conversion doesn't honor that. And other times classes are represented differently internally, so it's not universally broken, only in some cases -- like this one, apparently.)

Can you try to construct an example to prove/disprove that? Maybe try to see if an item of the resulting list still has a method characteristic of the original class.

@elazarg
Copy link
Contributor

elazarg commented Oct 27, 2016

Seems like you are right - the result is treated as the type itself. So mypy treats int as an "intersection" of CallableType and TypeType? Sounds good (I take the 'annoying' back ;) ) but I guess this distinction shouldn't be leaking to the user. What should we do about this?

@gvanrossum
Copy link
Member

We should make the string conversion of such callables smarter. Maybe rendering as Type[C] is good enough?

@rwbarton
Copy link
Contributor

There are two main formats for stringifying types, the "user" format and the "internal" format (names I just made up). The internal format is mostly used for debugging mypy, so it's better if it shows the real type, which is a Callable (with a special fallback type and with is_type_obj = True; but showing everything is perhaps impractical). If you were debugging an issue that came down to the difference between Type[C] and the Callable type associated to C, it would be pretty hard if the latter were displayed as the former.

reveal_type uses the internal format, which is useful because it gives more information, but on the other hand it also requires more familiarity with mypy internals to interpret correctly.

@rwbarton
Copy link
Contributor

As for the issue itself, can it not be fixed using self types with __subclasses__ remaining an instance method of type? That seems like a much more obvious place for it to me.

@elazarg
Copy link
Contributor

elazarg commented Oct 27, 2016

@rwbarton I think I've made much ado about nothing: it simply works, and the internal format confused me. But given my confusion, I would like it to be verified by someone else :)

class type:
    ...
    def __subclasses__(self: Type[_T]) -> List[Type[_T]]: ...

@gvanrossum
Copy link
Member

I will fix this, after verifying.

@gvanrossum
Copy link
Member

Well, I can't even repro this problem any more:

$ mypy -c 'reveal_type(type.__subclasses__(bool))'
<string>:1: error: Revealed type is 'builtins.list[def (o: builtins.object =) -> builtins.bool]'

When I try that I get:

$ mypy -c 'reveal_type(type.__subclasses__(bool))'
<string>:1: error: Revealed type is 'builtins.list[Type[builtins.bool*]]'

@gvanrossum
Copy link
Member

(That is, with my change to typeshed from c3ddd3c.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants