-
Notifications
You must be signed in to change notification settings - Fork 242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mark Shannon's presentation at the 2017 Language Summit #432
Comments
I agree with Mark that this is a problem and can be a blocker for adopting type annotations for some projects. We are mostly shielded from the problem at Dropbox because we use comment annotations, but once we migrate to Python 3 we'd potentially have problems with the current approach as well. Inheriting from from typing import implements, List
@implements(List[int])
class MyList(list): ... It's unclear whether this would also affect user-defined generic classes. For example, consider this code that works currently: class Box(Generic[T]): ...
class SpecialBox(Box[int]): ...
class AnotherBox(Box[T]): ...
def do_stuff(box: Box[int], box2: AnotherBox[str]) -> None: ... If I understood things right, this would be written like this based on Mark's proposal: class Box(Generic[T]): ...
@implements(Box[int])
class SpecialBox(Box): ...
@implements(Box[T])
class AnotherBox(Box): ...
def do_stuff(box: Box[int], box2: AnotherBox[str]) -> None: ... # No change This would still mean that user-defined classes may trigger metaclass conflicts due to These things should probably also work: @implements(Tuple[int, str])
class MyTuple(tuple): ...
@implements(Tuple[int, ...])
class MyUniformTuple(tuple): ...
@implements('List[int]') # Still slightly more efficient
class MyList(list): ... |
Personally I think this part of the proposal is not workable. E.g. inheriting from Sequence (an existing ABC repurposed as a type) also adds some concrete default implementations, such as
|
Maybe that could written like this: @implements(typing.Sequence[T])
class MyList(typing.Sequence):
... There would still be a duality --
This is still a usability regressions and a backward compatibility break, as defining subclasses of generic classes would become different and slightly harder. However, maybe we can live with this, since after we have protocols, using Finally, generic types such as |
Does anyone actually define their own generics, or even inherit from a type? As a data point, Zulip makes no use of inheritance to define a class and a type at the same time. In other words there is no code of the form https://lgtm.com/query/1957210066/project:140750185/lang:python/ |
I've done both things (define own generics and inheriting from something lice Dict[str, str]) in the context of writing stubs for popular libraries, and I've done the second (inheriting from a specific realization of a generic) in application code |
How often, though? I expect that it is a very rare thing to do. |
Mypy itself has three examples:
Also our internal Dropbox codebases have dozens of examples of |
As a user, I'm not too bothered with the current state of things, since it's rare that I need to subclass a generic. (I found only a single inheritance from |
@JelleZijlstra |
@JukkaL @implements(Mapping[Key, Type])
class Frame(dict): That way the type (mapping) can be kept separate from the implementation (dict). Similarly for the other two examples. |
@markshannon What would then happen to methods defined in
@gvanrossum I wonder if there would be a way to simplify the MROs even if we inherit from |
Here are my comments:
|
Yeah, migration would already be tricky. On the other hand, two years from now it would be still much harder, so if we are going change something fundamental, we should do it as soon as possible. |
First of all a bit of history. In the run up to accepting PEP 484, in an email thread with @gvanrossum and @JukkaL I specifically requested that all I have consistently been of the opinion that equating types and classes is a bad idea. @ilevkivskyi The specific proposal is that no classes representing types should inherit from As the use of type-hints spread, I expect that applications that declare types will remain a (small?) minority, but that applications that use at least one module that uses type-hints will become common. |
FWIW isinstance() was indeed removed, per your request. issubclass() remains because there were problems with removing it. There's still an issue open about removing it. |
|
The problem here is that it gets harder in practice to have a generic in a stub and non-generic in the library. For example working on the numpy stubs, I defined The examples of inheriting things like
This code is not opensource, but the usecase is essentially some kind of generic container (to represent results of API calls that are collections, but not necessarily a pythonic container) |
@dmoisset How is |
I talked about this with Mark in person, and he had another idea that would not break compatibility (at least not as much). Not sure if I understood it fully but I'm attempting to capture the idea here so that we don't completely lose track of it. @markshannon please let me know in which ways I misunderstood you :-) We could make types like class MyList(List[int]): ... However, the MRO of |
Here are some more comments:
This situation ( class C(List[int]):
...
I looked at my old profiling results and things I have found really slow are instantiation of user defined generic classes (up to 10x slower), and valid class C(Generic[T]):
...
c: C[int] = C()
isinstance(c, C) The first situation (instantiation) can be made few times faster by "inlining"
I don't understand this point. As I understand, you are still heavily using subclassing generic classes even in Python 2, and Mark claims that this is the main problem. Do you see any slow-down in Dropbox code when you make classes generic (or inherit from
Interesting. This is exactly what I first thought when this thread started. This approach however should be well thought out. For example, when the fallback to
class C(int, 1): # This fails with metaclass conflict
pass
class C(int, object()): # This fails with TypeError: bases must be types
pass
class A:
pass
class C(A()): # This fails with TypeError: object() takes no parameters
pass
class A:
def __init__(*args, **kwargs):
pass
class C(A()): # This actually works!
pass However, I think if we will figure this out, then this will fix many performance problems (including the original one, and two that I mention above) while preserving full backwards compatibility. |
Subclassing is just one of the problems that I've heard being talking about. Another potential issue is the memory use of annotations, as generic types take some space. I'm aware that identical types are shared, so it's much better now than it used to be. It can still be a problem for memory-constrained environments such as microcontrollers (MicroPython) and very large applications, perhaps. Startup overhead is another worry some users have. This should be easy to measure, though. |
This makes sense, I just tried this on my laptop, here is what I get:
I was thinking a bit more about Another question is should this be a function or just an attribute pointing to the actual class object? At the time of subscripting of a generic class we already know that it should just point to the original class. (Btw this attribute will also allow us to remove completely There is another thing that will help to speed-up things: a flag that ignores all annotations for modules, classes, and functions (like for docstrings currently). Both this flag and |
Lots of people seem interested in runtime processing annotations. E.g. https://github.com/ericvsmith/dataclasses |
Good point! Ignoring annotations flag will break |
It's still possible to use the functional forms of these even if annotations are ignored. Also, we could perhaps populate annotations dictionaries in classes but use |
A global flag would break libraries that use the non-functional notation (which is much nicer anyway). |
But what do you think about the idea of setting all annotations to |
I worry about library developers that use type annotations e.g. to construct JSON schemas. We already have problems with -O and -O. What problem are we really trying to solve, other than Mark complaining? |
I guess we are not doing things in a logical order -- first we should do benchmarking to determine if there is a significant problem right now. In addition to @ilevkivskyi 's results, these data points would be interesting and hopefully not difficult to generate (though they are still a significant amount of work):
Perhaps we could extrapolate that the relative overhead for mypy would be similar for other fully annotated, medium-sized programs. Not sure if we can extrapolate to much larger codebases, but at least the extrapolation could give a rough estimate for larger codebases as well. |
@gvanrossum
All the above apply to CPython as well as MicroPython. |
Good question!
I did many benchmarks, but I think they are all not reliable, since they are micro-benchmarks focused on the speed of specific things like generic class instantiation,
Here are some speculations: implementing |
Can we flip this discussion? Instead of demanding justification for types not being metaclasses, can someone justify why types need to be metaclasses? Making a class inherit from (the class) As to maintaining backwards compatibility, @JukkaL misinterpreted what I meant, but I prefer his approach, so lets go with that 😄 |
Why is micropython a concern? They already make plenty of other changes to support a radically smaller memory footprint, including leaving out most of the standard library by default. Since the primary use case of typing is for static checks, it would presumably be run only during development (on a more powerful machine), and NOT on the small devices. I think the default would be to not even make the annotations available on the small device; if they do choose to support some sort of typing for run-time checks, a lighter-weight version than CPython uses would be less of a compatibility issue than some of the other changes already are. (Making typing more efficient, particularly when not used, is still a good goal, but I don't think micropython in particular is an important use case.) |
It is not the only concern. There are several aspects where |
Here is PR #439 with some short term solutions I mentioned above (as I predicted this gives around 5% speed-up for real applications that use |
The idea of a Or do you want to experiment with mutating the |
@ncoghlan The idea is indeed interesting since it will not only remove the performance concerns, but will also avoid metaclass conflicts as e.g. #449 (by using
I am quite sure now that this is a reasonable idea. We have tried different variations of bases substitution in |
OK, cool. In that case, I think the most important part of my comment is the suggested method name:
|
Minor point: it's |
OK, I have a POC implementation. Here are observations:
Concerning the last point, there are two possible options: a) go with a simple solution (it will already give great speed-up) and keep b) We could modify ...
PyObject_GetItem(PyObject *o, PyObject *key)
{
...
if (PyType_Check(o)){
fallback = PyObject_GetAttrString(o, "__class_getitem__")
if (fallback == NULL){
goto error;
}
esle{
/* pack 'o' and 'key' into 'args'*/
return _PyObject_FastCall(fallback, args, 2);
}
}
...
} My idea is that people rarely subscript random things inside @gvanrossum @ncoghlan what do you think? Should we go with option (a) or (b)? |
I'm not sure I'm entirely following the problem:
|
@ncoghlan class Custom(Generic[T]): ...
Custom[int] # Should be OK
Another(Custom[T]): ...
Another[int] # Should be also OK Everything else seems to be possible without a metaclass (only with Concerning the performance there are two major slow-downs currently:
Both above problems will be fixed by |
Two additional notes:
|
Regarding the name, while Jelle's right that the implemented name is The TypeVar case is an interesting one, but it seems to me that it could potentially be addressed by:
If the isinstance check also proves to be too slow (or otherwise impractical), then I'd suggest we look at ways of optimising that before going down the |
OK
Yes, it works perfectly if we keep the
|
I'm a little lost. Ivan, if you have an implementation somewhere, can you link to it? I presume it's modifications to the C code of CPython? If we're going that route, what will happen on Python 3.5 and before? (Or on 3.6 and before if we decide this is too invasive to go into CPython 3.6.3.) I suppose you can fall back to metaclasses. If we want Another solution to the 3rd party metaclass problem (which is real) could be to just recommend people inherit their metaclass from Why is this still in the "Mark Shannon" thread? I think I missed a part of the conversation. |
Sorry, probably we went to fast. Here is a short summary:
Most probably we will need to have a separate source file for newer versions (like we now have for Python 2 and 3). I think there will be so many fallbacks so that the code will be hard to read. I expect that
This is actually another possibility that I was thinking about. Maybe it is even better (it is a more "local" change anyway).
It will probably fix vast majority of metaclass conflicts. I think we should start from a simple solution (i.e. have very minimal changes to CPython and keep |
I've gotta focus elsewhere for a while, but I'd like to note that I was
probably wrong about recommending people inherit their metaclasses from
ABCMeta -- it'll still be a metaclass conflict.
|
Yes, sorry, you are right, I was confused by the fact that this works: class C(typing.List[int], collections.abc.Sequence): ... Anyway, I am still not sure what to do. If you think we might go with |
(It probably makes sense to break this tangent out into a new issue, but I'll continue here for now) I think it makes sense to break exploration of this idea into 3 phases:
That is, allowing non-classes in a subclass bases list would be a feature of 1a. Potentially look at exposing better building blocks (e.g. a
|
The problem is to get
will fail soon in |
@ilevkivskyi Ah, you're right, I completely forgot that the initial metaclass determination step itself would fail. D'oh :( |
The original performance issue is now addressed by PEP 560. |
@ilevkivskyi @markshannon.
Mark observed that the typing module uses classes to represent types. This can be expensive, since e.g. the type List[int] really ought to be the tuple (List, int) but it's actually a class object which has a fair amount of overhead (though not as much as early versions of typing.py, since we now cache these class objects).
If we changed to tuples (or at least objects simpler than class object), we'd have a problem: The simpler object couldn't be subclassed from. But who subclasses List[int]? Then again, maybe simpler objects aren't the point?
Mark also pointed out that after
We find that
C.__mro__
has 17 items!I confirmed this. The roughly equivalent code using collections.abc
has only 7 items. And subclassing builtins.list
has only three.
This affects performance, e.g. which is faster?
One append() call is 10% faster than the other.
The text was updated successfully, but these errors were encountered: