Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

datetime.__sub__ overload order #10924

Closed
randolf-scholz opened this issue Oct 24, 2023 · 15 comments
Closed

datetime.__sub__ overload order #10924

randolf-scholz opened this issue Oct 24, 2023 · 15 comments

Comments

@randolf-scholz
Copy link
Contributor

Currently, the overload order is:

if sys.version_info >= (3, 8):
@overload # type: ignore[override]
def __sub__(self, __value: timedelta) -> Self: ...
@overload
def __sub__(self: _D, __value: _D) -> timedelta: ...

However, both the c-module and the fallback pydatetime first check if other is a datetime:

    def __sub__(self, other):
        "Subtract two datetimes, or a datetime and a timedelta."
        if not isinstance(other, datetime):
            if isinstance(other, timedelta):
                return self + -other
            return NotImplemented

https://github.com/python/cpython/blob/c7d68f907ad3e3aa17546df92a32bddb145a69bf/Lib/_pydatetime.py#L2235-L2257

https://github.com/python/cpython/blob/1198076447f35b19a9173866ccb9839f3bcf3f17/Modules/_datetimemodule.c#L5608

So shouldn't the overload order be vice versa?

Also, I noticed that at runtime, timedelta has __rdivmod__, __rmod__ and datetime has __rsub__, which are missing in the stub. But I guess this is because the pure python fallback doesn't have them.

@AlexWaygood
Copy link
Member

The order in which the runtime does its type checks shouldn't really be relevant to the order of the overloads in the stubs. Can you produce a minimal, reproducible example where a type checker is inferring the wrong thing as a result of the current overload order?

@randolf-scholz
Copy link
Contributor Author

Theoretically, this would produce wrong inference if someone created an intersection type of datetime and timedelta. For cpython, this shouldn't be possible (it might be possible for other implementations like pypy). It can lead to wrong inferences when a type-checker interprets a type as Never, which is possible since current type-checkers do not consider Never uninhabitable.

These are more theoretical than of any real world relevance. The real reason I bring this up is that both numpy and pandas do the overloads vice versa and this leads to issues when trying to write a Protocol that captures datetime.datetime, numpy.datetime64 and pandas.Timestamp simultaneously, since overloads are order-dependent.

https://github.com/numpy/numpy/blob/382eedf3891d474196677e65f3aae066afb32c5f/numpy/__init__.pyi#L2770-L2773

https://github.com/pandas-dev/pandas-stubs/blob/9aac8e31ba69eb4c0583e55dd2198755fb031620/pandas-stubs/_libs/tslibs/timestamps.pyi#L203-L214

@randolf-scholz
Copy link
Contributor Author

randolf-scholz commented Oct 25, 2023

We unfortunately see this quite often: overloads are only order-independent if the arguments are mutually exclusive. Since there are no intersection-types & or Not-types yet, which would allow one to write overloads with mutually exclusive arguments, library A will write overloads in a different order than library B. However, this ends up breaking duck typing, because for duck typing to work one needs to use Protocol-types that are supertypes for either case.

@AlexWaygood
Copy link
Member

AlexWaygood commented Oct 25, 2023

Theoretically, this would produce wrong inference if someone created an intersection type of datetime and timedelta. For cpython, this shouldn't be possible (it might be possible for other implementations like pypy).

Even for alternative implementations such as pypy, it wouldn't be possible to create a class that is a subclass of date(time) and a subclass of timedelta. The different __slots__ on the two bases prevent you from doing so:

Python 3.11.5 (tags/v3.11.5:cce6ba9, Aug 24 2023, 14:38:34) [MSC v.1936 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> # this ensures that `import datetime` imports the pure-Python version:
>>> sys.modules['_datetime'] = None
>>> import datetime
>>> datetime
<module 'datetime' from 'C:\\Users\\alexw\\AppData\\Local\\Programs\\Python\\Python311\\Lib\\datetime.py'>
>>> class Baffling(datetime.date, datetime.timedelta): pass
...
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: multiple bases have instance lay-out conflict
>>> class Baffling(datetime.timedelta, datetime.date): pass
...
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: multiple bases have instance lay-out conflict
>>> class Baffling(datetime.timedelta, datetime.datetime): pass
...
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: multiple bases have instance lay-out conflict
>>> class Baffling(datetime.datetime, datetime.timedelta): pass
...
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: multiple bases have instance lay-out conflict

@randolf-scholz
Copy link
Contributor Author

So here is the full protocol example, based on discussion here.

Full example
from datetime import datetime, timedelta
from typing import (
    Protocol,
    Self,
    SupportsFloat,
    SupportsInt,
    TypeVar,
    overload,
    runtime_checkable,
)

from numpy import datetime64, float64, int64, timedelta64
from pandas import Timedelta, Timestamp

TD_Type = TypeVar("TD_Type", bound="TimeDelta")
DT_Type = TypeVar("DT_Type", bound="DateTime[TimeDelta]")


@runtime_checkable
class TimeDelta(Protocol):
    """Time delta provides several arithmetical operations."""

    # unary operations
    def __pos__(self: TD_Type) -> TD_Type: ...
    def __neg__(self: TD_Type) -> TD_Type: ...
    def __abs__(self: TD_Type) -> TD_Type: ...
    def __bool__(self) -> bool: ...

    # comparisons
    def __le__(self: TD_Type, other: TD_Type, /) -> bool: ...
    def __lt__(self: TD_Type, other: TD_Type, /) -> bool: ...
    def __ge__(self: TD_Type, other: TD_Type, /) -> bool: ...
    def __gt__(self: TD_Type, other: TD_Type, /) -> bool: ...

    # arithmetic
    # addition +
    def __add__(self: TD_Type, other: TD_Type, /) -> TD_Type: ...
    def __radd__(self: TD_Type, other: TD_Type, /) -> TD_Type: ...

    # subtraction -
    def __sub__(self: TD_Type, other: TD_Type, /) -> TD_Type: ...
    def __rsub__(self: TD_Type, other: TD_Type, /) -> TD_Type: ...

    # multiplication *
    def __mul__(self: TD_Type, other: int, /) -> TD_Type: ...
    def __rmul__(self: TD_Type, other: int, /) -> TD_Type: ...

    # division /
    def __truediv__(self: TD_Type, other: TD_Type, /) -> SupportsFloat: ...

    # @overload
    # def __truediv__(self, other: Self, /) -> float: ...
    # @overload
    # def __truediv__(self, other: float, /) -> Self: ...

    # floor division //
    def __floordiv__(self: TD_Type, other: TD_Type, /) -> SupportsInt: ...

    # @overload
    # def __floordiv__(self, other: Self, /) -> int: ...
    # @overload
    # def __floordiv__(self, other: int, /) -> Self: ...

    # modulo %
    def __mod__(self: TD_Type, other: TD_Type, /) -> TD_Type: ...

    # NOTE: __rmod__ missing on fallback pydatetime
    # def __rmod__(self, other: Self, /) -> Self: ...

    # divmod
    def __divmod__(self: TD_Type, other: TD_Type, /) -> tuple[SupportsInt, TD_Type]: ...

    # NOTE: __rdivmod__ missing on fallback pydatetime
    # def __rdivmod__(self, other: Self, /) -> tuple[SupportsInt, Self]: ...


@runtime_checkable
class DateTime(Protocol[TD_Type]):  # bind appropriate TimeDelta type
    """Datetime can be compared and subtracted."""

    def __le__(self: DT_Type, other: DT_Type, /) -> bool: ...
    def __lt__(self: DT_Type, other: DT_Type, /) -> bool: ...
    def __ge__(self: DT_Type, other: DT_Type, /) -> bool: ...
    def __gt__(self: DT_Type, other: DT_Type, /) -> bool: ...

    def __add__(self: DT_Type, other: TD_Type, /) -> DT_Type: ...
    def __radd__(self: DT_Type, other: TD_Type, /) -> DT_Type: ...

    # Fallback: no overloads
    # def __sub__(self, other: Self, /) -> TD_Type: ...

    # order A
    @overload
    def __sub__(self, other: Self, /) -> TD_Type: ...
    @overload
    def __sub__(self, other: TD_Type, /) -> Self: ...

    # order B
    # @overload
    # def __sub__(self, other: TD_Type, /) -> Self: ...
    # @overload
    # def __sub__(self, other: Self, /) -> TD_Type: ...


# fmt: off
python_dt: DateTime[timedelta] = datetime.fromisoformat("2021-01-01") # incompatible with A
numpy_dt: DateTime[timedelta64] = datetime64("2021-01-01")  # incompatible with B
pandas_dt: DateTime[Timedelta] = Timestamp("2021-01-01") # incompatible with B
# fmt: on

The pickle is this: if I choose the over load order A

# order A
@overload
def __sub__(self, other: Self, /) -> TD_Type: ...
@overload
def __sub__(self, other: TD_Type, /) -> Self: ...

for DateTime.__sub__, then it is compatible with numpy.datetime64 and pandas.Timestamp, but incompatible with datatime.datetime. If we use the reverse order B:

# order B
@overload
def __sub__(self, other: TD_Type, /) -> Self: ...
@overload
def __sub__(self, other: Self, /) -> TD_Type: ...

It is compatible with datatime.datetime but incompatible with numpy.datetime64 and pandas.Timestamp.

Thus, the different overload orders make it impossible to have a nice joint protocol. Based on the fact that cpython's source code is aligned with overload order A, I think it is reasonable to request to use the corresponding overload order in the stubs.

@randolf-scholz
Copy link
Contributor Author

Btw. pyright sees no error in either overload order, not sure why that is.

@AlexWaygood
Copy link
Member

Btw. pyright sees no error in either overload order, not sure why that is.

I think I've seen a few instances where pyright is slightly less fussy when it comes to protocol matching than mypy is. (I make no judgement as to which behaviour is "correct", in this case or in others.)

@AlexWaygood
Copy link
Member

In terms of the principle of the thing, I feel like typeshed is sorta upstream to numpy or pandas when it comes to typing, so I kinda feel like they should be following our lead when it comes to the order of the overloads, rather than vice versa. But, having said that... if this is causing issues, and we're the "odd one out", I'd be okay with making this change -- I can't think of any ways that this could cause regressions for other users.

The current order does feel more logical to me currently -- to me, it follows a standard pattern where the more specific overloads are listed first, and the more general "catch-all" overloads follow later. In this specific case, I don't think the order matters hugely, but it would break with that standard pattern to switch the order.

@AlexWaygood
Copy link
Member

Full example

Off-topic, but I wouldn't recommend using @runtime_checkable on a protocol with that many members. isinstance() checks against runtime-checkable protocols are slow generally, but they get really slow (compared to normal isinstance() checks) if you're doing them against protocols with that many members. Personally I'd be hesitant adding the decorator to protocols with more than four, maybe five, members.

@randolf-scholz
Copy link
Contributor Author

The current order does feel more logical to me currently -- to me, it follows a standard pattern where the more specific overloads are listed first, and the more general "catch-all" overloads follow later.

To be honest the self: _D, __value: _D confuses me, why is it not simply self, __value: Self ? Since _D is bound to date, the only possible difference I could see if someone literally tried to do datetime.__sub__(date1, date2), which seems really awkward.

And secondly, in what sense do you mean is _D/Self less specific here than timedelta? I see that this is the case when you have something like str vs Sequence or Mapping vs MutableMapping, where one type is a subtype of another one, or one type could admit a subclass that is a subtype of the other.

@randolf-scholz
Copy link
Contributor Author

Off-topic, but I wouldn't recommend using @runtime_checkable on a protocol with that many members. isinstance() checks against runtime-checkable protocols are slow generally, but they get really slow (compared to normal isinstance() checks) if you're doing them against protocols with that many members.

I took a look at this, and it seems huge amounts of time are spent on calling _get_protocol_attrs every time an isinstance-check is done. It seems to me that if a protocol is decorated with @runtime-checkable, then at class creation time the decorator could compile a frozenset of attributes once and _ProtocolMeta could use that cached set. This might break if someone changed the Protocol dynamically, but that seems like unintended usage of Protocols to begin with.

@AlexWaygood
Copy link
Member

AlexWaygood commented Nov 16, 2023

I took a look at this, and it seems huge amounts of time are spent on calling _get_protocol_attrs every time an isinstance-check is done.

It sounds like you're measuring using Python 3.11 or lower; you'll find the performance profile is very different on py312+. We've already implemented the optimisations you suggest, but this does not negate any of the points I've made.

@AlexWaygood
Copy link
Member

FYI, we've also backported the performance improvements to typing_extensions.Protocol, if you want them on py311 or lower.

But even with the improvements in py312+, isinstance() checks are still much slower against runtime-checkable protocols than against other classes, especially if the runtime-checkable protocol in question has lots of members

@AlexWaygood
Copy link
Member

Fixed in #10999

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants