-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TYP: interval.pyi #44922
TYP: interval.pyi #44922
Conversation
Hello @twoertwein! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:
Comment last updated at 2022-02-21 04:04:50 UTC |
@@ -200,6 +201,9 @@ class IntervalArray(IntervalMixin, ExtensionArray): | |||
ndim = 1 | |||
can_hold_na = True | |||
_na_value = _fill_value = np.nan | |||
_left: np.ndarray | |||
_right: np.ndarray |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about the type of _left
and _right
, see mypy error for _from_sequence
in this file.
@@ -83,7 +83,7 @@ | |||
PythonScalar = Union[str, int, float, bool] | |||
DatetimeLikeScalar = Union["Period", "Timestamp", "Timedelta"] | |||
PandasScalar = Union["Period", "Timestamp", "Timedelta", "Interval"] | |||
Scalar = Union[PythonScalar, PandasScalar] | |||
Scalar = Union[PythonScalar, PandasScalar, np.datetime64, np.timedelta64] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before this PR, Interval
resolved to Any
. Some functions that are supposed to return a Scalar
can return np.datetime64
/ np.timedelta64
which was previously unnoticed.
I liket he _ScalarT better for a name of the typevar (as compared to _S in timestamps.pyi) |
pandas/_libs/interval.pyi
Outdated
@property | ||
def mid(self) -> float: ... | ||
@property | ||
def length(self) -> float: ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could be timedelta?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is now generic but IntervalMixin[datetime].length is annotated to return a datetime. I don't think this exception can be achieved with overloads (overlapping overloads with different return type).
pandas/core/reshape/pivot.py
Outdated
@@ -482,11 +482,20 @@ def pivot( | |||
if columns is None: | |||
raise TypeError("pivot() missing 1 required argument: 'columns'") | |||
|
|||
columns_listlike = com.convert_to_list_like(columns) | |||
# error: Argument 1 to "convert_to_list_like" has incompatible type "Hashable"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this suggest that the 'columns' annotation may be too loose?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
convert_to_list_like
seems to be too strict: it literally accepts anything.
I just realized that typing interval.pyi is much messier than I expected:
I think the only way to address the above is to use define a Protocol (needs to support +, -, <, ...) and then use this protocol as a bound for a TypeVar which is then used by IntervalMixin&co. The datetime/timedelta mechanics would then need to be addressed by overloads. Marking this PR as a draft until I have a working version with a protocol. |
Version 2: more aligned with the documentation (any "orderable scalar" is accepted) which also breaks the Scalar-Interval dependency. |
@@ -517,7 +516,7 @@ def f(x): | |||
|
|||
|
|||
def convert_to_list_like( | |||
values: Scalar | Iterable | AnyArrayLike, | |||
values: Hashable | Iterable | AnyArrayLike, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All scalars should also be hashable.
@twoertwein if you can merge master |
cc @simonjayhawkins if any comments |
@@ -893,7 +893,7 @@ def _clear_buffer(self) -> None: | |||
|
|||
def _get_index_name( | |||
self, columns: list[Hashable] | |||
) -> tuple[list[Hashable] | None, list[Hashable], list[Hashable]]: | |||
) -> tuple[Sequence[Hashable] | None, list[Hashable], list[Hashable]]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without this, we would get:
pandas/io/parsers/python_parser.py:949: error: Incompatible return value type (got "Tuple[List[Union[Union[str, int, float, bool], Union[Period, Timestamp, Timedelta, Interval[Any]]git , datetime64, timedelta64]], List[Hashable], List[Hashable]]", expected "Tuple[Optional[List[Hashable]], List[Hashable], List[Hashable]]") [return-value]
I think that this is based on stuff I've put in the Microsoft stubs. I recently did a PR there that addresses a few issues. See https://github.com/microsoft/python-type-stubs/pull/167/files One thing that I had to test was getting the type right for intervals based on I added the following test in that PR in the MS stubs: def test_interval_length() -> None:
i1 = pd.Interval(pd.Timestamp("2000-01-01"), pd.Timestamp("2000-01-02"), closed="both")
reveal_type(i1.length, expected_string="Timedelta")
i1.length.total_seconds()
i2 = pd.Interval(10, 20)
reveal_type(i2.length, expected_type=int)
i3 = pd.Interval(13.2, 19.5)
reveal_type(i3.length, expected_type=float)
|
Thanks @Dr-Irv It definitely makes sense to add exceptions for Pandas/important stdlib types. I think I found a way to encode this (using your workaround for overload+decorators): # note: mypy doesn't support overloading properties
# based on github.com/microsoft/python-type-stubs/pull/167
class _LengthDescriptor:
@overload
def __get__(self, instance: IntervalMixin[Timestamp], owner: Any) -> Timedelta: ...
@overload
def __get__(self, instance: IntervalMixin[datetime], owner: Any) -> timedelta: ...
@overload
def __get__(
self, instance: IntervalMixin[_OrderableT], owner: Any
) -> _OrderableT: ...
class IntervalMixin(Generic[_OrderableT]):
...
length: _LengthDescriptor Unfortunately, there doesn't seem to be a generic way to handle other exceptions (Interval can be used with more classes than just int/float/Timestamp). Still need to debug this locally - getting some "has-type/misc" mypy errors and pyright doesn't pick up on it (probably an issue with TypeVar'ed Protocol in Generic classes - will create an issue on pyright). |
Not sure about that. See below:
So I think we don't need to worry about |
In that case, 1) your approach might be much simpler and 2) the documentation needs to be updated. |
def __str__(self) -> str: ... | ||
# TODO: could return Interval with different type | ||
def __add__( | ||
self, y: numbers.Number | np.timedelta64 | timedelta |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you need to make the operators type specific. For Interval[Timestamp]
, you can only add and subtract Timedelta
. For the numeric ones, you can only add/subtract/multiply/divide float
or int
. See the latest in microsoft/python-type-stubs#167
def __sub__( | ||
self, y: numbers.Number | np.timedelta64 | timedelta | ||
) -> Interval[_OrderableT]: ... | ||
def __mul__(self, y: numbers.Number) -> Interval[_OrderableT]: ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
multiply and divide don't apply for the Timestamp
intervals
def __truediv__(self, y: numbers.Number) -> Interval[_OrderableT]: ... | ||
def __floordiv__(self, y: numbers.Number) -> Interval[_OrderableT]: ... | ||
def __hash__(self) -> int: ... | ||
def __contains__(self: Interval[_OrderableT], key: _OrderableT) -> bool: ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you have to be explicit here about the 4 different types. You can't have __contains__()
support testing an integer inside a Timestamp
interval. See PR microsoft/python-type-stubs#167
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The two _OrderableT
bind to the same type simultaneously.
i2 = pd.Interval(10, 20)
i2.__contains__(4) # ok
i2.__contains__(4.0) # error: Unsupported operand types for in ("float" and "Interval[int]") [operator]
i3 = pd.Interval(13.2, 19.5)
i3.__contains__(4) # ok
i3.__contains__(4.0) # ok
i3.__contains__(pd.Timestamp(0)) # error: Unsupported operand types for in ("Timestamp" and "Interval[float]") [operator]
Thank you for your feedback! I will integrate it over the next days. If you prefer, I can also leave the operators as-is and leave it to you - doesn't feel great copy&pasting your code ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that i2.__contains__(4.0)
should be allowed. If you have an interval based on integers, you can test whether a float is inside.
@@ -663,7 +672,9 @@ def _get_indexer( | |||
# homogeneous scalar index: use IntervalTree | |||
# we should always have self._should_partial_index(target) here | |||
target = self._maybe_convert_i8(target) | |||
indexer = self._engine.get_indexer(target.values) | |||
# error: Argument 1 to "get_indexer" of "IntervalTree" has incompatible type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not add typing to _maybe_convert_i8
?
yep. lgtm. always happy to have PRs that add more types merged if mypy is green. @Dr-Irv comments and pre-commit failure outstanding. |
Closing in favor of #46098 |
Currently rebased on top of #44339.This is the second last part of @erictraut's #43744 (offsets.pyi is still missing).