-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TYP: how to annotate DataFrame.__getitem__ #46616
Comments
I just pushed a PR for this and other mypy related issues in the MS stubs, and this is what I am using that works for the tests that are there: @overload
def __getitem__(self, idx: Scalar) -> Series: ...
@overload
def __getitem__(self, rows: slice) -> DataFrame: ...
@overload
def __getitem__(
self,
idx: Union[
Tuple,
Series[_bool],
DataFrame,
List[_str],
List[Hashable],
Index,
np_ndarray_str,
Sequence[Tuple[Scalar, ...]],
],
) -> DataFrame: ... |
I think
this is incorrect. If the idx is duplicated in the DataFrame, this returns a df.
this returns a DataFrame instead of a Series. |
Thanks @phofl for finding another issue! I'm not sure what the best solution for I would be inclined to a solution similar to @Dr-Irv's (but to somewhat accept Hashable) but we would then probably need
|
stepping back a bit, It was my understanding from the dev meetings a couple of months back that we were moving the MS Stubs across as is in the first instance? Has this changed? |
I'm looking forward to that (especially if that is done sooner than later)! Maybe the question of how to annotate |
So this is an issue because the type of the result is based on the "insides" of the calling With respect to typing, there are two ways to look at this:
As a user, I would vote for (2), which is why we've used those annotations in the MS stubs. |
I've spent a fair bit of time improving the annotations in the MS stubs, and developing out the test framework there. In particular, I just created a huge PR to make things work right with We'd have to figure out a way to manage the transition from MS publishing the MS stubs in |
I was about to create a PR but I realized that annotating
DataFrame.__getitem__
might be impossible without making some simplifications.There are two problems with
__getitem__
:Hashable
key (one would expect that this always returns aSeries
) butslice
(isHashable
) returns aDataFrame
df["a"
] can return aDataFrame
.The MS stubs seems to make two assumptions: 1) columns can only be of type str (and maybe a few more types - but not Hashable) and 2) multiindex doesn't exist. In practice, this will cover almost all cases.
I don't think there is a solution for the multiindex issue. Even if we make DataFrame generic to carry the type of the column index, there is no
Not[Multiindex]
type, so we will always end up with incompatible & overlapping overloads.The Hashable issue can partly be addressed:
Do you see a way to cover all cases of
__getitem__
and if not which assumptions are you willing to make? @simonjayhawkins @Dr-IrvThe text was updated successfully, but these errors were encountered: