-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TYP: Add annotation for df.pivot #32197
TYP: Add annotation for df.pivot #32197
Conversation
pandas/core/reshape/pivot.py
Outdated
def pivot( | ||
data: "DataFrame", | ||
index: Optional[Union[Label, Collection[Label]]] = None, | ||
columns: Union[Label, List[Label]] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this also be Optional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, i am not very sure, got some mypy errors below to complain that columns
is not Optional, probably because we will raise error if None
.
pandas/core/reshape/pivot.py
Outdated
def pivot(data: "DataFrame", index=None, columns=None, values=None) -> "DataFrame": | ||
def pivot( | ||
data: "DataFrame", | ||
index: Optional[Union[Label, List[Optional[Label]]]] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
index: Optional[Union[Label, List[Optional[Label]]]] = None, | |
index: Optional[Union[Label, Sequence[Optional[Label]]]] = None, |
For API items we want to be as generic as possible, so Sequence
instead of List
and Mapping
instead of Dict
(unless the interface really isn't generic)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment on next two lines
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For API items we want to be as generic as possible, so
Sequence
instead ofList
andMapping
instead ofDict
(unless the interface really isn't generic)
thanks for the very nice tip!!! 👍
Hello @charlesdong1991! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2020-05-18 19:53:56 UTC |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not a fan of all of the type ignores
pandas/core/reshape/pivot.py
Outdated
def pivot(data: "DataFrame", index=None, columns=None, values=None) -> "DataFrame": | ||
def pivot( | ||
data: "DataFrame", | ||
index: Optional[Union[Label, Sequence[Optional[Label]]]] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i t hink would be ok to add these to _typing, maybe Labels? so this becomes Optional[Labels]
pandas/core/reshape/pivot.py
Outdated
if index is None: | ||
pass | ||
elif is_list_like(index): | ||
cols = list(index) | ||
# Remove type ignore once mypy-5206 is implemented, same for below |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add an assert here to make mypy happy?
we have the choice of ignores, ignores with error codes, casts and asserts. my preference here would be casts (only after is_list_like calls) xref #32785 diff --git a/pandas/core/reshape/pivot.py b/pandas/core/reshape/pivot.py
index e49dd4c9b..855189c43 100644
--- a/pandas/core/reshape/pivot.py
+++ b/pandas/core/reshape/pivot.py
@@ -1,4 +1,14 @@
-from typing import TYPE_CHECKING, Callable, Dict, List, Optional, Sequence, Tuple, Union
+from typing import (
+ TYPE_CHECKING,
+ Callable,
+ Dict,
+ List,
+ Optional,
+ Sequence,
+ Tuple,
+ Union,
+ cast,
+)
import numpy as np
@@ -427,24 +437,28 @@ def _convert_by(by):
@Appender(_shared_docs["pivot"], indents=1)
def pivot(
data: "DataFrame",
- index: Optional[Union[Label, Sequence[Optional[Label]]]] = None,
- columns: Optional[Union[Label, Sequence[Optional[Label]]]] = None,
- values: Optional[Union[Label, Sequence[Optional[Label]]]] = None,
+ index: Optional[Union[Label, Sequence[Label]]] = None,
+ columns: Optional[Union[Label, Sequence[Label]]] = None,
+ values: Optional[Union[Label, Sequence[Label]]] = None,
) -> "DataFrame":
if columns is None:
raise TypeError("pivot() missing 1 required argument: 'columns'")
- columns = columns if is_list_like(columns) else [columns]
+ if is_list_like(columns):
+ columns = cast(Sequence[Label], columns)
+ columns = columns
+ else:
+ columns = [columns]
if values is None:
- cols: List[Optional[Sequence]] = []
+ cols: List[Label] = []
if index is None:
pass
elif is_list_like(index):
- # Remove type ignore once mypy-5206 is implemented, same for below
- cols = list(index) # type: ignore
+ index = cast(Sequence[Label], index)
+ cols = list(index)
else:
- cols = [index] # type: ignore
- cols.extend(columns) # type: ignore
+ cols = [index]
+ cols.extend(columns)
append = index is None
indexed = data.set_index(cols, append=append)
@@ -452,18 +466,20 @@ def pivot(
if index is None:
idx_list = [Series(data.index, name=data.index.name)]
elif is_list_like(index):
- idx_list = [data[idx] for idx in index] # type: ignore
+ index = cast(Sequence[Label], index)
+ idx_list = [data[idx] for idx in index]
else:
idx_list = [data[index]]
- data_columns = [data[col] for col in columns] # type: ignore
+ data_columns = [data[col] for col in columns]
idx_list.extend(data_columns)
mi_index = MultiIndex.from_arrays(idx_list)
if is_list_like(values) and not isinstance(values, tuple):
# Exclude tuple because it is seen as a single column name
+ values = cast(Sequence[Label], values)
indexed = data._constructor(
- data[values]._values, index=mi_index, columns=values # type: ignore
+ data[values]._values, index=mi_index, columns=values
)
else:
indexed = data._constructor_sliced(data[values]._values, index=mi_index) it looks like the use of type: ignores here masked an incorrect type definition of each cast used here could instead be replaced with a runtime assert until resolution of #32785 This diff uses 4 casts corresponding to 4 is_list_like calls compared with 6 ignores for which the reasons are not immediately obvious. |
@charlesdong1991 whats the status here? |
I have updated and it seems Thanks very much! @simonjayhawkins |
@jreback i use |
many thanks for the help on annotations @simonjayhawkins i think it's good to go now! |
Thanks @charlesdong1991 .lgtm. I think if you type the return type of MultiIndex.from_arrays you could revert the variable name changes idx_list and mi_index and reduce the diff.
|
ahh, yeah, indeed seems pass mypy locally. very nice!! @simonjayhawkins thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@charlesdong1991 Thanks. lgtm pending green. @jreback @WillAyd
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm @jreback
thanks @charlesdong1991 |
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff