Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TYP: Add annotation for df.pivot #32197

Merged
merged 41 commits into from
May 18, 2020

Conversation

charlesdong1991
Copy link
Member

def pivot(
data: "DataFrame",
index: Optional[Union[Label, Collection[Label]]] = None,
columns: Union[Label, List[Label]] = None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this also be Optional?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, i am not very sure, got some mypy errors below to complain that columns is not Optional, probably because we will raise error if None.

def pivot(data: "DataFrame", index=None, columns=None, values=None) -> "DataFrame":
def pivot(
data: "DataFrame",
index: Optional[Union[Label, List[Optional[Label]]]] = None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
index: Optional[Union[Label, List[Optional[Label]]]] = None,
index: Optional[Union[Label, Sequence[Optional[Label]]]] = None,

For API items we want to be as generic as possible, so Sequence instead of List and Mapping instead of Dict (unless the interface really isn't generic)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment on next two lines

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For API items we want to be as generic as possible, so Sequence instead of List and Mapping instead of Dict (unless the interface really isn't generic)

thanks for the very nice tip!!! 👍

@WillAyd WillAyd added the Typing type annotations, mypy/pyright type checking label Feb 24, 2020
@pep8speaks
Copy link

pep8speaks commented Feb 24, 2020

Hello @charlesdong1991! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-05-18 19:53:56 UTC

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not a fan of all of the type ignores

cc @simonjayhawkins @WillAyd

def pivot(data: "DataFrame", index=None, columns=None, values=None) -> "DataFrame":
def pivot(
data: "DataFrame",
index: Optional[Union[Label, Sequence[Optional[Label]]]] = None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i t hink would be ok to add these to _typing, maybe Labels? so this becomes Optional[Labels]

if index is None:
pass
elif is_list_like(index):
cols = list(index)
# Remove type ignore once mypy-5206 is implemented, same for below
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add an assert here to make mypy happy?

@simonjayhawkins
Copy link
Member

not a fan of all of the type ignores

we have the choice of ignores, ignores with error codes, casts and asserts.

my preference here would be casts (only after is_list_like calls) xref #32785

diff --git a/pandas/core/reshape/pivot.py b/pandas/core/reshape/pivot.py
index e49dd4c9b..855189c43 100644
--- a/pandas/core/reshape/pivot.py
+++ b/pandas/core/reshape/pivot.py
@@ -1,4 +1,14 @@
-from typing import TYPE_CHECKING, Callable, Dict, List, Optional, Sequence, Tuple, Union
+from typing import (
+    TYPE_CHECKING,
+    Callable,
+    Dict,
+    List,
+    Optional,
+    Sequence,
+    Tuple,
+    Union,
+    cast,
+)
 
 import numpy as np
 
@@ -427,24 +437,28 @@ def _convert_by(by):
 @Appender(_shared_docs["pivot"], indents=1)
 def pivot(
     data: "DataFrame",
-    index: Optional[Union[Label, Sequence[Optional[Label]]]] = None,
-    columns: Optional[Union[Label, Sequence[Optional[Label]]]] = None,
-    values: Optional[Union[Label, Sequence[Optional[Label]]]] = None,
+    index: Optional[Union[Label, Sequence[Label]]] = None,
+    columns: Optional[Union[Label, Sequence[Label]]] = None,
+    values: Optional[Union[Label, Sequence[Label]]] = None,
 ) -> "DataFrame":
     if columns is None:
         raise TypeError("pivot() missing 1 required argument: 'columns'")
-    columns = columns if is_list_like(columns) else [columns]
+    if is_list_like(columns):
+        columns = cast(Sequence[Label], columns)
+        columns = columns
+    else:
+        columns = [columns]
 
     if values is None:
-        cols: List[Optional[Sequence]] = []
+        cols: List[Label] = []
         if index is None:
             pass
         elif is_list_like(index):
-            # Remove type ignore once mypy-5206 is implemented, same for below
-            cols = list(index)  # type: ignore
+            index = cast(Sequence[Label], index)
+            cols = list(index)
         else:
-            cols = [index]  # type: ignore
-        cols.extend(columns)  # type: ignore
+            cols = [index]
+        cols.extend(columns)
 
         append = index is None
         indexed = data.set_index(cols, append=append)
@@ -452,18 +466,20 @@ def pivot(
         if index is None:
             idx_list = [Series(data.index, name=data.index.name)]
         elif is_list_like(index):
-            idx_list = [data[idx] for idx in index]  # type: ignore
+            index = cast(Sequence[Label], index)
+            idx_list = [data[idx] for idx in index]
         else:
             idx_list = [data[index]]
 
-        data_columns = [data[col] for col in columns]  # type: ignore
+        data_columns = [data[col] for col in columns]
         idx_list.extend(data_columns)
         mi_index = MultiIndex.from_arrays(idx_list)
 
         if is_list_like(values) and not isinstance(values, tuple):
             # Exclude tuple because it is seen as a single column name
+            values = cast(Sequence[Label], values)
             indexed = data._constructor(
-                data[values]._values, index=mi_index, columns=values  # type: ignore
+                data[values]._values, index=mi_index, columns=values
             )
         else:
             indexed = data._constructor_sliced(data[values]._values, index=mi_index)

it looks like the use of type: ignores here masked an incorrect type definition of cols

each cast used here could instead be replaced with a runtime assert until resolution of #32785

This diff uses 4 casts corresponding to 4 is_list_like calls compared with 6 ignores for which the reasons are not immediately obvious.

@simonjayhawkins
Copy link
Member

@charlesdong1991 whats the status here?

@charlesdong1991
Copy link
Member Author

I have updated and it seems cast indeed work here.

Thanks very much! @simonjayhawkins

@jreback jreback added this to the 1.1 milestone May 17, 2020
pandas/core/reshape/pivot.py Outdated Show resolved Hide resolved
pandas/core/reshape/pivot.py Outdated Show resolved Hide resolved
@charlesdong1991
Copy link
Member Author

@jreback i use convert_to_list_like for the first one, keep your second comment as is, and pls let me know if you think it's okay

@charlesdong1991 charlesdong1991 requested a review from jreback May 18, 2020 11:16
@charlesdong1991
Copy link
Member Author

many thanks for the help on annotations @simonjayhawkins i think it's good to go now!

@simonjayhawkins
Copy link
Member

Thanks @charlesdong1991 .lgtm. I think if you type the return type of MultiIndex.from_arrays you could revert the variable name changes idx_list and mi_index and reduce the diff.

    def from_arrays(cls, arrays, sortorder=None, names=lib.no_default) -> "MultiIndex":

@charlesdong1991
Copy link
Member Author

ahh, yeah, indeed seems pass mypy locally.

very nice!! @simonjayhawkins thanks!

Copy link
Member

@simonjayhawkins simonjayhawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@charlesdong1991 Thanks. lgtm pending green. @jreback @WillAyd

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm @jreback

@jreback jreback merged commit 1f48d3d into pandas-dev:master May 18, 2020
@jreback
Copy link
Contributor

jreback commented May 18, 2020

thanks @charlesdong1991

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Typing type annotations, mypy/pyright type checking
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants