Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP for MyPy CI Integration #25622

Closed
wants to merge 13 commits into from
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,7 @@ dist
.coverage
coverage.xml
coverage_html_report
*.mypy_cache
*.pytest_cache
# hypothesis test database
.hypothesis/
Expand Down
6 changes: 6 additions & 0 deletions mypy.ini
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
[mypy]
ignore_missing_imports=True
follow_imports=silent

[mypy-pandas.conftest,pandas.tests.*]
ignore_errors=True
9 changes: 9 additions & 0 deletions mypy_whitelist.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
pandas/core/dtypes/base.py
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are all of the files that currently have some sort of hints in them. The motivation for this comes form this part in the documentation:

https://mypy.readthedocs.io/en/latest/running_mypy.html#reading-a-list-of-files-from-a-file

So thinking for initial CI runs we can whitelist the modules we want to run against, though this is ideally just a temporary file

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear when running this from the project root you would do mypy @mypy_whitelist.txt

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Naddiseo any chance you have experience with this? Reading from docs I think suggested approach will be to whitelist particular modules at the outset and slowly open up as more are added.

I'd like to avoid having two files to control configuration here but I don't see an easy way in the .ini file to control which modules actually get checked

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@WillAyd, not with the whitelist approach. I went with the blacklist approach instead, and had a bunch of modules and packages with ignore_errors set.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback. Was that done per package / module in the config file?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently I'm just ignoring packages. But, I think at one point I was doing both, and I seem to remember doing a mixture where I'd ignore everything in a package except a specific file.
I think it looked like:

[mypy-package.subpackage]
ignore_errors=True
[mypy-package.subpackage.module]
ignore_errors=False

However, I don't remember if it worked or not.

pandas/core/groupby/groupby.py
pandas/core/internals/managers.py
pandas/core/common.py
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this not possible in the setup file?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea I was hoping that Mypy_path in the ini file would work but didn't have any luck. I didn't also see a very good way to make this readable therein since the ini would expect this to be a comma separated list.

With that said this may just be temporary anyway - would ideally like to run on the entire code. Modules without existing types would be ignored by default anyway

pandas/core/arrays/timedeltas.py
pandas/core/arrays/datetimes.py
pandas/core/arrays/base.py
pandas/core/frame.py
pandas/core/indexes/base.py
7 changes: 6 additions & 1 deletion pandas/core/arrays/array_.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,13 @@
from typing import Optional, Sequence, Union

import numpy as np

from pandas._libs import lib, tslibs

from pandas.core.arrays.base import ExtensionArray
from pandas.core.dtypes.common import (
is_datetime64_ns_dtype, is_extension_array_dtype, is_timedelta64_ns_dtype)
from pandas.core.dtypes.dtypes import registry
from pandas.core.dtypes.dtypes import ExtensionDtype, registry

from pandas import compat

Expand Down
8 changes: 5 additions & 3 deletions pandas/core/arrays/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
without warning.
"""
import operator
from typing import Any, Callable, Optional, Sequence, Tuple, Union

import numpy as np

Expand All @@ -15,6 +16,7 @@
from pandas.util._decorators import Appender, Substitution

from pandas.core.dtypes.common import is_list_like
from pandas.core.dtypes.dtypes import ExtensionDtype
from pandas.core.dtypes.generic import ABCIndexClass, ABCSeries
from pandas.core.dtypes.missing import isna

Expand Down Expand Up @@ -365,7 +367,7 @@ def isna(self):
raise AbstractMethodError(self)

def _values_for_argsort(self):
# type: () -> ndarray
# type: () -> np.ndarray
"""
Return values for sorting.

Expand Down Expand Up @@ -597,7 +599,7 @@ def searchsorted(self, value, side="left", sorter=None):
return arr.searchsorted(value, side=side, sorter=sorter)

def _values_for_factorize(self):
# type: () -> Tuple[ndarray, Any]
# type: () -> Tuple[np.ndarray, Any]
"""
Return an array and missing value suitable for factorization.

Expand All @@ -622,7 +624,7 @@ def _values_for_factorize(self):
return self.astype(object), np.nan

def factorize(self, na_sentinel=-1):
# type: (int) -> Tuple[ndarray, ExtensionArray]
# type: (int) -> Tuple[np.ndarray, ExtensionArray]
"""
Encode the extension array as an enumerated type.

Expand Down
6 changes: 4 additions & 2 deletions pandas/core/arrays/datetimelike.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,15 +2,17 @@
from datetime import datetime, timedelta
import operator
import warnings
from typing import Union, Sequence, Tuple

import numpy as np

from pandas._libs import NaT, algos, iNaT, lib
from pandas._libs.tslibs.nattype import NaTType
from pandas._libs.tslibs.period import (
DIFFERENT_FREQ, IncompatibleFrequency, Period)
from pandas._libs.tslibs.timedeltas import Timedelta, delta_to_nanoseconds
from pandas._libs.tslibs.timestamps import (
RoundTo, maybe_integer_op_deprecated, round_nsint64)
RoundTo, maybe_integer_op_deprecated, round_nsint64, Timestamp)
import pandas.compat as compat
from pandas.compat.numpy import function as nv
from pandas.errors import (
Expand Down Expand Up @@ -350,7 +352,7 @@ def __iter__(self):

@property
def asi8(self):
# type: () -> ndarray
# type: () -> np.ndarray
"""
Integer representation of the values.

Expand Down
1 change: 1 addition & 0 deletions pandas/core/arrays/datetimes.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
from datetime import datetime, time, timedelta
import textwrap
import warnings
from typing import Union

import numpy as np
from pytz import utc
Expand Down
2 changes: 1 addition & 1 deletion pandas/core/arrays/integer.py
Original file line number Diff line number Diff line change
Expand Up @@ -510,7 +510,7 @@ def value_counts(self, dropna=True):
return Series(array, index=index)

def _values_for_argsort(self):
# type: () -> ndarray
# type: () -> np.ndarray
"""Return values for sorting.

Returns
Expand Down
6 changes: 5 additions & 1 deletion pandas/core/arrays/period.py
Original file line number Diff line number Diff line change
@@ -1,17 +1,21 @@
# -*- coding: utf-8 -*-
from datetime import timedelta
import operator
from typing import Any, Callable, List, Optional, Sequence, Union

import numpy as np

from pandas._libs.tslibs import (
NaT, frequencies as libfrequencies, iNaT, period as libperiod)
from pandas._libs.tslibs.nattype import NaTType
from pandas._libs.tslibs.fields import isleapyear_arr
from pandas._libs.tslibs.period import (
DIFFERENT_FREQ, IncompatibleFrequency, Period, get_period_field_arr,
period_asfreq_arr)
from pandas._libs.tslibs.timedeltas import Timedelta, delta_to_nanoseconds
import pandas.compat as compat
from pandas.core.arrays.base import ExtensionArray
from pandas.core.indexes.base import Index
from pandas.util._decorators import Appender, cache_readonly

from pandas.core.dtypes.common import (
Expand Down Expand Up @@ -132,7 +136,7 @@ class PeriodArray(dtl.DatetimeLikeArrayMixin, dtl.DatelikeOps):
_scalar_type = Period

# Names others delegate to us
_other_ops = []
_other_ops = [] # type: List[str]
_bool_ops = ['is_leap_year']
_object_ops = ['start_time', 'end_time', 'freq']
_field_ops = ['year', 'month', 'day', 'hour', 'minute', 'second',
Expand Down
7 changes: 4 additions & 3 deletions pandas/core/arrays/sparse.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,12 +7,13 @@
import operator
import re
import warnings
from typing import Any, Callable, Type, Union

import numpy as np

from pandas._libs import index as libindex, lib
import pandas._libs.sparse as splib
from pandas._libs.sparse import BlockIndex, IntIndex
from pandas._libs.sparse import BlockIndex, IntIndex, SparseIndex
from pandas._libs.tslibs import NaT
import pandas.compat as compat
from pandas.compat.numpy import function as nv
Expand Down Expand Up @@ -79,7 +80,7 @@ class SparseDtype(ExtensionDtype):
_metadata = ('_dtype', '_fill_value', '_is_na_fill_value')

def __init__(self, dtype=np.float64, fill_value=None):
# type: (Union[str, np.dtype, 'ExtensionDtype', type], Any) -> None
# type: (Union[str, np.dtype, 'ExtensionDtype', Type], Any) -> None
from pandas.core.dtypes.missing import na_value_for_dtype
from pandas.core.dtypes.common import (
pandas_dtype, is_string_dtype, is_scalar
Expand Down Expand Up @@ -372,7 +373,7 @@ def _subtype_with_str(self):


def _get_fill(arr):
# type: (SparseArray) -> ndarray
# type: (SparseArray) -> np.ndarray
"""
Create a 0-dim ndarray containing the fill value

Expand Down
5 changes: 3 additions & 2 deletions pandas/core/arrays/timedeltas.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
from datetime import timedelta
import textwrap
import warnings
from typing import List

import numpy as np

Expand Down Expand Up @@ -134,8 +135,8 @@ class TimedeltaArray(dtl.DatetimeLikeArrayMixin, dtl.TimelikeOps):
_scalar_type = Timedelta
__array_priority__ = 1000
# define my properties & methods for delegation
_other_ops = []
_bool_ops = []
_other_ops = [] # type: List[str]
_bool_ops = [] # type: List[str]
_object_ops = ['freq']
_field_ops = ['days', 'seconds', 'microseconds', 'nanoseconds']
_datetimelike_ops = _field_ops + _object_ops + _bool_ops
Expand Down
1 change: 1 addition & 0 deletions pandas/core/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
from pandas.util._decorators import Appender, Substitution, cache_readonly
from pandas.util._validators import validate_bool_kwarg

from pandas.core.arrays.base import ExtensionArray
from pandas.core.dtypes.common import (
is_datetime64_ns_dtype, is_datetime64tz_dtype, is_datetimelike,
is_extension_array_dtype, is_extension_type, is_list_like, is_object_dtype,
Expand Down
1 change: 1 addition & 0 deletions pandas/core/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
from datetime import datetime, timedelta
from functools import partial
import inspect
from typing import Any

import numpy as np

Expand Down
4 changes: 3 additions & 1 deletion pandas/core/dtypes/base.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
"""Extend pandas with custom array types"""
from typing import List, Optional, Type

import numpy as np

from pandas.errors import AbstractMethodError
Expand Down Expand Up @@ -211,7 +213,7 @@ def __str__(self):

@property
def type(self):
# type: () -> type
# type: () -> Type
"""
The scalar type for the array, e.g. ``int``

Expand Down
3 changes: 2 additions & 1 deletion pandas/core/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@
import sys
import warnings
from textwrap import dedent
from typing import List, Set, Union

import numpy as np
import numpy.ma as ma
Expand Down Expand Up @@ -365,7 +366,7 @@ def _constructor(self):
_constructor_sliced = Series
_deprecations = NDFrame._deprecations | frozenset(
['get_value', 'set_value', 'from_csv', 'from_items'])
_accessors = set()
_accessors = set() # type: Set[str]

@property
def _constructor_expanddim(self):
Expand Down
15 changes: 8 additions & 7 deletions pandas/core/groupby/groupby.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,10 +13,11 @@ class providing the base-class of operations.
from functools import partial, wraps
import types
import warnings
from typing import FrozenSet, Optional, Type

import numpy as np

from pandas._libs import Timestamp, groupby as libgroupby
from pandas._libs import Timestamp, groupby as libgroupby # type: ignore
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See note on Cython imports

import pandas.compat as compat
from pandas.compat import range, set_function_name, zip
from pandas.compat.numpy import function as nv
Expand Down Expand Up @@ -325,7 +326,7 @@ def _group_selection_context(groupby):

class _GroupBy(PandasObject, SelectionMixin):
_group_selection = None
_apply_whitelist = frozenset()
_apply_whitelist = frozenset() # type: FrozenSet[str]

def __init__(self, obj, keys=None, axis=0, level=None,
grouper=None, exclusions=None, selection=None, as_index=True,
Expand Down Expand Up @@ -1041,7 +1042,7 @@ def _bool_agg(self, val_test, skipna):
"""

def objs_to_bool(vals):
# type: (np.ndarray) -> (np.ndarray, typing.Type)
# type: (np.ndarray) -> (np.ndarray, Type)
if is_object_dtype(vals):
vals = np.array([bool(x) for x in vals])
else:
Expand All @@ -1050,7 +1051,7 @@ def objs_to_bool(vals):
return vals.view(np.uint8), np.bool

def result_to_bool(result, inference):
# type: (np.ndarray, typing.Type) -> np.ndarray
# type: (np.ndarray, Type) -> np.ndarray
return result.astype(inference, copy=False)

return self._get_cythonized_result('group_any_all', self.grouper,
Expand Down Expand Up @@ -1743,7 +1744,7 @@ def quantile(self, q=0.5, interpolation='linear'):
"""

def pre_processor(vals):
# type: (np.ndarray) -> (np.ndarray, Optional[typing.Type])
# type: (np.ndarray) -> (np.ndarray, Optional[Type])
if is_object_dtype(vals):
raise TypeError("'quantile' cannot be performed against "
"'object' dtypes!")
Expand All @@ -1758,7 +1759,7 @@ def pre_processor(vals):
return vals, inference

def post_processor(vals, inference):
# type: (np.ndarray, Optional[typing.Type]) -> np.ndarray
# type: (np.ndarray, Optional[Type]) -> np.ndarray
if inference:
# Check for edge case
if not (is_integer_dtype(inference) and
Expand Down Expand Up @@ -2021,7 +2022,7 @@ def _get_cythonized_result(self, how, grouper, aggregate=False,
Function to be applied to result of Cython function. Should accept
an array of values as the first argument and type inferences as its
second argument, i.e. the signature should be
(ndarray, typing.Type).
(ndarray, Type).
**kwargs : dict
Extra arguments to be passed back to Cython funcs

Expand Down
3 changes: 2 additions & 1 deletion pandas/core/indexes/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,11 @@
import operator
from textwrap import dedent
import warnings
from typing import Union

import numpy as np

from pandas._libs import (
from pandas._libs import ( # type: ignore
algos as libalgos, index as libindex, join as libjoin, lib)
from pandas._libs.lib import is_datetime_array
from pandas._libs.tslibs import OutOfBoundsDatetime, Timedelta, Timestamp
Expand Down
5 changes: 3 additions & 2 deletions pandas/core/indexes/datetimelike.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
"""
import operator
import warnings
from typing import Set

import numpy as np

Expand Down Expand Up @@ -698,9 +699,9 @@ class DatetimelikeDelegateMixin(PandasDelegate):
boxed in an index, after being returned from the array
"""
# raw_methods : dispatch methods that shouldn't be boxed in an Index
_raw_methods = set()
_raw_methods = set() # type: Set[str]
# raw_properties : dispatch properties that shouldn't be boxed in an Index
_raw_properties = set()
_raw_properties = set() # type: Set[str]
name = None
_data = None

Expand Down
12 changes: 9 additions & 3 deletions pandas/core/internals/blocks.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,12 @@
import inspect
import re
import warnings
from typing import Any, List, Optional

import numpy as np

from pandas._libs import internals as libinternals, lib, tslib, tslibs
from pandas._libs import (internals as libinternals, # type: ignore
lib, tslib, tslibs)
from pandas._libs.tslibs import Timedelta, conversion, is_null_datetimelike
import pandas.compat as compat
from pandas.compat import range, zip
Expand Down Expand Up @@ -1826,8 +1828,12 @@ def interpolate(self, method='pad', axis=0, inplace=False, limit=None,
limit=limit),
placement=self.mgr_locs)

def shift(self, periods, axis=0, fill_value=None):
# type: (int, Optional[BlockPlacement], Any) -> List[ExtensionBlock]
def shift(self,
periods, # type: int
axis=0, # type: Optional[libinternals.BlockPlacement]
fill_value=None # type: Any
):
# type: (...) -> List[ExtensionBlock]
"""
Shift the block by `periods`.

Expand Down
Loading