Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial pandas.typing Module #25884

Merged
merged 19 commits into from
Mar 30, 2019
Merged
Show file tree
Hide file tree
Changes from 17 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions doc/source/whatsnew/v0.25.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,18 @@ What's New in 0.25.0 (April XX, 2019)
These are the changes in pandas 0.25.0. See :ref:`release` for a full changelog
including other versions of pandas.

Enhancements
~~~~~~~~~~~~

.. _whatsnew_0250.enhancements.typing:

Type Hints and ``pandas.typing``
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would remove this entirely. if its private it is not to be relied upon.

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

In accordance with :pep:`484` pandas has introduced Type Hints and a new ``pandas._typing`` module containing aliases for idiomatic pandas types into the code base. We will be continually adding annotations to the code base to improve readability, reduce code maintenance and proactively identify bugs. Note that ``pandas._typing`` is currently private as it is developmental and subject to change, though this will eventually be exposed as ``pandas.typing`` for third party libraries to leverage when type checking their own code which interfaces with pandas.

`MyPy <http://mypy-lang.org>`__ has been configured as part of our CI to perform compile-time type checking.


.. _whatsnew_0250.enhancements.other:

Expand Down
4 changes: 4 additions & 0 deletions pandas/_typing.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
from pathlib import Path
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a test that asserts this in pandas/tests/types/test_api.py

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea no problem. So the existing test(s) exclude any privately named modules:

result = sorted(f for f in dir(namespace) if not f.startswith('_'))

Are you asking to revisit that logic or simply add a test with this as an exception to make sure it lives there?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh i c. nvm then. though I think we should actually check and lock down all modules (so add back the private ones). in a new PR / issue though.

from typing import IO, AnyStr, Union

FilePathOrBuffer = Union[str, Path, IO[AnyStr]]
3 changes: 2 additions & 1 deletion pandas/io/gcs.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,5 +12,6 @@ def get_filepath_or_buffer(filepath_or_buffer, encoding=None,
mode = 'rb'

fs = gcsfs.GCSFileSystem()
filepath_or_buffer = fs.open(filepath_or_buffer, mode)
filepath_or_buffer = fs.open(
filepath_or_buffer, mode) # type: gcsfs.GCSFile
return filepath_or_buffer, None, compression, True
23 changes: 16 additions & 7 deletions pandas/io/parsers.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@
from pandas.core.dtypes.dtypes import CategoricalDtype
from pandas.core.dtypes.missing import isna

from pandas._typing import FilePathOrBuffer
from pandas.core import algorithms
from pandas.core.arrays import Categorical
from pandas.core.frame import DataFrame
Expand Down Expand Up @@ -400,7 +401,7 @@ def _validate_names(names):
return names


def _read(filepath_or_buffer, kwds):
def _read(filepath_or_buffer: FilePathOrBuffer, kwds):
"""Generic reader of line files."""
encoding = kwds.get('encoding', None)
if encoding is not None:
Expand All @@ -409,7 +410,12 @@ def _read(filepath_or_buffer, kwds):

compression = kwds.get('compression', 'infer')
compression = _infer_compression(filepath_or_buffer, compression)
filepath_or_buffer, _, compression, should_close = get_filepath_or_buffer(

# TODO: get_filepath_or_buffer could return
# Union[FilePathOrBuffer, s3fs.S3File, gcsfs.GCSFile]
# though mypy handling of conditional imports is difficult.
# See https://github.com/python/mypy/issues/1297
fp_or_buf, _, compression, should_close = get_filepath_or_buffer(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's mentioned in the comments but I changed the variable name here from filepath_or_buffer to fp_or_buf to intentionally NOT shadow the parameter from the signature.

As mentioned in the comment this local variable could potentially introduce new types for S3 and GCP and I don't think there is a great way with typing to statically analyze conditional imports like those just yet, so it's a clearer delimitation IMO to assign the return of this function to a separate variable

filepath_or_buffer, encoding, compression)
kwds['compression'] = compression

Expand All @@ -426,7 +432,7 @@ def _read(filepath_or_buffer, kwds):
_validate_names(kwds.get("names", None))

# Create the parser.
parser = TextFileReader(filepath_or_buffer, **kwds)
parser = TextFileReader(fp_or_buf, **kwds)

if chunksize or iterator:
return parser
Expand All @@ -438,7 +444,7 @@ def _read(filepath_or_buffer, kwds):

if should_close:
try:
filepath_or_buffer.close()
fp_or_buf.close()
except ValueError:
pass

Expand Down Expand Up @@ -533,7 +539,7 @@ def _make_parser_function(name, default_sep=','):
else:
sep = default_sep

def parser_f(filepath_or_buffer,
def parser_f(filepath_or_buffer: FilePathOrBuffer,
sep=sep,
delimiter=None,

Expand Down Expand Up @@ -725,8 +731,11 @@ def parser_f(filepath_or_buffer,
)(read_table)


def read_fwf(filepath_or_buffer, colspecs='infer', widths=None,
infer_nrows=100, **kwds):
def read_fwf(filepath_or_buffer: FilePathOrBuffer,
colspecs='infer',
widths=None,
infer_nrows=100,
**kwds):

r"""
Read a table of fixed-width formatted lines into DataFrame.
Expand Down
3 changes: 2 additions & 1 deletion pandas/io/s3.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,5 +31,6 @@ def get_filepath_or_buffer(filepath_or_buffer, encoding=None,
# A NoCredentialsError is raised if you don't have creds
# for that bucket.
fs = s3fs.S3FileSystem(anon=True)
filepath_or_buffer = fs.open(_strip_schema(filepath_or_buffer), mode)
filepath_or_buffer = fs.open(
_strip_schema(filepath_or_buffer), mode) # type: s3fs.S3File
return filepath_or_buffer, None, compression, True