Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add DataTypes documentation #536

Merged
merged 23 commits into from
Jul 2, 2021
Merged
Show file tree
Hide file tree
Changes from 16 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions .github/workflows/ci-tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -209,6 +209,13 @@ jobs:
- name: Upload coverage to Codecov
uses: "codecov/codecov-action@v1"

- name: Check Docstrings
run: >
nox
-db conda -r -v
--non-interactive
--session "doctests-${{ matrix.python-version }}"

- name: Check Docs
run: >
nox
Expand Down
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -113,7 +113,7 @@ venv.bak/
/asv_bench/results/

# Docs
docs/source/generated
docs/source/reference/generated

# Nox
.nox
Expand Down
167 changes: 0 additions & 167 deletions docs/source/API_reference.rst

This file was deleted.

41 changes: 41 additions & 0 deletions docs/source/_templates/dtype.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
{{ fullname | escape | underline}}

.. currentmodule:: {{ module }}

.. autoclass:: {{ objname }}

{% block attributes %}
{% if attributes %}
.. rubric:: Attributes

.. autosummary::
:nosignatures:

{% for item in attributes %}
~{{ name }}.{{ item }}
{%- endfor %}

{% endif %}
{% endblock %}

{% block methods %}
{% if methods %}
.. rubric:: Methods

.. autosummary::
:nosignatures:
:toctree: methods

{# Ignore the DateTime alias to avoid `WARNING: document isn't included in any toctree`#}
{% if objname != "DateTime" %}
{% for item in methods %}
~{{ name }}.{{ item }}
{%- endfor %}

{%- if members and '__call__' in members %}
~{{ name }}.__call__
{%- endif %}
{%- endif %}

{%- endif %}
{% endblock %}
7 changes: 6 additions & 1 deletion docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -162,7 +162,7 @@
.. role:: green
"""

autosummary_generate = ["API_reference.rst"]
autosummary_generate = True
autosummary_filename_map = {
"pandera.Check": "pandera.Check",
"pandera.check": "pandera.check_decorator",
Expand All @@ -174,6 +174,11 @@
"pandas": ("http://pandas.pydata.org/pandas-docs/stable/", None),
}

# strip prompts
copybutton_prompt_text = (
r">>> |\.\.\. |\$ |In \[\d*\]: | {2,5}\.\.\.: | {5,8}: "
)
copybutton_prompt_is_regexp = True

# this is a workaround to filter out forward reference issue in
# sphinx_autodoc_typehints
Expand Down
30 changes: 22 additions & 8 deletions docs/source/dataframe_schemas.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ DataFrame Schemas
The :class:`~pandera.schemas.DataFrameSchema` class enables the specification of a schema
that verifies the columns and index of a pandas ``DataFrame`` object.

The ``DataFrameSchema`` object consists of |column|_\s and an |index|_.
The :class:`~pandera.schemas.DataFrameSchema` object consists of |column|_\s and an |index|_.

.. |column| replace:: ``Column``
.. |index| replace:: ``Index``
Expand Down Expand Up @@ -44,12 +44,25 @@ The ``DataFrameSchema`` object consists of |column|_\s and an |index|_.
Column Validation
-----------------

A :class:`~pandera.schema_components.Column` must specify the properties of a column in a dataframe
object. It can be optionally verified for its data type, `null values`_ or
A :class:`~pandera.schema_components.Column` must specify the properties of a
column in a dataframe object. It can be optionally verified for its data type,
`null values`_ or
duplicate values. The column can be coerced_ into the specified type, and the
required_ parameter allows control over whether or not the column is allowed to
be missing.

Similarly to pandas, the data type can be specified as:

* a string alias, as long as it is recognized by pandas.
* a python type: `int`, `float`, `double`, `bool`, `str`
* a `numpy data type <(https://numpy.org/doc/stable/user/basics.types.html)>`_
* a `pandas extension type <(https://pandas.pydata.org/pandas-docs/stable/user_guide/basics.html#dtypes)>`_:
it can be an instance (e.g `pd.CategoricalDtype(["a", "b"])`) or a
class (e.g `pandas.CategoricalDtype`) if it can be initialized with default
values.
* a pandera :class:`~pandera.dtypes.DataType`: it can also be an instance or a
class.

:ref:`Column checks<checks>` allow for the DataFrame's values to be
checked against a user-provided function. ``Check`` objects also support
:ref:`grouping<grouping>` by a different column so that the user can make
Expand Down Expand Up @@ -270,7 +283,7 @@ objects can also be used to validate columns in a dataframe on its own:
validated_df = df.pipe(column1_schema).pipe(column2_schema)


For multi-column use cases, the ``DataFrameSchema`` is still recommended, but
For multi-column use cases, the :class:`~pandera.schemas.DataFrameSchema` is still recommended, but
if you have one or a small number of columns to verify, using ``Column``
objects by themselves is appropriate.

Expand Down Expand Up @@ -594,12 +607,13 @@ indexes by composing a list of ``pandera.Index`` objects.
foo 2 3


Get Pandas Datatypes
--------------------
Get Pandas Data Types
---------------------

Pandas provides a `dtype` parameter for casting a dataframe to a specific dtype
schema. ``DataFrameSchema`` provides a `dtype` property which returns a pandas
style dict. The keys of the dict are column names and values are the dtype.
schema. :class:`~pandera.schemas.DataFrameSchema` provides
a :attr:`~pandera.schemas.DataFrameSchema.dtypes` property which returns a
dictionary whose keys are column names and values are :class:`~pandera.dtypes.DataType`.

Some examples of where this can be provided to pandas are:

Expand Down
Loading