Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

decouple pandera and pandas dtypes #559

Merged
merged 19 commits into from
Jul 15, 2021
Merged

decouple pandera and pandas dtypes #559

merged 19 commits into from
Jul 15, 2021

Conversation

cosmicBboy
Copy link
Collaborator

fixes #369

@jeffzi finally getting this into the 0.7.0 release branch! Thanks for all your hard work on this, it's a huge step that will make pandera more accessible in the DS/ML ecosystem!

Jean-Francois Zinque and others added 19 commits July 2, 2021 10:20
…taType hierarchy (#504)

* fix pandas_engine.Interval

* fix Timedelta64 registration with pandas_engine.Engine

* add DataType helpers

* add DataType.continuous attribute

* add dtypes.is_numeric

* refactor schema_statistics based on DataType hierarchy

* refactor schema_inference based on DataType hierarchy

* fix numpy_engine.Timedelta64.type

* add is_subdtype helper

* add Engine.get_registered_dtypes

* fix Engine error when registering a base DataType

* fix pandas_engine DateTime string alias

* clean up test_dtypes

* fix test_extensions

* refactor strategies based on DataType hierarchy

* refactor io based on DataType hierarchy

* replace dtypes module by new DataType hierarchy

* fix black

* delete dtypes_.py

* drop legacy pandas and python 3.6 from CI

* fix mypy errors

* fix ci-docs

* fix conda dependencies

* fix lint, update noxfile

* simplify nox tests, fix test_io

* update ci build

* update nox

* pin nox, handle windows data types

* fix windows platform

* fix pandas_engine on windows platform

* fix test_dtypes on windows platform

* force pip on docs CI

* test out windows dtype stuff

* more messing around with windows

* more debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* revert ci

* increase cache

* testing

Co-authored-by: cosmicBboy <[email protected]>
* delete print statements

* pin furo

* fix generated docs not removed by nox

* re-organize API section

* replace aliased pandas_engine data types with their aliases

* drop warning when calling Engine.register_dtype without arguments

* add data types to api reference doc

* add document for DataType refactor

* unpin sphinx and drop sphinx_rtd_theme

* add xdoctest

* ignore prompt when copying example from doc

* add doctest builder when running sphinx-build locally

* fix dtypes doc examples

* fix pandas_engine.DataType.check

* fix pylint

* remove whitespaces in dtypes doc

* Update docs/source/dtypes.rst

* Update dtypes.rst

* update docs structure

* update nox file

* force pip on doctests

* update test_schemas

* fix docs session not overriding html with doctest output

Co-authored-by: Niels Bantilan <[email protected]>
* remove auto-generated docs

* add deprecation warnings, support pandas>=1.3.0

* add deprecation warnings for PandasDtype enum

* fix sphinx

* fix windows

* fix windows
* add support for pyarrow backed string data type

* fix regression for pandas < 1.3.0

* add verbosity to test run

* loosen strategies unit tests deadline, exclude windows ci

* loosen test_strategies.py tests

* use "dev" hypothesis profile for python 3.7

* add pandas==1.2.5 test

* fix ci

* ci typo

* don't install environment.yml on unit tests

* install nox in ci

* remove environment.yml

* update environment in ci

Co-authored-by: cosmicBboy <[email protected]>
@codecov
Copy link

codecov bot commented Jul 13, 2021

Codecov Report

Merging #559 (518a53b) into release/0.7.0 (d8ae89c) will decrease coverage by 1.88%.
The diff coverage is 93.75%.

Impacted file tree graph

@@                Coverage Diff                @@
##           release/0.7.0     #559      +/-   ##
=================================================
- Coverage          99.47%   97.58%   -1.89%     
=================================================
  Files                 21       25       +4     
  Lines               2661     3194     +533     
=================================================
+ Hits                2647     3117     +470     
- Misses                14       77      +63     
Impacted Files Coverage Δ
pandera/checks.py 98.54% <ø> (ø)
pandera/schema_inference.py 100.00% <ø> (ø)
pandera/engines/numpy_engine.py 87.50% <87.50%> (ø)
pandera/dtypes.py 91.37% <91.32%> (-8.63%) ⬇️
pandera/engines/engine.py 93.40% <93.40%> (ø)
pandera/engines/pandas_engine.py 95.83% <95.83%> (ø)
pandera/__init__.py 100.00% <100.00%> (ø)
pandera/deprecations.py 100.00% <100.00%> (ø)
pandera/io.py 100.00% <100.00%> (ø)
pandera/model.py 99.12% <100.00%> (-0.88%) ⬇️
... and 12 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d8ae89c...518a53b. Read the comment docs.

@jeffzi
Copy link
Collaborator

jeffzi commented Jul 15, 2021

Exciting news ! For 0.7.1 I'll take a look at opened issues that can be fixed using the new internals.

Thanks for all your hard work on this

My pleasure 🎉

@cosmicBboy cosmicBboy merged commit bc555b9 into release/0.7.0 Jul 15, 2021
cosmicBboy added a commit that referenced this pull request Jul 22, 2021
* refactor PandasDtype into class hierarchy supported by engines

* refactor DataFrameSchema based on DataType hierarchy

* refactor SchemaModel based on DataType hierarchy

* revert fix coerce=True and dtype=None should be a noop

* apply code style

* fix running tests/core with nox

* consolidate dtype names

* consolidate engine internal naming

* disable inherited __init__ with immutable(init=False)

* delete duplicated immutable

* disambiguate dtype variables

* add warning on base pandas_engine, numpy_engine.DataType init

* fix pylint, mypy errors

* fix DataFrameSchema.dtypes return type

* enable CI on dtypes branch

* Refactor inference, schema_statistics, strategies and io using the DataType hierarchy (#504)

* fix pandas_engine.Interval

* fix Timedelta64 registration with pandas_engine.Engine

* add DataType helpers

* add DataType.continuous attribute

* add dtypes.is_numeric

* refactor schema_statistics based on DataType hierarchy

* refactor schema_inference based on DataType hierarchy

* fix numpy_engine.Timedelta64.type

* add is_subdtype helper

* add Engine.get_registered_dtypes

* fix Engine error when registering a base DataType

* fix pandas_engine DateTime string alias

* clean up test_dtypes

* fix test_extensions

* refactor strategies based on DataType hierarchy

* refactor io based on DataType hierarchy

* replace dtypes module by new DataType hierarchy

* fix black

* delete dtypes_.py

* drop legacy pandas and python 3.6 from CI

* fix mypy errors

* fix ci-docs

* fix conda dependencies

* fix lint, update noxfile

* simplify nox tests, fix test_io

* update ci build

* update nox

* pin nox, handle windows data types

* fix windows platform

* fix pandas_engine on windows platform

* fix test_dtypes on windows platform

* force pip on docs CI

* test out windows dtype stuff

* more messing around with windows

* more debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* revert ci

* increase cache

* testing

Co-authored-by: cosmicBboy <[email protected]>

* Add DataTypes documentation (#536)

* delete print statements

* pin furo

* fix generated docs not removed by nox

* re-organize API section

* replace aliased pandas_engine data types with their aliases

* drop warning when calling Engine.register_dtype without arguments

* add data types to api reference doc

* add document for DataType refactor

* unpin sphinx and drop sphinx_rtd_theme

* add xdoctest

* ignore prompt when copying example from doc

* add doctest builder when running sphinx-build locally

* fix dtypes doc examples

* fix pandas_engine.DataType.check

* fix pylint

* remove whitespaces in dtypes doc

* Update docs/source/dtypes.rst

* Update dtypes.rst

* update docs structure

* update nox file

* force pip on doctests

* update test_schemas

* fix docs session not overriding html with doctest output

Co-authored-by: Niels Bantilan <[email protected]>

* add deprecation warnings for pandas_dtype and PandasDtype enum (#547)

* remove auto-generated docs

* add deprecation warnings, support pandas>=1.3.0

* add deprecation warnings for PandasDtype enum

* fix sphinx

* fix windows

* fix windows

* add support for pyarrow backed string data type (#548)

* add support for pyarrow backed string data type

* fix regression for pandas < 1.3.0

* add verbosity to test run

* loosen strategies unit tests deadline, exclude windows ci

* loosen test_strategies.py tests

* use "dev" hypothesis profile for python 3.7

* add pandas==1.2.5 test

* fix ci

* ci typo

* don't install environment.yml on unit tests

* install nox in ci

* remove environment.yml

* update environment in ci

Co-authored-by: cosmicBboy <[email protected]>

Co-authored-by: Jean-Francois Zinque <[email protected]>
cosmicBboy added a commit that referenced this pull request Jul 24, 2021
* Feature/420 (#454)

* parse frictionless schema

- using frictionless-py for some of the heavy lifting
- accept yaml/json/frictionless schema files/objects directly
- frictionless becomes a new requirement for io
- apply pre-commit formatting updates to other code in pandera.io
- add test to validate schema parsing, from yaml and json sources

* improve documentation

* update docstrings per code review

Co-authored-by: Niels Bantilan <[email protected]>

* add type hints

* standardise class properties for easier re-use in future

* simplify key check

* add missing alternative type

* update docstring

* align name with Column arg

* fix NaN check

* fix type assertion

* create empty dict if constraints not provided

Co-authored-by: Niels Bantilan <[email protected]>

* decouple pandera and pandas dtypes (#559)

* refactor PandasDtype into class hierarchy supported by engines

* refactor DataFrameSchema based on DataType hierarchy

* refactor SchemaModel based on DataType hierarchy

* revert fix coerce=True and dtype=None should be a noop

* apply code style

* fix running tests/core with nox

* consolidate dtype names

* consolidate engine internal naming

* disable inherited __init__ with immutable(init=False)

* delete duplicated immutable

* disambiguate dtype variables

* add warning on base pandas_engine, numpy_engine.DataType init

* fix pylint, mypy errors

* fix DataFrameSchema.dtypes return type

* enable CI on dtypes branch

* Refactor inference, schema_statistics, strategies and io using the DataType hierarchy (#504)

* fix pandas_engine.Interval

* fix Timedelta64 registration with pandas_engine.Engine

* add DataType helpers

* add DataType.continuous attribute

* add dtypes.is_numeric

* refactor schema_statistics based on DataType hierarchy

* refactor schema_inference based on DataType hierarchy

* fix numpy_engine.Timedelta64.type

* add is_subdtype helper

* add Engine.get_registered_dtypes

* fix Engine error when registering a base DataType

* fix pandas_engine DateTime string alias

* clean up test_dtypes

* fix test_extensions

* refactor strategies based on DataType hierarchy

* refactor io based on DataType hierarchy

* replace dtypes module by new DataType hierarchy

* fix black

* delete dtypes_.py

* drop legacy pandas and python 3.6 from CI

* fix mypy errors

* fix ci-docs

* fix conda dependencies

* fix lint, update noxfile

* simplify nox tests, fix test_io

* update ci build

* update nox

* pin nox, handle windows data types

* fix windows platform

* fix pandas_engine on windows platform

* fix test_dtypes on windows platform

* force pip on docs CI

* test out windows dtype stuff

* more messing around with windows

* more debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* revert ci

* increase cache

* testing

Co-authored-by: cosmicBboy <[email protected]>

* Add DataTypes documentation (#536)

* delete print statements

* pin furo

* fix generated docs not removed by nox

* re-organize API section

* replace aliased pandas_engine data types with their aliases

* drop warning when calling Engine.register_dtype without arguments

* add data types to api reference doc

* add document for DataType refactor

* unpin sphinx and drop sphinx_rtd_theme

* add xdoctest

* ignore prompt when copying example from doc

* add doctest builder when running sphinx-build locally

* fix dtypes doc examples

* fix pandas_engine.DataType.check

* fix pylint

* remove whitespaces in dtypes doc

* Update docs/source/dtypes.rst

* Update dtypes.rst

* update docs structure

* update nox file

* force pip on doctests

* update test_schemas

* fix docs session not overriding html with doctest output

Co-authored-by: Niels Bantilan <[email protected]>

* add deprecation warnings for pandas_dtype and PandasDtype enum (#547)

* remove auto-generated docs

* add deprecation warnings, support pandas>=1.3.0

* add deprecation warnings for PandasDtype enum

* fix sphinx

* fix windows

* fix windows

* add support for pyarrow backed string data type (#548)

* add support for pyarrow backed string data type

* fix regression for pandas < 1.3.0

* add verbosity to test run

* loosen strategies unit tests deadline, exclude windows ci

* loosen test_strategies.py tests

* use "dev" hypothesis profile for python 3.7

* add pandas==1.2.5 test

* fix ci

* ci typo

* don't install environment.yml on unit tests

* install nox in ci

* remove environment.yml

* update environment in ci

Co-authored-by: cosmicBboy <[email protected]>

Co-authored-by: Jean-Francois Zinque <[email protected]>

* improve coverage

* fix docs

* add pandas accessor tests

* pin sphinx

* fix lint

Co-authored-by: Tom Collingwood <[email protected]>
Co-authored-by: Jean-Francois Zinque <[email protected]>
@cosmicBboy cosmicBboy deleted the dtypes branch March 26, 2022 17:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants