Pandera: A flexible and expressive pandas data validation library. #12

cosmicBboy · 2019-08-14T22:21:25Z

Submitting Author: Niels Bantilan (@cosmicBboy)
All current maintainers: (@cosmicBboy)
Package Name: pandera
One-Line Description of Package: validate the types, properties, and statistics of pandas data structures
Repository Link: https://github.com/unionai-oss/pandera
Version submitted: 0.1.5
Editor: @lwasser
Reviewer 1: @mbjoseph
Reviewer 2: @xmnlab
Archive: https://github.com/pandera-dev/pandera/releases/tag/v0.2.3
Version accepted: v0.2.3
Date Accepted: 10/10/2019

Description

pandas data structures can hide a lot of information, and explicitly
validating them at runtime in production-critical or reproducible research
settings is a good idea for building reliable data transformation pipelines.
pandera enables users to:

Check the types and properties of columns in a DataFrame or values in
a Series.
Perform descriptive and inferential statistical validation, e.g. two-sample
t-tests.
Seamlessly integrate with existing data analysis/processing pipelines
via function decorators.

pandera provides a flexible and expressive API for performing data validation
on tidy (long-form) and wide data to make data processing pipelines more
readable and robust.

Scope

Please indicate which category or categories this package falls under:
- Data retrieval
- Data extraction
- Data munging
- Data deposition
- Reproducibility
- Geospatial
- Education
- Data visualization*

* Please fill out a pre-submission inquiry before submitting a data visualization package. For more info, see this section of our guidebook.

Explain how and why the package falls under these categories (briefly, 1-2 sentences):

Data munging: the package makes ETL, data analysis, and data processing
pipelines more robust and reliable by providing users with tools to validate
assumptions about the schema and statistical properties of datasets.
This package supports validation on long (tidy) data and wide data.

Reproducibility: This package enables users to validate DataFrame or Series
objects at runtime or as unit/integration tests, and can easily be integrated
to existing pipelines using the check_input and check_output decorators.
It also supports collaboration and reproducible research by programmatically
enforcing assertions made about the statistical properties of a dataset in
addition to making it easier to review pandas code in production-critical
contexts.

Who is the target audience and what are scientific applications of this package?

The target audience of pandera consist of data scientists, data engineers,
machine learning engineers, and machine learning scientists who use pandas in
their data processing pipelines for various purposes e.g., transforming data
for reporting, analytics, model training, and data visualization. This tool is
built on top of pandas and scipy to provide a user-friendly interface for
explicitly specifying the set of properties that a DataFrame or Series must
fulfill in order to be considered valid. Since pandera makes no assumptions
about the domain of study or contents of these pandas data structures, it
could be used in a wide variety of quantitative fields that involve the
analysis of tabular data.

Are there other Python packages that accomplish the same thing? If so, how does yours differ?

There are a few alternatives to pandera in the the Python ecosystem and here
is how they compare:

https://github.com/alecthomas/voluptuous
- not specific to pandas, applies to JSON/YAML etc.
- very flexible and reasonably simple
- no decorators, hypothesis or sophisticated checks
https://github.com/keleshev/schema
- similar to voloptuous
- validation of generic python data structures
https://github.com/TMiguelT/PandasSchema
- has a wider range of 'built-in' validator types
- limited type support (only has a conversion/coercion check)
- no decorators
- implementation has less flexibility than pandera's
- has generic 'check'-like validators
https://github.com/danielvdende/opulent-pandas
- similar to voluptuous, and conceptually similar to pandera, but lacking
  functionality
https://github.com/c-data/pandas-validator
- not maintained, inflexible syntax
https://github.com/xguse/table_enforcer
- not maintained
- the Enforcer and Column objects are very similar to pandera, but it's a
  little difficult to follow

Key differentiators of pandera:

column data types, nullability, and uniqueness are first-class concepts.
check_input and check_output decorators enable seamless integration with
existing code.
Checks provide flexibility and performance by providing access to pandas
API by design.
Hypothesis class provides a tidy-first interface for statistical hypothesis
testing.
Checks and Hypothesis objects support both tidy and wide data validation.
Comprehensive documentation on key functionality.
If you made a pre-submission enquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted:

https://pyopensci.discourse.group/t/candidate-package-pandera-a-flexible-pandas-data-structure-validation-package/92

Technical checks

For details about the pyOpenSci packaging requirements, see our packaging guide. Confirm each of the following by checking the box. This package:

does not violate the Terms of Service of any service it interacts with.
has an OSI approved license
contains a README with instructions for installing the development version.
includes documentation with examples for all functions.
contains a vignette with examples of its essential functions and uses.
has a test suite.
has continuous integration, such as Travis CI, AppVeyor, CircleCI, and/or others.

Publication options

Do you wish to automatically submit to the Journal of Open Source Software? If so:

JOSS Checks

The package has an obvious research application according to JOSS's definition in their submission requirements. Be aware that completing the pyOpenSci review process does not guarantee acceptance to JOSS. Be sure to read their submission requirements (linked above) if you are interested in submitting to JOSS.
The package is not a "minor utility" as defined by JOSS's submission requirements: "Minor ‘utility’ packages, including ‘thin’ API clients, are not acceptable." pyOpenSci welcomes these packages under "Data Retrieval", but JOSS has slightly different criteria.
The package contains a paper.md matching JOSS's requirements with a high-level description in the package root or in inst/.
The package is deposited in a long-term repository with the DOI:

Note: Do not submit your package separately to JOSS

Are you OK with Reviewers Submitting Issues to your Repo Directly?

This option will allow reviewers to open smaller issues that can then be linked to PR's rather than submitting a more dense text based review. It will also allow you to demonstrate addressing the issue via PR links.

Yes I am OK with reviewers submitting requested changes as issues to my repo. Reviewers will then link to the issues in their submitted review.

Code of conduct

I agree to abide by pyOpenSci's Code of Conduct during the review process and in maintaining my package should it be accepted.

P.S. Have feedback/comments about our review process? Leave a comment here

Editor and Review Templates

Editor and review templates can be found here

Previous Repo: https://github.com/cosmicBboy/pandera

The text was updated successfully, but these errors were encountered:

lwasser · 2019-08-19T19:26:03Z

Thank you @cosmicBboy !! we will get back to you with the editor / review process next steps !!

lwasser · 2019-08-23T22:19:53Z

Editor checks:

Fit: The package meets criteria for fit and overlap.
Automated tests: Package has a testing suite and is tested via Travis-CI or another CI service. Might add better dev setup instructions for contributing... but i see a dev envt txt
License: The package has an OSI accepted license MIT License
Repository: The repository link resolves correctly
Archive (JOSS only, may be post-review): The repository DOI resolves correctly
Version (JOSS only, may be post-review): Does the release version given match the GitHub release (v1.0.0)?

Editor comments

Reviewers: @mbjoseph @xmnlab
Due date: @mbjoseph we agreed to do reviews one at a time. Given that, is a 2 week deadline (which would be September 6) ok for your schedule? if that is ok then @xmnlab i will ping you once Max's review is in and you can begin your review!! @cosmicBboy has agreed to issues and PR's if you want to create a review using that approach rather than all text in this issue (links to the issue and/or PR may be preferred). Thank you all for your time!!

mbjoseph · 2019-08-27T13:58:07Z

@lwasser yes! A 2 week deadline works for me. I'll have my review in by Sep 6.

lwasser · 2019-08-28T17:21:56Z

@mbjoseph thank you!! and thank you for being willing to help @xmnlab out as well but submitting the first review. Ivan, we can totally support your first review for pyopensci!! so psyched to have you on board with us.

cosmicBboy · 2019-08-28T17:51:51Z

thanks everyone for participating in this review! Just FYI, the pandera issues page has a couple of tickets that may be of interest for reviewers.

We're planning on a 0.2.0 release in the next week or so.

xmnlab · 2019-08-28T20:24:54Z

@lwasser thank you so much! I am excited to contribute to pyopensci project! <3

mbjoseph · 2019-09-03T20:17:31Z

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

A statement of need clearly stating problems the software is designed to solve and its target audience in README
Installation instructions: for the development version of package and any non-standard dependencies in README
Vignette(s) demonstrating major functionality that runs successfully locally
Function Documentation: for all user-facing functions
Examples for all user-facing functions
Community guidelines including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer.

Readme requirements
The package meets the readme requirements below:

Package has a README.md file in the root directory.

The README should include, from top to bottom:

The package name
Badges for continuous integration and test coverage, the badge for pyOpenSci peer-review once it has started (see below), a repostatus.org badge, and any other badges. If the README has many more badges, you might want to consider using a table for badges, see this example, that one and that one. Such a table should be more wide than high.
Short description of goals of package, with descriptive links to all vignettes (rendered, i.e. readable, cf the documentation website section) unless the package is small and there’s only one vignette repeating the README.
Installation instructions
Any additional setup required (authentication tokens, etc)
Brief demonstration usage
Direction to more detailed documentation (e.g. your documentation files or website).
If applicable, how the package compares to other similar packages and/or how it relates to other packages
Citation information

Functionality

Installation: Installation succeeds as documented.
Functionality: Any functional claims of the software been confirmed.
Performance: Any performance claims of the software been confirmed.
Automated tests: Tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine.
Continuous Integration: Has continuous integration, such as Travis CI, AppVeyor, CircleCI, and/or others.
Packaging guidelines: The package conforms to the pyOpenSci packaging guidelines.

Final approval (post-review)

The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing: 6

Review Comments

Overall, this is a great package with a clear scope, good docs, and good testing infrastructure. Clearly, a lot of effort has been put into its development, and as somebody who works with raw data, something like this would be immediately useful. With this in mind, most of my comments are fairly minor.

Bigger points:

These relate to the top-level boxes for the pyOpenSci review process that I could not check.

API documentation is in pretty good shape, but there are some things without a description in the API docs (e.g., https://pandera.readthedocs.io/en/stable/API.html#pandera.Check.error_message).
I am not checking the box for "Examples for all user-facing functions". Taken literally, there are user-facing functions that do not have examples (e.g., generic_error_message), though I believe the examples cover the most common use cases. It might be a good idea to prefix some of these methods that users aren't expected to use with an underscore, or if it makes more sense to add examples (e.g., via doctest in the API docs), that could also be worth considering.

Minor notes

These are a smattering of questions I ran into, and notes that might help improve the package.

Test coverage is pretty high - any particular reason why the remaining lines are not tested?
There are some deprecation warnings that arise in running the tests: https://travis-ci.org/pandera-dev/pandera/jobs/579197344#L2287
Citation info is missing from the README, and could be added if you wanted to make it easy for others to cite the package.
CI testing on OSX and Windows might be nice too.
"Column Hypothesis test support testing different column so that assertions can be made about the relationships..." -- would "tests" work better? https://pandera.readthedocs.io/en/stable/dataframe_schemas.html
Backtick usage is somewhat inconsistent in the docs, e.g., Column vs. Column
SeriesSchema docs seem to have an unfinished sentence on Series Validation: https://pandera.readthedocs.io/en/stable/series_schemas.html#series-validation
pd.series should be pd.Series or pandas.Series (used below): https://pandera.readthedocs.io/en/stable/checks.html#checking-values-within-a-column
In describing how the function signature of Check changes, there may be a typo: "This changes the function signature of the Check function so that its input is a dict where keys are the group names and keys are subsets of the Column series." (https://pandera.readthedocs.io/en/stable/checks.html#column-check-groups). Should this be keys and values instead of keys and keys?
There are a few places where a significance treshhold/alpha value of 0.5 is used in the Hypothesis docs (https://pandera.readthedocs.io/en/stable/hypothesis.html#hypothesis-testing). Should this be 0.05, which seems like a more commonly used threshold than 0.5?
Some URLs still point to cosmicBboy/pandera: https://github.com/pandera-dev/pandera/search?q=cosmicbboy%2Fpandera&unscoped_q=cosmicbboy%2Fpandera
Why not conda-forge instead of the cosmicbboy conda channel?
Installation instructions look great for released versions, but you could also add installation instructions for the dev version (e.g., pip install -e .).
There is inconsistent capitalization of dataframe (also DataFrame): https://pandera.readthedocs.io/en/stable/dataframe_schemas.html
pylint points out some places where the code could be streamlined a bit (e.g., unnecessary else statements, and some cases where object is explicitly declared as a parent class), but none of the output is indicative of major problems. Feel free to address or ignore any of these checks:

>>> pylint pandera
************* Module pandera
pandera/__init__.py:1:0: C0111: Missing module docstring (missing-docstring)
************* Module pandera.dtypes
pandera/dtypes.py:6:0: C0111: Missing class docstring (missing-docstring)
pandera/dtypes.py:17:0: C0103: Constant name "Bool" doesn't conform to UPPER_CASE naming style (invalid-name)
pandera/dtypes.py:18:0: C0103: Constant name "DateTime" doesn't conform to UPPER_CASE naming style (invalid-name)
pandera/dtypes.py:19:0: C0103: Constant name "Category" doesn't conform to UPPER_CASE naming style (invalid-name)
pandera/dtypes.py:20:0: C0103: Constant name "Float" doesn't conform to UPPER_CASE naming style (invalid-name)
pandera/dtypes.py:21:0: C0103: Constant name "Int" doesn't conform to UPPER_CASE naming style (invalid-name)
pandera/dtypes.py:22:0: C0103: Constant name "Object" doesn't conform to UPPER_CASE naming style (invalid-name)
pandera/dtypes.py:23:0: C0103: Constant name "String" doesn't conform to UPPER_CASE naming style (invalid-name)
pandera/dtypes.py:24:0: C0103: Constant name "Timedelta" doesn't conform to UPPER_CASE naming style (invalid-name)
************* Module pandera.constants
pandera/constants.py:1:0: C0111: Missing module docstring (missing-docstring)
************* Module pandera.errors
pandera/errors.py:4:0: C0111: Missing class docstring (missing-docstring)
pandera/errors.py:8:0: C0111: Missing class docstring (missing-docstring)
pandera/errors.py:12:0: C0111: Missing class docstring (missing-docstring)
************* Module pandera.schemas
pandera/schemas.py:252:0: C0330: Wrong hanging indentation (add 1 space).
                            constants.N_FAILURE_CASES).to_dict()))
                            ^| (bad-continuation)
pandera/schemas.py:258:0: C0330: Wrong hanging indentation (add 1 space).
                            constants.N_FAILURE_CASES).to_dict()))
                            ^| (bad-continuation)
pandera/schemas.py:268:0: C0330: Wrong hanging indentation (add 1 space).
                        constants.N_FAILURE_CASES).to_dict()))
                        ^| (bad-continuation)
pandera/schemas.py:11:0: R0205: Class 'DataFrameSchema' inherits from object, can be safely removed from bases in python3 (useless-object-inheritance)
pandera/schemas.py:14:4: R0913: Too many arguments (7/5) (too-many-arguments)
pandera/schemas.py:56:4: R0913: Too many arguments (6/5) (too-many-arguments)
pandera/schemas.py:79:25: W0212: Access to a protected member _checks of a client class (protected-access)
pandera/schemas.py:105:28: C1801: Do not use `len(SEQUENCE)` to determine if a sequence is empty (len-as-condition)
pandera/schemas.py:118:4: R0913: Too many arguments (6/5) (too-many-arguments)
pandera/schemas.py:172:0: R0205: Class 'SeriesSchemaBase' inherits from object, can be safely removed from bases in python3 (useless-object-inheritance)
pandera/schemas.py:175:4: R0913: Too many arguments (6/5) (too-many-arguments)
pandera/schemas.py:246:16: R1720: Unnecessary "else" after "raise" (no-else-raise)
pandera/schemas.py:219:4: R0912: Too many branches (13/12) (too-many-branches)
pandera/schemas.py:172:0: R0903: Too few public methods (1/2) (too-few-public-methods)
pandera/schemas.py:285:0: C0111: Missing class docstring (missing-docstring)
pandera/schemas.py:287:4: R0913: Too many arguments (6/5) (too-many-arguments)
pandera/schemas.py:287:4: W0235: Useless super delegation in method '__init__' (useless-super-delegation)
pandera/schemas.py:285:0: R0903: Too few public methods (1/2) (too-few-public-methods)
pandera/schemas.py:5:0: C0411: standard import "from typing import Optional" should be placed before "import pandas as pd" (wrong-import-order)
************* Module pandera.checks
pandera/checks.py:98:0: C0330: Wrong hanging indentation (remove 4 spaces).
                "%s failed element-wise validator %d:\n"
            |   ^ (bad-continuation)
pandera/checks.py:100:0: C0330: Wrong hanging indentation (remove 4 spaces).
                (parent_schema, check_index,
            |   ^ (bad-continuation)
pandera/checks.py:59:8: C0103: Attribute name "fn" doesn't conform to snake_case naming style (invalid-name)
pandera/checks.py:12:0: C0111: Missing class docstring (missing-docstring)
pandera/checks.py:12:0: R0205: Class 'Check' inherits from object, can be safely removed from bases in python3 (useless-object-inheritance)
pandera/checks.py:14:4: R0913: Too many arguments (7/5) (too-many-arguments)
pandera/checks.py:77:4: C0111: Missing method docstring (missing-docstring)
pandera/checks.py:163:4: R0201: Method could be a function (no-self-use)
pandera/checks.py:194:8: R1705: Unnecessary "elif" after "return" (no-else-return)
pandera/checks.py:212:8: R1705: Unnecessary "else" after "return" (no-else-return)
pandera/checks.py:238:12: R1705: Unnecessary "elif" after "return" (no-else-return)
pandera/checks.py:261:8: R1720: Unnecessary "elif" after "raise" (no-else-raise)
pandera/checks.py:160:8: W0201: Attribute 'failure_cases' defined outside __init__ (attribute-defined-outside-init)
pandera/checks.py:5:0: C0411: standard import "from functools import partial" should be placed before "import pandas as pd" (wrong-import-order)
pandera/checks.py:6:0: C0411: standard import "from typing import Union, Optional, List, Dict" should be placed before "import pandas as pd" (wrong-import-order)
************* Module pandera.decorators
pandera/decorators.py:64:0: C0330: Wrong hanging indentation (remove 4 spaces).
                        "error in check_input decorator of function '%s': the "
                    |   ^ (bad-continuation)
pandera/decorators.py:68:0: C0330: Wrong hanging indentation (remove 4 spaces).
                        (fn.__name__,
                    |   ^ (bad-continuation)
pandera/decorators.py:74:0: C0330: Wrong hanging indentation.
                        )
                |   |   ^ (bad-continuation)
pandera/decorators.py:13:0: C0103: Argument name "fn" doesn't conform to snake_case naming style (invalid-name)
pandera/decorators.py:22:0: R0913: Too many arguments (6/5) (too-many-arguments)
pandera/decorators.py:57:4: C0103: Argument name "fn" doesn't conform to snake_case naming style (invalid-name)
pandera/decorators.py:62:12: C0103: Variable name "e" doesn't conform to snake_case naming style (invalid-name)
pandera/decorators.py:88:12: C0103: Variable name "e" doesn't conform to snake_case naming style (invalid-name)
pandera/decorators.py:57:21: W0613: Unused argument 'instance' (unused-argument)
pandera/decorators.py:100:0: R0913: Too many arguments (6/5) (too-many-arguments)
pandera/decorators.py:135:4: C0103: Argument name "fn" doesn't conform to snake_case naming style (invalid-name)
pandera/decorators.py:153:8: C0103: Variable name "e" doesn't conform to snake_case naming style (invalid-name)
pandera/decorators.py:135:21: W0613: Unused argument 'instance' (unused-argument)
************* Module pandera.schema_components
pandera/schema_components.py:9:0: C0111: Missing class docstring (missing-docstring)
pandera/schema_components.py:11:4: R0913: Too many arguments (7/5) (too-many-arguments)
pandera/schema_components.py:70:4: W0222: Signature differs from overridden '__call__' method (signature-differs)
pandera/schema_components.py:85:0: C0111: Missing class docstring (missing-docstring)
pandera/schema_components.py:87:4: R0913: Too many arguments (6/5) (too-many-arguments)
pandera/schema_components.py:87:4: W0235: Useless super delegation in method '__init__' (useless-super-delegation)
pandera/schema_components.py:101:4: W0222: Signature differs from overridden '__call__' method (signature-differs)
pandera/schema_components.py:110:0: C0111: Missing class docstring (missing-docstring)
pandera/schema_components.py:115:21: W0212: Access to a protected member _name of a client class (protected-access)
pandera/schema_components.py:115:46: W0212: Access to a protected member _name of a client class (protected-access)
pandera/schema_components.py:116:20: W0212: Access to a protected member _pandas_dtype of a client class (protected-access)
pandera/schema_components.py:117:27: W0212: Access to a protected member _checks of a client class (protected-access)
pandera/schema_components.py:118:29: W0212: Access to a protected member _nullable of a client class (protected-access)
pandera/schema_components.py:119:37: W0212: Access to a protected member _allow_duplicates of a client class (protected-access)
pandera/schema_components.py:127:4: W0222: Signature differs from overridden '__call__' method (signature-differs)
************* Module pandera.hypotheses
pandera/hypotheses.py:237:0: C0301: Line too long (103/100) (line-too-long)
pandera/hypotheses.py:30:4: R0913: Too many arguments (8/5) (too-many-arguments)
pandera/hypotheses.py:148:12: R1720: Unnecessary "else" after "raise" (no-else-raise)
pandera/hypotheses.py:168:8: R1705: Unnecessary "else" after "return" (no-else-return)
pandera/hypotheses.py:177:4: R0913: Too many arguments (8/5) (too-many-arguments)
pandera/hypotheses.py:5:0: C0411: standard import "from functools import partial" should be placed before "import pandas as pd" (wrong-import-order)
pandera/hypotheses.py:8:0: C0411: standard import "from typing import Union, Optional, List, Dict" should be placed before "import pandas as pd" (wrong-import-order)
pandera/hypotheses.py:1:0: R0801: Similar lines in 3 files
==pandera.schema_components:86
==pandera.schemas:174
==pandera.schemas:286
    def __init__(
            self,
            pandas_dtype,
            checks: callable = None,
            nullable: bool = False,
            allow_duplicates: bool = True,
            name: str = None): (duplicate-code)
pandera/hypotheses.py:1:0: R0801: Similar lines in 3 files
==pandera.schema_components:10
==pandera.schemas:174
==pandera.schemas:286
    def __init__(
            self,
            pandas_dtype,
            checks: callable = None,
            nullable: bool = False,
            allow_duplicates: bool = True, (duplicate-code)

------------------------------------------------------------------

lwasser · 2019-09-10T14:55:42Z

thank you @mbjoseph for this extremely thorough review. gosh i'm not sure why i didn't see this in my github notifications. my apologies. @xmnlab you can have a look at the review above. Do you want to give the second review a go after seeing what max has pointed out above? If you need any guidance, please say the word!!

xmnlab · 2019-09-10T15:23:25Z

@lwasser sure thing! I am planning to start to work on that today :) thanks!

lwasser · 2019-09-10T16:12:07Z

awesome @xmnlab please reach out if you have any questions !! we are all hear to support. @cosmicBboy just a note that the second reviewer is starting the process. You could have a look at @mbjoseph review if you'd like in the meantime!! thank you all!! :)

cosmicBboy · 2019-09-11T02:30:18Z

thanks @lwasser!

@mbjoseph your review is much appreciated! I've released v0.2.1, where I addressed many of the points that you raised, check out the release notes. @xmnlab FYI I've taken a crack at some of @mbjoseph's comments.

Most notable changes:

add citation information
add dev installation instructions
improve formatting and wording of sphinx documentation (this addresses several of the points you made about formatting and wording in the documentation)
make SchemaError message formatting functions private (generic_error_message and this other methods should have been private all along)
add docstrings to error classes

Minor points:

Test coverage is pretty high - any particular reason why the remaining lines are not tested?

I haven't really had to much time to prioritize covering the rest, though I'd like to prioritize the biggest holes and cover those.

There are some deprecation warnings that arise in running the tests: https://travis-ci.org/pandera-dev/pandera/jobs/579197344#L2287

Planning to do this as part of unionai-oss/pandera#110

CI testing on OSX and Windows might be nice too.

Made an issue for this: unionai-oss/pandera#109

Why not conda-forge instead of the cosmicbboy conda channel?

Yes, would love to get a conda-forge recipe going: unionai-oss/pandera#90

pylint points out some places where the code could be streamlined a bit (e.g., unnecessary else statements, and some cases where object is explicitly declared as a parent class), but none of the output is indicative of major problems. Feel free to address or ignore any of these checks:

Cool, made an issue to add pylint to CI: unionai-oss/pandera#108

xmnlab · 2019-09-11T20:46:10Z

just one question. the version submitted for review is 0.1.5
but it seems pandera has more 2 version after that.

should I review just 0.1.5? the same applies to documentation on readthedocs?

mbjoseph · 2019-09-11T21:23:27Z

IMO @xmnlab you should focus on the most recent version, but @lwasser may also have a preference!

lwasser · 2019-09-11T22:18:47Z

@mbjoseph i think that is a reasonable suggestion!! may i assume you reviewed the most recent version as well? if that is the case then the reviews will be consistent. thank you both!!

mbjoseph · 2019-09-11T22:57:45Z

That's right @lwasser -- my review was for the most recent version at the time, but the package has been updated since (including updates that address my review). So, probably better to work on the most recent version for review 2.

cosmicBboy · 2019-09-12T17:30:45Z

sorry for throwing a wrench in the review process! I probably should have waited on review 2 before updating the package

xmnlab · 2019-09-13T13:56:55Z

thanks for the feedback @mbjoseph and @lwasser ! I am doing the review on the latest version. thanks

xmnlab · 2019-09-14T15:58:58Z

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

A statement of need clearly stating problems the software is designed to solve and its target audience in README
Installation instructions: for the development version of package and any non-standard dependencies in README
Vignette(s) demonstrating major functionality that runs successfully locally
Function Documentation: for all user-facing functions
Examples for all user-facing functions
Community guidelines including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer.

Readme requirements
The package meets the readme requirements below:

Package has a README.md file in the root directory.

The README should include, from top to bottom:

The package name
Badges for continuous integration and test coverage, the badge for pyOpenSci peer-review once it has started (see below), a repostatus.org badge, and any other badges. If the README has many more badges, you might want to consider using a table for badges, see this example, that one and that one. Such a table should be more wide than high.
Short description of goals of package, with descriptive links to all vignettes (rendered, i.e. readable, cf the documentation website section) unless the package is small and there’s only one vignette repeating the README.
Installation instructions
Any additional setup required (authentication tokens, etc)
Brief demonstration usage
Direction to more detailed documentation (e.g. your documentation files or website).
If applicable, how the package compares to other similar packages and/or how it relates to other packages
Citation information

Functionality

Installation: Installation succeeds as documented.
Functionality: Any functional claims of the software been confirmed.
Performance: Any performance claims of the software been confirmed.
Automated tests: Tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine.
Continuous Integration: Has continuous integration, such as Travis CI, AppVeyor, CircleCI, and/or others.
Packaging guidelines: The package conforms to the pyOpenSci packaging guidelines.

Final approval (post-review)

The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing: 4:30

Review Comments

The package looks very good: package structure, documentation, tests and CI looks in very good shape. Some points reported by @mbjoseph were already fixed or already added as an GitHub issue.

I am adding just 2 more comments. Actually the 1st is just a comment related to an issue that was already partial fixed (installation for development) but maybe it could be improved.

Installation instructions:: probably the documentation should recommend python setup.py develop or pip install -e . for the installation in development mode (as @mbjoseph suggested)
Examples: maybe it should consider the usage of example sections for docstrings. It seems the project is using sphinx style for docstrings. I didn't find an official documentation for that but maybe it could help: http://queirozf.com/entries/python-docstrings-reference-examples

lwasser · 2019-09-16T17:21:05Z

awesome. thanks @xmnlab and great job on your first review !!! @cosmicBboy please note the new round of review comments. Ping me when changes have been implemented / you have questions etc!! Thank you all for a really smooth review process!!

cosmicBboy · 2019-09-29T23:12:31Z

thanks @lwasser @xmnlab @mbjoseph!

I've cut a new pandera release 0.2.2 that adds example docstrings to all public-facing classes and methods. The commit also:

docstring examples should be reflected in the docs
changes README with updated development installation instructions.
adds more test coverage in schema.py
fixes unit test pandas FutureDeprecation warnings

Please let me know if you have any questions.

lwasser · 2019-09-30T19:07:01Z

thank you @cosmicBboy !! @mbjoseph @xmnlab will you please have a look at the latest release? let me know if the changes are acceptable given your review! if so, you can check the. "the author has responded to my review" box at the bottom of your review submission. If you see anything that wasn't addressed to your satisfaction please let me know!!

thank you all for such a smooth review process!

mbjoseph · 2019-09-30T23:41:35Z

@cosmicBboy thanks for addressing my suggestions - v0.2.2 looks good to me!

lwasser · 2019-10-01T15:08:38Z

@xmnlab can you kindly have a look at the above and if you are happy with the edits, check the box in your review that states that the author has addressed everything to your satisfaction .

lwasser · 2019-11-13T22:58:48Z

given this has been APPROVED, i will close this issue. If there is any reason to reopen it, please say the word!!!

lwasser · 2021-07-16T20:22:27Z

reopening to keep tabs on JOSS submission!

astrojuanlu · 2021-08-31T13:37:55Z

I tried to locate the pandera paper on JOSS, without success. Am I missing anything?

lwasser · 2021-08-31T21:59:05Z

hey there @astrojuanlu i believe that @cosmicBboy hasn't yet submitted to JOSS. I briefly chatted over twitter i think or maybe at scipy and it wasn't submitted yet. it may not be under review yet. @cosmicBboy can you confirm? i can also remove that tag if you don't plan on submitting there but it sounded like you were interested in doing that at some point. the submission process is fast with JOSS once it goes through our review.

cosmicBboy · 2021-08-31T23:10:33Z

Hi @lwasser @astrojuanlu yes I do intend on submitting a paper to JOSS, I'm still working on a draft and plan on submitting within the next 2-3 weeks.

lwasser · 2021-12-16T21:55:55Z

hey there @cosmicBboy did this ever go through JOSS? i just didn't see the issue referenced here. I am going to close this for the time being but if it does go into JOSS please reference this issue and we can update it accordingly! thank you!

cosmicBboy · 2021-12-16T22:46:27Z

thanks @lwasser will do! Just got swamped with other things, but am committed to submitting through JOSS in the new year

lwasser · 2022-09-15T18:59:22Z

hey 👋 @cosmicBboy @mbjoseph @xmnlab ! I hope that you are all well. I am reaching out here to all reviewers and maintainers about pyOpenSci now that i am working full time on the project (read more here). We have a survey that we'd like for you to fill out so we can:

🔗 HERE IS THE SURVEY LINK 🔗

invite you to our slack channel to participate in our community (if you wish to join - no worries if that is not how you prefer to communicate / participate).
Collect information from you about how we can improve our review process and also better serve maintainers.
The survey should take about 10 minutes to complete depending upon how much you decide to write. This information will help us greatly as we make decisions about how pyOpenSci grows and serves the community. Thank you so much in advance for filling it out.

NOTE: this is different from the form designed for reviewers to sign up to review.
If there are other maintainers for this project, please ping them here and ask them to fill out the survey as well. It is important that we ensure packages are supported long term or sunsetted with sufficient communication to users. Thus we will check in with maintainers annually about maintenance.

Thank you in advance for doing this and supporting pyOpenSci.

lwasser · 2022-09-28T15:45:35Z

hey there @cosmicBboy @mbjoseph 👋 Just a friendly reminder to take 5-10 minutes to fill out our survey . We really appreciate it. Thank you in advance for helping us by filling out the survey!! 🙌 Niels, it's really important for us to collect information from our maintainers so that we can both stay in touch with you regarding package maintenance and also support you through time. We really appreciate your time in filling this out. Also are you the sole maintainer of this package? if not, please have your co-maintainers also fill it out and please list them here as well. Many thanks in advance!

✨ Ivan you only need to do this once :) ping me on slack with any questions!! 🙌

🔗 HERE IS THE SURVEY LINK 🔗

lwasser · 2022-10-19T20:31:07Z

hi again @cosmicBboy and @mbjoseph i'd be super appreciative if your filling our our survey

🔗 HERE IS THE SURVEY LINK 🔗!

I know you are busy and Niels I know you have super exciting job transition life happening now. But i'd appreciate your time. We'd like to check in with maintainers once a year to ensure all is well with package maintenance. Also your input on the survey helps us improve and show funders we are doing good things! Many thanks for your time!

cosmicBboy · 2022-10-20T17:45:30Z

just filled it out!

lwasser · 2022-10-24T12:02:26Z

You rock!! thanks Niels!

NickleDave · 2023-04-24T19:07:59Z

Hi @cosmicBboy we are updating our metadata to be consistent.

When you have a second, can you please confirm for me that at the time of this review you were the only core maintainer? I have added that in the "all current maintainers" field above (as in #109)

cosmicBboy · 2023-07-10T18:44:46Z

Hi @NickleDave sorry for the late response 😅

can you please confirm for me that at the time of this review you were the only core maintainer?
Yes, confirmed

cosmicBboy added 0/pre-review-checks New Submission! labels Aug 14, 2019

lwasser added 2/seeking-reviewers 3/reviewers-assigned labels Aug 23, 2019

lwasser closed this as completed Sep 30, 2019

lwasser reopened this Oct 1, 2019

lwasser added the 4/reviews-in-awaiting-changes label Oct 1, 2019

lwasser closed this as completed Nov 13, 2019

lwasser mentioned this issue Jan 31, 2020

ObsPy: Software Submission for Review #16

Closed

22 tasks

NickleDave mentioned this issue Jul 15, 2020

pystiche: A Framework for Neural Style Transfer #25

Closed

22 tasks

lwasser reopened this Jul 16, 2021

lwasser added 7/under-joss-review and removed 0/pre-review-checks 2/seeking-reviewers 3/reviewers-assigned 4/reviews-in-awaiting-changes 5/awaiting-reviewer-response New Submission! labels Jul 16, 2021

lwasser closed this as completed Dec 16, 2021

lwasser mentioned this issue Apr 24, 2023

Update issue metadata for all reviews #109

Closed

17 tasks

lwasser removed the 7/under-joss-review label Jul 5, 2023

lwasser added this to peer-review-status Jul 11, 2023

lwasser moved this to pyos-accepted in peer-review-status Jul 11, 2023

Pandera: A flexible and expressive pandas data validation library. #12

Pandera: A flexible and expressive pandas data validation library. #12

Comments

cosmicBboy commented Aug 14, 2019 • edited by isabelizimm Loading

Description

Scope

Technical checks

Publication options

Are you OK with Reviewers Submitting Issues to your Repo Directly?

Code of conduct

Editor and Review Templates

lwasser commented Aug 19, 2019

lwasser commented Aug 23, 2019

Editor checks:

Editor comments

mbjoseph commented Aug 27, 2019

lwasser commented Aug 28, 2019

cosmicBboy commented Aug 28, 2019

xmnlab commented Aug 28, 2019

mbjoseph commented Sep 3, 2019 • edited Loading

Package Review

Documentation

Functionality

Final approval (post-review)

Review Comments

Bigger points:

Minor notes

lwasser commented Sep 10, 2019

xmnlab commented Sep 10, 2019

lwasser commented Sep 10, 2019

cosmicBboy commented Sep 11, 2019 • edited Loading

Most notable changes:

Minor points:

xmnlab commented Sep 11, 2019

mbjoseph commented Sep 11, 2019

lwasser commented Sep 11, 2019

mbjoseph commented Sep 11, 2019

cosmicBboy commented Sep 12, 2019

xmnlab commented Sep 13, 2019

xmnlab commented Sep 14, 2019 • edited Loading

Package Review

Documentation

Functionality

Final approval (post-review)

Review Comments

lwasser commented Sep 16, 2019 • edited Loading

cosmicBboy commented Sep 29, 2019

lwasser commented Sep 30, 2019

mbjoseph commented Sep 30, 2019

lwasser commented Oct 1, 2019

lwasser commented Nov 13, 2019

lwasser commented Jul 16, 2021

astrojuanlu commented Aug 31, 2021 • edited Loading

lwasser commented Aug 31, 2021

cosmicBboy commented Aug 31, 2021

lwasser commented Dec 16, 2021

cosmicBboy commented Dec 16, 2021

lwasser commented Sep 15, 2022

lwasser commented Sep 28, 2022

lwasser commented Oct 19, 2022

cosmicBboy commented Oct 20, 2022

lwasser commented Oct 24, 2022

NickleDave commented Apr 24, 2023

cosmicBboy commented Jul 10, 2023

cosmicBboy commented Aug 14, 2019 •

edited by isabelizimm

Loading

mbjoseph commented Sep 3, 2019 •

edited

Loading

cosmicBboy commented Sep 11, 2019 •

edited

Loading

xmnlab commented Sep 14, 2019 •

edited

Loading

lwasser commented Sep 16, 2019 •

edited

Loading

astrojuanlu commented Aug 31, 2021 •

edited

Loading