-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pandera: A flexible and expressive pandas data validation library. #12
Comments
Thank you @cosmicBboy !! we will get back to you with the editor / review process next steps !! |
Editor checks:
Editor commentsReviewers: @mbjoseph @xmnlab |
@lwasser yes! A 2 week deadline works for me. I'll have my review in by Sep 6. |
thanks everyone for participating in this review! Just FYI, the pandera issues page has a couple of tickets that may be of interest for reviewers. We're planning on a 0.2.0 release in the next week or so. |
@lwasser thank you so much! I am excited to contribute to pyopensci project! <3 |
Package ReviewPlease check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
DocumentationThe package includes all the following forms of documentation:
Readme requirements
The README should include, from top to bottom:
Functionality
Final approval (post-review)
Estimated hours spent reviewing: 6 Review CommentsOverall, this is a great package with a clear scope, good docs, and good testing infrastructure. Clearly, a lot of effort has been put into its development, and as somebody who works with raw data, something like this would be immediately useful. With this in mind, most of my comments are fairly minor. Bigger points:These relate to the top-level boxes for the pyOpenSci review process that I could not check.
Minor notesThese are a smattering of questions I ran into, and notes that might help improve the package.
|
thank you @mbjoseph for this extremely thorough review. gosh i'm not sure why i didn't see this in my github notifications. my apologies. @xmnlab you can have a look at the review above. Do you want to give the second review a go after seeing what max has pointed out above? If you need any guidance, please say the word!! |
@lwasser sure thing! I am planning to start to work on that today :) thanks! |
awesome @xmnlab please reach out if you have any questions !! we are all hear to support. @cosmicBboy just a note that the second reviewer is starting the process. You could have a look at @mbjoseph review if you'd like in the meantime!! thank you all!! :) |
thanks @lwasser! @mbjoseph your review is much appreciated! I've released v0.2.1, where I addressed many of the points that you raised, check out the release notes. @xmnlab FYI I've taken a crack at some of @mbjoseph's comments. Most notable changes:
Minor points:
I haven't really had to much time to prioritize covering the rest, though I'd like to prioritize the biggest holes and cover those.
Planning to do this as part of unionai-oss/pandera#110
Made an issue for this: unionai-oss/pandera#109
Yes, would love to get a conda-forge recipe going: unionai-oss/pandera#90
Cool, made an issue to add pylint to CI: unionai-oss/pandera#108 |
just one question. the version submitted for review is should I review just |
@mbjoseph i think that is a reasonable suggestion!! may i assume you reviewed the most recent version as well? if that is the case then the reviews will be consistent. thank you both!! |
That's right @lwasser -- my review was for the most recent version at the time, but the package has been updated since (including updates that address my review). So, probably better to work on the most recent version for review 2. |
sorry for throwing a wrench in the review process! I probably should have waited on review 2 before updating the package |
Package ReviewPlease check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
DocumentationThe package includes all the following forms of documentation:
Readme requirements
The README should include, from top to bottom:
Functionality
Final approval (post-review)
Estimated hours spent reviewing: 4:30 Review CommentsThe package looks very good: package structure, documentation, tests and CI looks in very good shape. Some points reported by @mbjoseph were already fixed or already added as an GitHub issue. I am adding just 2 more comments. Actually the 1st is just a comment related to an issue that was already partial fixed (installation for development) but maybe it could be improved.
|
awesome. thanks @xmnlab and great job on your first review !!! @cosmicBboy please note the new round of review comments. Ping me when changes have been implemented / you have questions etc!! Thank you all for a really smooth review process!! |
thanks @lwasser @xmnlab @mbjoseph! I've cut a new pandera release 0.2.2 that adds example docstrings to all public-facing classes and methods. The commit also:
Please let me know if you have any questions. |
thank you @cosmicBboy !! @mbjoseph @xmnlab will you please have a look at the latest release? let me know if the changes are acceptable given your review! if so, you can check the. "the author has responded to my review" box at the bottom of your review submission. If you see anything that wasn't addressed to your satisfaction please let me know!! thank you all for such a smooth review process! |
@cosmicBboy thanks for addressing my suggestions - v0.2.2 looks good to me! |
@xmnlab can you kindly have a look at the above and if you are happy with the edits, check the box in your review that states that the author has addressed everything to your satisfaction . |
given this has been APPROVED, i will close this issue. If there is any reason to reopen it, please say the word!!! |
reopening to keep tabs on JOSS submission! |
I tried to locate the pandera paper on JOSS, without success. Am I missing anything? |
hey there @astrojuanlu i believe that @cosmicBboy hasn't yet submitted to JOSS. I briefly chatted over twitter i think or maybe at scipy and it wasn't submitted yet. it may not be under review yet. @cosmicBboy can you confirm? i can also remove that tag if you don't plan on submitting there but it sounded like you were interested in doing that at some point. the submission process is fast with JOSS once it goes through our review. |
Hi @lwasser @astrojuanlu yes I do intend on submitting a paper to JOSS, I'm still working on a draft and plan on submitting within the next 2-3 weeks. |
hey there @cosmicBboy did this ever go through JOSS? i just didn't see the issue referenced here. I am going to close this for the time being but if it does go into JOSS please reference this issue and we can update it accordingly! thank you! |
thanks @lwasser will do! Just got swamped with other things, but am committed to submitting through JOSS in the new year |
hey 👋 @cosmicBboy @mbjoseph @xmnlab ! I hope that you are all well. I am reaching out here to all reviewers and maintainers about pyOpenSci now that i am working full time on the project (read more here). We have a survey that we'd like for you to fill out so we can:
NOTE: this is different from the form designed for reviewers to sign up to review. Thank you in advance for doing this and supporting pyOpenSci. |
hey there @cosmicBboy @mbjoseph 👋 Just a friendly reminder to take 5-10 minutes to fill out our survey . We really appreciate it. Thank you in advance for helping us by filling out the survey!! 🙌 Niels, it's really important for us to collect information from our maintainers so that we can both stay in touch with you regarding package maintenance and also support you through time. We really appreciate your time in filling this out. Also are you the sole maintainer of this package? if not, please have your co-maintainers also fill it out and please list them here as well. Many thanks in advance! ✨ Ivan you only need to do this once :) ping me on slack with any questions!! 🙌 |
hi again @cosmicBboy and @mbjoseph i'd be super appreciative if your filling our our survey I know you are busy and Niels I know you have super exciting job transition life happening now. But i'd appreciate your time. We'd like to check in with maintainers once a year to ensure all is well with package maintenance. Also your input on the survey helps us improve and show funders we are doing good things! Many thanks for your time! |
just filled it out! |
You rock!! thanks Niels! |
Hi @cosmicBboy we are updating our metadata to be consistent. When you have a second, can you please confirm for me that at the time of this review you were the only core maintainer? I have added that in the "all current maintainers" field above (as in #109) |
Hi @NickleDave sorry for the late response 😅
|
Submitting Author: Niels Bantilan (@cosmicBboy)
All current maintainers: (@cosmicBboy)
Package Name: pandera
One-Line Description of Package: validate the types, properties, and statistics of pandas data structures
Repository Link: https://github.com/unionai-oss/pandera
Version submitted: 0.1.5
Editor: @lwasser
Reviewer 1: @mbjoseph
Reviewer 2: @xmnlab
Archive: https://github.com/pandera-dev/pandera/releases/tag/v0.2.3
Version accepted: v0.2.3
Date Accepted: 10/10/2019
Description
pandas
data structures can hide a lot of information, and explicitlyvalidating them at runtime in production-critical or reproducible research
settings is a good idea for building reliable data transformation pipelines.
pandera
enables users to:DataFrame
or values ina
Series
.t-tests.
via function decorators.
pandera
provides a flexible and expressive API for performing data validationon tidy (long-form) and wide data to make data processing pipelines more
readable and robust.
Scope
* Please fill out a pre-submission inquiry before submitting a data visualization package. For more info, see this section of our guidebook.
Data munging: the package makes ETL, data analysis, and data processing
pipelines more robust and reliable by providing users with tools to validate
assumptions about the schema and statistical properties of datasets.
This package supports validation on long (tidy) data and wide data.
Reproducibility: This package enables users to validate
DataFrame
orSeries
objects at runtime or as unit/integration tests, and can easily be integrated
to existing pipelines using the
check_input
andcheck_output
decorators.It also supports collaboration and reproducible research by programmatically
enforcing assertions made about the statistical properties of a dataset in
addition to making it easier to review pandas code in production-critical
contexts.
The target audience of
pandera
consist of data scientists, data engineers,machine learning engineers, and machine learning scientists who use
pandas
intheir data processing pipelines for various purposes e.g., transforming data
for reporting, analytics, model training, and data visualization. This tool is
built on top of
pandas
andscipy
to provide a user-friendly interface forexplicitly specifying the set of properties that a
DataFrame
orSeries
mustfulfill in order to be considered valid. Since
pandera
makes no assumptionsabout the domain of study or contents of these
pandas
data structures, itcould be used in a wide variety of quantitative fields that involve the
analysis of tabular data.
There are a few alternatives to pandera in the the Python ecosystem and here
is how they compare:
functionality
Enforcer
andColumn
objects are very similar to pandera, but it's alittle difficult to follow
Key differentiators of pandera:
column data types, nullability, and uniqueness are first-class concepts.
check_input
andcheck_output
decorators enable seamless integration withexisting code.
Check
s provide flexibility and performance by providing access topandas
API by design.
Hypothesis
class provides a tidy-first interface for statistical hypothesistesting.
Check
s andHypothesis
objects support both tidy and wide data validation.Comprehensive documentation on key functionality.
If you made a pre-submission enquiry, please paste the link to the corresponding issue, forum post, or other discussion, or
@tag
the editor you contacted:https://pyopensci.discourse.group/t/candidate-package-pandera-a-flexible-pandas-data-structure-validation-package/92
Technical checks
For details about the pyOpenSci packaging requirements, see our packaging guide. Confirm each of the following by checking the box. This package:
Publication options
JOSS Checks
paper.md
matching JOSS's requirements with a high-level description in the package root or ininst/
.Note: Do not submit your package separately to JOSS
Are you OK with Reviewers Submitting Issues to your Repo Directly?
This option will allow reviewers to open smaller issues that can then be linked to PR's rather than submitting a more dense text based review. It will also allow you to demonstrate addressing the issue via PR links.
Code of conduct
P.S. Have feedback/comments about our review process? Leave a comment here
Editor and Review Templates
Editor and review templates can be found here
Previous Repo: https://github.com/cosmicBboy/pandera
The text was updated successfully, but these errors were encountered: