Presubmission: harmonize-wq #132

jbousquin · 2023-09-01T18:03:57Z

Submitting Author: Justin Bousquin (@jbousquin)
Package Name: harmonize-wq
One-Line Description of Package: Standardize, clean and wrangle Water Quality Portal data into more analytic-ready formats
Repository Link (if existing): https://github.com/USEPA/harmonize-wq

Code of Conduct & Commitment to Maintain Package

I agree to abide by pyOpenSci's Code of Conduct during the review process and in maintaining my package after should it be accepted.
I have read and will commit to package maintenance after the review as per the pyOpenSci Policies Guidelines.

Description

Include a brief paragraph describing what your package does:
The US EPA's Water Quality Portal (WQP) is a data warehouse that facilitates access to data stored in large water quality databases in a common format. There are tools to facilitate both publishing data to and retrieving data from WQP, harmonize-wq is focused on retrieved data (1) cleaning to ensure it meets the required quality standards, and (2) wrangling to get it in a more analytic-ready format. Although there are many examples where this has been done, standardized tools to perform this task could make it less time-intensive, more standardized, and more reproducible.

Community Partnerships

We partner with communities to support peer review with an additional layer of
checks that satisfy community requirements. If your package fits into an
existing community please check below:

Pangeo
- My package adheres to the Pangeo standards listed in the pyOpenSci peer review guidebook

Scope

Please indicate which category or categories this package falls under:

Scope

Please indicate which category or categories.
Check out our package scope page to learn more about our
scope. (If you are unsure of which category you fit, we suggest you make a pre-submission inquiry):
- Data retrieval
- Data extraction
- Data processing/munging
- Data deposition
- Data validation and testing
- Data visualization
- Workflow automation
- Citation management and bibliometrics
- Scientific software wrappers
- Database interoperability

Domain Specific & Community Partnerships

- [ ] Geospatial
- [ ] Education
- [ ] Pangeo
- [ ] Unsure/Other (explain below)

Explain how and why the package falls under these categories (briefly, 1-2 sentences). Please note any areas you are unsure of:
Package has some limited geospatial (leverages geopandas) to handle where samples were taken and to build retrieval queries from an area of interest geometry. Likewise some data validation (e.g., checking metadata consistency) and visualization, but these are intentionally limited.
Who is the target audience and what are the scientific applications of this package?
Water quality domain experts trying to synthesize available data in a stream, bay, estuary, etc.. More standardized data cleansing and wrangling allows outputs to be integrated into other tools in the water quality data pipeline, e.g., for integration into dashboards for visualization (Beck et al., 2021) or decision support tools (Booth et al., 2011).
Are there other Python packages that accomplish similar things? If so, how does yours differ?
No packages to my knowledge, there is in R: USEPA/TADA
Any other questions or issues we should be aware of:
Would like to leverage the relationship with JOSS, paper.md and documentation needs to clear internal review before this can be submitted. Package leverages USGS's dataretrieval to retrieve the data and pint for units handling. Current focus of development (open branches) is on demonstrations in jupyter notebooks, examples as part of doc strings, and expanding handling of sample fraction in combination with similar characteristicName (e.g., dissolved vs filtered nitrogen). Very open to ideas for making it more maintainable!

P.S. Have feedback/comments about our review process? Leave a comment here

The text was updated successfully, but these errors were encountered:

NickleDave · 2023-09-05T21:05:21Z

Welcome to pyOpenSci @jbousquin and thank you for opening this detailed presubmission inquiry for harmonize-wq.

The package is definitely in scope, because of its main use for data processing / munging, as you point out.

Please do proceed with a full submission when you are ready and everything has cleared internal review.
Be sure to reference this issue by number when you do, and I will close this one at that time.

Current focus of development (open branches) is on demonstrations in jupyter notebooks, examples as part of doc strings, and expanding handling of sample fraction in combination with similar characteristicName (e.g., dissolved vs filtered nitrogen). Very open to ideas for making it more maintainable!

Two high level things I noticed during my initial pass:

Having examples in notebooks is great. We heavily emphasize including these kinds of examples in the web docs as well. One way you could achieve both is to render the notebooks as part of your docs build using e.g. nbsphinx or myst-nb
The landing page of the docs is currently the default sphinx page which doesn't include a lot of info. One way to get people the info they need ASAP from the docs is to recycle your README as the landing page, e.g. with a literal include, so that way a potential user can see at one glance from the docs how to install, what code snippets look like, etc. For example landing pages like this see pandera, pyGMT, and pyafscgap.

Hope that's helpful. Happy to answer any questions specific to the review here, or for more discussion, please feel free to start a topic in our forum.

We're looking forward to your submission!

NickleDave · 2023-09-13T20:42:52Z

Hi @jbousquin just checking back--if there's anything we can clear up, please let us know.

For now, I will assume it's clear that we have confirmed to you that harmonize-wq is in scope. We look forward to your full submission and thank you again for opening this detailed presubmission inquiry. I will close for now.

jbousquin added the presubmission label Sep 1, 2023

lwasser added this to peer-review-status Sep 13, 2023

lwasser moved this to pre-submission in peer-review-status Sep 13, 2023

NickleDave closed this as completed Sep 13, 2023

lwasser added the Submission Requested label Sep 27, 2023

NickleDave self-assigned this Oct 17, 2023

jbousquin mentioned this issue Feb 8, 2024

harmonize-wq #157

Open

30 tasks

lwasser added this to presubmission-inquiries Apr 6, 2024

lwasser moved this to Done in presubmission-inquiries Apr 6, 2024

lwasser moved this from Done to Closed in presubmission-inquiries Apr 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Presubmission: harmonize-wq #132

Presubmission: harmonize-wq #132

jbousquin commented Sep 1, 2023

NickleDave commented Sep 5, 2023

NickleDave commented Sep 13, 2023

Presubmission: harmonize-wq #132

Presubmission: harmonize-wq #132

Comments

jbousquin commented Sep 1, 2023

Code of Conduct & Commitment to Maintain Package

Description

Community Partnerships

Scope

Scope

NickleDave commented Sep 5, 2023

NickleDave commented Sep 13, 2023