-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Presubmission: harmonize-wq #132
Comments
Welcome to pyOpenSci @jbousquin and thank you for opening this detailed presubmission inquiry for harmonize-wq. The package is definitely in scope, because of its main use for data processing / munging, as you point out. Please do proceed with a full submission when you are ready and everything has cleared internal review.
Two high level things I noticed during my initial pass:
Hope that's helpful. Happy to answer any questions specific to the review here, or for more discussion, please feel free to start a topic in our forum. We're looking forward to your submission! |
Hi @jbousquin just checking back--if there's anything we can clear up, please let us know. For now, I will assume it's clear that we have confirmed to you that harmonize-wq is in scope. We look forward to your full submission and thank you again for opening this detailed presubmission inquiry. I will close for now. |
Submitting Author: Justin Bousquin (@jbousquin)
Package Name: harmonize-wq
One-Line Description of Package: Standardize, clean and wrangle Water Quality Portal data into more analytic-ready formats
Repository Link (if existing): https://github.com/USEPA/harmonize-wq
Code of Conduct & Commitment to Maintain Package
Description
The US EPA's Water Quality Portal (WQP) is a data warehouse that facilitates access to data stored in large water quality databases in a common format. There are tools to facilitate both publishing data to and retrieving data from WQP, harmonize-wq is focused on retrieved data (1) cleaning to ensure it meets the required quality standards, and (2) wrangling to get it in a more analytic-ready format. Although there are many examples where this has been done, standardized tools to perform this task could make it less time-intensive, more standardized, and more reproducible.
Community Partnerships
We partner with communities to support peer review with an additional layer of
checks that satisfy community requirements. If your package fits into an
existing community please check below:
Scope
Scope
Please indicate which category or categories.
Check out our package scope page to learn more about our
scope. (If you are unsure of which category you fit, we suggest you make a pre-submission inquiry):
Domain Specific & Community Partnerships
Explain how and why the package falls under these categories (briefly, 1-2 sentences). Please note any areas you are unsure of:
Package has some limited geospatial (leverages geopandas) to handle where samples were taken and to build retrieval queries from an area of interest geometry. Likewise some data validation (e.g., checking metadata consistency) and visualization, but these are intentionally limited.
Who is the target audience and what are the scientific applications of this package?
Water quality domain experts trying to synthesize available data in a stream, bay, estuary, etc.. More standardized data cleansing and wrangling allows outputs to be integrated into other tools in the water quality data pipeline, e.g., for integration into dashboards for visualization (Beck et al., 2021) or decision support tools (Booth et al., 2011).
Are there other Python packages that accomplish similar things? If so, how does yours differ?
No packages to my knowledge, there is in R: USEPA/TADA
Any other questions or issues we should be aware of:
Would like to leverage the relationship with JOSS, paper.md and documentation needs to clear internal review before this can be submitted. Package leverages USGS's dataretrieval to retrieve the data and pint for units handling. Current focus of development (open branches) is on demonstrations in jupyter notebooks, examples as part of doc strings, and expanding handling of sample fraction in combination with similar characteristicName (e.g., dissolved vs filtered nitrogen). Very open to ideas for making it more maintainable!
P.S. Have feedback/comments about our review process? Leave a comment here
The text was updated successfully, but these errors were encountered: