Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Presubmission: harmonize-wq #132

Closed
3 of 14 tasks
jbousquin opened this issue Sep 1, 2023 · 2 comments
Closed
3 of 14 tasks

Presubmission: harmonize-wq #132

jbousquin opened this issue Sep 1, 2023 · 2 comments

Comments

@jbousquin
Copy link

Submitting Author: Justin Bousquin (@jbousquin)
Package Name: harmonize-wq
One-Line Description of Package: Standardize, clean and wrangle Water Quality Portal data into more analytic-ready formats
Repository Link (if existing): https://github.com/USEPA/harmonize-wq


Code of Conduct & Commitment to Maintain Package

Description

  • Include a brief paragraph describing what your package does:
    The US EPA's Water Quality Portal (WQP) is a data warehouse that facilitates access to data stored in large water quality databases in a common format. There are tools to facilitate both publishing data to and retrieving data from WQP, harmonize-wq is focused on retrieved data (1) cleaning to ensure it meets the required quality standards, and (2) wrangling to get it in a more analytic-ready format. Although there are many examples where this has been done, standardized tools to perform this task could make it less time-intensive, more standardized, and more reproducible.

Community Partnerships

We partner with communities to support peer review with an additional layer of
checks that satisfy community requirements. If your package fits into an
existing community please check below:

Scope

Scope

  • Please indicate which category or categories.
    Check out our package scope page to learn more about our
    scope. (If you are unsure of which category you fit, we suggest you make a pre-submission inquiry):

    • Data retrieval
    • Data extraction
    • Data processing/munging
    • Data deposition
    • Data validation and testing
    • Data visualization
    • Workflow automation
    • Citation management and bibliometrics
    • Scientific software wrappers
    • Database interoperability

Domain Specific & Community Partnerships

- [ ] Geospatial
- [ ] Education
- [ ] Pangeo
- [ ] Unsure/Other (explain below)
  • Explain how and why the package falls under these categories (briefly, 1-2 sentences). Please note any areas you are unsure of:
    Package has some limited geospatial (leverages geopandas) to handle where samples were taken and to build retrieval queries from an area of interest geometry. Likewise some data validation (e.g., checking metadata consistency) and visualization, but these are intentionally limited.

  • Who is the target audience and what are the scientific applications of this package?
    Water quality domain experts trying to synthesize available data in a stream, bay, estuary, etc.. More standardized data cleansing and wrangling allows outputs to be integrated into other tools in the water quality data pipeline, e.g., for integration into dashboards for visualization (Beck et al., 2021) or decision support tools (Booth et al., 2011).

  • Are there other Python packages that accomplish similar things? If so, how does yours differ?
    No packages to my knowledge, there is in R: USEPA/TADA

  • Any other questions or issues we should be aware of:
    Would like to leverage the relationship with JOSS, paper.md and documentation needs to clear internal review before this can be submitted. Package leverages USGS's dataretrieval to retrieve the data and pint for units handling. Current focus of development (open branches) is on demonstrations in jupyter notebooks, examples as part of doc strings, and expanding handling of sample fraction in combination with similar characteristicName (e.g., dissolved vs filtered nitrogen). Very open to ideas for making it more maintainable!

P.S. Have feedback/comments about our review process? Leave a comment here

@NickleDave
Copy link
Contributor

Welcome to pyOpenSci @jbousquin and thank you for opening this detailed presubmission inquiry for harmonize-wq.

The package is definitely in scope, because of its main use for data processing / munging, as you point out.

Please do proceed with a full submission when you are ready and everything has cleared internal review.
Be sure to reference this issue by number when you do, and I will close this one at that time.

Current focus of development (open branches) is on demonstrations in jupyter notebooks, examples as part of doc strings, and expanding handling of sample fraction in combination with similar characteristicName (e.g., dissolved vs filtered nitrogen). Very open to ideas for making it more maintainable!

Two high level things I noticed during my initial pass:

  • Having examples in notebooks is great. We heavily emphasize including these kinds of examples in the web docs as well. One way you could achieve both is to render the notebooks as part of your docs build using e.g. nbsphinx or myst-nb
  • The landing page of the docs is currently the default sphinx page which doesn't include a lot of info. One way to get people the info they need ASAP from the docs is to recycle your README as the landing page, e.g. with a literal include, so that way a potential user can see at one glance from the docs how to install, what code snippets look like, etc. For example landing pages like this see pandera, pyGMT, and pyafscgap.

Hope that's helpful. Happy to answer any questions specific to the review here, or for more discussion, please feel free to start a topic in our forum.

We're looking forward to your submission!

@lwasser lwasser moved this to pre-submission in peer-review-status Sep 13, 2023
@NickleDave
Copy link
Contributor

Hi @jbousquin just checking back--if there's anything we can clear up, please let us know.

For now, I will assume it's clear that we have confirmed to you that harmonize-wq is in scope. We look forward to your full submission and thank you again for opening this detailed presubmission inquiry. I will close for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

No branches or pull requests

3 participants