Skip to content

Commit

Permalink
Py open sci review (#56)
Browse files Browse the repository at this point in the history
* Update README.md

Fix typo (readme.md): l7 on package name

* Update README.md

Fix typo (readme.md): double spaces

* Update README.md

Edit (general): new line at each full stop in a markdown paragraph.

* Update contributing.rst

fix typo (contributing.rst): double spaces

* Update contributing.rst

Edit (general): adding a new line at each full stop in a rst paragraph.

* Update example workflow.rst

Edit (general): add a new line at each full stop in a markdown or rst paragraph.

* Update index.rst

Edit (general): add a new line at each full stop in a markdown or rst paragraph.

* wet_dry_drop() has become outside the normal workflow, what was being done in the try/except was a bit proactive and gets problematic with keeping the functions in a linear workflow.

* from harmonize -> clean: df_checks(). add_qa_flag()

* convert_unit_series() moved harmonize -> convert

* Import specific functions instead of module

* Fix docs examples

* Module needs to be imported for example

* 'Filter/sieve residue' & 'Yield' now included in this domain. This is meant to get the updated list, but will need to keep an eye out for additions for the example in docs like this. Especailly right now as there may be updates with WQX 2.0 -> 3.0.
  • Loading branch information
jbousquin authored Apr 10, 2024
1 parent a5f048b commit 81448a9
Show file tree
Hide file tree
Showing 13 changed files with 355 additions and 336 deletions.
13 changes: 10 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,10 @@
# harmonize-wq
Standardize, clean, and wrangle Water Quality Portal data into more analytic-ready formats

US EPA’s [Water Quality Portal (WQP)](https://www.waterqualitydata.us/) aggregates water quality, biological, and physical data provided by many organizations and has become an essential resource with tools to query and retrieval data using [python](https://github.com/USGS-python/dataretrieval) or [R](https://github.com/USGS-R/dataRetrieval). Given the variety of data and variety of data originators, using the data in analysis often requires data cleaning to ensure it meets the required quality standards and data wrangling to get it in a more analytic-ready format. Recognizing the definition of analysis-ready varies depending on the analysis, the harmonixe_wq package is intended to be a flexible water quality specific framework to help:
US EPA’s [Water Quality Portal (WQP)](https://www.waterqualitydata.us/) aggregates water quality, biological, and physical data provided by many organizations and has become an essential resource with tools to query and retrieval data using [python](https://github.com/USGS-python/dataretrieval) or [R](https://github.com/USGS-R/dataRetrieval).
Given the variety of data and variety of data originators, using the data in analysis often requires data cleaning to ensure it meets the required quality standards and data wrangling to get it in a more analytic-ready format.
Recognizing the definition of analysis-ready varies depending on the analysis, the harmonize_wq package is intended to be a flexible water quality specific framework to help:

- Identify differences in data units (including speciation and basis)
- Identify differences in sampling or analytic methods
- Resolve data errors using transparent assumptions
Expand Down Expand Up @@ -73,7 +76,8 @@ df_cleaned
```

### Transform results from long to wide format
There are many columns in the dataframe that are characteristic specific, that is they have different values for the same sample depending on the characteristic. To ensure one result for each sample after the transformation of the data these columns must either be split, generating a new column for each characteristic with values, or moved out from the table if not being used.
There are many columns in the dataframe that are characteristic specific, that is they have different values for the same sample depending on the characteristic.
To ensure one result for each sample after the transformation of the data these columns must either be split, generating a new column for each characteristic with values, or moved out from the table if not being used.

```python
from harmonize_wq import wrangle
Expand Down Expand Up @@ -108,8 +112,11 @@ QA_Temperature | QA | NA | harmonization processing quality issues

## Issue Tracker
harmonize_wq is under development. Please report any bugs and enhancement ideas using the issue track:

https://github.com/USEPA/harmonize-wq/issues


## Disclaimer
The United States Environmental Protection Agency (EPA) GitHub project code is provided on an "as is" basis and the user assumes responsibility for its use. EPA has relinquished control of the information and no longer has responsibility to protect the integrity , confidentiality, or availability of the information. Any reference to specific commercial products, processes, or services by service mark, trademark, manufacturer, or otherwise, does not constitute or imply their endorsement, recommendation or favoring by EPA. The EPA seal and logo shall not be used in any manner to imply endorsement of any commercial product or activity by EPA or the United States Government.
The United States Environmental Protection Agency (EPA) GitHub project code is provided on an "as is" basis and the user assumes responsibility for its use.
EPA has relinquished control of the information and no longer has responsibility to protect the integrity, confidentiality, or availability of the information. Any reference to specific commercial products, processes, or services by service mark, trademark, manufacturer, or otherwise, does not constitute or imply their endorsement, recommendation or favoring by EPA.
The EPA seal and logo shall not be used in any manner to imply endorsement of any commercial product or activity by EPA or the United States Government.
21 changes: 11 additions & 10 deletions contributing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,13 +3,16 @@
Contributing to harmonize_wq
============================

We’re so glad you’re thinking about contributing to an EPA open source project! If you’re unsure about anything, just ask — or submit your issue or pull request anyway. The worst that can happen is we’ll politely ask you to change something. We appreciate all friendly contributions.
We’re so glad you’re thinking about contributing to an EPA open source project!
If you’re unsure about anything, just ask — or submit your issue or pull request anyway.
The worst that can happen is we’ll politely ask you to change something. We appreciate all friendly contributions.

We encourage you to read this project’s CONTRIBUTING policy (you are here), its
`LICENSE <https://github.com/USEPA/harmonize-wq/blob/81b172afc3b72bec0a9f5624bade59eb2527510f/LICENSE>`_,
and its `README <https://github.com/USEPA/harmonize-wq/blob/main/README.md>`_.

All contributions to this project will be released under the MIT dedication. By submitting a pull request or issue, you are agreeing to comply with this waiver of copyright interest.
All contributions to this project will be released under the MIT dedication.
By submitting a pull request or issue, you are agreeing to comply with this waiver of copyright interest.

harmonize_wq uses:

Expand All @@ -34,20 +37,18 @@ To contribute fixes, code, tests, or documentation, fork harmonize_wq in GitHub_
and submit the changes using a pull request against the **main** branch.

- If you are submitting new code, add tests (see below) and documentation.
- Write "Closes #<bug number>" in the PR description or a comment, as described in the
`GitHub docs`_.
- Write "Closes #<bug number>" in the PR description or a comment, as described in the `GitHub docs`_.
- Check tests and resolve any issues.

In any case, feel free to use the `issue tracker`_ to discuss ideas for new features or improvements.

Notice that we will not merge a PR if tests are failing. In certain cases tests pass in your
machine but not in GitHub actions. There might be multiple reasons for this but these are some of
the most common:
Notice that we will not merge a PR if tests are failing.
In certain cases tests pass in your machine but not in GitHub actions.
There might be multiple reasons for this but these are some of the most common:

- Your new code does not work for other operating systems or Python versions.
- The documentation is not being built properly or the examples in the docs are
not working.
- The documentation is not being built properly or the examples in the docs are not working.


.. _`issue tracker`: https://github.com/USEPA/harmonize-wq/issues
.. _`GitHub docs`: https://help.github.com/articles/closing-issues-via-commit-messages/
.. _`GitHub docs`: https://help.github.com/articles/closing-issues-via-commit-messages/
5 changes: 3 additions & 2 deletions docs/source/example workflow.rst
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,8 @@ Clean results
Transform results from long to wide format
******************************************
There are many columns in the :class:`pandas.DataFrame` that are characteristic specific, that is they have different values for the same sample depending on the characteristic. To ensure one result for each sample after the transformation of the data these columns must either be split, generating a new column for each characteristic with values, or moved out from the table if not being used.
There are many columns in the :class:`pandas.DataFrame` that are characteristic specific, that is they have different values for the same sample depending on the characteristic.
To ensure one result for each sample after the transformation of the data these columns must either be split, generating a new column for each characteristic with values, or moved out from the table if not being used.

.. code-block:: python3
Expand Down Expand Up @@ -105,4 +106,4 @@ The number of columns in the resulting table is greatly reduced:
|QA_Temperature | QA |NA |Harmonization quality issues |
+----------------------------+-------------+----------------------------------------+-------------------------------+

For more complete tutorial information, see: `demos <https://github.com/USEPA/harmonize-wq/tree/main/demos>`_
For more complete tutorial information, see: `demos <https://github.com/USEPA/harmonize-wq/tree/main/demos>`_
9 changes: 7 additions & 2 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,9 @@ Standardize, clean, and wrangle Water Quality Portal data into more analytic-rea
Overview
========

US EPA’s `Water Quality Portal (WQP) <https://www.waterqualitydata.us/>`_ aggregates water quality, biological, and physical data provided by many organizations and has become an essential resource with tools to query and retrieve data using `python <https://github.com/USGS-python/dataretrieval>`_ or `R <https://github.com/USGS-R/dataRetrieval>`_. Given the variety of data and data originators, using the data in analysis often requires cleaning to ensure it meets required quality standards and wrangling to get it in a more analytic-ready format. Recognizing the definition of analysis-ready varies depending on the analysis, the harmonize_wq package is intended to be a flexible water quality specific framework to help:
US EPA’s `Water Quality Portal (WQP) <https://www.waterqualitydata.us/>`_ aggregates water quality, biological, and physical data provided by many organizations and has become an essential resource with tools to query and retrieve data using `python <https://github.com/USGS-python/dataretrieval>`_ or `R <https://github.com/USGS-R/dataRetrieval>`_.
Given the variety of data and data originators, using the data in analysis often requires cleaning to ensure it meets required quality standards and wrangling to get it in a more analytic-ready format.
Recognizing the definition of analysis-ready varies depending on the analysis, the harmonize_wq package is intended to be a flexible water quality specific framework to help:

* Identify differences in data units (including speciation and basis)
* Identify differences in sampling or analytic methods
Expand Down Expand Up @@ -70,4 +72,7 @@ Indices and tables

Disclaimer
==========
The United States Environmental Protection Agency (EPA) GitHub project code is provided on an “as is” basis and the user assumes responsibility for its use. EPA has relinquished control of the information and no longer has responsibility to protect the integrity , confidentiality, or availability of the information. Any reference to specific commercial products, processes, or services by service mark, trademark, manufacturer, or otherwise, does not constitute or imply their endorsement, recommendation or favoring by EPA. The EPA seal and logo shall not be used in any manner to imply endorsement of any commercial product or activity by EPA or the United States Government.
The United States Environmental Protection Agency (EPA) GitHub project code is provided on an “as is” basis and the user assumes responsibility for its use.
EPA has relinquished control of the information and no longer has responsibility to protect the integrity , confidentiality, or availability of the information.
Any reference to specific commercial products, processes, or services by service mark, trademark, manufacturer, or otherwise, does not constitute or imply their endorsement, recommendation or favoring by EPA.
The EPA seal and logo shall not be used in any manner to imply endorsement of any commercial product or activity by EPA or the United States Government.
5 changes: 2 additions & 3 deletions harmonize_wq/basis.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,7 @@
"""Functions to process characteristic basis or return basis dictionary."""
from warnings import warn
from numpy import nan
from harmonize_wq import harmonize

from harmonize_wq.clean import add_qa_flag

def unit_basis_dict(out_col):
"""Characteristic specific basis dictionary to define basis from units.
Expand Down Expand Up @@ -169,7 +168,7 @@ def basis_from_unit(df_in, basis_dict, unit_col='Units', basis_col='Speciation')
if old_basis != base:
qa_mask = mask & (df[basis_col] == old_basis)
warn(f'Mismatched {flag}', UserWarning)
df = harmonize.add_qa_flag(df, qa_mask, flag)
df = add_qa_flag(df, qa_mask, flag)
# Add/update basis from unit
df = set_basis(df, mask, base, basis_col)
df[unit_col] = [new_unit if x == old_unit else x
Expand Down
Loading

0 comments on commit 81448a9

Please sign in to comment.