Py open sci review (#56)

* Update README.md Fix typo (readme.md): l7 on package name * Update README.md Fix typo (readme.md): double spaces * Update README.md Edit (general): new line at each full stop in a markdown paragraph. * Update contributing.rst fix typo (contributing.rst): double spaces * Update contributing.rst Edit (general): adding a new line at each full stop in a rst paragraph. * Update example workflow.rst Edit (general): add a new line at each full stop in a markdown or rst paragraph. * Update index.rst Edit (general): add a new line at each full stop in a markdown or rst paragraph. * wet_dry_drop() has become outside the normal workflow, what was being done in the try/except was a bit proactive and gets problematic with keeping the functions in a linear workflow. * from harmonize -> clean: df_checks(). add_qa_flag() * convert_unit_series() moved harmonize -> convert * Import specific functions instead of module * Fix docs examples * Module needs to be imported for example * 'Filter/sieve residue' & 'Yield' now included in this domain. This is meant to get the updated list, but will need to keep an eye out for additions for the example in docs like this. Especailly right now as there may be updates with WQX 2.0 -> 3.0.
USEPA · Apr 10, 2024 · 81448a9 · 81448a9
1 parent a5f048b
commit 81448a9
Show file tree

Hide file tree

Showing 13 changed files with 355 additions and 336 deletions.
diff --git a/README.md b/README.md
@@ -4,7 +4,10 @@
 # harmonize-wq
 Standardize, clean, and wrangle Water Quality Portal data into more analytic-ready formats
 
-US EPA’s [Water Quality Portal (WQP)](https://www.waterqualitydata.us/) aggregates water quality, biological, and physical data provided by many organizations and has become an essential resource with tools to query and retrieval data using [python](https://github.com/USGS-python/dataretrieval) or [R](https://github.com/USGS-R/dataRetrieval). Given the variety of data and variety of data originators, using the data in analysis often requires data cleaning to ensure it meets the required quality standards and data wrangling to get it in a more analytic-ready format.  Recognizing the definition of analysis-ready varies depending on the analysis, the harmonixe_wq package is intended to be a flexible water quality specific framework to help:
+US EPA’s [Water Quality Portal (WQP)](https://www.waterqualitydata.us/) aggregates water quality, biological, and physical data provided by many organizations and has become an essential resource with tools to query and retrieval data using [python](https://github.com/USGS-python/dataretrieval) or [R](https://github.com/USGS-R/dataRetrieval).
+Given the variety of data and variety of data originators, using the data in analysis often requires data cleaning to ensure it meets the required quality standards and data wrangling to get it in a more analytic-ready format.
+Recognizing the definition of analysis-ready varies depending on the analysis, the harmonize_wq package is intended to be a flexible water quality specific framework to help:
+
 - Identify differences in data units (including speciation and basis)
 - Identify differences in sampling or analytic methods
 - Resolve data errors using transparent assumptions
@@ -73,7 +76,8 @@ df_cleaned
 ```
 
 ### Transform results from long to wide format
-There are many columns in the dataframe that are characteristic specific, that is they have different values for the same sample depending on the characteristic. To ensure one result for each sample after the transformation of the data these columns must either be split, generating a new column for each characteristic with values, or moved out from the table if not being used.
+There are many columns in the dataframe that are characteristic specific, that is they have different values for the same sample depending on the characteristic.
+To ensure one result for each sample after the transformation of the data these columns must either be split, generating a new column for each characteristic with values, or moved out from the table if not being used.
 
 ```python
 from harmonize_wq import wrangle
@@ -108,8 +112,11 @@ QA_Temperature | QA | NA | harmonization processing quality issues
 
 ## Issue Tracker
 harmonize_wq is under development. Please report any bugs and enhancement ideas using the issue track:
+
 https://github.com/USEPA/harmonize-wq/issues
 
 
 ## Disclaimer
-The United States Environmental Protection Agency (EPA) GitHub project code is provided on an "as is" basis and the user assumes responsibility for its use.  EPA has relinquished control of the information and no longer has responsibility to protect the integrity , confidentiality, or availability of the information.  Any reference to specific commercial products, processes, or services by service mark, trademark, manufacturer, or otherwise, does not constitute or imply their endorsement, recommendation or favoring by EPA.  The EPA seal and logo shall not be used in any manner to imply endorsement of any commercial product or activity by EPA or the United States Government.
+The United States Environmental Protection Agency (EPA) GitHub project code is provided on an "as is" basis and the user assumes responsibility for its use.
+EPA has relinquished control of the information and no longer has responsibility to protect the integrity, confidentiality, or availability of the information. Any reference to specific commercial products, processes, or services by service mark, trademark, manufacturer, or otherwise, does not constitute or imply their endorsement, recommendation or favoring by EPA.
+The EPA seal and logo shall not be used in any manner to imply endorsement of any commercial product or activity by EPA or the United States Government.
diff --git a/contributing.rst b/contributing.rst
@@ -3,13 +3,16 @@
 Contributing to harmonize_wq
 ============================
 
-We’re so glad you’re thinking about contributing to an EPA open source project! If you’re unsure about anything, just ask — or submit your issue or pull request anyway. The worst that can happen is we’ll politely ask you to change something. We appreciate all friendly contributions.
+We’re so glad you’re thinking about contributing to an EPA open source project!
+If you’re unsure about anything, just ask — or submit your issue or pull request anyway.
+The worst that can happen is we’ll politely ask you to change something. We appreciate all friendly contributions.
 
 We encourage you to read this project’s CONTRIBUTING policy (you are here), its
 `LICENSE <https://github.com/USEPA/harmonize-wq/blob/81b172afc3b72bec0a9f5624bade59eb2527510f/LICENSE>`_,
 and its `README <https://github.com/USEPA/harmonize-wq/blob/main/README.md>`_.
 
-All contributions to this project will be released under the MIT dedication. By submitting a pull request or issue, you are agreeing to comply with this waiver of copyright interest.
+All contributions to this project will be released under the MIT dedication.
+By submitting a pull request or issue, you are agreeing to comply with this waiver of copyright interest.
 
 harmonize_wq uses:
 
@@ -34,20 +37,18 @@ To contribute fixes, code, tests, or documentation, fork harmonize_wq in GitHub_
 and submit the changes using a pull request against the **main** branch.
 
 - If you are submitting new code, add tests (see below) and documentation.
-- Write "Closes #<bug number>" in the PR description or a comment, as described in the
-  `GitHub docs`_.
+- Write "Closes #<bug number>" in the PR description or a comment, as described in the `GitHub docs`_.
 - Check tests and resolve any issues.
 
 In any case, feel free to use the `issue tracker`_ to discuss ideas for new features or improvements.
 
-Notice that we will not merge a PR if tests are failing. In certain cases tests pass in your
-machine but not in GitHub actions. There might be multiple reasons for this but these are some of
-the most common:
+Notice that we will not merge a PR if tests are failing.
+In certain cases tests pass in your machine but not in GitHub actions.
+There might be multiple reasons for this but these are some of the most common:
 
 - Your new code does not work for other operating systems or Python versions.
-- The documentation is not being built properly or the examples in the docs are
-  not working.
+- The documentation is not being built properly or the examples in the docs are not working.
 
 
 .. _`issue tracker`: https://github.com/USEPA/harmonize-wq/issues
-.. _`GitHub docs`: https://help.github.com/articles/closing-issues-via-commit-messages/
+.. _`GitHub docs`: https://help.github.com/articles/closing-issues-via-commit-messages/
diff --git a/docs/source/example workflow.rst b/docs/source/example workflow.rst
@@ -55,7 +55,8 @@ Clean results
 
 Transform results from long to wide format
 ******************************************
-There are many columns in the :class:`pandas.DataFrame` that are characteristic specific, that is they have different values for the same sample depending on the characteristic. To ensure one result for each sample after the transformation of the data these columns must either be split, generating a new column for each characteristic with values, or moved out from the table if not being used.
+There are many columns in the :class:`pandas.DataFrame` that are characteristic specific, that is they have different values for the same sample depending on the characteristic.
+To ensure one result for each sample after the transformation of the data these columns must either be split, generating a new column for each characteristic with values, or moved out from the table if not being used.
 
 .. code-block:: python3
 
@@ -105,4 +106,4 @@ The number of columns in the resulting table is greatly reduced:
 |QA_Temperature              | QA          |NA                                      |Harmonization quality issues   |
 +----------------------------+-------------+----------------------------------------+-------------------------------+
 
-For more complete tutorial information, see: `demos <https://github.com/USEPA/harmonize-wq/tree/main/demos>`_
+For more complete tutorial information, see: `demos <https://github.com/USEPA/harmonize-wq/tree/main/demos>`_
diff --git a/docs/source/index.rst b/docs/source/index.rst
@@ -16,7 +16,9 @@ Standardize, clean, and wrangle Water Quality Portal data into more analytic-rea
 Overview
 ========
 
-US EPA’s `Water Quality Portal (WQP) <https://www.waterqualitydata.us/>`_ aggregates water quality, biological, and physical data provided by many organizations and has become an essential resource with tools to query and retrieve data using `python <https://github.com/USGS-python/dataretrieval>`_ or `R <https://github.com/USGS-R/dataRetrieval>`_. Given the variety of data and data originators, using the data in analysis often requires cleaning to ensure it meets required quality standards and wrangling to get it in a more analytic-ready format.  Recognizing the definition of analysis-ready varies depending on the analysis, the harmonize_wq package is intended to be a flexible water quality specific framework to help:
+US EPA’s `Water Quality Portal (WQP) <https://www.waterqualitydata.us/>`_ aggregates water quality, biological, and physical data provided by many organizations and has become an essential resource with tools to query and retrieve data using `python <https://github.com/USGS-python/dataretrieval>`_ or `R <https://github.com/USGS-R/dataRetrieval>`_.
+Given the variety of data and data originators, using the data in analysis often requires cleaning to ensure it meets required quality standards and wrangling to get it in a more analytic-ready format.
+Recognizing the definition of analysis-ready varies depending on the analysis, the harmonize_wq package is intended to be a flexible water quality specific framework to help:
 
 * Identify differences in data units (including speciation and basis)
 * Identify differences in sampling or analytic methods
@@ -70,4 +72,7 @@ Indices and tables
 
 Disclaimer
 ==========
-The United States Environmental Protection Agency (EPA) GitHub project code is provided on an “as is” basis and the user assumes responsibility for its use. EPA has relinquished control of the information and no longer has responsibility to protect the integrity , confidentiality, or availability of the information. Any reference to specific commercial products, processes, or services by service mark, trademark, manufacturer, or otherwise, does not constitute or imply their endorsement, recommendation or favoring by EPA. The EPA seal and logo shall not be used in any manner to imply endorsement of any commercial product or activity by EPA or the United States Government.
+The United States Environmental Protection Agency (EPA) GitHub project code is provided on an “as is” basis and the user assumes responsibility for its use.
+EPA has relinquished control of the information and no longer has responsibility to protect the integrity , confidentiality, or availability of the information.
+Any reference to specific commercial products, processes, or services by service mark, trademark, manufacturer, or otherwise, does not constitute or imply their endorsement, recommendation or favoring by EPA.
+The EPA seal and logo shall not be used in any manner to imply endorsement of any commercial product or activity by EPA or the United States Government.
diff --git a/harmonize_wq/basis.py b/harmonize_wq/basis.py
@@ -2,8 +2,7 @@
 """Functions to process characteristic basis or return basis dictionary."""
 from warnings import warn
 from numpy import nan
-from harmonize_wq import harmonize
-
+from harmonize_wq.clean import add_qa_flag
 
 def unit_basis_dict(out_col):
     """Characteristic specific basis dictionary to define basis from units.
@@ -169,7 +168,7 @@ def basis_from_unit(df_in, basis_dict, unit_col='Units', basis_col='Speciation')
                         if old_basis != base:
                             qa_mask = mask & (df[basis_col] == old_basis)
                             warn(f'Mismatched {flag}', UserWarning)
-                            df = harmonize.add_qa_flag(df, qa_mask, flag)
+                            df = add_qa_flag(df, qa_mask, flag)
                 # Add/update basis from unit
                 df = set_basis(df, mask, base, basis_col)
                 df[unit_col] = [new_unit if x == old_unit else x