Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add nwm_client documentation and minor subpackage level import changes #179

Merged
merged 11 commits into from
Mar 3, 2022
Merged
59 changes: 59 additions & 0 deletions python/nwm_client/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,13 +18,69 @@ $ python3 -m pip install --upgrade pip wheel

# Install nwm_client
$ python3 -m pip install hydrotools.nwm_client

# Install nwm_client with gcp capabilities
$ python3 -m pip install "hydrotools.nwm_client[gcp]"
```

## Usage

The following example demonstrates how one might use `hydrotools.nwm_client` to retrieve NWM streamflow forecasts.

### Code

<details><summary><b>Retrieving data from google cloud</b></summary>

```python
# Import the nwm Client
from hydrotools.nwm_client import gcp as nwm
import pandas as pd

# Instantiate model data service
model_data_service = nwm.NWMDataService()

# Retrieve forecast data
# By default, only retrieves data at USGS gaging sites in
# CONUS that are used for model assimilation
forecast_data = model_data_service.get(
configuration = "short_range",
reference_time = "20210101T01Z"
)

# Look at the data
print(forecast_data.info(memory_usage='deep'))
print(forecast_data[['value_time', 'value']].head())
```
### Example output
```console
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 137628 entries, 0 to 137627
Data columns (total 8 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 reference_time 137628 non-null datetime64[ns]
1 value_time 137628 non-null datetime64[ns]
2 nwm_feature_id 137628 non-null int64
3 value 137628 non-null float32
4 usgs_site_code 137628 non-null category
5 configuration 137628 non-null category
6 measurement_unit 137628 non-null category
7 variable_name 137628 non-null category
dtypes: category(4), datetime64[ns](2), float32(1), int64(1)
memory usage: 5.1 MB
None
value_time value
0 2021-01-01 02:00:00 5.29
1 2021-01-01 03:00:00 5.25
2 2021-01-01 04:00:00 5.20
3 2021-01-01 05:00:00 5.12
4 2021-01-01 06:00:00 5.03
```

</details>

<details><summary><b>Retrieving data from Nomads</b></summary>

```python
# Import the nwm Client
from hydrotools.nwm_client import http as nwm
Expand Down Expand Up @@ -77,6 +133,9 @@ None
3 2021-01-01 05:00:00 5.12
4 2021-01-01 06:00:00 5.03
```

</details>

### System Requirements
We employ several methods to make sure the resulting `pandas.DataFrame` produced by `nwm_client` are as efficient and manageable as possible. Nonetheless, this package can potentially use a large amount of memory.

Expand Down
11 changes: 11 additions & 0 deletions python/nwm_client/src/hydrotools/nwm_client/__init__.py
Original file line number Diff line number Diff line change
@@ -1,2 +1,13 @@
# removing __version__ import will cause build to fail. see: https://github.com/pypa/setuptools/issues/1724#issuecomment-627241822
from ._version import __version__

from . import http
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a good idea?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By which I mean, we're not "polluting the namespace", right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, it's a matter of opinion and style. Personally, given our audience, I would like to make the experience using hydrotools as intuitive as possible. Here, I am assuming it is more intuitive to a user to import nwm_client and then tab complete to see the available submodules. Ex:

from hydrotools import nwm_client

# tab to see exposed subpackage level entities 
nwm_client. # tab tab

nwm_client.html, nwm_client.gcp 

To be fair, I am making a fair number of assumptions about how someone might use hydrotools. What do you think about my assumptions? To your point regarding clouding the namespace, im not sure im ready to answer that. I need to think about it more and consider what implications my assumptions might have.

Copy link
Collaborator

@jarq6c jarq6c Feb 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming you've mulled over the implications, let's see how it works out. No need for the PR to go stale because of this minor concern.


try:
from . import gcp
except ImportError as e:
# google-cloud-storage not installed
if "google-cloud-storage" in e.msg:
pass
else:
raise e
2 changes: 1 addition & 1 deletion python/nwm_client/src/hydrotools/nwm_client/_version.py
Original file line number Diff line number Diff line change
@@ -1 +1 @@
__version__ = "5.0.1"
__version__ = "5.0.2"
39 changes: 30 additions & 9 deletions python/nwm_client/src/hydrotools/nwm_client/gcp.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,14 @@
from pandas.core.indexing import convert_from_missing_indexer_tuple
from hydrotools.caches.hdf import HDFCache

from google.cloud import storage
try:
from google.cloud import storage
except ImportError as e:
error_message = ("Unable to import google-cloud-storage. Reinstall `hydrotools.nwm_client` using the extra-requirement, `gcp`."
"\n"
"`pip install 'hydrotools.nwm_client[gcp]'")
raise ImportError(error_message) from e

from io import BytesIO
import xarray as xr
import warnings
Expand Down Expand Up @@ -56,6 +63,10 @@ def __init__(
):
"""Instantiate NWM Data Service.

Note: By default, only nwm sites codes with an associated USGS site are returned by
`NWMDataService.get`. See `NWMDataService`'s `location_metadata_mapping` parameter to change
this behavior.

Parameters
----------
bucket_name : str, required, default 'national-water-model'
Expand All @@ -80,8 +91,14 @@ def __init__(

Examples
--------
>>> from hydrotools.gcp_client import gcp
>>> model_data_service = gcp.NWMDataService()
>>> from hydrotools.nwm_client import gcp as nwm
>>> model_data_service = nwm.NWMDataService()
>>> # get nwm short range forecast data as a dataframe
>>> # for nwm sites with associated USGS gage
>>> forecast_data = model_data_service.get(
... configuration = "short_range",
... reference_time = "20210101T01Z"
... )

"""
# Set bucket name
Expand Down Expand Up @@ -139,8 +156,8 @@ def list_blobs(

Examples
--------
>>> from hydrotools.gcp_client import gcp
>>> model_data_service = gcp.NWMDataService()
>>> from hydrotools.nwm_client import gcp as nwm
>>> model_data_service = nwm.NWMDataService()
>>> blob_list = model_data_service.list_blobs(
... configuration = "short_range",
... reference_time = "20210101T01Z"
Expand Down Expand Up @@ -319,8 +336,8 @@ def get_cycle(

Examples
--------
>>> from hydrotools.gcp_client import gcp
>>> model_data_service = gcp.NWMDataService()
>>> from hydrotools.nwm_client import gcp as nwm
>>> model_data_service = nwm.NWMDataService()
>>> forecast_data = model_data_service.get(
... configuration = "short_range",
... reference_time = "20210101T01Z"
Expand Down Expand Up @@ -390,6 +407,10 @@ def get(
) -> pd.DataFrame:
"""Return streamflow data for a single model cycle in a pandas DataFrame.

Note: By default, only nwm sites codes with an associated USGS site are returned by
`NWMDataService.get`. See `NWMDataService`'s `location_metadata_mapping` parameter to change
this behavior.

Parameters
----------
configuration : str, required
Expand All @@ -409,8 +430,8 @@ def get(

Examples
--------
>>> from hydrotools.gcp_client import gcp
>>> model_data_service = gcp.NWMDataService()
>>> from hydrotools.nwm_client import gcp as nwm
>>> model_data_service = nwm.NWMDataService()
>>> forecast_data = model_data_service.get(
... configuration = "short_range",
... reference_time = "20210101T01Z"
Expand Down
8 changes: 8 additions & 0 deletions python/nwm_client/src/hydrotools/nwm_client/http.py
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,10 @@ def __init__(
):
"""Instantiate NWM Data Service.

Note: By default, only nwm sites codes with an associated USGS site are returned by
`NWMDataService.get`. See `NWMDataService`'s `location_metadata_mapping` parameter to change
this behavior.

Parameters
----------
server : str, required, default 'national-water-model'
Expand Down Expand Up @@ -412,6 +416,10 @@ def get(
) -> pd.DataFrame:
"""Return streamflow data for a single model cycle in a pandas DataFrame.

Note: By default, only nwm sites codes with an associated USGS site are returned by
`NWMDataService.get`. See `NWMDataService`'s `location_metadata_mapping` parameter to change
this behavior.

Parameters
----------
configuration : str, required
Expand Down