-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Social Vulnerability Index (SVI) subpackage #169
Merged
Merged
Changes from 43 commits
Commits
Show all changes
50 commits
Select commit
Hold shift + click to select a range
ede9170
setup svi subpackage directory structure
aaraney ec51b56
fix dependency typo
aaraney d877b49
initial foundation for svi_client. todos: add exception handling and …
aaraney c96e498
strip down readme. sections will be filled in, in the future
aaraney 6be7ae3
add urls to ESRI svi feature servers hosted by the cdc
aaraney 79d0919
add field name map wrapper type
aaraney c14c215
change LOCATIONS AliasGroup to alias location abbreviations. in the f…
aaraney 47dcbe0
method for building urls for cdc esri features servers
aaraney 3294efd
update svi_client get method to use cdc esri feature server. this wil…
aaraney 34be2e8
remove debug print statement
aaraney 85d08d9
add comment to clarify why LIKE is used over equivalence test
aaraney 1952504
remove unnecessary comment
aaraney 67bbd5f
tract should have been census_tract in factory map
aaraney 8fad687
change field names from theme_# to their literal names for clarity
aaraney d40aa26
return long dataframe format
aaraney 317cfdc
static method overridden by subclasses to create missing fields
aaraney c2454ed
create missin county_fips field from some cdc data sources
aaraney d2f9ffe
create missing fields in get
aaraney ea75dae
reorder columns in get
aaraney d86f635
cdc esri sources with county fips present are only 3 digits long. ref…
aaraney 9b537e3
forgot to exclude svi_edition when building urls.
aaraney 0f29d08
add geographic context type to distinguish between SVI calc'd at the …
aaraney 8ceb293
add get docstring summary and update example
aaraney 1e80e03
update setup.cfg classifiers per @jarq6c's suggestions
aaraney b99d93a
raise value error for years without county geographic scale
aaraney 9e2bb36
raise value errors when validating utility types
aaraney 4978820
comment explaining why it is unnecessary to guard for response code i…
aaraney d3a6b90
remove unnecessary import
aaraney e590d85
use relative local imports in url_builder mod
aaraney 05b81d8
add deps geopandas and pydanticc
aaraney ff13676
add cache_filename parameter to SVIClient constructor
aaraney cd37769
preform dataframe quality control before returning
aaraney 7f990b2
update svi_client.get example now that string fields are lowered
aaraney 4e021cd
use 1=1 where clause in US case.
aaraney b686baa
add SVIClient.get integration tests
aaraney f039bd6
add svi_client to github actions
aaraney 6687c7d
import __future__.annotations to allow using MappingProxyType as type…
aaraney 3f4cb4f
svi_client requires >= python 3.8 for typing.Literal support.
aaraney 8532242
use typing_extension.Literal to provide python 3.7 support. add typin…
aaraney f1e430c
change min version back to python 3.7 for svi_client
aaraney bb95330
add results offset, results record count, and count only parameters t…
aaraney 4563ad4
make concurrent requests in svi_client.get
aaraney bd0b7e2
fix typo
aaraney ef90e8e
fix bug when melting dataframes. also cast str cols to categories
aaraney f8dab85
dont sort collection of str and ints
aaraney e3b2866
add unit tests for utility fns
aaraney f916660
remove --use-feature=in-tree-build pip flag when installting package …
aaraney 000bfec
rename test_utilities.py in svi client to make pytest's resolver happy
aaraney 229548d
guard typing.get_args import. import typing_extensions.get_args to su…
aaraney b633153
fill out svi readme
aaraney File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
../../CONTRIBUTING.md |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
../../LICENSE |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
include LICENSE | ||
include src/hydrotools/nwm_client/data/* |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,33 @@ | ||
# OWPHydroTools :: SVI Client | ||
|
||
## Installation | ||
|
||
|
||
## Usage | ||
|
||
|
||
### Code | ||
```python | ||
``` | ||
### Example output | ||
```console | ||
``` | ||
### System Requirements | ||
|
||
## Development | ||
|
||
```bash | ||
$ python3 -m venv env | ||
$ source env/bin/activate | ||
$ python3 -m pip install -U pip | ||
$ python3 -m pip install -U setuptools | ||
$ python3 -m pip install -e ".[develop]" | ||
``` | ||
|
||
To generate a source distribution: | ||
```bash | ||
$ python3 -m pip install -U wheel build | ||
$ python3 -m build | ||
``` | ||
|
||
The packages generated in `dist/` can be installed directly with `pip` or uploaded to PyPI using `twine`. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
../../SECURITY.md |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
../../TERMS.md |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
[build-system] | ||
build-backend = "setuptools.build_meta" | ||
requires = [ | ||
"setuptools>=42", | ||
"wheel", | ||
] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
[pytest] | ||
markers = | ||
slow: marks tests as slow (deselect with '-m "not slow"') | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,49 @@ | ||
[metadata] | ||
name = hydrotools.svi_client | ||
version = attr: hydrotools.svi_client._version.__version__ | ||
author = Austin Raney | ||
author_email = [email protected] | ||
description = Retrieve Social Vulnerability Index data from The Center for Disease Control / The Agency for Toxic Substances and Disease Registry. | ||
long_description = file: README.md | ||
long_description_content_type = text/markdown | ||
charset = UTF-8 | ||
license = USDOC | ||
license_files = | ||
LICENSE | ||
url = https://github.com/NOAA-OWP/hydrotools | ||
project_urls = | ||
Documentation = https://noaa-owp.github.io/hydrotools/hydrotools.svi_client.html | ||
Source = https://github.com/NOAA-OWP/hydrotools/tree/main/python/svi_client | ||
Tracker = https://github.com/NOAA-OWP/hydrotools/issues | ||
classifiers = | ||
Development Status :: 3 - Alpha | ||
Intended Audience :: Education | ||
Intended Audience :: Science/Research | ||
License :: Free To Use But Restricted | ||
Programming Language :: Python :: 3.7 | ||
Programming Language :: Python :: 3.8 | ||
Programming Language :: Python :: 3.9 | ||
Topic :: Scientific/Engineering | ||
Topic :: Sociology | ||
Intended Audience :: Science/Research | ||
Operating System :: OS Independent | ||
|
||
[options] | ||
packages = find_namespace: | ||
package_dir = | ||
=src | ||
install_requires = | ||
hydrotools._restclient | ||
numpy >=1.20.0 | ||
pandas | ||
geopandas | ||
pydantic | ||
typing_extensions | ||
python_requires = >=3.7 | ||
|
||
[options.packages.find] | ||
where = src | ||
|
||
[options.extras_require] | ||
develop = | ||
pytest |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
# removing __version__ import will cause build to fail. see: https://github.com/pypa/setuptools/issues/1724#issuecomment-627241822 | ||
from ._version import __version__ | ||
|
||
from .clients import SVIClient |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
__version__ = "0.0.1" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,220 @@ | ||
from hydrotools._restclient import RestClient | ||
import pandas as pd | ||
import geopandas as gpd | ||
|
||
# local imports | ||
from . import url_builders | ||
from .types import GeographicScale, GeographicContext, Year, utilities, field_name_map | ||
|
||
# typing imports | ||
from typing import Union | ||
from pathlib import Path | ||
|
||
|
||
class SVIClient: | ||
def __init__( | ||
self, | ||
enable_cache: bool = True, | ||
cache_filename: Union[str, Path] = "svi_client_cache", | ||
) -> None: | ||
self._rest_client = RestClient( | ||
cache_filename=cache_filename, | ||
enable_cache=enable_cache, | ||
) | ||
|
||
def get( | ||
self, | ||
location: str, | ||
geographic_scale: GeographicScale, | ||
year: Year, | ||
geographic_context: GeographicContext = "national", | ||
) -> gpd.GeoDataFrame: | ||
"""Retrieve social vulnerability index thematic rankings and values for a given state or the | ||
U.S.. | ||
|
||
SVI values are available for the following years: 2000, 2010, 2014, 2016, and 2018. The CDC | ||
calculates the SVI at the census tract or county geographic scale. Likewise, the CDC | ||
calculates SVI rankings in two geographic contexts: (1) relative to a given state's SVI | ||
values or (2) relative to the U.S.. (1) permits interastate comparison and (2) permits | ||
national comparison. | ||
|
||
Note: `state` geographic_context is not supported at this time. | ||
|
||
Parameters | ||
---------- | ||
location : str | ||
state / national name or abbreviation (e.g. "AL", "US", "Wyoming", "new york") | ||
geographic_scale : GeographicScale "census_tract" or "county" | ||
geographic scale at which theme values were calculated | ||
year : Year | ||
2000, 2010, 2014, 2016, or 2018 | ||
geographic_context : GeographicContext "national" or "state", optional | ||
svi rankings calculated at the national or state level. use state for intrastate comparisons, by default "national" | ||
Note: `state` not supported at this time. will raise NotImplimented Error | ||
|
||
Returns | ||
------- | ||
pd.DataFrame | ||
Dataframe of Social Vulnerability Index values at the census tract or county scale | ||
|
||
columns names: | ||
state_name: str | ||
state_abbreviation: str | ||
county_name: str | ||
state_fips: str | ||
county_fips: str | ||
fips: str | ||
theme: str | ||
rank: float | ||
value: float | ||
svi_edition: str | ||
geometry: gpd.array.GeometryDtype | ||
|
||
|
||
Examples | ||
-------- | ||
>>> client = SVIClient() | ||
... df = client.get("AL", "census_tract", "2018") | ||
state_name state_abbreviation ... svi_edition geometry | ||
0 alabama al ... 2018 POLYGON ((-87.21230 32.83583, -87.20970 32.835... | ||
1 alabama al ... 2018 POLYGON ((-86.45640 31.65556, -86.44864 31.655... | ||
... ... ... ... ... ... | ||
29498 alabama al ... 2018 POLYGON ((-85.99487 31.84424, -85.99381 31.844... | ||
29499 alabama al ... 2018 POLYGON ((-86.19941 31.80787, -86.19809 31.808... | ||
|
||
""" | ||
url_path = url_builders.build_feature_server_url( | ||
location=location, | ||
geographic_scale=geographic_scale, | ||
year=year, | ||
geographic_context=geographic_context, | ||
count_only=True, | ||
) | ||
|
||
# RestClient only allows 200 response code or an aiohttp.client_exceptions.ClientConnectorError is raised | ||
# number of features | ||
count_request = self._rest_client.get(url_path) | ||
|
||
deserialized_count = count_request.json() | ||
count = deserialized_count["properties"]["count"] | ||
|
||
# number of features requested by a single request | ||
OFFSET = 1000 | ||
n_gets = (count // OFFSET) + 1 | ||
|
||
urls = [ | ||
url_builders.build_feature_server_url( | ||
location=location, | ||
geographic_scale=geographic_scale, | ||
year=year, | ||
geographic_context=geographic_context, | ||
result_offset=i * OFFSET, | ||
result_record_count=OFFSET, | ||
) | ||
for i in range(n_gets) | ||
] | ||
|
||
results = self._rest_client.mget(urls) | ||
|
||
# create geodataframe from geojson response | ||
df = pd.concat( | ||
[gpd.GeoDataFrame.from_features(r.json()) for r in results], | ||
ignore_index=True, | ||
) | ||
|
||
assert len(df) == count | ||
|
||
fnm = field_name_map.CdcEsriFieldNameMapFactory(geographic_scale, year) | ||
|
||
# map of dataset field names to canonical field names | ||
field_names = { | ||
v: k | ||
for k, v in fnm.dict(exclude_unset=True, exclude={"svi_edition"}).items() | ||
} | ||
|
||
df = df.rename(columns=field_names) | ||
|
||
# create missing fields if required | ||
df = fnm.create_missing_fields(df) | ||
|
||
df["svi_edition"] = fnm.svi_edition | ||
|
||
# wide to long format | ||
rank_col_names = df.columns.str.contains("rank$") | ||
|
||
df = df.melt( | ||
id_vars=df.columns[~rank_col_names], | ||
value_vars=df.columns[rank_col_names], | ||
var_name="rank_theme", | ||
value_name="rank", | ||
) | ||
|
||
value_col_names = df.columns.str.contains("value$") | ||
# some datasources do not include summed theme values | ||
if not (value_col_names == False).all(): | ||
df = df.melt( | ||
id_vars=df.columns[~value_col_names], | ||
value_vars=df.columns[value_col_names], | ||
var_name="value_theme", | ||
value_name="value", | ||
) | ||
# create theme column by truncating rank_theme's _rank suffix | ||
df["theme"] = df["rank_theme"].str.rstrip("_rank") | ||
|
||
# drop unnecessary cols | ||
# value_theme column might not exist, so ignore errors when trying to drop | ||
df = df.drop(columns=["rank_theme", "value_theme"], errors="ignore") | ||
|
||
# lowercase and strip all leading and trailing white spaces from str columns for consistent | ||
# output and quality control | ||
df_dtypes = df.dtypes | ||
str_cols = df_dtypes[df_dtypes == "object"].index | ||
df[str_cols] = df[str_cols].apply(lambda d: d.str.strip().str.lower()) | ||
|
||
df.sort_values("state_name", inplace=True, ignore_index=True) | ||
|
||
output_column_order = [ | ||
"state_name", | ||
"state_abbreviation", | ||
"county_name", | ||
"state_fips", | ||
"county_fips", | ||
"fips", | ||
"theme", | ||
"rank", | ||
"value", | ||
"svi_edition", | ||
"geometry", | ||
] | ||
|
||
# reorder dataframe columns | ||
# note, during reindex, if there are columns not present in dataframe, they will be created | ||
# with NaN row values | ||
df = df.reindex(columns=output_column_order) | ||
|
||
return df | ||
|
||
@staticmethod | ||
def svi_documentation_url(year: Year) -> str: | ||
year = utilities.validate_year(year) | ||
|
||
urls = { | ||
"2000": "https://www.atsdr.cdc.gov/placeandhealth/svi/documentation/pdf/SVI2000Documentation-H.pdf", | ||
"2010": "https://www.atsdr.cdc.gov/placeandhealth/svi/documentation/pdf/SVI-2010-Documentation-H.pdf", | ||
"2014": "https://www.atsdr.cdc.gov/placeandhealth/svi/documentation/pdf/SVI2014Documentation_01192022.pdf", | ||
"2016": "https://www.atsdr.cdc.gov/placeandhealth/svi/documentation/pdf/SVI2016Documentation_01192022.pdf", | ||
"2018": "https://www.atsdr.cdc.gov/placeandhealth/svi/documentation/pdf/SVI2018Documentation_01192022_1.pdf", | ||
} | ||
|
||
url = urls.get(year, None) | ||
|
||
# raise error if valid year not in urls. | ||
# when new svi releases are added, this will purposefully break. | ||
if url is None: | ||
# raise error | ||
error_message = ( | ||
f"documentation for year: {year} has not been added to SVIClient." | ||
) | ||
raise ValueError(error_message) | ||
|
||
return url |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be a dumb question, but does this always select only string columns?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it was a dumb question. I don't think it will always return a string column. I think datetime columns will also be included. Because of this comment, I changed this check to use
pd.DataFrame.select_dtypes
as you showed above.