Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Final data processing #16

Open
wants to merge 30 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
0166a09
updates git ignore and zoning map
samgdotson Jul 17, 2024
be79c90
adds snakefile and the steps required for pulling residential data
samgdotson Jul 17, 2024
5d48685
adds rule to build the dag file
samgdotson Jul 18, 2024
bc17302
starts adding utility access feature
samgdotson Jul 23, 2024
d3383d3
adds steps to get relevant utility rates (no processing)
samgdotson Jul 23, 2024
52cce01
updates the residential load retrieval
samgdotson Jul 23, 2024
9ec5bfb
adds rule to download project sunroof data
samgdotson Jul 24, 2024
0c51a31
propagates update to affected rules
samgdotson Jul 24, 2024
9a962ff
commits local updates
samgdotson Jul 29, 2024
b772eb6
pep8 fixes for census data retrieval
samgdotson Sep 3, 2024
30b63f8
pep8 fixes
samgdotson Sep 3, 2024
63a3aea
merge conf
samgdotson Sep 11, 2024
3420aa6
adds info about rules to README
samgdotson Sep 11, 2024
a468aef
downloads weather data along with load data
samgdotson Sep 11, 2024
a06a687
downloads total building data
samgdotson Sep 11, 2024
92fb838
adds rule to retrieve lead data
samgdotson Sep 12, 2024
966fe2c
adds rule to calculate the weighted average energy expenses by buildi…
samgdotson Sep 12, 2024
d22b6d2
adds model options to config
samgdotson Sep 13, 2024
c469860
adds hplib to environment file
samgdotson Sep 13, 2024
08ce780
adds rule to download several files from wykck gis database
samgdotson Sep 16, 2024
7e1d1e2
generalizes the community 'name'
samgdotson Sep 16, 2024
88845c6
propagates name change
samgdotson Sep 18, 2024
49d43f4
adds retail prices to config
samgdotson Sep 18, 2024
c5fc2e2
adds rescaling rule to snakefile
samgdotson Sep 18, 2024
e1e730d
adds rule to rescale load from resstock based on LEAD data
samgdotson Sep 18, 2024
73e8161
adds rule to retrieve nrel costs
samgdotson Sep 19, 2024
34a134a
adds costs to targets
samgdotson Sep 19, 2024
7aba5cb
updates readme and env template
samgdotson Sep 19, 2024
404a022
merge conf
samgdotson Sep 19, 2024
99674af
updates readme
samgdotson Sep 20, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 1 addition & 17 deletions .env.template
Original file line number Diff line number Diff line change
@@ -1,24 +1,8 @@
# This template comes from https://www.github.omc/kmax12/gridstatus

# Register at https://www.eia.gov/opendata/register.php for an API key
EIA_API_KEY=

# Register at https://apiportal.pjm.com/ for an API key
PJM_API_KEY=

# Register at https://apiexplorer.ercot.com/ for username/password
# and follow instructions at
# https://developer.ercot.com/applications/pubapi/ERCOT%20Public%20API%20Registration%20and%20Authentication/
# to get the subscription key
ERCOT_API_USERNAME=
ERCOT_API_PASSWORD=
ERCOT_API_SUBSCRIPTION_KEY=

# Request access at https://www.ncdc.noaa.gov/cdo-web/token
NOAA_API_KEY=

# Request access at https://api.census.gov/data/key_signup.html
CENCUS_API_KEY=
CENSUS_API_KEY=

# Request access at https://www.epa.gov/power-sector/cam-api-portal#/api-key-signup
CEMS_API_KEY=
6 changes: 5 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,5 +1,9 @@
cjest-data

.snakemake/
data
01-energy-utility.ipynb
02-census.ipynb
puma_maps.ipynb
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
Expand Down
94 changes: 93 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,94 @@
# 2024 Kansas City Analysis
This repository holds analysis for the energy system in Kansas City, Kansas. Located in Wyandotte County, Kansas.
This repository holds analysis for the energy system in Kansas City, Kansas. Located in Wyandotte County, Kansas.

# Installation

## Requirements

* `git` - version control software
* [Windows Installation instructions](https://git-scm.com/download/win)
* [MacOS Installation instructions](https://git-scm.com/download/mac)
* [Linux Installation instructions](https://git-scm.com/download/linux)
* Python installed with either `conda` or `mamba`(recommended)
* Download `mamba` installer [here](https://github.com/conda-forge/miniforge).
* 'anaconda' ('conda') installation instructions [here](https://docs.anaconda.com/anaconda/install/windows/).

> [!NOTE]
> Make sure you add Python to PATH during installation.

## Installation Steps
0. Open command prompt or terminal window. Copy and paste the following commands.

1. Clone the repository

```bash
git clone https://github.com/ucsusa/2024-kansas-city-analysis.git
```

2. Set up the environment

```bash
cd 2024-kansas-city-analysis
mamba env create # mamba and conda may be used interchangeably, here
mamba activate kansas-city
```

3. Creating the `.env` file

Users should copy the `.env.template` file into a new file simply called `.env`.
This file contains "secret" information, such as API keys, emails, and other data
that should remain local. In order to run the current model, users must have API keys
from the following organizations:

* [U.S. Census API](https://api.census.gov/data/key_signup.html)

These keys may be added directly to the `.env` file.

## Running the model

This project uses the workflow management tool, `snakemake`, to create a reproducible data pipeline.
Running the command

```bash
snakemake --cores=1
```

will run the workflow illustrated in the directed acyclic graph (DAG) shown below.

# Workflow

The flow of data through the modeling process is shown in the graph below.

![DAG](dag.png)

There are a few categories of steps:
* **Retrieve**: In a `retrieve` step, data are primarily downloaded and lightly processed (e.g., ensuring good formatting and data types).
* **Calculate**: In a `calculate` step, data are transformed through some calculation.
* *place holder for future additions*


## Steps

### `retrieve_census_data`
In this step, data from the U.S. Census Bureau are queried. The datasets gathered, here, are:
* Total population and
* the number and types of residential building units.

### `retrieve_armourdale_shape`
In this step, the "shape" of the community of interest is retrieved. This shape can be used as a cut-out
to subset other geospatial data later.

> [!NOTE]
> This data is specific to the particular community of Armourdale in Kansas City, Kansas. If you
> wish to model a different community, should omit this step or replace it with a different shape.
> For example, by specifying a few census tracts.

### `retrieve_spatial_lut`
This step downloads the spatial lookup table (LUT) for NREL's ResStock datasets. The spatial LUT
cross references census tracts, counties, and states with public use microdata areas (PUMAs). As
well as how the data are stored within NREL's models.

### `retreive_res_load`
Simulated building load data is collected from NREL's ResStock database in this step. Currently,
the data collected are aggregated building data for the building types defined in the `config.yml` file.
Future versions may include an option to specify individual buildings.
137 changes: 137 additions & 0 deletions Snakefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,137 @@
configfile: "config.yml"

from us import states
from pathlib import Path
from dotenv import load_dotenv

state = config['state']
state_abbr = states.lookup(state).abbr

community_name = config['community_name']

env_file = Path("./.env").resolve()
load_dotenv(str(env_file))

rule targets:
input:
community = f"data/spatial_data/{community_name.lower()}_shape.gpkg",
census_data = "data/spatial_data/county_census_data.gpkg",
state_blockgroups = f"data/spatial_data/{state.lower()}_blockgroups.gpkg",
county_blockgroups = f"data/spatial_data/{config['county'].lower()}_blockgroups.gpkg",
elec_load = "data/timeseries/residential_elec_load.csv",
heat_load = "data/timeseries/residential_heat_load.csv",
weather = "data/timeseries/weather_year.csv",
res_structures = "data/residential_buildings.csv",
rates = "data/usrdb_rates.csv",
project_sunroof = f"data/spatial_data/project-sunroof-census_tract.csv",
utility="data/spatial_data/electric_utility.gpkg",
lead_data = f"data/spatial_data/{state_abbr}-2018-LEAD-data/{state_abbr} AMI Census Tracts 2018.csv",
res_energy_expenses = f"data/{community_name.lower()}_energy_expenses.csv",
zoning_data = f"data/spatial_data/{community_name.lower()}/zoning.gpkg",
rescaled_elec_load = "data/timeseries/residential_elec_load_rescaled.csv",
costs = "data/technology_costs.csv",
dag = "dag.png"

rule retrieve_spatial_lut:
output:
spatial_lut = "data/spatial_data/spatial_lut.csv"
script: "scripts/retrieve_lut.py"

rule retrieve_census_data:
output:
census_data = "data/spatial_data/county_census_data.gpkg",
state_blockgroups = f"data/spatial_data/{state.lower()}_blockgroups.gpkg",
county_blockgroups = f"data/spatial_data/{config['county'].lower()}_blockgroups.gpkg"
script: "scripts/retrieve_census_data.py"

rule retrieve_project_sunroof:
input:
blockgroups = f"data/spatial_data/{state.lower()}_blockgroups.gpkg",
community = f"data/spatial_data/{community_name.lower()}_shape.gpkg"
output:
project_sunroof = "data/spatial_data/project-sunroof-census_tract.csv",
local_potential = f"data/spatial_data/{community_name.lower()}_rooftop_potential.gpkg"
script: "scripts/retrieve_project_sunroof.py"

# a bespoke step to make this analysis specific to community
rule retrieve_community_shape:
output:
community = f"data/spatial_data/{community_name.lower()}_shape.gpkg"
script: "scripts/retrieve_community_cutout.py"

rule retrieve_electric_utility:
input:
cutout=f"data/spatial_data/{community_name.lower()}_shape.gpkg"
output:
utility="data/spatial_data/electric_utility.gpkg"
script: "scripts/retrieve_electric_utility.py"

rule retrieve_usrdb:
input:
utility="data/spatial_data/electric_utility.gpkg"
output:
rates="data/usrdb_rates.csv"
script: "scripts/retrieve_usrdb.py"

rule calculate_res_structures:
input:
census_data = "data/spatial_data/county_census_data.gpkg",
community = f"data/spatial_data/{community_name.lower()}_shape.gpkg"
output:
res_structures = "data/residential_buildings.csv"
script: "scripts/calculate_res_structures.py"

rule retrieve_res_load:
input:
spatial_lut = "data/spatial_data/spatial_lut.csv"
output:
elec_load = "data/timeseries/residential_elec_load.csv",
heat_load = "data/timeseries/residential_heat_load.csv",
weather = "data/timeseries/weather_year.csv"
script: "scripts/retrieve_res_load.py"

rule retrieve_lead_data:
input:
community = f"data/spatial_data/{community_name.lower()}_shape.gpkg",
county_blockgroups = f"data/spatial_data/{config['county'].lower()}_blockgroups.gpkg"
output:
lead_data = f"data/spatial_data/{state_abbr}-2018-LEAD-data/{state_abbr} AMI Census Tracts 2018.csv",
lead_community = f"data/spatial_data/{community_name.lower()}_lead.csv"
script: "scripts/retrieve_lead_data.py"

rule retrieve_nrel_costs:
output:
costs = "data/technology_costs.csv"
script: "scripts/retrieve_nrel_costs.py"

rule calculate_historical_expenses:
input:
lead_community = f"data/spatial_data/{community_name.lower()}_lead.csv"
output:
res_energy_expenses = f"data/{community_name.lower()}_energy_expenses.csv"
script: "scripts/calculate_historical_expenses.py"

rule retrieve_community_spatial_data:
input:
community = f"data/spatial_data/{community_name.lower()}_shape.gpkg"
output:
zoning_data = f"data/spatial_data/{community_name.lower()}/zoning.gpkg"
script: "scripts/retrieve_shapefiles.py"

rule calculate_rescaled_load:
input:
res_energy_expenses = f"data/{community_name.lower()}_energy_expenses.csv",
elec_load = "data/timeseries/residential_elec_load.csv",
heat_load = "data/timeseries/residential_elec_load.csv",
res_structures = "data/residential_buildings.csv"
output:
rescaled_elec_load = "data/timeseries/residential_elec_load_rescaled.csv",
rescaled_heat_load = "data/timeseries/residential_heat_load_rescaled.csv",
script: "scripts/calculate_residential_load.py"

rule build_dag:
input: "Snakefile"
output:
"dag.png"
shell:
"snakemake --dag | dot -Tpng > {output}"
52 changes: 52 additions & 0 deletions config.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
# geographic data
state: 'Kansas'
county: 'Wyandotte'
community_name: 'Armourdale'

# historical data
census_year: 2020
census_level: 'tract'
usrdb_start_date: "2024-07-23" # today?
usrdb_future_date: "2099-01-01" # some date in the future, replaces NaT values

# price data
retail_price_elec: 0.1129 # from google, $/kWh
# https://www.kansasgasservice.com//media/KGS/Tariffs/20-RSS.pdf
retail_price_gas: 2.3485 # $/Mcf, 0.0080126123 $/kWh

# ATB cost options
atb_params:
atb_year: 2023 # the ATB publication year // DO NOT CHANGE
case: 'Market' # 'R&D'
scenario: 'Moderate' # 'Conservative', 'Advanced'
scale: 'Residential' # 'Utility', 'Commercial'
maturity: 'Y' # 'N'
crp: 30 # '20'
cost_year: 2025 # Any year 2020-2050

# model options
topology: "sectoral" # or building type // NOT IMPLEMENTED

# building data options
building_data_options:
resstock_year: 2021 # DO NOT CHANGE
comstock_year: 2021 # DO NOT CHANGE
weather_version: "tmy3" # or "amy2018"
release_version: 1
building_types:
residential:
- multi-family_with_2_-_4_units
- multi-family_with_5plus_units
- single-family_attached
- single-family_detached
- mobile_home
# commercial: # pending implementation
# -

energy_sectors:
- residential
# - commercial # pending implementation

# geographic options
geographic_crs: 4326 # for using lat/lon; EPSG code
projected_crs: 5070 # for doing calculations; EPSG code
Binary file added dag.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
5 changes: 3 additions & 2 deletions environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ channels:
- bioconda
dependencies:
# Requirements for core model functionality
- python==3.10.13
- python>=3.9
- pip
- ipython
- matplotlib
Expand All @@ -25,7 +25,7 @@ dependencies:
- momepy
- pysal
- osmnx
- spyder-kernels=2.5
- spyder-kernels>=2.5
- unyt
- cartopy
- descartes
Expand All @@ -48,3 +48,4 @@ dependencies:
- census
- streamlit
- vresutils
- git+https://github.com/FZJ-IEK3-VSA/hplib.git
Loading