-
Notifications
You must be signed in to change notification settings - Fork 0
/
texas_blackout.Rmd
445 lines (289 loc) · 19.2 KB
/
texas_blackout.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
---
title: '2021 Texas Blackout: Analyzing affected areas & socioeconomic factors that influenced recovery'
author: "Flora Hamilton"
date: "2022-10-26"
output:
html_document:
print_df: paged
toc: yes
toc_depth: 4
toc_float: yes
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
# Analyzing the Effects of the Texas February 2021 Storm on the Houston metropolitan Area
## Context
"In February 2021, Texas faced a significant power crisis resulting from three severe winter storms that swept across the United States on February 10--11, 13--17, and 15--20."[^1] For additional insights, explore these [engineering](https://www.youtube.com/watch?v=08mwXICY4JM&ab_channel=PracticalEngineering) and [political](https://www.youtube.com/watch?v=Zcrsgdl_hP0&ab_channel=Vox) perspectives.
[^1]: Wikipedia. 2021. "2021 Texas power crisis." Last modified October 2, 2021. <https://en.wikipedia.org/wiki/2021_Texas_power_crisis>.
## The objectives and methods:
- Estimate the number of households in Houston that experienced power loss due to the initial two storms.
- To determine the count of homes affected by power outages, we will spatially join these areas with [OpenStreetMap](https://www.openstreetmap.org/#map=4/38.01/-95.84) data on buildings and roads.
- Investigate whether socioeconomic factors serve as predictors of community recovery from a power outage.
- For the exploration of potential socioeconomic factors influencing recovery, the analysis will be linked with data from the US Census Bureau.
This analysis will rely on remotely-sensed night lights data acquired from the [Visible Infrared Imaging Radiometer Suite (VIIRS)](https://en.wikipedia.org/wiki/Visible_Infrared_Imaging_Radiometer_Suite) aboard the Suomi satellite. Specifically, we will use the VNP46A1 to identify variations in night lights before and after the storms, pinpointing areas that lost electrical power.
##### Spatial Skills:
- load vector/raster data\
- simple raster operations\
- simple vector operations\
- spatial joins
## Data
#### Night lights
We use NASA's Worldview to explore the data around the day of the storm. There are several days with too much cloud cover to be useful, but 2021-02-07 and 2021-02-16 provide two clear, contrasting images to visualize the extent of the power outage in Texas.
VIIRS data is distributed through NASA's [Level-1 and Atmospheric Archive & Distribution System Distributed Active Archive Center (LAADS DAAC)](https://ladsweb.modaps.eosdis.nasa.gov/). Many NASA Earth data products are distributed in 10x10 degree tiles in sinusoidal equal-area projection. Tiles are identified by their horizontal and vertical position in the grid. Houston lies on the border of tiles h08v05 and h08v06. We therefore need to download two tiles per date.
The following files had been sub-setted & stored in the `VNP46A1` folder.\
- `VNP46A1.A2021038.h08v05.001.2021039064328.tif`: tile h08v05, collected on 2021-02-07\
- `VNP46A1.A2021038.h08v06.001.2021039064329.tif`: tile h08v06, collected on 2021-02-07\
- `VNP46A1.A2021047.h08v05.001.2021048091106.tif`: tile h08v05, collected on 2021-02-16\
- `VNP46A1.A2021047.h08v06.001.2021048091105.tif`: tile h08v06, collected on 2021-02-16
#### Roads
Typically highways account for a large portion of the night lights observable from space (see Google's [Earth at Night](https://earth.google.com/web/@27.44405464,-84.7693044,206.63660162a,8916361.52264659d,35y,0h,0t,0r/data=CiQSIhIgMGY3ZTJkYzdlOGExMTFlNjk5MGQ2ZjgxOGQ2OWE2ZTc)). To minimize falsely identifying areas with reduced traffic as areas without power, we will ignore areas near highways.
[OpenStreetMap (OSM)](https://planet.openstreetmap.org/) is a collaborative project which creates publicly available geographic data of the world. Ingesting this data into a database where it can be subsetted and processed is a large undertaking. Fortunately, third party companies redistribute OSM data. We used [Geofabrik's download sites](https://download.geofabrik.de/) to retrieve a shapefile of all highways in Texas and prepared a Geopackage (`.gpkg` file) containing just the subset of roads that intersect the Houston metropolitan area.
- `gis_osm_roads_free_1.gpkg`
#### Houses
We also obtain building data from OpenStreetMap. We again downloaded from Geofabrick and prepared a GeoPackage containing only houses in the Houston metropolitan area.\
- `gis_osm_buildings_a_free_1.gpkg`
#### Socioeconomic
We cannot readily get socioeconomic information for every home, so instead we obtained data from the [U.S. Census Bureau's American Community Survey](https://www.census.gov/programs-surveys/acs) for census tracts in 2019. The *folder* `ACS_2019_5YR_TRACT_48.gdb` is an ArcGIS ["file geodatabase"](https://desktop.arcgis.com/en/arcmap/latest/manage-data/administer-file-gdbs/file-geodatabases.htm), a multi-file proprietary format that's roughly analogous to a GeoPackage file.\
We can use `st_layers()` to explore the contents of the geodatabase. Each layer contains a subset of the fields documents in the [ACS metadata](https://www2.census.gov/geo/docs/maps-data/data/tiger/prejoined/ACSMetadata2011.txt).\
The geodatabase contains a layer holding the geometry information, separate from the layers holding the ACS attributes. You have to combine the geometry with the attributes to get a feature layer that `sf` can use.
## Project Roadmap
Below is an outline of the steps taken to achieve the objectives.
## Find locations of blackouts
For improved computational efficiency and easier inter-operability with `sf`, we use the `stars` package for raster handling.\
##### combine the data
- read in night lights tiles\
- combine tiles into a single `stars` object for each date (2021-02-07 and 2021-02-16) using `st_mosaic`
```{r, warning=FALSE, message=FALSE}
library(sf)
library(stars)
library(tmap)
library(raster)
library(terra)
library(dplyr)
library(ggplot2)
```
```{r, warning=FALSE,message=FALSE}
## Read in all four night lights tiles, in folder VNP46A1, using the package "stars"
tile1_2021_02_07 = read_stars("~/dev/eds223/assignment-3-floraham/data/VNP46A1/VNP46A1.A2021038.h08v05.001.2021039064328.tif", package = "stars")
tile2_2021_02_07 = read_stars("~/dev/eds223/assignment-3-floraham/data/VNP46A1/VNP46A1.A2021038.h08v06.001.2021039064329.tif", package = "stars")
tile3_2021_02_16 = read_stars("~/dev/eds223/assignment-3-floraham/data/VNP46A1/VNP46A1.A2021047.h08v05.001.2021048091106.tif", package = "stars")
tile4_2021_02_16 = read_stars("~/dev/eds223/assignment-3-floraham/data/VNP46A1/VNP46A1.A2021047.h08v06.001.2021048091105.tif", package = "stars")
# combine tiles into single stars object for each date (2021-02-07 and 2021-02-16), using st_mosaic
combined_02_07 = st_mosaic(tile1_2021_02_07, tile2_2021_02_07)
combined_02_16 = st_mosaic(tile3_2021_02_16, tile4_2021_02_16)
```
```{r results= FALSE, warning=FALSE, message=FALSE}
#check that everything is loaded in and in the right formats
tile1_2021_02_07
tile2_2021_02_07
tile3_2021_02_16
tile4_2021_02_16
combined_02_07
combined_02_16
```
##### create a blackout mask
- find the change in night lights intensity (presumably) caused by the storm\
- reclassify the difference raster, assuming that any location that experienced a drop of more than 200 nW cm^-2^sr^-1^ experienced a blackout\
- assign `NA` to all locations that experienced a drop of *less* than 200 nW cm^-2^sr^-1^\
```{r}
#find the change in night lights intensity (presumably) caused by the storm by calculating the light difference between the two dates
difference <- combined_02_07 - combined_02_16
#check out difference in night lights intensity
plot(difference)
#reclassify the difference raster, assuming that any location that experienced a drop of more than 200 nW cm-2sr-1 experienced a blackout by assigning NA to locations that experienced drop of less than 200 nW cm-2sr-1
#assigning all the houses that experienced a drop of 200 and more to a blackout mask.
mask_blackout <- difference > 200
mask_blackout[mask_blackout == FALSE] <- NA
```
##### vectorize the mask
- use `st_as_sf()` to vectorize the blackout mask\
- fix any invalid geometries using `st_make_valid`
```{r results = FALSE}
#vectorize the blackout mask with st_as_sf()
#use st_make_valid to fix any invalid geometries
blackout_mask_vector <- st_as_sf(mask_blackout) %>% st_make_valid()
##Self-check, check out what the blackout mask looks like
plot(blackout_mask_vector)
```
##### crop the vectorized map to our region of interest
- define the Houston metropolitan area with the following coordinates\
- (-96.5, 29), (-96.5, 30.5), (-94.5, 30.5), (-94.5, 29)\
- turn these coordinates into a polygon using `st_polygon`\
- convert the polygon into a simple feature collection using `st_sfc()` and assign a CRS\
- hint: because we are using this polygon to crop the night lights data it needs the same CRS\
- crop (spatially subset) the blackout mask to our region of interest
- re-project the cropped blackout dataset to EPSG:3083 (NAD83 / Texas Centric Albers Equal Area)\
```{r include=TRUE}
#crop the vectorized map to our region of interest using polygon of Houston's coordinates
houston_polygon <- st_polygon(list(rbind(c(-96.5,29), c(-96.5,30.5), c(-94.5, 30.5), c(-94.5,29), c(-96.5,29))))
#convert the polygon into a simple feature collection using st_sfc() and assign the CRS 4326, which is the same as the night lights data
houston_border_sf <- st_sfc(houston_polygon, crs = st_crs(mask_blackout))
##inspect the polygon crs to make sure the crs is 4326, nightlights dataset
st_crs(houston_border_sf) == st_crs(mask_blackout)
# spatially subset the blackout mask to the Houston Area
houston_blackout <- blackout_mask_vector[houston_border_sf, ,]
#re-project the cropped blackout dataset to EPSG:3083 (NAD83 / Texas Centric Albers Equal Area)
houston_blackout <- st_transform(houston_blackout, crs = 'EPSG:3083')
## transform to sf object
final_houston_blackout <- st_as_sf(houston_blackout)
#verify that it's an sf object
class(final_houston_blackout)
##verify that the final, transformed sf houston blackout dataset is 3083
st_crs(final_houston_blackout)
```
##### exclude highways from blackout mask
The roads geopackage includes data on roads other than highways. However, we can avoid reading in data we don't need by taking advantage of `st_read`'s ability to subset using a SQL query.\
- define SQL query\
- load just highway data from geopackage using `st_read`\
- reproject data to EPSG:3083\
- identify areas within 200m of all highways using `st_buffer. st_buffer` produces undissolved buffers, use `st_union` to dissolve them\
- find areas that experienced blackouts that are further than 200m from a highway
`query <- "SELECT * FROM gis_osm_roads_free_1 WHERE fclass='motorway'"`\
`highways <- st_read("data/gis_osm_roads_free_1.gpkg", query = query)`
```{r results = FALSE, warning=FALSE,message=FALSE}
## Read in highway data from roads using SQL query ##
query <- "SELECT * FROM gis_osm_roads_free_1 WHERE fclass='motorway'"
highways <- st_read("~/dev/eds223/assignment-3-floraham/data/gis_osm_roads_free_1.gpkg", query = query)
```
```{r include=TRUE, warning=FALSE,message=FALSE}
# reproject data to EPSG:3083
highways_3038 <- st_transform(highways, crs="EPSG:3083")
##verify it's the right crs
st_crs(highways_3038)
# identify areas within 200m of all highways using st_buffer. Use st_union to dissolve overlapping polygons.
a_near_hwys <- st_buffer(highways_3038, dist = 200) %>% st_union()
# find areas that experienced blackouts that are further than 200m from a highway by spacially subsetting houston blackout areas to ones near highways, and taking st_difference of that.
blackouts_far_hwy_houston <- final_houston_blackout[a_near_hwys, , op = st_difference]
#plot to see what it looks like
plot(blackouts_far_hwy_houston)
```
## Find homes impacted by blackouts
##### load buildings data
- load buildings dataset using `st_read` and the following SQL query to select only residential buildings. We first reproject data to EPSG:3083
`SELECT *` `FROM gis_osm_buildings_a_free_1`\
`WHERE (type IS NULL AND name IS NULL)`\
`OR type in ('residential', 'apartments', 'house', 'static_caravan', 'detached')`\
```{r results=FALSE, include=TRUE}
#load buildings dataset using st_read and the following SQL query to select only residential buildings
query_buildings <- "SELECT * FROM gis_osm_buildings_a_free_1 WHERE (type IS NULL AND name IS NULL) OR type in ('residential', 'apartments', 'house', 'static_caravan', 'detached')"
buildings <- st_read("~/dev/eds223/assignment-3-floraham/data/gis_osm_buildings_a_free_1.gpkg", query = query_buildings)
```
##### find homes in blackout areas
- filter to homes within blackout areas\
- count number of impacted **homes**\
```{r include=TRUE}
##check CRS's are the same
st_crs(buildings) == st_crs(blackouts_far_hwy_houston)
##transform buildings crs
buildings <- st_transform(buildings, crs = st_crs(blackouts_far_hwy_houston))
# filtering the houses data with the blackout mask
buildings_blackout <- buildings[blackouts_far_hwy_houston, drop = FALSE]
## View the data frame for inspection
View(buildings_blackout)
#count number of impacted homes
print(paste0(nrow(buildings_blackout), " Houston homes experienced power outage due to the Texas storms in Feburary, 2021"))
```
## Investigate socioeconomic factors
##### load ACS data
- use `st_read()` to load the geodatabase layers\
- geometries are stored in the `ACS_2019_5YR_TRACT_48_TEXAS` layer\
- income data is stored in the `X19_INCOME` layer\
- select the median income field `B19013e1`\
- hint: reproject data to EPSG:3083\
```{r, warning=FALSE,message=FALSE}
census <- st_read("~/dev/eds223/assignment-3-floraham/data/ACS_2019_5YR_TRACT_48_TEXAS.gdb", layer = "ACS_2019_5YR_TRACT_48_TEXAS")
income <- st_read("~/dev/eds223/assignment-3-floraham/data/ACS_2019_5YR_TRACT_48_TEXAS.gdb", layer = "X19_INCOME")
#select median income field B19013e1
median_income <- income %>% dplyr::select(GEOID, B19013e1)
```
##### determine which census tracts experienced blackouts
- join the income data to the census tract geometries\
- hint: make sure to join by geometry ID\
- spatially join census tract data with buildings determined to be impacted by blackouts\
- find which census tracts had blackouts\
```{r include=TRUE}
#determine which census tracts experienced blackouts
#Join the income data to the census tract geometries. Join by geometry ID
median_income_df <- median_income %>% rename(GEOID_Data = GEOID,
median_income = B19013e1)
##check types to varify that they are the same class
class(median_income_df$GEOID_Data) == class(census$GEOID_Data)
joined_income_census <- left_join(census,
median_income_df,
by = "GEOID_Data")
## filter census data by buildings determined to be impacted by blackouts.
# transforming both objects to the correct crs
joined_income_census <- st_transform(joined_income_census, crs = st_crs(census))
buildings_blackout <- st_transform(buildings_blackout, crs = st_crs(census))
#check that they are the same crs
st_crs(joined_income_census) == st_crs(buildings_blackout)
## filtering and mutating to add a blackout column for affected tracts
census_blackout <- joined_income_census[buildings_blackout,]%>% mutate(blackout = 'Y')
print(paste0(nrow(census_blackout), " census tracts had blackouts"))
## Since we have blackout status of Y indicated, we will want to indicate "N" for no-blackout areas as well, for full dataset.
#census_income_cropped combined with census_blackout
census_Y_N <- full_join(as.data.frame(joined_income_census), as.data.frame(census_blackout))
census_Y_N$blackout[is.na(census_Y_N$blackout)] = "N"
#check the class
class(census_Y_N)
## make sure it's an sf object
census_Y_N <- st_as_sf(census_Y_N)
#check census_Y_N class, that it's a dataframe.
class(census_Y_N)
```
##### compare incomes of impacted tracts to unimpacted tracts
- create a map of median income by census tract, designating which tracts had blackouts
- plot the distribution of income in impacted and unimpacted tracts
```{r, warning=FALSE,message=FALSE}
#join income data to the census tract geometries and crop to region of interest
census_income_geom <- left_join(census, median_income_df,
by = "GEOID_Data") %>% st_transform(3083) #this is texas#
houston_border_sf <- houston_border_sf %>% st_transform(3083)
#spatial crop joined census data to houston border
census_income_geom_cropped <- census_income_geom[houston_border_sf, op = st_intersects]
## use tmap to plot income distribution and affected areas. Make graph aesthetics.
tmap_mode("plot")
tm_shape(census_income_geom_cropped) +
tm_graticules() +
tm_polygons("median_income", palette= "-viridis", title = "Median income") +
tm_shape(census_blackout) + tm_dots(size = 0.4, alpha = 0.5) +
tm_title( "Distribution of Med. Income & blackout status in Houston") +
tm_scalebar(position = c("RIGHT", "BOTTOM")) +
tm_compass(position = c("LEFT", "BOTTOM")) +
tm_xlab("Longitude", size = 0.5) +
tm_ylab("Latitude", rotation = 90, size = 0.5)
```
```{r include=FALSE}
## this is for leaflet viewing in the future ##
tm_shape(census_income_geom_cropped) + tm_polygons("median_income", palette= "viridis") + tm_shape(census_blackout) + tm_dots(size = 0.4, alpha = 0.5)
```
```{r}
#plot histogram of the distribution of median incomes of homes by blackout status
#Plot from one dataset so that ggplot can plot both datasets on one axis & autogenerate legend
census_Y_N_houston <- census_Y_N %>% st_transform(3083)
houston_border_sf <- houston_border_sf %>% st_transform(3083)
census_Y_N_houston <- census_Y_N_houston[houston_border_sf, op = st_intersects]
ggplot(census_Y_N_houston, aes(median_income)) + geom_histogram(aes(color = blackout, fill = blackout), bins = 50) + ggtitle("Distribution of median income of homes in Houston by blackout status")
```
```{r}
## Doing other statistical explorations
# plotting the comparison data via geom jitter plot
ggplot(census_Y_N_houston, aes(x = median_income, y = blackout)) +
geom_boxplot(aes(color = blackout), alpha = 0.8) +
labs(title = "Median Income of Houston Homes based on Blackout Status",
y = "Blackout Status",
x = "Median Income") +
theme_minimal()
```
```{r, warning=FALSE, message=FALSE}
### Summarizing statistics
census_Y_N_houston %>%
group_by(blackout) %>%
summarize(mean = mean(median_income, na.rm = TRUE),
median = median(median_income, na.rm = TRUE),
sd = sd(median_income, na.rm = TRUE))
```
## Discussion
We found that 778 census tracts had been affected by Texas's 2021 energy crisis, amounting to 164,867 total homes experiencing power outage between the studied dates (Feb 07, 2021 to Feb 16, 2021).
The average median income for homes that experienced a blackout was \$70,939, slightly higher than the average median income for homes that didn't experience a blackout, which was \$67,859. It is important to note that 1) our factor of interest was median income within census tracts and that other socioeconomic factors could reveal unexplored spatial patterns, and that 2) weather conditions were not normalized to conduct this exercise, variable that could cause a different outcome.