-
Notifications
You must be signed in to change notification settings - Fork 6
/
Copy pathindex.Rmd
492 lines (354 loc) · 20.5 KB
/
index.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
---
pagetitle: "Raster data handling with Python"
author: "Jan Verbesselt, Jorge Mendes de Jesus, Aldo Bergsma, Dainius Masiliunas, David Swinkels, Judith Verstegen, Corné Vreugdenhil"
date: "`r format(Sys.time(), '%Y-%m-%d')`"
output:
rmdformats::html_clean:
highlight: zenburn
---
```{css, echo=FALSE}
@import url("https://netdna.bootstrapcdn.com/bootswatch/3.0.0/simplex/bootstrap.min.css");
.main-container {max-width: none;}
div.figcaption {display: none;}
pre {color: inherit; background-color: inherit;}
code[class^="sourceCode"]::before {
content: attr(class);
display: block;
text-align: right;
font-size: 70%;
}
code[class^="sourceCode r"]::before { content: "R Source";}
code[class^="sourceCode python"]::before { content: "Python Source"; }
code[class^="sourceCode bash"]::before { content: "Bash Source"; }
```
<font size="6">[WUR Geoscripting](https://geoscripting-wur.github.io/)</font> <img src="https://www.wur.nl/upload/854757ab-168f-46d7-b415-f8b501eebaa5_WUR_RGB_standard_2021-site.svg" alt="WUR logo" style="height: 35px; margin:inherit;"/>
# Raster data handling with Python
## Introduction
Today we will work with Python packages for spatial raster analysis. Python has some dedicated packages to handle rasters:
* [OWSLib](https://geopython.github.io/OWSLib/) allows us to download geospatial raster data from Web Coverage Services
* [GDAL](https://gdal.org/api/python_bindings.html) is powerful library for reading, writing and warping raster datasets
* [Rasterio](https://Rasterio.readthedocs.io/en/latest/) reads and writes geospatial raster data
* [rasterstats](https://pythonhosted.org/rasterstats/) summarizes geospatial raster datasets based on vector geometries
* [NumPy](http://www.numpy.org/) is fundamental package for scientific computing, such as array (thus raster) calculations
## Learning objectives
- Be able to read spatial raster formats from web services and files
- Be able to write spatial raster formats to disk
- Know how to apply basic operations on raster data, such as arithmetics
- Be able to plot spatial raster data with Matplotlib
## Setting up the Python Environment
Make a directory structure for this tutorial:
```{r, eval=FALSE,engine='bash'}
cd ~/Documents/
mkdir PythonRaster #or give the directory a name to your liking
cd ./PythonRaster
mkdir data
mkdir output
```
Like in the previous tutorials, we will create a conda environment with a `.yaml` file:
```
name: raster
dependencies:
- python
- numpy
- matplotlib
- spyder
- owslib
- gdal
- geopandas
- rasterio
- rasterstats
- affine
- osmnx
```
After creation, activate the environment, open Spyder, create a script in the root directory, and start coding.
# Reading raster data and accessing metadata
## From a Web Coverage Service
A Web Coverage Service (WCS) loads raster data in a similar way as Web Feature Services (WFS) load vector data. [Web Coverage Services](https://www.ogc.org/standards/wcs) are a standard by the Open Geospatial Consortium and allow the downloading of geospatial raster data with multiple types of format encoding: GeoTIFF, netCDF, JPEG2000 etc. A [Web Map Service](https://www.ogc.org/standards/wms) [WMS] also exists for rasters; it allows downloading of images but without the data values.
Today we will work with elevation rasters. More specifically, we will have a look at the WCS of the AHN dataset. AHN stands for "Actueel Hoogtebestand Nederland" and is a Digital Elevation Model [DEM] that covers the Netherlands. Access the web coverage service to have a look at the contents:
```{python, eval=FALSE}
from owslib.wcs import WebCoverageService
# Access the WCS by proving the url and optional arguments
wcs = WebCoverageService('https://service.pdok.nl/rws/ahn/wcs/v1_0?SERVICE=WCS&request=GetCapabilities', version='1.0.0')
# Print to check the contents of the WCS
print(list(wcs.contents))
```
Running the last line of code shows that the Web Coverage Service of the AHN3 contains two rasters: 0.5m DSM, and 0.5m DTM. Raster data of AHN has the projected coordinate system `RD_New (EPSG: 28992)`.
We can also check what types of operations are available for this WCS:
```{python, eval=FALSE}
# Get all operations and print the name of each of them
print([op.name for op in wcs.operations])
```
You will see that the Web Coverage Service allows accessing the data (GetCoverage), the metadata (DescribeCoverage), and the capabilities (GetCapabilities). These are all standard protocols defined by the OGC.
Several functions are available to access specific metadata of each individual raster, for example:
```{python, eval=FALSE}
# Take the 0.5m DSM as an example
cvg = wcs.contents['dsm_05m']
# Print supported reference systems, the bounding box defined in WGS 84 coordinates, and supported file formats
print(cvg.supportedCRS)
print(cvg.boundingBoxWGS84)
print(cvg.supportedFormats)
```
Let us have a look at the data itself. As we do not want to overload the web service, we download once and store the data locally.
Download the Digital Surface Model [DSM], which is the the 'dsm_05m' version, and Digital Terrain Model [DTM], which is the 'dtm_05m' version, to a local file. The difference between a DEM, DSM and DTM is explained on the [GIS StackExchange](https://gis.stackexchange.com/questions/5701/what-is-the-difference-between-dem-dsm-and-dtm/5704).
```{python, eval=FALSE}
import os
# Define a bounding box in the available crs (see before) by picking a point and drawing a 1x1 km box around it
x, y = 174100, 444100
bbox = (x - 500, y - 500, x + 500, y + 500)
# Request the DSM data from the WCS
response = wcs.getCoverage(identifier='dsm_05m', bbox=bbox, format='GEOTIFF',
crs='urn:ogc:def:crs:EPSG::28992', resx=0.5, resy=0.5)
# Write the data to a local file in the 'data' directory
with open('data/AHN3_05m_DSM.tif', 'wb') as file:
file.write(response.read())
# Do the same for the DTM
response = wcs.getCoverage(identifier='dtm_05m', bbox=bbox, format='GEOTIFF',
crs='urn:ogc:def:crs:EPSG::28992', resx=0.5, resy=0.5)
with open('data/AHN3_05m_DTM.tif', 'wb') as file:
file.write(response.read())
```
Before continuing, please check if this step was successful (*Hint: you can check if files have been written into corresponding directory*).
## From a file with GDAL
[GDAL](https://gdal.org/) handles raster and vector geospatial data formats with Python, Java, R and C APIs. When opening a raster file in gdal, the object has a hierarchical structure starting at the Dataset level. A Dataset has a Geotransform (metadata) and can contain one or more Bands. Each Band has a Data array and potentially Overviews.
<figure>
<center>
<img src="./images/gdal.png" alt="gdal structure" width = "90%">
<figcaption>GDAL class structure, adapted from Garrard, 2016, Geoprocessing with Python.</figcaption>
</center>
</figure>
Let us open the file we just saved. You will see you first get the dataset, and need to access the band (even though there is only one), before the data array can be accessed.
```{python, eval=FALSE}
from osgeo import gdal
# Open dataset, gdal automatically selects the correct driver
ds = gdal.Open("data/AHN3_05m_DSM.tif" )
# Get the band (band number 1)
band = ds.GetRasterBand(1)
# Get the data array
data = band.ReadAsArray()
print(data)
# Delete objects to close the file
ds = None
```
```{block, type="alert alert-success"}
> **Question 1**: Why do we set ds to None at the end of your script? What may happen if you do not do that?
<details>
<summary>*Click for answer*</summary>
Keeping files open may leave you vulnerable to losing data, (Geo)Pandas manage resources under the hood so you don't explicitly need to close files, but for the case of GDAL, and as you will later see, Rasterio, it's important to close your files or open them with a context manager `with open ...`
</details>
```
The GDAL Python API is not the best documented Python module. Therefore, Rasterio is explained as an alternative raster data handling module.
## From a file with Rasterio
[Rasterio](https://Rasterio.readthedocs.io/en/latest/intro.html) reads and writes multiple raster formats based on GDAL, provides raster processing functions based on NumPy arrays and GeoJSON, and integrates Matplotlib in the module `rasterio.plot` for visualization.
The rest of the tutorial below is a complete route of handling a raster dataset. We will use the DEMs from a the WCS for our study area, handle it with Rasterio, calculate new information (CHM), overlay it with vector data representing buildings and visualize it.
Let us read in the raster data we just stored from the WCS with Rasterio and plot it with `rasterio.plot`:
```{python, eval=FALSE}
import rasterio
from rasterio.plot import show
import matplotlib.pyplot as plt
# Open the two rasters
dsm = rasterio.open("data/AHN3_05m_DSM.tif", driver="GTiff")
dtm = rasterio.open("data/AHN3_05m_DTM.tif", driver="GTiff")
# Metadata functions from Rasterio
print(dsm.meta)
print(dtm.meta)
# Plot with rasterio.plot, which provides Matplotlib functionality
plt.figure(figsize=(5, 5), dpi=300) # adjust size and resolution
show(dsm, title='Digital Surface Model', cmap='gist_ncar')
```
<img src="./images/dsm_campus.png" alt="Water pixels of the Netherlands" width = "60%">
```{block, type="alert alert-success"}
> **Question 2**: Adjust the code above to take a look at the DTM. Note the gaps that appear. What are these gaps?
<details>
<summary>*Click for answer*</summary>
These are buildings.
</details>
```
The metadata shows the driver (GDAL's way of knowing how to function with a specific file format), datatype, nodata value, width of raster in number of cells, height of raster in number of cells, number of raster bands in the dataset, coordinate reference system, and transformation values.
In the back-end, raster layers in Rasterio are stored as NumPy arrays, which appear when the data are read with the method `.read()`:
```{python, eval=FALSE}
# Rasterio object
print(type(dsm))
# Read, show object type and data
dsm_data = dsm.read(1)
print(type(dsm_data))
print(dsm_data)
```
# Processing raster data
## Creating a Canopy Height Model
A Canopy Height Model (CHM) gives an indication of the height of trees and/or buildings. It can be created by subtracting a Digital Terrain Model from a Digital Surface Model. In the resulting raster, each cell value represents the height above the underlying surface topography.
```{python, eval=FALSE}
import numpy as np
# Access the data from the two rasters
dsm_data = dsm.read()
dtm_data = dtm.read()
# Set our nodata to np.nan (this is important for later)
dsm_data[dsm_data == dsm.nodata] = np.nan
dtm_data[dtm_data == dtm.nodata] = np.nan
```
Earlier, we noticed that the DTM included gaps. Let's first fill these gaps using the `fillnodata()` function from `rasterio.fill`. For more information, see the [documentation](https://rasterio.readthedocs.io/en/latest/api/rasterio.fill.html).
```{python, eval=FALSE}
from rasterio.fill import fillnodata
# Create a mask to specify which pixels to fill (0=fill, 1=do not fill)
dtm_mask = dtm_data.copy()
dtm_mask[~np.isnan(dtm_data)] = 1
dtm_mask[np.isnan(dtm_data)] = 0
# Fill missing values
dtm_data = fillnodata(dtm_data, mask=dtm_mask)
```
Now, let's can create our CHM:
```{python, eval=FALSE}
# Subtract the NumPy arrays
chm = dsm_data - dtm_data
# Check the resulting array
print(chm)
# Copy metadata of one of the rasters (does not matter which one)
kwargs = dsm.meta
# Save the chm as a raster
with rasterio.open('data/AHN3_05m_CHM.tif', 'w', **kwargs) as file:
file.write(chm.astype(rasterio.float32))
```
```{block, type="alert alert-success"}
> **Question 3**: Where is the CHM the highest in the study area? Is it what you expected?
<details>
<summary>*Click for answer*</summary>
Think about where you have the most forests on campus.
</details>
```
We have now applied the basic concepts of creating a Canopy Height Model!
## Computing heights of buildings
Using our CHM, let's determine the average heights of the buildings in our study area. The first step is to download building data from the BAG Web Feature Service that we also used in the vector tutorial. Note that we make use of the `bbox` from an earlier codeblock for this.
```{python, eval=FALSE}
import geopandas as gpd
import json
from owslib.wfs import WebFeatureService
# Get the WFS of the BAG
wfsUrl = 'https://service.pdok.nl/lv/bag/wfs/v2_0'
wfs = WebFeatureService(url=wfsUrl, version='2.0.0')
layer = list(wfs.contents)[0]
# Get the features for the study area
# notice that we now get them as json, in contrast to before
response = wfs.getfeature(typename=layer, bbox=bbox, outputFormat='json')
data = json.loads(response.read())
# Create GeoDataFrame, without saving first
buildings_gdf = gpd.GeoDataFrame.from_features(data['features'])
# Set crs to RD New
buildings_gdf.crs = 28992
```
<!-- Used for the previous code but there were some issues downloading data via requests so the code block was replaced
```{block, type="alert alert-success"}
> **Question 4**: What happens in the line "bb = ','.join(map(str, bbox))"? Look up how `map()` [works in Python](https://www.geeksforgeeks.org/python-map-function/) if you do not know.
```
-->
The next step is to perform zonal statistics to get the average height value per building polygon. We will do this with the module Rasterstats, which can use a `GeoDataFrame` and a `.tif` file for this task. Here, we make it output a [GeoJSON](http://geojson.org/).
```{python, eval=FALSE}
import rasterstats as rs
# Apply the zonal statistics function with gdf and tif as input
chm_buildings = rs.zonal_stats(buildings_gdf, "data/AHN3_05m_CHM.tif", prefix='CHM_', geojson_out=True)
# Convert GeoJSON to GeoDataFrame
buildings_gdf = gpd.GeoDataFrame.from_features(chm_buildings)
# Check the added attributes with a prefix 'CHM_'
print(buildings_gdf['CHM_mean'])
```
A quick visualization shows us the heights derived from the raster data on the map:
```{python, eval=FALSE}
# Create one plot with figure size 10 by 10
fig, ax = plt.subplots(1, figsize=(10, 10))
# Customize figure with title, legend, and facecolour
ax.set_title('Heights above ground (m) of buildings on the WUR campus')
buildings_gdf.plot(ax=ax, column='CHM_mean', k=6,
cmap=plt.cm.viridis, linewidth=1, edgecolor='black', legend=True)
ax.set_facecolor("lightgray")
# Make sure to get an equal scale in the x and y direction
plt.axis('equal')
# Visualize figure
plt.show()
```
<img src="./images/buildingsGDF.png" alt="Buildings on Wageningen Campus and their height" width = "100%">
```{block, type="alert alert-success"}
> **Question 4**: Why do we want an equal scale in the x and y direction for this figure?
<details>
<summary>*Click for answer*</summary>
To visualize the buildings properly, otherwise their geometries will be skewed.
</details>
```
## Other functionality
Note that this tutorial only scratches the surface of the possibilities of Rasterio. It can do most if not all things you did in `R` in the Vector - Raster tutorial. Rasterio for example also allows you to do [masking](https://rasterio.readthedocs.io/en/latest/topics/masking-by-shapefile.html), [reprojecting](https://rasterio.readthedocs.io/en/latest/topics/reproject.html), and [resampling](https://rasterio.readthedocs.io/en/latest/topics/resampling.html).
# More on writing raster data to a file
As you've seen before, to store the NumPy array as a raster file, Rasterio needs the accompanying metadata. It is possible to use the metadata of an existing raster (which we did before), but it is also possible to create it from scratch.
To create metadata from scratch, the CRS can be defined with a function from Rasterio and the transformation can be defined using Affine. Affine is a Python module that facilitates [affine transformations](https://www.quora.com/In-an-intuitive-explanation-what-is-an-affine-transformation-of-image), i.e. scaling, rotating, mirroring or skewing of images/rasters/arrays.
Rasterio can write most [raster formats from GDAL](https://www.gdal.org/formats_list.html). [The developers recommend using GeoTiff driver](https://github.com/mapbox/rasterio/issues/731) for writing as it is the best-tested and best-supported format.
```{python, eval=FALSE}
import affine
# Specify the components of the crs (we know them from the DSM)
kwargs = {'driver': 'GTiff',
'dtype': 'float32',
'nodata': np.nan,
'width': 2000,
'height': 2000,
'count': 1,
'crs': rasterio.crs.CRS({'init': 'epsg:28992'}),
'transform': affine.Affine(0.5, 0.0, 173600.0, 0.0, -0.5, 444600.0)}
# Write the raster file
with rasterio.open('data/AHN3_05m_CHM_affine.tif', 'w', **kwargs) as file:
file.write(chm.astype(rasterio.float32))
```
# More on raster data visualization
Raster data can be visualized by passing NumPy arrays to Matplotlib directly or by making use of a method in Rasterio that accesses Matplotlib for you. Using Matplotlib directly allows more flexibility, such as tweaking the legend, axis and labels, and is more suitable for professional purposes. The visualization using Rasterio requires less code and can give a quick idea of your raster data. We show both approaches below. Let's first make a visualization of the DSM using Matplotlib:
```{python, eval=FALSE}
# Create one plot with figure size 10 by 10
fig, ax = plt.subplots(figsize=(10, 10), dpi=200)
# imshow() is the main raster plotting method in Matplotlib
# Again, ensure an equal scale in the x and y direction
dsmplot = ax.imshow(dsm_data[0], cmap='Oranges', extent=bbox, aspect='equal')
# Title (do not do this for a scientific report, use a caption instead)
ax.set_title("Digital Surface Model - WUR Campus", fontsize=14)
# Add a legend (colourbar) with label
cbar = fig.colorbar(dsmplot, fraction=0.035, pad=0.01)
cbar.ax.get_yaxis().labelpad = 15
cbar.ax.set_ylabel('Height (m)', rotation=90)
# Hide the axes
ax.set_axis_off()
plt.show()
```
<img src="./images/DSMgood.png" alt="Digital Surface Model of WUR Campus" width = "100%">
If you do not like the orange colourmap of Matplotlib, it is also possible to pick [another colourmap](https://Matplotlib.org/examples/color/colormaps_reference.html).
The second approach with Rasterio only requires one line of code to make a plot. By creating subplots, the figures can be combined (this can be done with Matplotlib directly as well).
```{python, eval=FALSE}
# Figure with three subplots, unpack directly
fig, (axdsm, axdtm, axchm) = plt.subplots(1, 3, figsize=(15, 7), dpi=200)
# Populate the three subplots with raster data
show(dsm_data, ax=axdsm, title='DSM')
show(dtm_data, ax=axdtm, title='Filled DTM')
show(chm, ax=axchm, title='CHM')
plt.show()
```
<img src="./images/DSMDTMCHMImage.png" alt="Canopy Height, Digital Surface and Terrain Model of WUR Campus" width = "100%">
Rasterio can also create simple histograms by calling functions of Matplotlib.
```{python, eval=FALSE}
from rasterio.plot import show_hist
# Figure with three subplots, unpack directly
fig, (axdsm, axdtm, axchm) = plt.subplots(1, 3, figsize=(15, 7), dpi=200)
# Populate the three subplots with histograms
show_hist(dsm_data, ax=axdsm, bins=100, lw=0.0, stacked=False, alpha=0.3, title="Histogram DSM")
show_hist(dtm_data, ax=axdtm, bins=100, lw=0.0, stacked=False, alpha=0.3, title="Histogram filled DTM")
show_hist(chm, ax=axchm, bins=100, lw=0.0, stacked=False, alpha=0.3, title="Histogram CHM")
# Build legends and show
axdsm.legend(['DSM'])
axdtm.legend(['Filled DTM'])
axchm.legend(['CHM'])
plt.show()
```
<img src="./images/DSMDTMCHMHistogram.png" alt="Histograms of Canopy Height, Digital Surface and Terrain Model of WUR Campus" width = "100%">
```{block, type="alert alert-success"}
> **Question 5**: What is represented on the x and y axis? The default axis labels are DN (x) and Frequency (y); if you were to change them, what labels would you pick to better reflect the content of the plots?
<details>
<summary>*Click for answer*</summary>
The y axis represents the count of pixels. Meanwhile the x axis represents the pixel's DN (digital value), in this tutorial since we are looking at elevation this value is actually meters. For example, in the first plot (DSM) you can see that most pixel values are in the 10 to 15 meter range
</details>
```
# More info
* [Tutorial working with rasters in Python with Rasterio](https://geohackweek.github.io/raster/04-workingwithrasters/)
* [Tutorial working with raster in Python with GDAL (for Python 2)](https://pcjericks.github.io/py-gdalogr-cookbook/raster_layers.html)
* [Landsat satellite images](https://earthexplorer.usgs.gov/)
* [Resampled landsat satellite images](http://espa.cr.usgs.gov/index/)
* [Sentinel satellite images](https://scihub.copernicus.eu/dhus/#/home)