forked from geocompx/geocompr
-
Notifications
You must be signed in to change notification settings - Fork 0
/
05-transform.Rmd
864 lines (691 loc) · 39.8 KB
/
05-transform.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
# Geometric operations {#transform}
## Prerequisites {-}
- This chapter requires the following packages:^[**lwgeom** is also needed for a couple of more advanced transformations.]
```{r, message=FALSE}
library(sf)
library(raster)
library(tidyverse)
```
- It also relies on **spData** and **spDataLarge**, which load `cycle_hire_osm` dataset and provide external files:
```{r, message=FALSE}
library(spData)
library(spDataLarge)
```
The previous three chapters have demonstrated how geographic datasets are structured in R (Chapter \@ref(spatial-class)) and how to manipulate them based on their non-geographic attributes (\@ref(attr)) and spatial properties (\@ref(spatial-operations)).
This chapter goes a step further, by showing how to modify the *geometry* underlying spatial datasets.
Section \@ref(geo-vec) covers transforming vector geometries.
This includes simplifying, buffering, clipping and even shifting/scaling/rotating geometries using 'affine transformations'.
Geometry unions, which underlie spatial data aggregation, are covered in section \@ref(geometry-unions).
Advanced transformations of vector geometries include type transformations (e.g. from few multipolygons to many polygons) and 'rasterization', which sets-up the next section.
Section \@ref(geo-ras) covers geometric transformations on raster objects.
This involves changing the size and number of the underlying pixels, and assigning them new values.
It teaches how to change the resolution (also called raster aggregation and disaggregation), the extent and the origin of a raster.
These operations are especially useful if one would like to align raster datasets from diverse sources (see section \@ref(raster-alignment)).
Only making sure that these rasters share the same header information allows the jointly usage of map algebra operations (see section \@ref(map-algebra)).
A vital type of geometry transformation is *reprojecting* from one coordinate reference system (CRS) to another.
Because of the importance of reprojection, introduced in Chapter \@ref(spatial-class) (see figure \@ref(fig:vectorplots) and section \@ref(crs-intro)), and the fact that it applies to raster and vector geometries alike, it is the topic of the first section in this chapter.
## Reprojecting geographic data
Section \@ref(crs-intro) demonstrated the importance of understanding CRSs for geocomputation.
Many spatial operations assume that you are using a *projected* CRS (on a Euclidean grid with units of meters rather than a geographic 'lat/lon' grid with units of degrees).
The GEOS engine underlying most spatial operations in **sf**, for example, assumes your data is in a projected CRS.
For this reason **sf** contains a function for checking if geometries have a geographic or projected CRS.
This is illustrated below using the example of *London*:
```{r}
london = st_sf(geometry = st_sfc(st_point(c(-0.1, 51.5))))
st_is_longlat(london)
```
The results show that when geographic data is created from scratch, or is loaded from a source that has no CRS metadata, the CRS is unspecified by default.
The CRS can be set with `st_set_crs()`:^[The CRS can also be added when creating `sf` objects with the `crs` argument (e.g. `st_sf(geometry = st_sfc(st_point(c(-0.1, 51.5))), crs = 4326)`).
The same argument can also be used to set the CRS when creating raster datasets (e.g. `raster(crs = "+proj=longlat")`).]
```{r}
london = st_set_crs(london, 4326)
st_is_longlat(london)
```
Spatial operations on objects without a CRS run on the implicit assumption that they are projected, even when in reality they are not.
This can lead to problems, as illustrated by the following code chunk, which creates a buffer of one degree around `london`:
```{r}
london_buff = st_buffer(london, dist = 1)
```
Note the warning that informs us that the result has limited use because distances in geographic CRSs are in degrees, rather than meters or some other suitable measure of distance.
The consequences of a failure to work on projected data are illustrated in Figure \@ref(fig:crs-buf):
note how the buffer is elongated in the north-south direction.
This because lines of longitude are converge towards the Earth's poles making them closer together (lines of latitude by contrast have constant distance from each other).
```{r crs-buf, fig.cap="Buffer on data with geographic CRS.", fig.asp=1}
plot(london_buff, graticule = st_crs(4326), axes = TRUE)
plot(london, add = TRUE)
```
Do not interpret the warning about the geographic (`longitude/latitude`) CRS as "the CRS should not be set": it almost always should be!
It is better understood as a suggestion to *reproject* the data onto a projected CRS.
This suggestion does not always need to be heeded: performing spatial and geometric operations makes little or no difference some cases (e.g. spatial subsetting).
But for operations involving distances such as buffering, the only way to ensure a good result is to create a projected copy of the data and run the operation on that.
This is done in the command below, using the **sf** function `st_transform()`:
```{r}
london_proj = st_transform(london, crs = 27700)
```
The result is a new object that is identical to `london`, but reprojected onto a suitable CRS (the British National Grid, which has an EPSG code of 27700 in this case) that has units of meters.
We can verify that the CRS has changed using `st_crs()` as follows (only the most important components of the output are shown, other details have been replaced by `...`):
```{r, eval=FALSE}
st_crs(london_proj)
#> Coordinate Reference System:
#> EPSG: 27700
#> proj4string: "+proj=tmerc +lat_0=49 +lon_0=-2 ... +units=m +no_defs"
```
Notable components of this CRS description include the EPSG code (`EPSG: 27700`), the origin (`+lat_0=49 +lon_0=-2`) and units (`+units=m`).
The fact that the units of the CRS are meters (rather than degrees) tells us that geometry operations on on `london_proj` will make sense (and not emit a warning).
This can be verified by checking that it is no longer in a geographic CRS, that `st_is_longlat(london_proj)` returns `FALSE`.
Now we can repeat the buffer operation with a meaningful measure of distance.
Moving 1 degree at the equator means moving more than 100 km (111,320 meters), the new buffer distance:
```{r}
london_proj_buff = st_buffer(london_proj, 111320)
```
The result in Figure \@ref(fig:crs-buf-proj) shows that buffers based on a projected CRS are not distorted:
it is the same distance from London to every part of the buffer's border.
```{r crs-buf-proj, fig.cap="Buffer on data with projected CRS.", fig.asp=1}
plot(london_proj_buff, graticule = st_crs(27700), axes = TRUE)
plot(london_proj, add = TRUE)
```
The importance of understanding an object's CRS (primarily whether it's projected or geographic), and reprojecting when appropriate, is illustrated above using a simple geographic vector object representing London.
The subsequent two sections go into more depth on reprojections, exploring which CRS to use and providing details about the process of reprojecting vector and raster objects.
### Which CRS to use?
While CRSs can be set manually --- as illustrated in the previous section with `st_set_crs(london, 4326)` --- it is more common in real world applications for CRSs to be set automatically when data is read-in.
The main task will be to *transform* objects provided in one CRS into another.
But when should data be transformed? And into which CRS?
Although there are no clear-cut answers to these questions, the contents of this section should help you decide on answers, on a case-by-case basis.
The question of **when to transform** is simpler to answer.
Transformation from geographic to projected CRSs is vital in some cases:
when distance measurements or area calculations are needed, for example.
Another case when transformation is vital is when exploring the spatial relationships between two objects with different CRSs.
As illustrated in the below code chunk, which attempts to find the distance between the projected and unprojected versions of the `london` objects, this results in an error telling you that either dataset must be transformed:
```{r, eval=FALSE}
st_distance(london, london_proj)
# > Error: st_crs(x) == st_crs(y) is not TRUE
```
### Reprojecting vector geometries
Vector data on the most basic level is represented by individual points, and points create more complex objects, such as lines and polygons.
Spatial reprojection of vectors is a mathematical transformation of coordinates of these point.
Depending on projections used, reprojection could be either lossy or lossless.
For example, loss of spatial information could occur when the new CRS is only adequate for smaller area than input vector.
The precision could be also lost when transformation is between coordinate systems that have different datum - in those situations approximations are used.
However, in most cases CRS vector transformation is lossless.
The dataset `cycle_hire_osm` represents all cycle hire locations across London, taken from OpenStreetMap (OSM).
It is automatically loaded by the **spData** package, meaning we do not have to load it, and its CRS can be queried as follows:
```{r}
st_crs(cycle_hire_osm)
```
CRS in R can be described as an `epsg` code or a `proj4string` definition, as described in section \@ref(crs-in-r).
Let's create a new version of `cycle_hire_osm` in a projected CRS, using the `epsg` number of 27700:
```{r}
cycle_hire_osm_projected = st_transform(cycle_hire_osm, 27700)
st_crs(cycle_hire_osm_projected)
```
Note that the result shows that the `epsg` has been updated and that `proj4string` element of the CRS now contains, among other things `+proj=tmerc` (meaning it is a projected CRS using the [tranverse Mercator](https://en.wikipedia.org/wiki/Transverse_Mercator_projection) projection) and `+units=m` (meaning the units of the coordinates are meters).
Another function, from the **rgdal** library, provides a note containing the name of the CRS:
```{r}
crs_codes = rgdal::make_EPSG()[1:2]
dplyr::filter(crs_codes, code == 27700)
```
The result shows that the EPSG code 27700 represents the British National Grid, a result that could have been found by searching online for "[CRS 27700](https://www.google.com/search?q=CRS+27700)".
The formula that converts a geographic point into a point on the surface of the Earth is provided by the `proj4string` element of the `crs` (see [proj4.org](http://proj4.org/) for further details):
```{r}
st_crs(27700)$proj4string
```
```{block2 type='rmdnote'}
The EPSG code can be found inside the `crs` attribute of the object's geometry.
It is hidden from view for most of the time except when the object is printed but can be can identified and set using the `st_crs` function, for example `st_crs(cycle_hire_osm)$epsg`.
```
Existing CRS are well suited for most purposes.
<!-- examples -->
In the same time, `proj4string` definitions are highly modifiable and allow for CRS customization.
<!-- as we mentioned in section \@ref(crs-in-r). -->
We can present that using selected world projections.
The Mollweide projection is recommended when it is important to preserve areas [@jenny_guide_2017] (Figure \@ref(fig:mollproj)).
To use this projection, we need to specify it using the `proj4string` element, `"+proj=moll"`, in the `st_transform` function:
```{r}
world_mollweide = st_transform(world, crs = "+proj=moll")
```
<!-- plot(world_mollweide$geom) -->
<!-- plot(world_mollweide$geom, graticule = TRUE) -->
```{r mollproj, echo=FALSE, fig.cap="Mollweide projection of the world", warning=FALSE}
par_old = par()
par(mar = c(0, 0, 1, 0))
plot(world_mollweide$geom, graticule = TRUE, main = "the Mollweide projection")
par(par_old)
```
On the other hand, the goal for many visualization purposes is to have a map with minimized area, direction, and distance distortions.
One of the most popular projection to achieve that is Winkel tripel (Figure \@ref(fig:wintriproj)).^[This projection is used, among others, by the National Geographic Society.]
`st_transform_proj()` from the **lwgeom** package allows for coordinates transformations to a wider range of CRSs, inluding the Winkel tripel projection:
```{r}
world_wintri = lwgeom::st_transform_proj(world, crs = "+proj=wintri")
```
<!-- plot(world_wintri$geom) -->
<!-- plot(world_wintri$geom, graticule = TRUE) -->
```{r wintriproj, echo=FALSE, fig.cap="Winkel tripel projection of the world", warning=FALSE}
world_wintri_gr = st_graticule(lat = c(-89.9, seq(-80, 80, 20), 89.9)) %>%
lwgeom::st_transform_proj(crs = "+proj=wintri")
par_old = par()
par(mar = c(0, 0, 1, 0))
plot(world_wintri_gr$geometry, main = "the Winkel tripel projection", col = "grey")
plot(world_wintri$geom, add = TRUE)
par(par_old)
```
```{block2 type='rmdnote'}
Two main functions for transformation of simple features coordinates are `sf::st_transform()` and `sf::sf_project()`.
The `st_transform` function uses the GDAL interface to PROJ.4, while `sf_project()` (which works with two-column numeric matrices, representing points) and `lwgeom::st_transform_proj()` use the PROJ.4 API directly.
The first one is appropriate in most situations, and provides a set of the most often used parameters and well defined transformations.
The second one allows for a greater customization of a projection, which includes cases when some of the PROJ.4 parameters (e.g. `+over`) or projection (`+proj=wintri`) is not available in `st_transform()`.
```
```{r, eval=FALSE, echo=FALSE}
# demo of sf_project
mat_lonlat = as.matrix(data.frame(x = 0:20, y = 50:70))
plot(mat_lonlat)
mat_projected = sf_project(from = st_crs(4326)$proj4string, to = st_crs(27700)$proj4string, pts = mat_lonlat)
plot(mat_projected)
```
Moreover, PROJ.4 parameters can be modified in most CRS definitions.
The below code transforms the coordinates to the Lambert azimuthal equal-area projection centered on longitude and latitude of `0` (Figure \@ref(fig:laeaproj1)).
```{r}
world_laea1 = st_transform(world, crs = "+proj=laea +x_0=0 +y_0=0 +lon_0=0 +lat_0=0")
```
<!-- plot(world_laea1$geom) -->
<!-- plot(world_laea1$geom, graticule = TRUE) -->
```{r laeaproj1, echo=FALSE, fig.cap="Lambert azimuthal equal-area projection of the world centered on longitude and latitude of 0", warning=FALSE}
par_old = par()
par(mar = c(0, 0, 1, 0))
plot(world_laea1$geom, graticule = TRUE, main = "the Lambert azimuthal equal-area projection")
par(par_old)
```
We can change the PROJ.4 parameters, for example the center of the projection using the `+lon_0` and `+lat_0` parameters.
The code below gives the map centered on New York City (Figure \@ref(fig:laeaproj2)).
```{r}
world_laea2 = st_transform(world, crs = "+proj=laea +x_0=0 +y_0=0 +lon_0=-74 +lat_0=40")
```
<!-- plot(world_laea2$geom) -->
<!-- plot(world_laea2$geom, graticule = TRUE) -->
```{r laeaproj2, echo=FALSE, fig.cap="Lambert azimuthal equal-area projection of the world centered on New York City", warning=FALSE}
par_old = par()
par(mar = c(0, 0, 1, 0))
plot(world_laea2$geom, graticule = TRUE, main = "the Lambert azimuthal equal-area projection")
par(par_old)
```
More information about CRS modification can be found in the [Using PROJ.4](http://proj4.org/usage/index.html) documentation.
<!-- https://github.com/r-spatial/lwgeom/issues/6 -->
<!-- ```{r} -->
<!-- # devtools::install_github("r-spatial/lwgeom") -->
<!-- library(lwgeom) -->
<!-- world_3 = lwgeom::st_transform_proj(world, crs = "+proj=wintri") -->
<!-- plot(world_3$geom) -->
<!-- ``` -->
<!-- http://bl.ocks.org/vlandham/raw/9216751/ -->
### Reprojecting raster geometries
The projection concepts described in the previous section apply equally to rasters.
However, there are important differences in reprojection of vectors and rasters:
transforming a vector object involves changing the coordinates of every vertex but this do not apply to raster data.
Rasters are are composed of rectangular cells of the same size (expressed by map units, such as degrees or meters), so it is impossible to transform coordinates of pixels separately.
Raster reprojection involves creating a new raster object, often with a different number of columns and rows than the original.
The attributes must subsequently be re-estimated, allowing the new pixels to be 'filled' with appropriate values.
This two-stage process is done with `projectRaster()` from the **raster** package.
Like the `st_transform()` function demonstrated in the previous section, `projectRaster()` takes a geographic object (a raster dataset in this case) and a `crs` argument.
However, `projectRaster()` only accepts the lengthy `proj4string` definitions of a CRS rather than concise EPSG codes.
```{block2 type='rmdnote'}
It is possible to use a EPSG code in a `proj4string` definition with `"+init=epsg:MY_NUMBER"`.
For example, one can use the `"+init=epsg:4326"` definition to set CRS to WGS84 (EPSG code of 4326).
The PROJ.4 library automatically adds the rest of parameters and converts it into `"+init=epsg:4326 +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0"`,
```
Let's take a look at two examples of raster transformation - using categorical and continuous data.
Land cover data are usually represented by categorical maps.
The `nlcd2011.tif` file provides information for a small area in Utah, USA obtained from [National Land Cover Database 2011](https://www.mrlc.gov/nlcd2011.php) in the NAD83 / UTM zone 12N CRS.
```{r}
cat_raster = raster(system.file("raster/nlcd2011.tif", package = "spDataLarge"))
cat_raster
```
In this region, 14 land cover classes were distinguished^[Full list of NLCD2011 land cover classes can be found at https://www.mrlc.gov/nlcd11_leg.php]:
```{r}
unique(cat_raster)
```
When reprojecting categorical raster, we need to ensure that our new estimated values would still have values of our original classes.
This could be done using the nearest neighbor method (`ngb`).
In this method, value of the output cell is calculated based on the nearest cell center of the input raster.
For example, we want to change the CRS to WGS 84.
It can be desired when we want to visualize a raster data on top of a web basemaps, such as the Google or OpenStreetMap map tiles.
The first step is to obtain the proj4 definition of this CRS, which can be done using the [http://spatialreference.org](http://spatialreference.org/ref/epsg/wgs-84/) webpage.
The second and last step is to define the reprojection method in the `projectRaster()` function, which in case of categorical data is the nearest neighbor method (`ngb`):
```{r}
wgs84 = "+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs"
cat_raster_wgs84 = projectRaster(cat_raster, crs = wgs84, method = "ngb")
cat_raster_wgs84
```
Many properties of the new object differs from the previous one, which include the number of columns and rows (and therefore number of cells), resolution (transformed from meters into degrees), and extent.
In the same time, it keeps the same land cover classes - `unique(cat_raster_wgs84)`.
<!-- freq(cat_raster_wgs84) -->
<!-- freq(cat_raster) -->
This process of reprojection is almost identical for continuous data.
The `srtm.tif` file contains digital elevation model for the same area in Utah from [the Shuttle Radar Topography Mission (SRTM)](https://www2.jpl.nasa.gov/srtm/).
Each value in this raster represents elevation measured in meters.
```{r}
con_raster = raster(system.file("raster/srtm.tif", package = "spDataLarge"))
con_raster
```
The nearest neighbor method should not be used for continuous raster data, as we want to preserve gradual changes in values.
Alternatively, continuous data could be reprojected in the **raster** package using the bi-linear method.
In this technique, value of the output cell is calculated based on four nearest cells in the original raster.
The new value is a weighted average of values from these four cells, adjusted for their distance from the center of the output cell.
This dataset has geographic CRS and we want to transform it into projected CRS.
```{block2 type='rmdnote'}
All the grid cells in equal-area projections have the same size.
Therefore, these projections are recommended when performing many raster operations, such as distance calculations.
```
In the fist step we need to obtain the proj4 definition of the existing projected CRS appropriate for this area or create a new one using the [Projection Wizard](http://projectionwizard.org/) online tool [@savric_projection_2016].
For this example, we used the Oblique Lambert azimuthal equal-area projection.
The second step is to define the `bilinear` reprojection method:
```{r}
equalarea = "+proj=laea +lat_0=37.32 +lon_0=-113.04"
con_raster_ea = projectRaster(con_raster, crs = equalarea, method = "bilinear")
con_raster_ea
```
Reprojection of continuous rasters also changes spatial properties, such as the number of cells, resolution, and extent.
Moreover, it slightly modifies values in the new raster, which can be seen by comparing the outputs of the `summary()` function between `con_raster` and `con_raster_ea`.
```{r, eval=FALSE}
summary(con_raster)
summary(con_raster_ea)
```
<!-- why new na? -->
<!-- res option in projectRaster? -->
<!-- note1: in most of the cases reproject vector, not raster-->
<!-- note2: equal area projections are the best for raster calculations -->
<!-- q: should we mentioned gdal_transform? -->
## Geometric operations on vector data {#geo-vec}
This section is about operations that in some way change the geometry of vector (`sf`) objects.
It is more advanced than the spatial data operations presented in the previous Chapter (in section \@ref(spatial-operations-on-vector-data)) because here we drill down into the geometry:
the functions discussed in this section work on objects of class `sfc` (simple feature geometry collections) in addition to objects of class `sf`.
### Simplification
<!-- \@ref(fig:seine-simp) -->
```{r}
seine_simp = st_simplify(seine, dTolerance = 2000)
```
```{r}
object.size(seine)
object.size(seine_simp)
```
```{r seine-simp, echo=FALSE, fig.cap="Comparision of original data of the contiguous United States and two simplified versions using `st_simplify` and `ms_simplify`.", warning=FALSE}
par_old = par()
par(mfrow = c(1, 2), mar = c(0, 0, 1, 0))
plot(seine$geometry, col = 8, main = "Original data")
plot(seine_simp$geometry, col = 8, main = "st_simplify")
par(par_old)
```
<!-- \@ref(fig:us-simp) -->
<!-- - simplifications -->
<!-- st_simplify -->
```{r}
us_states_simp1 = st_simplify(us_states, dTolerance = 1)
```
<!-- line example -->
<!-- maybe river or road network to spData?? -->
<!-- rmapshaper -->
<!-- polygon example -->
```{r}
# proportion of points to retain (0-1; default 0.05)
us_states_simp2 = rmapshaper::ms_simplify(us_states, keep = 0.01, keep_shapes = TRUE)
```
```{r us-simp, echo=FALSE, fig.cap="Comparision of original data of the contiguous United States and two simplified versions using `st_simplify` and `ms_simplify`.", warning=FALSE}
par_old = par()
par(mfrow = c(1, 3), mar = c(0, 0, 1, 0))
plot(us_states$geometry, col = 8, main = "Original data")
plot(us_states_simp1$geometry, col = 8, main = "st_simplify")
plot(us_states_simp2$geometry, col = 8, main = "ms_simplify")
par(par_old)
```
### Geometry unions
Spatial aggregation can also be done in the **tidyverse**, using **dplyr** functions as follows:
```{r, eval = FALSE}
group_by(us_states, REGION) %>%
summarize(sum(pop = total_pop_15, na.rm = TRUE))
```
For attribute data aggregation the grouping variable is another variable, typically one with few unique values relative to the number of rows (see section \@ref(vector-attribute-aggregation)).
What we did not cover in that section was that attribute data aggregation dissolves the geometries of touching polygons.
The `REGION` variable in the `us_states` dataset is a good example:
there are 49 states (excluding Hawaii and Alaska) which can be aggregated into four regions.
This is demonstrated in the code chunk below, the results of which are illustrated in Figure \@ref(fig:us-regions):
```{r}
regions = aggregate(x = us_states[, "total_pop_15"], by = list(us_states$REGION),
FUN = sum, na.rm = TRUE)
```
<!--
show also tidyverse way, so what you are doing is basically a spatial join and a subsequent aggregation without a grouping variable. Didactically, it might be better to present a grouping variable.
-->
```{r, echo=FALSE}
# st_join(buff, africa[, "pop"]) %>%
# summarize(pop = sum(pop, na.rm = TRUE))
# summarize(africa[buff, "pop"], pop = sum(pop, na.rm = TRUE))
```
```{r us-regions, fig.cap="Spatial aggregation on contiguous polygons, illustrated by aggregating the population of US states into regions, with population represented by color. Note the operation automatically dissolves boundaries between states.", echo=FALSE, warning=FALSE, fig.asp=0.2, out.width="100%"}
# # with base R ----
# old_par = par()
# breaks = c(1e5, 5e6, 1e7, 7e7, 2e8)
# rdf = st_set_geometry(regions, NULL)
# us_states$region_pop = inner_join(dplyr::select(us_states, REGION),
# dplyr::select(rdf, Group.1, total_pop_15),
# by = c("REGION" = "Group.1")) %>%
# pull(total_pop_15)
#
# par(mfrow = c(1, 2))
# plot(us_states[, "total_pop_15"], main = "US states", breaks = breaks, key.pos = NULL)
# plot(regions[, "total_pop_15"], main = "US regions", breaks = breaks, key.pos = NULL)
# par(old_par)
# with tmap ----
library(tmap)
us_states_facet = dplyr::select(us_states, REGION, total_pop_15) %>%
mutate(Level = "State")
regions_facet = rename(regions, REGION = Group.1) %>%
mutate(Level = "Region")
us_facet = rbind(us_states_facet, regions_facet) %>%
mutate(Level = factor(Level, levels = c("State", "Region")))
qtm(us_facet, "total_pop_15") +
tm_facets(by = "Level", nrow = 1, drop.units = TRUE)
```
The equivalent result can be achieved using **tidyverse** functions as follows (result not shown):
```{r, eval=FALSE}
regions2 = us_states %>%
group_by(REGION) %>%
summarize(sum(pop = total_pop_15, na.rm = TRUE))
```
### Clipping
Spatial clipping is a form of spatial subsetting that involves changes to the `geometry` columns of at least some of the affected features.
Clipping can only apply to features more complex than points:
lines, polygons and their 'multi' equivalents.
To illustrate the concept we will start with a simple example:
two overlapping circles with a center point one unit away from each other and radius of one:
```{r points, fig.cap="Overlapping circles."}
b = st_sfc(st_point(c(0, 1)), st_point(c(1, 1))) # create 2 points
b = st_buffer(b, dist = 1) # convert points to circles
l = c("x", "y")
plot(b)
text(x = c(-0.5, 1.5), y = 1, labels = l) # add text
```
Imagine you want to select not one circle or the other, but the space covered by both `x` *and* `y`.
This can be done using the function `st_intersection()`, illustrated using objects named `x` and `y` which represent the left and right-hand circles:
```{r}
x = b[1]
y = b[2]
x_and_y = st_intersection(x, y)
plot(b)
plot(x_and_y, col = "lightgrey", add = TRUE) # color intersecting area
```
The subsequent code chunk demonstrate how this works for all combinations of the 'Venn' diagram representing `x` and `y`, inspired by [Figure 5.1](http://r4ds.had.co.nz/transform.html#logical-operators) of the book R for Data Science [@grolemund_r_2016].
<!-- Todo: reference r4ds -->
```{r venn-clip, echo=FALSE, fig.cap="Spatial equivalents of logical operators.", warning=FALSE}
old_par = par()
par(mfrow = c(3, 3), mai = c(0.1, 0.1, 0.1, 0.1))
plot(b)
y_not_x = st_difference(y, x)
plot(y_not_x, col = "grey", add = TRUE)
text(x = 0.5, y = 1, "st_difference(y, x)")
plot(b)
plot(x, add = TRUE, col = "grey")
text(x = 0.5, y = 1, "x")
plot(b, add = TRUE)
x_or_y = st_union(x, y)
plot(x_or_y, col = "grey")
text(x = 0.5, y = 1, "st_union(x, y)")
x_and_y = st_intersection(x, y)
plot(b)
plot(x_and_y, col = "grey", add = TRUE)
text(x = 0.5, y = 1, "st_intersection(x, y)")
# x_xor_y = st_difference(x_xor_y, x_and_y) # failing
x_not_y = st_difference(x, y)
x_xor_y = st_sym_difference(x, y)
plot(x_xor_y, col = "grey")
text(x = 0.5, y = 1, "st_sym_difference(x, y)")
plot.new()
plot(b)
plot(x_not_y, col = "grey", add = TRUE)
text(x = 0.5, y = 1, "st_difference(x, y)")
plot(b)
plot(y, col = "grey", add = TRUE)
plot(b, add = TRUE)
text(x = 0.5, y = 1, "y")
par(old_par)
```
To illustrate the relationship between subsetting and clipping spatial data, we will subset points that cover the bounding box of the circles `x` and `y` in Figure \@ref(fig:venn-clip).
Some points will be inside just one circle, some will be inside both and some will be inside neither.
There are two different ways to subset points that fit into combinations of the circles: via clipping and logical operators.
But first we must generate some points.
We will use the *simple random* sampling strategy to sample from a box representing the extent of `x` and `y`.
To generate this points will use a function not yet covered in this book, `st_sample()`.
Next we will generate the situation plotted in Figure \@ref(fig:venn-subset):
```{r venn-subset, fig.cap="Randomly distributed points within the bounding box enclosing circles x and y."}
bb = st_bbox(st_union(x, y))
pmat = matrix(c(bb[c(1, 2, 3, 2, 3, 4, 1, 4, 1, 2)]), ncol = 2, byrow = TRUE)
box = st_polygon(list(pmat))
set.seed(2017)
p = st_sample(x = box, size = 10)
plot(box)
plot(x, add = TRUE)
plot(y, add = TRUE)
plot(p, add = TRUE)
text(x = c(-0.5, 1.5), y = 1, labels = l)
```
```{r, echo=FALSE}
# An alternative way to sample from the bb
bb = st_bbox(st_union(x, y))
pmulti = st_multipoint(pmat)
box = st_convex_hull(pmulti)
```
### Centroids
<!-- centroids intro -->
There are two main functions that create single point representations of more complex vector objects - `st_centroid()` and `st_point_on_surface()`.
<!-- The first one is `st_centroid()` -->
<!-- st_centroid -->
```{r}
nz_centroid = st_centroid(nz)
```
<!-- st_point_on_surface -->
<!-- The second one -->
<!-- add lines example -->
<!-- note about computation time -->
```{r}
nz_pos = st_point_on_surface(nz)
```
```{r}
seine_centroid = st_centroid(seine)
```
```{r}
seine_pos = st_point_on_surface(seine)
```
```{r, warning=FALSE, echo=FALSE}
par_old = par()
par(mfrow = c(1, 2), mar = c(0, 0, 1, 0))
plot(nz$geometry)
plot(nz_centroid$geometry, add = TRUE)
plot(nz_pos$geometry, add = TRUE, col = "red")
plot(seine$geometry)
plot(seine_centroid$geometry, add = TRUE)
plot(seine_pos$geometry, add = TRUE, col = "red")
par(par_old)
```
### Buffers
### Affine transformations
### Type transformation
Geometry casting is powerful operation which enable transformation of the geometry type.
It is implemented in the `st_cast` function from the `sf` package.
Importantly, `st_cast` behaves differently on single simple feature geometry (`sfg`) objects, and simple feature geometry column (`sfc`) and simple features objects.
Let's create a multipoint to illustrate how geometry casting works on simple feature geometry (`sfg`) objects:
```{r}
multipoint = st_multipoint(matrix(c(1, 3, 5, 1, 3, 1), ncol = 2))
```
In this case, `st_cast` can be useful to transform the new object into linestring or polygon (Figure \@ref(fig:single-cast)):
<!-- a/ points -> lines -> polygons -->
```{r}
linestring = st_cast(multipoint, "LINESTRING")
polyg = st_cast(multipoint, "POLYGON")
```
```{r single-cast, echo = FALSE, fig.cap="Examples of linestring and polygon created based on multipoint using the `st_cast` function", warning=FALSE}
par_old = par()
par(mfrow = c(1, 3), mar = c(0, 0, 1, 0))
plot(multipoint, main = "MULTIPOINT")
plot(linestring, main = "LINESTRING")
plot(polyg, col = "grey", main = "POLYGON")
par(par_old)
```
This process can be also reversed using `st_cast`:
```{r}
multipoint_2 = st_cast(linestring, "MULTIPOINT")
multipoint_3 = st_cast(polyg, "MULTIPOINT")
all.equal(multipoint, multipoint_2, multipoint_3)
```
```{block2 type='rmdnote'}
For single simple feature geometries (`sfg`), `st_cast` provides also geometry casting from non-multi to multi types (e.g. `POINT` to `MULTIPOINT`) and from multi types to non-multi types.
However, only the first element of the old object would remain in the second group of cases.
<!-- note: beware of information lost (you will get a warning) -->
```
```{r, include=FALSE}
cast_all <- function(xg) {
lapply(c("MULTIPOLYGON", "MULTILINESTRING", "MULTIPOINT", "POLYGON", "LINESTRING", "POINT"),
function(x) st_cast(xg, x))
}
t = cast_all(multipoint)
t2 = cast_all(polyg)
```
Geometry casting of simple feature geometry column (`sfc`) and simple features objects works the same as for single geometries in most of the cases.
One imporant difference is conversion between multi to non-multi types.
As a result of this process, multi-objects are split into many non-multi objects.
We would use a new object, `multilinestring_sf`, as an example (on the left in Figure \@ref(fig:line-cast)):
```{r}
multilinestring_list = list(matrix(c(1, 4, 5, 3), ncol = 2),
matrix(c(4, 4, 4, 1), ncol = 2),
matrix(c(2, 4, 2, 2), ncol = 2))
multilinestring = st_multilinestring((multilinestring_list))
multilinestring_sf = st_sf(geom = st_sfc(multilinestring))
multilinestring_sf
```
You can imagine it as a road or river network.
The new object has only one row that define all the lines.
This restrict number of operation that could be done, for example it prevent adding names to each line segment or calculating lengths of single lines.
The `st_cast` function can be used in this situation, as it separates one mutlilinestring into three linestrings:
```{r}
linestring_sf2 = st_cast(multilinestring_sf, "LINESTRING")
linestring_sf2
```
```{r line-cast, echo=FALSE, fig.cap="Examples of type casting between MULTILINESTRING (left) and LINESTRING (right).", warning=FALSE}
par_old = par()
par(mfrow = c(1, 2), mar = c(0, 0, 1, 0))
plot(multilinestring_sf$geom, col = "black", main = "MULTILINESTRING")
plot(linestring_sf2$geom, col = c("red", "green", "blue"), main = "LINESTRING")
par(par_old)
```
The newely created object allows for attributes creation (see more in section \@ref(vec-attr-creation)) and length measurement:
```{r}
linestring_sf2$name = c("Riddle Rd", "Marshall Ave", "Foulke St")
linestring_sf2$length = st_length(linestring_sf2)
linestring_sf2
```
<!-- ### Class conversion -->
<!-- placeholder for: -->
<!-- sf -> sp -->
<!-- sp -> sf -->
<!-- stars; https://github.com/r-spatial/stars/blob/master/vignettes/blog1.Rmd -->
### Rasterization
<!-- - vector to raster -->
## Geometric operations on raster data {#geo-ras}
### Raster alignment
```{r, echo=FALSE}
source("code/create-rasters.R")
```
When merging or performing map algebra on rasters, their resolution, projection, origin and/or extent has to match.
Otherwise, how should we add the values of one raster with a resolution of 0.2 decimal degrees to a second with a resolution of 1 decimal degree?
The same problem arises when we would like to merge satellite imagery from different sensors with different projections and resolutions.
We can deal with such mismatches by aligning the rasters.
This section uses the `elev` object from \@ref(manipulating-raster-objects).
The `projectRaster()` function reprojects one raster to a desired projection, say from UTM to WGS84.
Equally, map algebra operations require the same extent.
Following code adds one row and two columns to each side of the raster while setting all new values to an elevation of 1000 meters (\@ref(fig:extend-example)).
```{r extend-example, fig.cap = "Original raster extended by 1 one row on each side (top, bottom) and two columns on each side (right, left)."}
elev_2 = extend(elev, c(1, 2), value = 1000)
plot(elev_2)
```
Performing an algebraic operation on two objects with differing extents in R, the **raster** package returns the result for the intersection, and says so in a warning.
```{r}
elev_3 = elev + elev_2
```
However, we can also align the extent of two rasters with the `extend()` command.
Here, we extend the `elev` object to the extend of `elev_2`.
The newly added rows and column receive the default value of the `value` parameter, i.e., `NA`.
```{r}
elev_4 = extend(elev, elev_2)
```
The `aggregate()` and `disaggregate()` functions help to change the cell size resolution of a raster.
For instance, let us aggregate `elev` from a resolution of 0.5 to a resolution of 1, that means we aggregate by a factor of 2 (Fig. \@ref(fig:aggregate-example)).
Additionally, the output cell value should correspond to the mean of the input cells (but one could use other functions as well such as `median()`, `sum()`, etc.):
```{r aggregate-example, fig.cap = "Original raster (left). Aggregated raster (right)."}
elev_agg = aggregate(elev, fact = 2, fun = mean)
par(mfrow = c(1, 2))
plot(elev)
plot(elev_agg)
```
Note that the origin of `elev_agg` has changed, too.
```{r}
origin(elev)
origin(elev_agg)
```
The origin is the point closest to (0, 0) if you moved towards it in steps of x and y resolution.
If two rasters have different origins, their cells do not overlap completely which would make map algebra impossible.
To change the origin , use `origin()`.^[If the origins of two raster datasets are just marginally apart, it sometimes is sufficient to simply increase the `tolerance` argument of `raster::rasterOptions()`.]
Looking at figure \@ref(fig:origin-example) reveals the effect of changing the origin.
```{r origin-example, fig.cap = "Plotting rasters with the same values but different origins."}
# plot the aggregated raster
plot(elev_agg)
# change the origin
origin(elev_agg) = c(0, 0)
# plot it again
plot(elev_agg, add = TRUE)
```
The `resample()` command lets you align several raster properties in one go, namely origin, extent and resolution.
Let us resample an extended `elev_agg` to the properties of `elev` again.
```{r}
# add 2 rows and columns, i.e. change the extent
elev_agg = extend(elev_agg, 2)
elev_disagg = resample(elev_agg, elev)
```
Though our disaggregated `elev_disagg` retrieved back its original resolution, cell size and extent, its values differ.
However, this is to be expected, disaggregating **cannot** predict values at a finer resolution, it simply uses an interpolation algorithm.
It is important to keep in mind that disaggregating results in a finer resolution, the corresponding values, however, are only as accurate as their lower resolution source.
Finally, if you want to align many (possibly hundreds or thousands of) images stored on disk, you might want to checkout the `gdalUtils::align_rasters()` function.
Nevertheless, you may also use **raster** with very large datasets.
This is because **raster**:
1. lets you work with raster datasets that are too large to fit into the main memory (RAM) by only processing chunks of it.
2. tries to facilitate parallel processing.
For more information have a look at the help pages of `beginCluster()` and `clusteR()`.
Additionally, check out the *Multi-core functions* section in `vignette("functions", package = "raster")`.
### Aggregation {#ras-agg}
### Vectorization
## Exercises
<!-- CRS CONVERSION -->
<!-- 1. vector reprojection exercise (e.g. modification of proj4) -->
1. Transform the `world` dataset to the transverse Mercator projection (`"+proj=tmerc"`) and plot the result.
What has changed and why?
Try to transform it back into WGS 84 and plot the new object.
Why the new object differs from the original one?
<!-- https://github.com/r-spatial/sf/issues/509 -->
<!-- ```{r} -->
<!-- world_tmerc = st_transform(world, "+proj=tmerc") -->
<!-- plot(world_tmerc$geom) -->
<!-- world_4326 = st_transform(world_tmerc, 4326) -->
<!-- plot(world_4326$geom) -->
<!-- ``` -->
1. Try to transform the categorical raster (`cat_raster`) into WGS 84 using the bi-linear interpolation method.
What has changed?
How it influences the results?
<!-- ```{r} -->
<!-- wgs84 = "+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs" -->
<!-- cat_raster_wgs84 = projectRaster(cat_raster, crs = wgs84, method = "bilinear") -->
<!-- cat_raster_wgs84 -->
<!-- ``` -->
1. Try to transform the continuous raster (`cat_raster`) into WGS 84 using the nearest neighbor interpolation method.
What has changed?
How it influences the results?
<!-- ```{r} -->
<!-- con_raster = raster(system.file("raster/srtm.tif", package="spDataLarge")) -->
<!-- con_raster_wgs84 = projectRaster(con_raster, crs = wgs84, method = "ngb") -->
<!-- con_raster_wgs84 -->
<!-- ``` -->
<!-- GEOMETRY TRANSFORMATION -->