05-transform.Rmd

# Geometric operations {#transform}

## Prerequisites {-}

- This chapter requires the following packages:^[**lwgeom** is also needed for a couple of more advanced transformations.]

```{r, message=FALSE}
library(sf)
library(raster)
library(tidyverse)
```

- It also relies on **spData** and **spDataLarge**, which load `cycle_hire_osm` dataset and provide external files:

```{r, message=FALSE}
library(spData)
library(spDataLarge)
```

The previous three chapters have demonstrated how geographic datasets are structured in R (Chapter \@ref(spatial-class)) and how to manipulate them based on their non-geographic attributes (\@ref(attr)) and spatial properties (\@ref(spatial-operations)).
This chapter goes a step further, by showing how to modify the *geometry* underlying spatial datasets.

Section \@ref(geo-vec) covers transforming vector geometries.
This includes simplifying, buffering, clipping and even shifting/scaling/rotating geometries using 'affine transformations'.
Geometry unions, which underlie spatial data aggregation, are covered in section \@ref(geometry-unions).
Advanced transformations of vector geometries include type transformations (e.g. from few multipolygons to many polygons) and 'rasterization', which sets-up the next section.

Section \@ref(geo-ras) covers geometric transformations on raster objects.
This involves changing the size and number of the underlying pixels, and assigning them new values.
It teaches how to change the resolution (also called raster aggregation and disaggregation), the extent and the origin of a raster.
These operations are especially useful if one would like to align raster datasets from diverse sources (see section \@ref(raster-alignment)).
Only making sure that these rasters share the same header information allows the jointly usage of map algebra operations (see section \@ref(map-algebra)). 

A vital type of geometry transformation is *reprojecting* from one coordinate reference system (CRS) to another.
Because of the importance of reprojection, introduced in Chapter \@ref(spatial-class) (see figure \@ref(fig:vectorplots) and section \@ref(crs-intro)), and the fact that it applies to raster and vector geometries alike, it is the topic of the first section in this chapter.

## Reprojecting geographic data 

Section \@ref(crs-intro) demonstrated the importance of understanding CRSs for geocomputation.
Many spatial operations assume that you are using a *projected* CRS (on a Euclidean grid with units of meters rather than a geographic 'lat/lon' grid with units of degrees).
The GEOS engine underlying most spatial operations in **sf**, for example, assumes your data is in a projected CRS.
For this reason **sf** contains a function for checking if geometries have a geographic or projected CRS.
This is illustrated below using the example of *London*:

```{r}
london = st_sf(geometry = st_sfc(st_point(c(-0.1, 51.5))))
st_is_longlat(london)
```

The results show that when geographic data is created from scratch, or is loaded from a source that has no CRS metadata, the CRS is unspecified by default.
The CRS can be set with `st_set_crs()`:^[The CRS can also be added when creating `sf` objects with the `crs` argument (e.g. `st_sf(geometry = st_sfc(st_point(c(-0.1, 51.5))), crs = 4326)`).
The same argument can also be used to set the CRS when creating raster datasets (e.g. `raster(crs = "+proj=longlat")`).]

```{r}
london = st_set_crs(london, 4326)
st_is_longlat(london)
```

Spatial operations on objects without a CRS run on the implicit assumption that they are projected, even when in reality they are not.
This can lead to problems, as illustrated by the following code chunk, which creates a buffer of one degree around `london`:

```{r}
london_buff = st_buffer(london, dist = 1)
```

Note the warning that informs us that the result has limited use because distances in geographic CRSs are in degrees, rather than meters or some other suitable measure of distance.
The consequences of a failure to work on projected data are illustrated in Figure \@ref(fig:crs-buf):
note how the buffer is elongated in the north-south direction.
This because lines of longitude are converge towards the Earth's poles making them closer together (lines of latitude by contrast have constant distance from each other).  

```{r crs-buf, fig.cap="Buffer on data with geographic CRS.", fig.asp=1}
plot(london_buff, graticule = st_crs(4326), axes = TRUE)
plot(london, add = TRUE)
```

Do not interpret the warning about the geographic (`longitude/latitude`) CRS as "the CRS should not be set": it almost always should be!
It is better understood as a suggestion to *reproject* the data onto a projected CRS.
This suggestion does not always need to be heeded: performing spatial and geometric operations makes little or no difference some cases (e.g. spatial subsetting).
But for operations involving distances such as buffering, the only way to ensure a good result is to create a projected copy of the data and run the operation on that.
This is done in the command below, using the **sf** function `st_transform()`:

```{r}
london_proj = st_transform(london, crs = 27700)
```

The result is a new object that is identical to `london`, but reprojected onto a suitable CRS (the British National Grid, which has an EPSG code of 27700 in this case) that has units of meters. 
We can verify that the CRS has changed using `st_crs()` as follows (only the most important components of the output are shown, other details have been replaced by `...`):

```{r, eval=FALSE}
st_crs(london_proj)
#> Coordinate Reference System:
#>   EPSG: 27700 
#>   proj4string: "+proj=tmerc +lat_0=49 +lon_0=-2 ... +units=m +no_defs"
```

Notable components of this CRS description include the EPSG code (`EPSG: 27700`), the origin (`+lat_0=49 +lon_0=-2`) and units (`+units=m`).
The fact that the units of the CRS are meters (rather than degrees) tells us that geometry operations on on `london_proj` will make sense (and not emit a warning).
This can be verified by checking that it is no longer in a geographic CRS, that `st_is_longlat(london_proj)` returns `FALSE`.
Now we can repeat the buffer operation with a meaningful measure of distance.
Moving 1 degree at the equator means moving more than 100 km (111,320 meters), the new buffer distance:

```{r}
london_proj_buff = st_buffer(london_proj, 111320)
```

The result in Figure \@ref(fig:crs-buf-proj) shows that buffers based on a projected CRS are not distorted:
it is the same distance from London to every part of the buffer's border.

```{r crs-buf-proj, fig.cap="Buffer on data with projected CRS.", fig.asp=1}
plot(london_proj_buff, graticule = st_crs(27700), axes = TRUE)
plot(london_proj, add = TRUE)
```

The importance of understanding an object's CRS (primarily whether it's projected or geographic), and reprojecting when appropriate, is illustrated above using a simple geographic vector object representing London.
The subsequent two sections go into more depth on reprojections, exploring which CRS to use and providing details about the process of reprojecting vector and raster objects.

### Which CRS to use?

While CRSs can be set manually --- as illustrated in the previous section with `st_set_crs(london, 4326)` --- it is more common in real world applications for CRSs to be set automatically when data is read-in.
The main task will be to *transform* objects provided in one CRS into another.
But when should data be transformed? And into which CRS?
Although there are no clear-cut answers to these questions, the contents of this section should help you decide on answers, on a case-by-case basis. 

The question of **when to transform** is simpler to answer.
Transformation from geographic to projected CRSs is vital in some cases:
when distance measurements or area calculations are needed, for example.
Another case when transformation is vital is when exploring the spatial relationships between two objects with different CRSs.
As illustrated in the below code chunk, which attempts to find the distance between the projected and unprojected versions of the `london` objects, this results in an error telling you that either dataset must be transformed:

```{r, eval=FALSE}
st_distance(london, london_proj)
# > Error: st_crs(x) == st_crs(y) is not TRUE
```


### Reprojecting vector geometries

Vector data on the most basic level is represented by individual points, and points create more complex objects, such as lines and polygons.
Spatial reprojection of vectors is a mathematical transformation of coordinates of these point.
Depending on projections used, reprojection could be either lossy or lossless.
For example, loss of spatial information could occur when the new CRS is only adequate for smaller area than input vector.
The precision could be also lost when transformation is between coordinate systems that have different datum - in those situations approximations are used.
However, in most cases CRS vector transformation is lossless.

The dataset `cycle_hire_osm` represents all cycle hire locations across London, taken from OpenStreetMap (OSM).
It is automatically loaded by the **spData** package, meaning we do not have to load it, and its CRS can be queried as follows:

```{r}
st_crs(cycle_hire_osm)
```

CRS in R can be described as an `epsg` code or a `proj4string` definition, as described in section \@ref(crs-in-r).
Let's create a new version of `cycle_hire_osm` in a projected CRS, using the `epsg` number of 27700:

```{r}
cycle_hire_osm_projected = st_transform(cycle_hire_osm, 27700)
st_crs(cycle_hire_osm_projected)
```

Note that the result shows that the `epsg` has been updated and that `proj4string` element of the CRS now contains, among other things `+proj=tmerc` (meaning it is a projected CRS using the [tranverse Mercator](https://en.wikipedia.org/wiki/Transverse_Mercator_projection) projection) and `+units=m` (meaning the units of the coordinates are meters).
Another function, from the **rgdal** library, provides a note containing the name of the CRS:

```{r}
crs_codes = rgdal::make_EPSG()[1:2]
dplyr::filter(crs_codes, code == 27700)
```

The result shows that the EPSG code 27700 represents the British National Grid, a result that could have been found by searching online for "[CRS 27700](https://www.google.com/search?q=CRS+27700)".
The formula that converts a geographic point into a point on the surface of the Earth is provided by the `proj4string` element of the `crs` (see [proj4.org](http://proj4.org/) for further details):

```{r}
st_crs(27700)$proj4string
```

```{block2 type='rmdnote'}
The EPSG code can be found inside the `crs` attribute of the object's geometry.
It is hidden from view for most of the time except when the object is printed but can be can identified and set using the `st_crs` function, for example `st_crs(cycle_hire_osm)$epsg`.
```

Existing CRS are well suited for most purposes.
<!-- examples -->
In the same time, `proj4string` definitions are highly modifiable and allow for CRS customization.
<!-- as we mentioned in section \@ref(crs-in-r). -->
We can present that using selected world projections.
The Mollweide projection is recommended when it is important to preserve areas [@jenny_guide_2017] (Figure \@ref(fig:mollproj)).
To use this projection, we need to specify it using the `proj4string` element, `"+proj=moll"`, in the `st_transform` function:

```{r}
world_mollweide = st_transform(world, crs = "+proj=moll")
```
<!-- plot(world_mollweide$geom) -->
<!-- plot(world_mollweide$geom, graticule = TRUE) -->

```{r mollproj, echo=FALSE, fig.cap="Mollweide projection of the world", warning=FALSE}
par_old = par()
par(mar = c(0, 0, 1, 0))
plot(world_mollweide$geom, graticule = TRUE, main = "the Mollweide projection")
par(par_old)
```

On the other hand, the goal for many visualization purposes is to have a map with minimized area, direction, and distance distortions.
One of the most popular projection to achieve that is Winkel tripel (Figure \@ref(fig:wintriproj)).^[This projection is used, among others, by the National Geographic Society.]
`st_transform_proj()` from the **lwgeom** package allows for coordinates transformations to a wider range of CRSs, inluding the Winkel tripel projection: 

```{r}
world_wintri = lwgeom::st_transform_proj(world, crs = "+proj=wintri")
```
<!-- plot(world_wintri$geom) -->
<!-- plot(world_wintri$geom, graticule = TRUE) -->

```{r wintriproj, echo=FALSE, fig.cap="Winkel tripel projection of the world", warning=FALSE}
world_wintri_gr = st_graticule(lat = c(-89.9, seq(-80, 80, 20), 89.9)) %>% 
  lwgeom::st_transform_proj(crs = "+proj=wintri")
par_old = par()
par(mar = c(0, 0, 1, 0))
plot(world_wintri_gr$geometry, main = "the Winkel tripel projection", col = "grey")
plot(world_wintri$geom, add = TRUE)
par(par_old)
```

```{block2 type='rmdnote'}
Two main functions for transformation of simple features coordinates are `sf::st_transform()` and `sf::sf_project()`. 
The `st_transform` function uses the GDAL interface to PROJ.4, while `sf_project()` (which works with two-column numeric matrices, representing points) and `lwgeom::st_transform_proj()` use the PROJ.4 API directly.
The first one is appropriate in most situations, and provides a set of the most often used parameters and well defined transformations.
The second one allows for a greater customization of a projection, which includes cases when some of the PROJ.4 parameters (e.g. `+over`) or projection (`+proj=wintri`) is not available in `st_transform()`.
```

```{r, eval=FALSE, echo=FALSE}
# demo of sf_project
mat_lonlat = as.matrix(data.frame(x = 0:20, y = 50:70))
plot(mat_lonlat)
mat_projected = sf_project(from = st_crs(4326)$proj4string, to = st_crs(27700)$proj4string, pts = mat_lonlat)
plot(mat_projected)
```


Moreover, PROJ.4 parameters can be modified in most CRS definitions.
The below code transforms the coordinates to the Lambert azimuthal equal-area projection centered on longitude and latitude of `0` (Figure \@ref(fig:laeaproj1)).

```{r}
world_laea1 = st_transform(world, crs = "+proj=laea +x_0=0 +y_0=0 +lon_0=0 +lat_0=0")
```
<!-- plot(world_laea1$geom) -->
<!-- plot(world_laea1$geom, graticule = TRUE) -->

```{r laeaproj1, echo=FALSE, fig.cap="Lambert azimuthal equal-area projection of the world centered on longitude and latitude of 0", warning=FALSE}
par_old = par()
par(mar = c(0, 0, 1, 0))
plot(world_laea1$geom, graticule = TRUE, main = "the Lambert azimuthal equal-area projection")
par(par_old)
```

We can change the PROJ.4 parameters, for example the center of the projection using the `+lon_0` and `+lat_0` parameters. 
The code below gives the map centered on New York City (Figure \@ref(fig:laeaproj2)).

```{r}
world_laea2 = st_transform(world, crs = "+proj=laea +x_0=0 +y_0=0 +lon_0=-74 +lat_0=40")
```
<!-- plot(world_laea2$geom) -->
<!-- plot(world_laea2$geom, graticule = TRUE) -->

```{r laeaproj2, echo=FALSE, fig.cap="Lambert azimuthal equal-area projection of the world centered on New York City", warning=FALSE}
par_old = par()
par(mar = c(0, 0, 1, 0))
plot(world_laea2$geom, graticule = TRUE, main = "the Lambert azimuthal equal-area projection")
par(par_old)
```

More information about CRS modification can be found in the [Using PROJ.4](http://proj4.org/usage/index.html) documentation.

<!-- https://github.com/r-spatial/lwgeom/issues/6 -->
<!-- ```{r} -->
<!-- # devtools::install_github("r-spatial/lwgeom") -->
<!-- library(lwgeom) -->
<!-- world_3 = lwgeom::st_transform_proj(world, crs = "+proj=wintri") -->
<!-- plot(world_3$geom) -->
<!-- ``` -->
<!-- http://bl.ocks.org/vlandham/raw/9216751/ -->

### Reprojecting raster geometries

The projection concepts described in the previous section apply equally to rasters.
However, there are important differences in reprojection of vectors and rasters:
transforming a vector object involves changing the coordinates of every vertex but this do not apply to raster data.
Rasters are are composed of rectangular cells of the same size (expressed by map units, such as degrees or meters), so it is impossible to transform coordinates of pixels separately.

Raster reprojection involves creating a new raster object, often with a different number of columns and rows than the original.
The attributes must subsequently be re-estimated, allowing the new pixels to be 'filled' with appropriate values.
This two-stage process is done with `projectRaster()` from the **raster** package.
Like the `st_transform()` function demonstrated in the previous section, `projectRaster()` takes a geographic object (a raster dataset in this case) and a `crs` argument.
However, `projectRaster()` only accepts the lengthy `proj4string` definitions of a CRS rather than concise EPSG codes.

```{block2 type='rmdnote'}
It is possible to use a EPSG code in a `proj4string` definition with `"+init=epsg:MY_NUMBER"`.
For example, one can use the `"+init=epsg:4326"` definition to set CRS to WGS84 (EPSG code of 4326).
The PROJ.4 library automatically adds the rest of parameters and converts it into `"+init=epsg:4326 +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0"`,
```

Let's take a look at two examples of raster transformation - using categorical and continuous data.
Land cover data are usually represented by categorical maps.
The `nlcd2011.tif` file provides information for a small area in Utah, USA obtained from [National Land Cover Database 2011](https://www.mrlc.gov/nlcd2011.php) in the NAD83 / UTM zone 12N CRS.

```{r}
cat_raster = raster(system.file("raster/nlcd2011.tif", package = "spDataLarge"))
cat_raster
```

In this region, 14 land cover classes were distinguished^[Full list of NLCD2011 land cover classes can be found at https://www.mrlc.gov/nlcd11_leg.php]:

```{r}
unique(cat_raster)
```

When reprojecting categorical raster, we need to ensure that our new estimated values would still have values of our original classes.
This could be done using the nearest neighbor method (`ngb`).
In this method, value of the output cell is calculated based on the nearest cell center of the input raster.

For example, we want to change the CRS to WGS 84.  
It can be desired when we want to visualize a raster data on top of a web basemaps, such as the Google or OpenStreetMap map tiles.
The first step is to obtain the proj4 definition of this CRS, which can be done using the [http://spatialreference.org](http://spatialreference.org/ref/epsg/wgs-84/) webpage. 
The second and last step is to define the reprojection method in the `projectRaster()` function, which in case of categorical data is the nearest neighbor method (`ngb`):

```{r}
wgs84 = "+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs"
cat_raster_wgs84 = projectRaster(cat_raster, crs = wgs84, method = "ngb")
cat_raster_wgs84
```

Many properties of the new object differs from the previous one, which include the number of columns and rows (and therefore number of cells), resolution (transformed from meters into degrees), and extent.
In the same time, it keeps the same land cover classes - `unique(cat_raster_wgs84)`.
<!-- freq(cat_raster_wgs84) -->
<!-- freq(cat_raster) -->

This process of reprojection is almost identical for continuous data.
The `srtm.tif` file contains digital elevation model for the same area in Utah from [the Shuttle Radar Topography Mission (SRTM)](https://www2.jpl.nasa.gov/srtm/).
Each value in this raster represents elevation measured in meters.

```{r}
con_raster = raster(system.file("raster/srtm.tif", package = "spDataLarge"))
con_raster
```

The nearest neighbor method should not be used for continuous raster data, as we want to preserve gradual changes in values.
Alternatively, continuous data could be reprojected in the **raster** package using the bi-linear method. 
In this technique, value of the output cell is calculated based on four nearest cells in the original raster. 
The new value is a weighted average of values from these four cells, adjusted for their distance from the center of the output cell. 
This dataset has geographic CRS and we want to transform it into projected CRS.

```{block2 type='rmdnote'}
All the grid cells in equal-area projections have the same size.
Therefore, these projections are recommended when performing many raster operations, such as distance calculations.
```

In the fist step we need to obtain the proj4 definition of the existing projected CRS appropriate for this area or create a new one using the [Projection Wizard](http://projectionwizard.org/) online tool [@savric_projection_2016].
For this example, we used the Oblique Lambert azimuthal equal-area projection.
The second step is to define the `bilinear` reprojection method:

```{r}
equalarea = "+proj=laea +lat_0=37.32 +lon_0=-113.04"
con_raster_ea = projectRaster(con_raster, crs = equalarea, method = "bilinear")
con_raster_ea
```

Reprojection of continuous rasters also changes spatial properties, such as the number of cells, resolution, and extent.
Moreover, it slightly modifies values in the new raster, which can be seen by comparing the outputs of the `summary()` function between `con_raster` and `con_raster_ea`.

```{r, eval=FALSE}
summary(con_raster)
summary(con_raster_ea)
```

<!-- why new na? -->

<!-- res option in projectRaster? -->
<!-- note1: in most of the cases reproject vector, not raster-->
<!-- note2: equal area projections are the best for raster calculations -->
<!-- q: should we mentioned gdal_transform? -->

## Geometric operations on vector data {#geo-vec}

This section is about operations that in some way change the geometry of vector (`sf`) objects.
It is more advanced than the spatial data operations presented in the previous Chapter (in section \@ref(spatial-operations-on-vector-data)) because here we drill down into the geometry:
the functions discussed in this section work on objects of class `sfc` (simple feature geometry collections) in addition to objects of class `sf`.

### Simplification

<!-- \@ref(fig:seine-simp) -->

```{r}
seine_simp = st_simplify(seine, dTolerance = 2000)
```

```{r}
object.size(seine)
object.size(seine_simp)
```

```{r seine-simp, echo=FALSE, fig.cap="Comparision of original data of the contiguous United States and two simplified versions using `st_simplify` and `ms_simplify`.", warning=FALSE}
par_old = par()
par(mfrow = c(1, 2), mar = c(0, 0, 1, 0))
plot(seine$geometry, col = 8, main = "Original data")
plot(seine_simp$geometry, col = 8, main = "st_simplify")
par(par_old)
```

<!-- \@ref(fig:us-simp) -->

<!-- - simplifications -->
<!-- st_simplify -->
```{r}
us_states_simp1 = st_simplify(us_states, dTolerance = 1)
```

<!-- line example -->
<!-- maybe river or road network to spData?? -->

<!-- rmapshaper -->
<!-- polygon example -->
```{r}
# proportion of points to retain (0-1; default 0.05)
us_states_simp2 = rmapshaper::ms_simplify(us_states, keep = 0.01, keep_shapes = TRUE)
```

```{r us-simp, echo=FALSE, fig.cap="Comparision of original data of the contiguous United States and two simplified versions using `st_simplify` and `ms_simplify`.", warning=FALSE}
par_old = par()
par(mfrow = c(1, 3), mar = c(0, 0, 1, 0))
plot(us_states$geometry, col = 8, main = "Original data")
plot(us_states_simp1$geometry, col = 8, main = "st_simplify")
plot(us_states_simp2$geometry, col = 8, main = "ms_simplify")
par(par_old)
```

### Geometry unions

Spatial aggregation can also be done in the **tidyverse**, using **dplyr** functions as follows:

```{r, eval = FALSE}
group_by(us_states, REGION) %>%
  summarize(sum(pop = total_pop_15, na.rm = TRUE))
```

For attribute data aggregation the grouping variable is another variable, typically one with few unique values relative to the number of rows (see section \@ref(vector-attribute-aggregation)).
What we did not cover in that section was that attribute data aggregation dissolves the geometries of touching polygons.
The `REGION` variable in the `us_states` dataset is a good example:
there are 49 states (excluding Hawaii and Alaska) which can be aggregated into four regions.
This is demonstrated in the code chunk below, the results of which are illustrated in Figure \@ref(fig:us-regions):

```{r}
regions = aggregate(x = us_states[, "total_pop_15"], by = list(us_states$REGION),
                    FUN = sum, na.rm = TRUE)
```
<!--
show also tidyverse way, so what you are doing is basically a spatial join and a subsequent aggregation without a grouping variable. Didactically, it might be better to present a grouping variable.
-->

```{r, echo=FALSE}
# st_join(buff, africa[, "pop"]) %>%
#   summarize(pop = sum(pop, na.rm = TRUE))
# summarize(africa[buff, "pop"], pop = sum(pop, na.rm = TRUE))
```

```{r us-regions, fig.cap="Spatial aggregation on contiguous polygons, illustrated by aggregating the population of US states into regions, with population represented by color. Note the operation automatically dissolves boundaries between states.", echo=FALSE, warning=FALSE, fig.asp=0.2, out.width="100%"}
# # with base R ----
# old_par = par()
# breaks = c(1e5, 5e6, 1e7, 7e7, 2e8)
# rdf = st_set_geometry(regions, NULL)
# us_states$region_pop = inner_join(dplyr::select(us_states, REGION),
#                                   dplyr::select(rdf, Group.1, total_pop_15),
#                                   by = c("REGION" = "Group.1")) %>% 
#   pull(total_pop_15)
# 
# par(mfrow = c(1, 2))
# plot(us_states[, "total_pop_15"], main = "US states", breaks = breaks, key.pos = NULL)
# plot(regions[, "total_pop_15"], main = "US regions", breaks = breaks, key.pos = NULL)
# par(old_par)
# with tmap ----
library(tmap)
us_states_facet = dplyr::select(us_states, REGION, total_pop_15) %>%
  mutate(Level = "State")
regions_facet = rename(regions, REGION = Group.1) %>%
  mutate(Level = "Region")
us_facet = rbind(us_states_facet, regions_facet) %>% 
  mutate(Level = factor(Level, levels = c("State", "Region")))

qtm(us_facet, "total_pop_15") +
  tm_facets(by = "Level", nrow = 1, drop.units = TRUE)
```

The equivalent result can be achieved using **tidyverse** functions as follows (result not shown):

```{r, eval=FALSE}
regions2 = us_states %>% 
  group_by(REGION) %>%
  summarize(sum(pop = total_pop_15, na.rm = TRUE))
```

### Clipping 

Spatial clipping is a form of spatial subsetting that involves changes to the `geometry` columns of at least some of the affected features.

Clipping can only apply to features more complex than points: 
lines, polygons and their 'multi' equivalents.
To illustrate the concept we will start with a simple example:
two overlapping circles with a center point one unit away from each other and radius of one:

```{r points, fig.cap="Overlapping circles."}
b = st_sfc(st_point(c(0, 1)), st_point(c(1, 1))) # create 2 points
b = st_buffer(b, dist = 1) # convert points to circles
l = c("x", "y")
plot(b)
text(x = c(-0.5, 1.5), y = 1, labels = l) # add text
```

Imagine you want to select not one circle or the other, but the space covered by both `x` *and* `y`.
This can be done using the function `st_intersection()`, illustrated using objects named `x` and `y` which represent the left and right-hand circles:

```{r}
x = b[1]
y = b[2]
x_and_y = st_intersection(x, y)
plot(b)
plot(x_and_y, col = "lightgrey", add = TRUE) # color intersecting area
```

The subsequent code chunk demonstrate how this works for all combinations of the 'Venn' diagram representing `x` and `y`, inspired by [Figure 5.1](http://r4ds.had.co.nz/transform.html#logical-operators) of the book R for Data Science [@grolemund_r_2016].
<!-- Todo: reference r4ds -->

```{r venn-clip, echo=FALSE, fig.cap="Spatial equivalents of logical operators.", warning=FALSE}
old_par = par()
par(mfrow = c(3, 3), mai = c(0.1, 0.1, 0.1, 0.1))
plot(b)
y_not_x = st_difference(y, x)
plot(y_not_x, col = "grey", add = TRUE)
text(x = 0.5, y = 1, "st_difference(y, x)")
plot(b)
plot(x, add = TRUE, col = "grey")
text(x = 0.5, y = 1, "x")
plot(b, add = TRUE)
x_or_y = st_union(x, y)
plot(x_or_y, col = "grey")
text(x = 0.5, y = 1, "st_union(x, y)")
x_and_y = st_intersection(x, y)
plot(b)
plot(x_and_y, col = "grey", add = TRUE) 
text(x = 0.5, y = 1, "st_intersection(x, y)")
# x_xor_y = st_difference(x_xor_y, x_and_y) # failing
x_not_y = st_difference(x, y)
x_xor_y = st_sym_difference(x, y)
plot(x_xor_y, col = "grey")
text(x = 0.5, y = 1, "st_sym_difference(x, y)")
plot.new()
plot(b)
plot(x_not_y, col = "grey", add = TRUE)
text(x = 0.5, y = 1, "st_difference(x, y)")
plot(b)
plot(y, col = "grey", add = TRUE)
plot(b, add = TRUE)
text(x = 0.5, y = 1, "y")
par(old_par)
```

To illustrate the relationship between subsetting and clipping spatial data, we will subset points that cover the bounding box of the circles `x` and `y` in Figure \@ref(fig:venn-clip).
Some points will be inside just one circle, some will be inside both and some will be inside neither.

There are two different ways to subset points that fit into combinations of the circles: via clipping and logical operators.
But first we must generate some points.
We will use the *simple random* sampling strategy to sample from a box representing the extent of `x` and `y`.
To generate this points will use a function not yet covered in this book, `st_sample()`.
Next we will generate the situation plotted in Figure \@ref(fig:venn-subset):

```{r venn-subset, fig.cap="Randomly distributed points within the bounding box enclosing circles x and y."}
bb = st_bbox(st_union(x, y))
pmat = matrix(c(bb[c(1, 2, 3, 2, 3, 4, 1, 4, 1, 2)]), ncol = 2, byrow = TRUE)
box = st_polygon(list(pmat))
set.seed(2017)
p = st_sample(x = box, size = 10)
plot(box)
plot(x, add = TRUE)
plot(y, add = TRUE)
plot(p, add = TRUE)
text(x = c(-0.5, 1.5), y = 1, labels = l)
```

```{r, echo=FALSE}
# An alternative way to sample from the bb
bb = st_bbox(st_union(x, y))
pmulti = st_multipoint(pmat)
box = st_convex_hull(pmulti)
```

### Centroids

<!-- centroids intro -->
There are two main functions that create single point representations of more complex vector objects - `st_centroid()` and `st_point_on_surface()`.

<!-- The first one is `st_centroid()`  -->
<!-- st_centroid -->

```{r}
nz_centroid = st_centroid(nz)
```

<!-- st_point_on_surface -->
<!-- The second one -->
<!-- add lines example -->
<!-- note about computation time -->

```{r}
nz_pos = st_point_on_surface(nz)
```

```{r}
seine_centroid = st_centroid(seine)
```

```{r}
seine_pos = st_point_on_surface(seine)
```

```{r, warning=FALSE, echo=FALSE}
par_old = par()
par(mfrow = c(1, 2), mar = c(0, 0, 1, 0))
plot(nz$geometry)
plot(nz_centroid$geometry, add = TRUE)
plot(nz_pos$geometry, add = TRUE, col = "red")
plot(seine$geometry)
plot(seine_centroid$geometry, add = TRUE)
plot(seine_pos$geometry, add = TRUE, col = "red")
par(par_old)
```

### Buffers

### Affine transformations

### Type transformation

Geometry casting is powerful operation which enable transformation of the geometry type.
It is implemented in the `st_cast` function from the `sf` package.
Importantly, `st_cast` behaves differently on single simple feature geometry (`sfg`) objects, and simple feature geometry column (`sfc`) and simple features objects.

Let's create a multipoint to illustrate how geometry casting works on simple feature geometry (`sfg`) objects:

```{r}
multipoint = st_multipoint(matrix(c(1, 3, 5, 1, 3, 1), ncol = 2))
```

In this case, `st_cast` can be useful to transform the new object into linestring or polygon (Figure \@ref(fig:single-cast)):

<!-- a/ points -> lines -> polygons  -->
```{r}
linestring = st_cast(multipoint, "LINESTRING")
polyg = st_cast(multipoint, "POLYGON")
```

```{r single-cast, echo = FALSE, fig.cap="Examples of linestring and polygon created based on multipoint using the `st_cast` function", warning=FALSE}
par_old = par()
par(mfrow = c(1, 3), mar = c(0, 0, 1, 0))
plot(multipoint, main = "MULTIPOINT")
plot(linestring, main = "LINESTRING")
plot(polyg, col = "grey", main = "POLYGON")
par(par_old)
```

This process can be also reversed using `st_cast`:

```{r}
multipoint_2 = st_cast(linestring, "MULTIPOINT")
multipoint_3 = st_cast(polyg, "MULTIPOINT")
all.equal(multipoint, multipoint_2, multipoint_3)
```

```{block2 type='rmdnote'}
For single simple feature geometries (`sfg`), `st_cast` provides also geometry casting from non-multi to multi types (e.g. `POINT` to `MULTIPOINT`) and from multi types to non-multi types.
However, only the first element of the old object would remain in the second group of cases.
<!-- note: beware of information lost (you will get a warning) -->
```

```{r, include=FALSE}
cast_all <- function(xg) {
  lapply(c("MULTIPOLYGON", "MULTILINESTRING", "MULTIPOINT", "POLYGON", "LINESTRING", "POINT"), 
         function(x) st_cast(xg, x))
}
t = cast_all(multipoint)
t2 = cast_all(polyg)
```

Geometry casting of simple feature geometry column (`sfc`) and simple features objects works the same as for single geometries in most of the cases. 
One imporant difference is conversion between multi to non-multi types.
As a result of this process, multi-objects are split into many non-multi objects.

We would use a new object, `multilinestring_sf`, as an example (on the left in Figure \@ref(fig:line-cast)):

```{r}
multilinestring_list = list(matrix(c(1, 4, 5, 3), ncol = 2), 
                            matrix(c(4, 4, 4, 1), ncol = 2),
                            matrix(c(2, 4, 2, 2), ncol = 2))
multilinestring = st_multilinestring((multilinestring_list))
multilinestring_sf = st_sf(geom = st_sfc(multilinestring))
multilinestring_sf
```

You can imagine it as a road or river network. 
The new object has only one row that define all the lines.
This restrict number of operation that could be done, for example it prevent adding names to each line segment or calculating lengths of single lines.
The `st_cast` function can be used in this situation, as it separates one mutlilinestring into three linestrings:

```{r}
linestring_sf2 = st_cast(multilinestring_sf, "LINESTRING")
linestring_sf2
```

```{r line-cast, echo=FALSE, fig.cap="Examples of type casting between MULTILINESTRING (left) and LINESTRING (right).", warning=FALSE}
par_old = par()
par(mfrow = c(1, 2), mar = c(0, 0, 1, 0))
plot(multilinestring_sf$geom, col = "black", main = "MULTILINESTRING")
plot(linestring_sf2$geom, col = c("red", "green", "blue"), main = "LINESTRING")
par(par_old)
```

The newely created object allows for attributes creation (see more in section \@ref(vec-attr-creation)) and length measurement:

```{r}
linestring_sf2$name = c("Riddle Rd", "Marshall Ave", "Foulke St")
linestring_sf2$length = st_length(linestring_sf2)
linestring_sf2
```

<!-- ### Class conversion -->
<!-- placeholder for: -->
<!-- sf -> sp -->
<!-- sp -> sf -->
<!-- stars; https://github.com/r-spatial/stars/blob/master/vignettes/blog1.Rmd -->

### Rasterization
<!-- - vector to raster -->

## Geometric operations on raster data {#geo-ras}

### Raster alignment

```{r, echo=FALSE}
source("code/create-rasters.R")
```

When merging or performing map algebra on rasters, their resolution, projection, origin and/or extent has to match.
Otherwise, how should we add the values of one raster with a resolution of 0.2 decimal degrees to a second with a resolution of 1 decimal degree?
The same problem arises when we would like to merge satellite imagery from different sensors with different projections and resolutions.
We can deal with such mismatches by aligning the rasters.

This section uses the `elev` object from \@ref(manipulating-raster-objects).
The `projectRaster()` function reprojects one raster to a desired projection, say from UTM to WGS84.
Equally, map algebra operations require the same extent.
Following code adds one row and two columns to each side of the raster while setting all new values to an elevation of 1000 meters (\@ref(fig:extend-example)).

```{r extend-example, fig.cap = "Original raster extended by 1 one row on each side (top, bottom) and two columns on each side (right, left)."}
elev_2 = extend(elev, c(1, 2), value = 1000)
plot(elev_2)
```

Performing an algebraic operation on two objects with differing extents in R, the **raster** package returns the result for the intersection, and says so in a warning.

```{r}
elev_3 = elev + elev_2
```

However, we can also align the extent of two rasters with the `extend()` command.
Here, we extend the `elev` object to the extend of `elev_2`. 
The newly added rows and column receive the default value of the `value` parameter, i.e., `NA`.

```{r}
elev_4 = extend(elev, elev_2)
```

The `aggregate()` and `disaggregate()` functions help to change the cell size resolution of a raster.
For instance, let us aggregate `elev` from a resolution of 0.5 to a resolution of 1, that means we aggregate by a factor of 2 (Fig. \@ref(fig:aggregate-example)).
Additionally, the output cell value should correspond to the mean of the input cells (but  one could use other functions as well such as `median()`, `sum()`, etc.):

```{r aggregate-example, fig.cap = "Original raster (left). Aggregated raster (right)."}
elev_agg = aggregate(elev, fact = 2, fun = mean)
par(mfrow = c(1, 2))
plot(elev)
plot(elev_agg)
```

Note that the origin of `elev_agg` has changed, too.

```{r}
origin(elev)
origin(elev_agg)
```

The origin is the point closest to (0, 0) if you moved towards it in steps of x and y resolution.
If two rasters have different origins, their cells do not overlap completely which would make map algebra impossible.
To change the origin , use `origin()`.^[If the origins of two raster datasets are just marginally apart, it sometimes is sufficient to simply increase the `tolerance` argument  of `raster::rasterOptions()`.]
Looking at figure \@ref(fig:origin-example) reveals the effect of changing the origin.

```{r origin-example, fig.cap = "Plotting rasters with the same values but different origins."}
# plot the aggregated raster
plot(elev_agg)
# change the origin
origin(elev_agg) = c(0, 0)
# plot it again
plot(elev_agg, add = TRUE)
```

The `resample()` command lets you align several raster properties in one go, namely origin, extent and resolution.
Let us resample an extended `elev_agg` to the properties of `elev` again.

```{r}
# add 2 rows and columns, i.e. change the extent
elev_agg = extend(elev_agg, 2)
elev_disagg = resample(elev_agg, elev)
```

Though our disaggregated `elev_disagg` retrieved back its original resolution, cell size and extent, its values differ. 
However, this is to be expected, disaggregating **cannot** predict values at a finer resolution, it simply uses an interpolation algorithm.
It is important to keep in mind that disaggregating results in a finer resolution, the corresponding values, however, are only as accurate as their lower resolution source.

Finally, if you want to align many (possibly hundreds or thousands of) images stored on disk, you might want to checkout the `gdalUtils::align_rasters()` function.
Nevertheless, you may also use **raster** with very large datasets. 
This is because **raster**:

1. lets you work with raster datasets that are too large to fit into the main memory (RAM) by only processing chunks of it.
2. tries to facilitate parallel processing.
For more information have a look at the help pages of `beginCluster()` and `clusteR()`.
Additionally, check out the *Multi-core functions* section in `vignette("functions", package = "raster")`.

### Aggregation {#ras-agg}

### Vectorization

## Exercises

<!-- CRS CONVERSION -->
<!-- 1. vector reprojection exercise (e.g. modification of proj4) -->
1. Transform the `world` dataset to the transverse Mercator projection (`"+proj=tmerc"`) and plot the result.
What has changed and why?
Try to transform it back into WGS 84 and plot the new object.
Why the new object differs from the original one?
<!-- https://github.com/r-spatial/sf/issues/509 -->
<!-- ```{r} -->
<!-- world_tmerc = st_transform(world, "+proj=tmerc") -->
<!-- plot(world_tmerc$geom) -->
<!--  world_4326 = st_transform(world_tmerc, 4326) -->
<!-- plot(world_4326$geom) -->
<!-- ``` -->
1. Try to transform the categorical raster (`cat_raster`) into WGS 84 using the bi-linear interpolation method. 
What has changed?
How it influences the results?
<!-- ```{r} -->
<!-- wgs84 = "+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs" -->
<!-- cat_raster_wgs84 = projectRaster(cat_raster, crs = wgs84, method = "bilinear") -->
<!-- cat_raster_wgs84 -->
<!-- ``` -->
1. Try to transform the continuous raster (`cat_raster`) into WGS 84 using the nearest neighbor interpolation method. 
What has changed?
How it influences the results?
<!-- ```{r} -->
<!-- con_raster = raster(system.file("raster/srtm.tif", package="spDataLarge")) -->
<!-- con_raster_wgs84 = projectRaster(con_raster, crs = wgs84, method = "ngb") -->
<!-- con_raster_wgs84 -->
<!-- ``` -->
<!-- GEOMETRY TRANSFORMATION -->