Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Complete rewrite of the Ecology chapter #771

Merged
merged 81 commits into from
Apr 18, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
81 commits
Select commit Hold shift + click to select a range
aa1a728
start to write mlr3 code
Aug 3, 2021
d313e4b
add 2 remarks
Aug 3, 2021
6e7907d
add svm autotuner and mlr3 advantages
jannes-m Aug 3, 2021
5aa0871
rewrite case study section
jannes-m Aug 4, 2021
2dbdc49
update pat-s citation
jannes-m Aug 4, 2021
8f66590
add mlr3 book reference
jannes-m Aug 4, 2021
20f9d57
evaluate spDataLarge data code chunk
jannes-m Aug 4, 2021
9f82d04
update adv-r ref
jannes-m Aug 4, 2021
5d223b6
create a mlr3 taskclassifst
jannes-m Aug 4, 2021
a5533b7
delete superfluous following
jannes-m Aug 4, 2021
29e4b93
add to-dos
jannes-m Aug 4, 2021
e30f2f0
add missing comma
jannes-m Aug 4, 2021
a7edbfa
create log_reg learner and run resampling
jannes-m Aug 5, 2021
c74ef1f
start writing the tuning section
jannes-m Aug 7, 2021
019fd19
use lsl study area
jannes-m Aug 8, 2021
dcd952b
replace all instances of study_area by study_mask
jannes-m Aug 8, 2021
665565b
add future parallelization
jannes-m Aug 8, 2021
a83cf3b
replace mlr by mlr3 output
jannes-m Aug 10, 2021
36bf6c2
use benchmark(_grid) for comparing sp with conventional cv, rewrite p…
jannes-m Aug 11, 2021
162cb19
replace old boxplot, rewrite parallel stuff
jannes-m Aug 11, 2021
ee9a95b
update contents
jannes-m Aug 11, 2021
fb5cc34
rewrite svm stuff using mlr3
jannes-m Aug 11, 2021
aab55e3
delete old readRDS commands
jannes-m Aug 11, 2021
d7a633d
run glm and svm sp and nsp cv using benchmark
jannes-m Aug 16, 2021
f2ce68d
use only one script for glm and svm cvs
jannes-m Aug 16, 2021
c5f2c7e
update svm resampling and exercises
jannes-m Aug 16, 2021
845bd3a
mlr3 code for predicting floristic composition
jannes-m Aug 18, 2021
8a68e8d
update 14-rf_mlr3.R
jannes-m Feb 6, 2022
d245799
harmonize with main
jannes-m Feb 6, 2022
3b57594
rename code and output files
jannes-m Feb 15, 2022
9eac3cd
add exercise file
jannes-m Feb 15, 2022
f289ff4
rename code chunks and add new exercise style
jannes-m Feb 15, 2022
d83926f
use updated spDataLarge data and replace raster by terra
Feb 16, 2022
f9fb429
add encapsulate explanation
Feb 16, 2022
e94c8a8
update figures
Feb 23, 2022
6d61dde
use terra, update figures, use bm_agg instead of agg in boxplot
Feb 23, 2022
db23472
Merge branch 'main' into rewrite_cv
Feb 23, 2022
1eccbee
update and add references
jannes-m Feb 23, 2022
af6072b
sync with main
jannes-m Feb 23, 2022
f2d8de9
delete unnecessary ta_2
jannes-m Feb 23, 2022
f50d09c
rename 14-rf_mlr3.R to 15-rf_mlr3.R
jannes-m Feb 23, 2022
14c8d77
use terra instead of raster
jannes-m Feb 26, 2022
04fae2d
rename intermediat results
jannes-m Feb 26, 2022
e0cd03b
read in geotiffs with terra
jannes-m Feb 26, 2022
58973d2
update 12-cv.R
Apr 11, 2022
75786a3
update extdata/.gitignore
Apr 11, 2022
718bedd
harmonize
Apr 11, 2022
cf7d55c
Merge branch 'main' into rewrite_cv
jannes-m Apr 11, 2022
18c11a6
Merge branch 'rewrite_cv' into rewrite_eco
jannes-m Apr 11, 2022
9978537
update eco text until the start of the dimensionality reduction
jannes-m Apr 11, 2022
fd71259
only rename two layers and add a comment
jannes-m Apr 11, 2022
e42fe33
save AutoTuner output
jannes-m Apr 12, 2022
22560b7
rewrite the modeling stuff
jannes-m Apr 12, 2022
6fc35f8
rewrite predictive mapping part
jannes-m Apr 12, 2022
852792c
add spatial prediction
jannes-m Apr 12, 2022
57903c5
updating figures of c15
jannes-m Apr 12, 2022
c2ea0c8
update code chunk name
jannes-m Apr 13, 2022
e3c0ca3
start working on exercises for c15
jannes-m Apr 13, 2022
dcd8efc
complete exercices for c15
jannes-m Apr 14, 2022
e12dfbc
polish
jannes-m Apr 14, 2022
2c2f788
add missing libraries to ex0
jannes-m Apr 14, 2022
ef941e4
delete unnecessary and outdated files
jannes-m Apr 14, 2022
7256a47
add structure to 15-rf_mlr3.R
jannes-m Apr 14, 2022
200b9ee
replace mlr by mlr3
jannes-m Apr 15, 2022
7b3e236
add reference to benchmark() and benchmark_grid()
jannes-m Apr 15, 2022
bba8caa
add benchmark and benchmark_grid() hint to _12_exercises
jannes-m Apr 15, 2022
b174dca
replace raster by terra again
jannes-m Apr 15, 2022
ce69f05
use system.file to attach ta
jannes-m Apr 16, 2022
564f4c3
replace raster by terra in _12-ex.Rmd
jannes-m Apr 16, 2022
60c0596
Merge branch 'rewrite_cv' into rewrite_eco
jannes-m Apr 16, 2022
062c439
replace raster by terra and use new dplyr syntax
jannes-m Apr 16, 2022
20e32dd
use :: style to make clear which packages are used when
jannes-m Apr 16, 2022
127273a
fix typo
jannes-m Apr 16, 2022
13b7b65
delete old, no longer necessary sentence from comments
jannes-m Apr 16, 2022
dd58c32
delete merge conflict HEAD
jannes-m Apr 16, 2022
60d14f8
Merge branch 'rewrite_cv' into rewrite_eco
jannes-m Apr 16, 2022
2e6c789
further polish 12-spatial-cv.Rmd
jannes-m Apr 17, 2022
b305dc2
delete no longer required mlr3 file
jannes-m Apr 17, 2022
57358be
check if not storing the models decreases substantially runtime and f…
jannes-m Apr 17, 2022
5f691a8
use only score output
jannes-m Apr 18, 2022
92c3492
Merge branch 'rewrite_cv' into rewrite_eco
jannes-m Apr 18, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion 01-introduction.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -116,7 +116,7 @@ With a wide range of packages, R also supports advanced geospatial statistics\in
\index{R!language}
New integrated development environments (IDEs\index{IDE}) such as RStudio\index{RStudio} have made R more user-friendly for many, easing map making with a panel dedicated to interactive visualization.

At its core, R is an object-oriented, [functional programming language](http://adv-r.had.co.nz/Functional-programming.html) [@wickham_advanced_2019], and was specifically designed as an interactive interface to other software [@chambers_extending_2016].
At its core, R is an object-oriented, [functional programming language](https://adv-r.hadley.nz/fp.html) [@wickham_advanced_2019], and was specifically designed as an interactive interface to other software [@chambers_extending_2016].
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

The latter also includes many 'bridges' to a treasure trove of GIS\index{GIS} software, 'geolibraries' and functions (see Chapter \@ref(gis)).
It is thus ideal for quickly creating 'geo-tools', without needing to master lower level languages (compared to R) such as C\index{C}, FORTRAN\index{FORTRAN} or Java\index{Java} (see Section \@ref(software-for-geocomputation)).
\index{R}
Expand Down
621 changes: 286 additions & 335 deletions 12-spatial-cv.Rmd

Large diffs are not rendered by default.

447 changes: 228 additions & 219 deletions 15-eco.Rmd

Large diffs are not rendered by default.

230 changes: 230 additions & 0 deletions _12-ex.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,230 @@
The solutions assume the following packages are attached (other packages will be attached when needed):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍


```{r 12-ex-e0, message=FALSE, warning=FALSE}
library(dplyr)
# library(kernlab)
library(mlr3)
library(mlr3learners)
library(mlr3extralearners)
library(mlr3spatiotempcv)
library(mlr3tuning)
library(qgisprocess)
library(raster)
# library(rlang)
library(sf)
library(tmap)
```

E1. Compute the following terrain attributes from the `elev` dataset loaded with `terra::rast(system.file("raster/ta.tif", package = "spDataLarge"))$elev` with the help of R-GIS bridges (see this [Chapter](https://geocompr.robinlovelace.net/gis.html#gis)):
- Slope
- Plan curvature
- Profile curvature
- Catchment area

```{r, eval=FALSE}
# attach data
dem = terra::rast(system.file("raster/ta.tif", package = "spDataLarge"))$elev

algs = qgisprocess::qgis_algorithms()
dplyr::filter(algs, grepl("curvature", algorithm))
alg = "saga:slopeaspectcurvature"
qgisprocess::qgis_show_help(alg)
qgisprocess::qgis_arguments(alg)
# terrain attributes (ta)
out_nms = paste0(tempdir(), "/", c("slope", "cplan", "cprof"),
".sdat")
args = rlang::set_names(out_nms, c("SLOPE", "C_PLAN", "C_PROF"))
out = qgis_run_algorithm(alg, ELEVATION = dem, METHOD = 6,
UNIT_SLOPE = "[1] degree",
!!!args,
.quiet = TRUE
)
ta = out[names(args)] |> unlist() |> terra::rast()
names(ta) = c("slope", "cplan", "cprof")
# catchment area
dplyr::filter(algs, grepl("[Cc]atchment", algorithm))
alg = "saga:catchmentarea"
qgis_show_help(alg)
qgis_arguments(alg)
carea = qgis_run_algorithm(alg,
ELEVATION = dem,
METHOD = 4,
FLOW = file.path(tempdir(), "carea.sdat"))
# transform carea
carea = terra::rast(carea$FLOW[1])
log10_carea = log10(carea)
names(log10_carea) = "log10_carea"
# add log_carea and dem to the terrain attributes
ta = c(ta, dem, log10_carea)
```

E2. Extract the values from the corresponding output rasters to the `lsl` data frame (`data("lsl", package = "spDataLarge"`) by adding new variables called `slope`, `cplan`, `cprof`, `elev` and `log_carea` (see this [section](https://geocompr.robinlovelace.net/spatial-cv.html#case-landslide) for details).

```{r, eval=FALSE}
# attach terrain attribute raster stack (in case you have skipped the previous
# exercise)
data("lsl", package = "spDataLarge")
ta = terra::rast(system.file("raster/ta.tif", package = "spDataLarge"))
lsl = dplyr::select(lsl, x, y, lslpts)
# extract values to points, i.e., create predictors
lsl[, names(ta)] = terra::extract(ta, lsl[, c("x", "y")]) |>
dplyr::select(-ID)
```

E3. Use the derived terrain attribute rasters in combination with a GLM to make a spatial prediction map similar to that shown in this [Figure](https://geocompr.robinlovelace.net/spatial-cv.html#fig:lsl-susc).
Running `data("study_mask", package = "spDataLarge")` attaches a mask of the study area.

```{r, eval=FALSE}
# attach data (in case you have skipped exercises 1) and 2)
# landslide points with terrain attributes and terrain attribute raster stack
data("lsl", "study_mask", package = "spDataLarge")
ta = terra::rast(system.file("raster/ta.tif", package = "spDataLarge"))

# fit the model
fit = glm(lslpts ~ slope + cplan + cprof + elev + log10_carea,
data = lsl, family = binomial())

# make the prediction
pred = terra::predict(object = ta, model = fit, type = "response")

# make the map
lsl_sf = sf::st_as_sf(lsl, coords = c("x", "y"), crs = 32717)
study_mask = terra::vect(study_mask)
lsl_sf = sf::st_as_sf(lsl, coords = c("x", "y"), crs = 32717)
hs = terra::shade(ta$slope * pi / 180,
terra::terrain(ta$elev, v = "aspect", unit = "radians"))
rect = tmaptools::bb_poly(raster::raster(hs))
bbx = tmaptools::bb(raster::raster(hs), xlim = c(-0.00001, 1),
ylim = c(-0.00001, 1), relative = TRUE)
# white raster to only plot the axis ticks, otherwise gridlines would be visible
tm_shape(hs, bbox = bbx) +
tm_grid(col = "black", n.x = 1, n.y = 1, labels.inside.frame = FALSE,
labels.rot = c(0, 90)) +
tm_raster(palette = "white", legend.show = FALSE) +
# hillshade
tm_shape(terra::mask(hs, study_mask), bbox = bbx) +
tm_raster(palette = gray(0:100 / 100), n = 100,
legend.show = FALSE) +
# prediction raster
tm_shape(terra::mask(pred, study_mask)) +
tm_raster(alpha = 0.5, palette = "Reds", n = 6, legend.show = TRUE,
title = "Susceptibility") +
# rectangle and outer margins
qtm(rect, fill = NULL) +
tm_layout(outer.margins = c(0.04, 0.04, 0.02, 0.02), frame = FALSE,
legend.position = c("left", "bottom"),
legend.title.size = 0.9)
```

E4. Compute a 100-repeated 5-fold non-spatial cross-validation and spatial CV based on the GLM learner and compare the AUROC values from both resampling strategies with the help of boxplots (see this [Figure](https://geocompr.robinlovelace.net/spatial-cv.html#fig:boxplot-cv).
Hint: You need to specify a non-spatial resampling strategy.
Another hint: You might want to Excercises 4 to 6 in one go with the help of `mlr3::benchmark()` and `mlr3::benchmark_grid()` (for more information, please refer to https://mlr3book.mlr-org.com/perf-eval-cmp.html#benchmarking).
When doing so, keep in mind that the computation can take very long, probably several days.
This, of course, depends on your system.
Computation time will be shorter the more RAM and cores you have at your disposal.

```{r, eval=FALSE}
# attach data (in case you have skipped exercises 1) and 2)
data("lsl", package = "spDataLarge") # landslide points with terrain attributes

# create task
task = TaskClassifST$new(
id = "lsl_ecuador",
backend = mlr3::as_data_backend(lsl), target = "lslpts", positive = "TRUE",
extra_args = list(
coordinate_names = c("x", "y"),
coords_as_features = FALSE,
crs = 32717)
)

# construct learners (for all subsequent exercises)
# GLM
lrn_glm = lrn("classif.log_reg", predict_type = "prob")

# SVM
# construct SVM learner (using ksvm function from the kernlab package)
lrn_ksvm = lrn("classif.ksvm", predict_type = "prob", kernel = "rbfdot",
type = "C-svc")
# specify nested resampling and adjust learner accordingly
# five spatially disjoint partitions
tune_level = rsmp("spcv_coords", folds = 5)
# use 50 randomly selected hyperparameters
terminator = trm("evals", n_evals = 50)
tuner = tnr("random_search")
# define the outer limits of the randomly selected hyperparameters
ps = ps(
C = p_dbl(lower = -12, upper = 15, trafo = function(x) 2^x),
sigma = p_dbl(lower = -15, upper = 6, trafo = function(x) 2^x)
)
at_ksvm = AutoTuner$new(
learner = lrn_ksvm,
resampling = tune_level,
measure = msr("classif.auc"),
search_space = ps,
terminator = terminator,
tuner = tuner
)

# QDA
lrn_qda = lrn("classif.qda", predict_type = "prob")

# SVM without tuning hyperparameters
vals = lrn_ksvm$param_set$values
lrn_ksvm_notune = lrn_ksvm$clone()
lrn_ksvm_notune$param_set$values = c(vals, C = 1, sigma = 1)

# define resampling strategies
# specify the reampling method, i.e. spatial CV with 100 repetitions and 5 folds
# -> in each repetition dataset will be splitted into five folds
# method: repeated_spcv_coords -> spatial partioning
rsmp_sp = rsmp("repeated_spcv_coords", folds = 5, repeats = 100)
# method: repeated_cv -> non-spatial partitioning
rsmp_nsp = rsmp("repeated_cv", folds = 5, repeats = 100)

# (spatial) cross-validataion
#****************************
# create your design
grid = benchmark_grid(tasks = task,
learners = list(lrn_glm, at_ksvm, lrn_qda,
lrn_ksvm_notune),
resamplings = list(rsmp_sp, rsmp_nsp))
# execute the cross-validation
library(future)
# execute the outer loop sequentially and parallelize the inner loop
future::plan(list("sequential", "multisession"),
workers = floor(availableCores() / 2))
set.seed(021522)
bmr = benchmark(grid, store_backends = FALSE)
# stop parallelization
future:::ClusterRegistry("stop")
# save your result, e.g. to
# saveRDS(bmr, file = "extdata/12-bmr.rds")

# plot your result
autoplot(bmr, measure = msr("classif.auc"))
```

E5. Model landslide susceptibility using a quadratic discriminant analysis (QDA).
Assess the predictive performance of the QDA.
What is the a difference between the spatially cross-validated mean AUROC value of the QDA and the GLM?

```{r, eval=FALSE}
# attach data (in case you have skipped exercise 4)
bmr = readRDS("extdata/12-bmr.rds")

# plot your result
autoplot(bmr, measure = msr("classif.auc"))
# QDA has higher AUROC values on average than GLM which indicates moderately
# non-linear boundaries
```

E6. Run the SVM without tuning the hyperparameters.
Use the `rbfdot` kernel with $\sigma$ = 1 and *C* = 1.
Leaving the hyperparameters unspecified in **kernlab**'s `ksvm()` would otherwise initialize an automatic non-spatial hyperparameter tuning.

```{r, eval=FALSE}
# attach data (in case you have skipped exercise 4)
bmr = readRDS("extdata/12-bmr.rds")
# plot your result
autoplot(bmr, measure = msr("classif.auc"))
```
Loading