This module contains the final practical part for the Certificate of Advanced Studies in Advanced Statistical Data Science (CAS ASDS) at the University of Berne for the class of 2024.
The practical module has been packaged as an R package, so everything (data, scripts, reports, …) should be contained in a single bundle, and analyses should be reproducible.
This setup follows suggestions from (Marwick, Boettiger, and Mullen 2018b, 2018a), (Flight 2014), and (Wickham and Bryan 2023) (which provided more or less the instructions and toolchain recommendations based on which this package has been created).
There are other opinions and tools (e.g. (Flight 2021) and (Landau 2024, 2021)) for a lighter-weight reproducible research approach, which I might explore in the future.
For more inspiration and available tools, see (Blischak et al. 2024).
You can install the development version of asds2024.nils.practical
from GitHub with:
# install.packages("devtools") # <- for `install_github` to be available uncomment this and run it (unless you've already installed it)
# Notes:
# - you probably want to install the suggested dependencies as well, since this package only uses suggested dependencies
# - when `install_github`-ing, you need to explicitly specify that you want the vignettes built as well
devtools::install_github(
"nils-s/cas-asds-practical",
dependencies = c("Depends", "Imports", "LinkingTo", "Suggests"),
build_vignettes = TRUE)
Since not all documents are provided as vignettes, you probably want to clone the package sources into a local directory as well:
git clone https://github.com/nils-s/cas-asds-practical.git
From there, you can more directly explore the raw data, and read documents that are not packaged as vignettes.
Assuming the devtools
package is installed (and install_github
is
available), this package by itself should not cause problems (simply
because it contains very little stuff that could cause problems).
However, it depends on a bunch of dependencies, which will be installed
when installing this package’s suggested dependencies as shown in the
code snippet above.
The sf
package has a few dependencies of its own (not all of which are
R packages). The first thing to try (after studying the error messages,
of course) is to make sure all prerequisites for sf
are fulfilled
(e.g. the GEOS, GDAL, and
PROJ libraries).
On a Fedora machine, the following should get you started:
sudo dnf install gdal gdal-devel udunits2-devel proj proj-devel geos geos-devel
See the sf
documentation for more
information.
When installing packages from source (as is common on Linux),
compilation errors may occur due to aggressive compiler flag settings
used in conjunction with C or C++ sources and Rcpp
. In case you see
errors like
...
/usr/local/lib/R/site-library/Rcpp/include/Rcpp/iostream/Rstreambuf.h:53:20: warning: field precision specifier ‘.*’ expects argument of type ‘int’, but argument 2 has type ‘std::streamsize’ {aka ‘long int’} [-Wformat=]
53 | Rprintf("%.*s", num, s);
| ~~^~ ~~~
| | |
| int std::streamsize {aka long int}
...
.../include/Rcpp/print.h:30:19: error: format not a string literal and no format arguments [-Werror=format-security]
...
ERROR: compilation failed for package ...
...
you should probably open an issue in the Github/Gitlab/whatever repo of the package that caused the error.
You should absolutely not go into $(R RHOME)/etc/Makeconf
and change
the compiler flags, like for example removing -Werror=format-security
from the CXX14FLAGS
or similar ;)
library(asds2024.nils.practical)
vignette("get-started", package = "asds2024.nils.practical")
Blischak, John, Alison Hill, Ben Marwick, Daniel Sjoberg, and Will Landau. 2024. “CRAN Task View: Reproducible Research.” February 20, 2024. https://cran.r-project.org/view=ReproducibleResearch.
Flight, Robert M. 2014. “Analyses as Packages.” July 28, 2014. https://rmflight.github.io/posts/2014-07-28-analyses-as-packages.
———. 2021. “Packages Don’t Work Well for Analyses in Practice.” March 2, 2021. https://rmflight.github.io/posts/2021-03-02-packages-dont-work-well-for-analyses-in-practice.
Landau, William Michael. 2021. “The targets R Package: A Dynamic Make-Like Function-Oriented Pipeline Toolkit for Reproducibility and High-Performance Computing.” Journal of Open Source Software 6 (57): 2959. https://doi.org/10.21105/joss.02959.
———. 2024. targets: Dynamic Function-Oriented Make-Like Declarative Pipelines. https://docs.ropensci.org/targets/.
Marwick, Ben, Carl Boettiger, and Lincoln Mullen. 2018a. “Packaging Data Analytical Work Reproducibly Using R (and Friends).” The American Statistician 72 (1): 80–88. https://doi.org/10.1080/00031305.2017.1375986.
———. 2018b. “Packaging Data Analytical Work Reproducibly Using R (and Friends).” PeerJ Preprints 6 (March): e3192v2. https://doi.org/10.7287/peerj.preprints.3192v2.
Wickham, Hadley, and Jennifer Bryan. 2023. R Packages. 2. ed. O’Reilly. https://r-pkgs.org.