-
Notifications
You must be signed in to change notification settings - Fork 38
Home
Part of the Reproducible R Toolkit:
-
checkpoint
is a package to help do reproducible work in R. -
checkpoint-server
is the server-side component. For information about checkpoint-server, refer to https://github.com/RevolutionAnalytics/checkpoint-server
Create a folder that you want your project in, OR go to a folder you already have a project in. Let's say the folder is called ~/myproject
. Then run:
checkpoint(snapshotDate = "2015-01-01")
Every time you run the checkpoint()
function in a project, the entire project gets scanned again for any R scripts and required packages.
To ensure all the package dependencies are installed, simply run checkpoint()
again().
checkpoint(snapshotDate = "2014-09-17")
Initialize a checkpoint
project by adding this code to the top of your script.
library(checkpoint)
checkpoint(snapshotDate = "2015-01-01") # Use desired snapshot date
When you rund this code, the checkpoint()
function scans through your entire project and looks for libary()
and require()
statements.
For example, if you have this script in your project, checkpoint()
will recognize that ggplot2
should be installed
library(ggplot2)
ggplot(mtcars, aes(mpg, cyl)) +
geom_point()
checkpoint
stores packages in your home folder, i.e. ~/.checkpoint
.
Every time you run checkpoint()
, your files are scanned and all packages used are identified. These packages are then installed inside the checkpoint home folder. For example, if you have a checkpoint snapshot date of 2015-01-01
, the function creates a new folder ~/.checkpoint/2015-01-15
.
Every time you run checkpoint()
your project gets scanned for all R files, i.e. files with extensions .R
and .Rnw
. Specifically, we parse these script files for occurrences of library(...)
and require(...)
calls.
In addition, if you have knitr
installed, we also scan rmarkdown
files, i.e. files with extension .Rmd
, .Rpres
and .Rhtml
. To scan these files, we first tangle the files then scan for library()
and require()
calls.
Both checkpoint
and packrat
install packages required for a project to a local archive as they existed at a specified point in time. This allows specific package versions to be maintained over time and different users.
However, the packages differ fundamentally in how they go about their business.
checkpoint
uses MRAN snapshots, daily snapshots of CRAN, to install packages. Since Revolution Analytics built the server-side MRAN solution, it means the workload on the user is very low. Simply include checkpoint(...)
at the top of your script, and the checkpoint
function automatically downloads all required packages. To share your script, simply publish or email your work.
- Checkpoint is simple
- Reproducibility from one script
- Simple for recipients to reproduce results
- Only allows use of CRAN packages versions that have been tested together
- Relies on availability of MRAN
In contrast, packrat
requires you to manage and publish all your packages. Thus sharing a packrat
project requires you to copy and upload all the required packages to a public location, e.g. github.
- Packrat is flexible and powerful
- Supports non-CRAN packages (e.g. github)
- Allows mix-and-matching package versions
- Requires shipping all package source
- Requires recipients to build packages from source
The reproducible R toolkit (RRT) consists of checkpoint
and the server-side checkpoint-server
that manages the CRAN snapshots.
MRAN is the implementation of checkpoint-server
.
See https://mran.microsoft.com/documents/rro/reproducibility#reproducibility for more information.
Use the following code at the top of your script:
library(checkpoint)
checkpoint("2015-01-01")
Since the identified packages are stored in ~/.checkpoint
, different projects can share packages from the same date.
Yes, but note that that this will have the effect of downloading multiple complete sets of packages into your ~/checkpoint
home folder - one set for each snapshot date.
No, the checkpoint()
function does this automatically. This is a big benefit when reading and sharing scripts.
You can manually delete any snapshot folder in your ~/.checkpoint
home folder. This won’t harm any existing projects — the required package versions will simply be redownloaded next time the script is run.
Simply put your script in a shared location, e.g. github or gist, or email to your collaborators.
checkpoint()
only supports MRAN snapshots. This has the advantage of ensuring your packages have been tested to work together (by the daily CRAN build process).
If you wish to manually add specific packages, then take a look at packrat
that allows you to manually manage packages.
To ensure you have a specific version of a package, you have to find a snapshot date that contains this version of the package. Alternatively, use packrat
instead of checkpoint
.