Docker image for running R code in OpenSAFELY, both locally and in production.
- docker
- docker-compose
- just
And the tests additionally require
- curl
- python3
just build VERSION
where VERSION
is either v1 or v2.
Under the hood, this builds VERSION/Dockerfile
using docker-compose and buildkit.
In v1, we currently build a lot of packages, so an initial build on a fresh checkout can take a long time (e.g. an hour). However, to alleviate this, the v1/Dockerfile is carefully designed to use local buildkit cache, so subequent rebuilds should be very fast.
In v2, where possible we install binary R packages for Linux from the Posit Public Package Manager.
- Enough bandwidth to comfortably push potentionally gigabytes worth of Docker layers.
- Several hours worth of CPU time to re-compile all the packages (if this is the first time you've done this and don't have them cached locally).
- Push access to ghcr.io.
If you don't have all these things then please don't start.
Before adding a package, check with an OpenSAFELY team member with R experience to approve the package.
To add a package, by default it will be installed from CRAN.
just add-package-v1 PACKAGE
If you need to install a package from another CRAN-like repository, specify its URL as the REPOS argument.
just add-package-v1 PACKAGE REPOS
If you need to install a package from another CRAN-like repository, specify its URL as the REPOS argument.
just add-package PACKAGE REPOS
This will attempt to install and build the package and its dependencies, and
update the v1/renv.lock
. It will then rebuild the R image with the new lock file
and test it.
Note that the first time you do this it will need to compile every included R package (because you won't have the R package builds cached locally). This can take several hours. (When we solve the caching problem here we'll be able to do this all in CI.)
Add the package to v2/packages.csv. In this file, the first column is the package name, and the second column is for the repository. If the package is on CRAN simply leave the second column blank after the comma. If the package is not on CRAN please add it to the https://opensafely-core.r-universe.dev by adding it to the packages.json file in https://github.com/opensafely-core/opensafely-core.r-universe.dev, then enter the relevant Linux binary package URL, likely https://opensafely-core.r-universe.dev/bin/linux/noble/4.4/
.
If the package requires any runtime dependencies add those to v2/dependencies.txt
Then build the v2 image.
just build v2
You will need to configure authentication to GitHub's container registry first. See GitHub's documentation.
When you have authentication configured, run:
just publish VERSION
Commit and push the small resulting change (should only be a few extra
lines in VERSION/packages.csv
, VERSION/packages.md
, and VERSION/renv.lock
) to a branch, then get the changes
merged via pull request.
The review is a trivial exercise because the Docker image has already been pushed to GitHub.
The updated image will need pulling into production. This is covered
separately in the tech team manual. If you don't have access, ask in
#tech
.
If the package requires any system build dependencies (e.g. -dev packages with
headers), they should be added to VERSION/build-dependencies.txt
. If it requires
runtime dependencies, they should be added to VERSION/dependencies.txt
. Packages
don't advertise their system dependencies, so you may need to figure them out
by trying to add the package and reading any error output on failure.
If the package still fails to build, you may be able to install an older version.
Find a previous version at https://cran.r-project.org/src/contrib/Archive/{PACKAGE}/
, and attempt to install it specifically with
just add-package v1 PACKAGE@VERSION
The rstudio image is based on the r image including rstudio-server. To build run
just build-rstudio VERSION
To test that rstudio-server appears at http://localhost:8787
run
just test-rstudio VERSION
And then push the new rstudio image to the GitHub container registry with
just publish-rstudio VERSION
In v2, we choose a date from which to install the packages from CRAN, we strongly recommend that the version of R in the image was the release version of R on this date. R release dates can be found on the R wikipedia page.
In v2, when installing packages we use a Posit Public Package Manager (PPPM) snapshot repository on the chosen CRAN_DATE
.
We use a fixed date because CRAN follows a rolling release model. As such we know that on a particular date CRAN has tested these package versions with the release version of R. Hence this is an extremely stable approach to choosing a set of package versions. And we can add additional packages at their versions on this date reliably (and without updating dependency packages already included in the image).
The CRAN apt repository for R is available here (note you may need to amend the Ubuntu codename in the URL if using a newer base image), find the package number you require and edit the number in v2/dependencies.txt and v2/build-dependencies.txt.
Then amend the CRAN_DATE
and REPOS
arguments in v2/env.
To update run
just build v2
To test the updated image run
just test v2
Choose a version of R.
Choose a CRAN date when that version of R.
Essentially we follow a very similar approach to the versioned stack of the Rocker project. They list their R versions and CRAN dates on their wiki.
We recommend not choosing a date within the first week of a new version of R being released, because there may be alot of packages updated on CRAN during this time.
You then need to check that a PPPM snapshot repository exists for your chosen date. Navigate to https://p3m.dev/client/#/repos/cran/setup and inspect your chosen date. Set this as the REPOS
argument in v2/env.
If you choose a version of R that is not the current version of R we recommend following the rocker approach and choosing the CRAN date as the day before the next version of R was released. For example, if choosing R 4.4.1, R 4.4.2 was released on 2024-10-31 and so we would choose 2024-10-30 as the CRAN date.
You can find out when the next release of R is scheduled for on the R developer page.
We set the HTTPUserAgent
in the appropriate places so that we obtain binary R packages for Linux from the PPPM. There is additional information about this on the PPPM website.