Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Version 2 of the r image #167

Draft
wants to merge 75 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
75 commits
Select commit Hold shift + click to select a range
2f0068d
Create v2 of r image
remlapmot Nov 20, 2024
3ec01a2
Add build deps as backup
remlapmot Dec 4, 2024
3cabaf3
Add as backup
remlapmot Dec 4, 2024
3495284
rsvg not installed
remlapmot Dec 4, 2024
b369d47
Add to build-dependencies.txt
remlapmot Dec 4, 2024
3ca9d65
Bump CRAN_DATE
remlapmot Dec 5, 2024
ecef702
just build v2
remlapmot Dec 5, 2024
b925305
Bump CRAN_DATE
remlapmot Dec 6, 2024
e351e45
just build v2
remlapmot Dec 6, 2024
507c68d
Add report of arrow package capabilities
remlapmot Dec 6, 2024
279a7ff
Add comment
remlapmot Dec 6, 2024
dba578a
Bump CRAN_DATE
remlapmot Dec 9, 2024
aa03edf
just build v2
remlapmot Dec 9, 2024
3055ba0
Bump CRAN_DATE
remlapmot Dec 10, 2024
a303646
just build v2
remlapmot Dec 10, 2024
d7906fd
Add test loading the base packages
remlapmot Dec 10, 2024
a72ffea
Update test.sh
remlapmot Dec 10, 2024
860eaea
Bump CRAN_DATE
remlapmot Dec 10, 2024
1015ed9
just build v2
remlapmot Dec 10, 2024
713981d
Amend test filename
remlapmot Dec 10, 2024
8b23864
Update .gitignore
remlapmot Dec 10, 2024
c079255
Bump CRAN_DATE
remlapmot Dec 11, 2024
c43dfaa
just build v2
remlapmot Dec 11, 2024
b978f18
Using ${} expansion in env file
bloodearnest Dec 11, 2024
f8d82fa
Use opensafely-core version of dd4d
remlapmot Dec 11, 2024
775c229
just build v2
remlapmot Dec 11, 2024
1ad1ffc
Update v2 dockerfile to latest docker syntax and practice
bloodearnest Dec 11, 2024
be3aac4
Amend as to AS
remlapmot Dec 12, 2024
09b9bed
Delete build-dependencies.txt
remlapmot Dec 12, 2024
e0a4237
Remove renv, just use pak
remlapmot Dec 12, 2024
bb18436
Just use pak
remlapmot Dec 12, 2024
93b1909
Don't create pkg.lock
remlapmot Dec 12, 2024
172c11d
Create test-loading-packages.R
remlapmot Dec 12, 2024
6c23392
Add message
remlapmot Dec 12, 2024
f366345
Delete renv.lock
remlapmot Dec 12, 2024
8b32937
Only update renv.lock for v1
remlapmot Dec 12, 2024
059ee06
Use test-loading-packages.R
remlapmot Dec 12, 2024
1239a15
Start v1/v2
remlapmot Dec 12, 2024
3e2ab80
Create pkg.lock
remlapmot Dec 12, 2024
ed414ff
Different lock files for v1 and v2
remlapmot Dec 12, 2024
e18f844
Ignore pkg.lock.bak
remlapmot Dec 12, 2024
eff27bc
Fix path
remlapmot Dec 12, 2024
299bf49
Create pkg.lock
remlapmot Dec 12, 2024
c291f64
Render from pkg.lock
remlapmot Dec 12, 2024
2fdcb14
Include pak in pkg.lock
remlapmot Dec 12, 2024
f3571d6
Update packages.md
remlapmot Dec 12, 2024
7fe5deb
Update pkg.lock
remlapmot Dec 12, 2024
b504e8a
Fix path to non-base packages
remlapmot Dec 12, 2024
9c43353
Bump CRAN_DATE
remlapmot Dec 12, 2024
127385a
just build v2
remlapmot Dec 12, 2024
bc0a37e
Install packages from pak lockfile
remlapmot Dec 12, 2024
b6cbb3c
Simplify Dockerfile
remlapmot Dec 12, 2024
cf2442f
No need for base-r layer
remlapmot Dec 12, 2024
3eb8bf1
Move ENV below comment
remlapmot Dec 12, 2024
3203b4c
Add some more pak caches
remlapmot Dec 12, 2024
cb4c888
Clean up pak caches for good measure
remlapmot Dec 12, 2024
12453e2
Use an R script for base package loading
remlapmot Dec 13, 2024
8414d06
Update Dockerfile
remlapmot Dec 13, 2024
cf59d35
Strip dependencies
bloodearnest Dec 13, 2024
22ef4d9
Consolidate pak caches into one
bloodearnest Dec 13, 2024
cc4a288
Remove pak_cleanup() call
bloodearnest Dec 13, 2024
6f34ef3
Apply the earlier dockerfile sytax improvements to rstudio
bloodearnest Dec 13, 2024
719a157
Amend dependency comments
remlapmot Dec 13, 2024
4bbac0f
Add space
remlapmot Dec 15, 2024
cb87e2a
Bump CRAN_DATE
remlapmot Dec 15, 2024
d12a262
just build v2
remlapmot Dec 15, 2024
092334e
Bump CRAN_DATE
remlapmot Dec 15, 2024
5d8ac10
just build v2
remlapmot Dec 15, 2024
2cde91d
Bump CRAN_DATE
remlapmot Dec 16, 2024
834abb1
just build v2
remlapmot Dec 16, 2024
8f69ee3
Bump CRAN_DATE
remlapmot Dec 17, 2024
1ef9ad0
just build v2
remlapmot Dec 17, 2024
d2807a4
Update RStudio Server
remlapmot Dec 17, 2024
2406e83
Delete &&
remlapmot Dec 18, 2024
c0457c2
Delete &&
remlapmot Dec 18, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@
renv.lock.bak
renv/
.tests.R
.tests-base.R
pkg.lock.bak

# rstudio-server directories
*.config
Expand Down
104 changes: 89 additions & 15 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,19 +8,28 @@ Docker image for running R code in OpenSAFELY, both locally and in production.
* docker-compose
* [just](https://github.com/casey/just)

And the tests additionally require

* curl
* python3

## Building

```sh
just build
just build VERSION
```

Under the hood, this builds `./Dockerfile` using docker-compose and buildkit.
where `VERSION` is either v1 or v2.

Under the hood, this builds `VERSION/Dockerfile` using docker-compose and buildkit.

We currently build a lot of packages, so an initial build on a fresh checkout
In v1, we currently build a lot of packages, so an initial build on a fresh checkout
can take a long time (e.g. an hour). However, to alleviate this, the
Dockerfile is carefully designed to use local buildkit cache, so subequent
v1/Dockerfile is carefully designed to use local buildkit cache, so subequent
rebuilds should be very fast.

In v2, where possible we install binary R packages for Linux from the Posit Public Package Manager.

## Adding new packages

:warning: To do this you will need:
Expand All @@ -41,10 +50,18 @@ experience to approve the package.

### Install the package within Docker

#### Under v1

To add a package, by default it will be installed from CRAN.

```sh
just add-package PACKAGE
just add-package-v1 PACKAGE
```

If you need to install a package from another CRAN-like repository, specify its URL as the REPOS argument.

```sh
just add-package-v1 PACKAGE REPOS
```

If you need to install a package from another CRAN-like repository, specify its URL as the REPOS argument.
Expand All @@ -54,14 +71,26 @@ just add-package PACKAGE REPOS
```

This will attempt to install and build the package and its dependencies, and
update the `renv.lock`. It will then rebuild the R image with the new lock file
update the `v1/renv.lock`. It will then rebuild the R image with the new lock file
and test it.

Note that the first time you do this it will need to compile every
included R package (because you won't have the R package builds cached
locally). This can take **several hours**. (When we solve the caching
problem here we'll be able to do this all in CI.)

#### Under v2

Add the package to _v2/packages.csv_. In this file, the first column is the package name, and the second column is for the repository. If the package is on CRAN simply leave the second column blank after the comma. If the package is not on CRAN please add it to the <https://opensafely-core.r-universe.dev> by adding it to the _packages.json_ file in <https://github.com/opensafely-core/opensafely-core.r-universe.dev>, then enter the relevant Linux binary package URL, likely `https://opensafely-core.r-universe.dev/bin/linux/noble/4.4/`.

If the package requires any runtime dependencies add those to _v2/dependencies.txt_

Then build the v2 image.

```sh
just build v2
```

### Push the new Docker image to Github Container Registry

You will need to configure authentication to GitHub's container registry first.
Expand All @@ -70,13 +99,13 @@ See [GitHub's documentation](https://docs.github.com/en/packages/working-with-a-
When you have authentication configured, run:

```sh
just publish
just publish VERSION
```

### Commit changes to this repository

Commit and push the small resulting change (should only be a few extra
lines in `packages.csv` and `renv.lock`) to a branch, then get the changes
lines in `VERSION/packages.csv`, `VERSION/packages.md`, and `VERSION/renv.lock`) to a branch, then get the changes
merged via pull request.

The review is a trivial exercise because the Docker image has already been
Expand All @@ -93,37 +122,82 @@ separately in the tech team manual. If you don't have access, ask in
#### System dependencies

If the package requires any system build dependencies (e.g. -dev packages with
headers), they should be added to `build-dependencies.txt`. If it requires
runtime dependencies, they should be added to `dependencies.txt`. Packages
headers), they should be added to `VERSION/build-dependencies.txt`. If it requires
runtime dependencies, they should be added to `VERSION/dependencies.txt`. Packages
don't advertise their system dependencies, so you may need to figure them out
by trying to add the package and reading any error output on failure.

#### Installing an older version
#### Installing an older version in the v1 image only

If the package still fails to build, you may be able to install an older version.

Find a previous version at `https://cran.r-project.org/src/contrib/Archive/{PACKAGE}/`, and attempt to install it specifically with

```sh
just add-package PACKAGE@VERSION
just add-package v1 PACKAGE@VERSION
```

## Building, testing, and publishing the rstudio image

The rstudio image is based on the r image including rstudio-server. To build run

```sh
just build-rstudio
just build-rstudio VERSION
```

To test that rstudio-server appears at `http://localhost:8787` run

```sh
just test-rstudio
just test-rstudio VERSION
```

And then push the new rstudio image to the GitHub container registry with

```sh
just publish-rstudio
just publish-rstudio VERSION
```

## How to update the version of R and the packages

In v2, we choose a date from which to install the packages from CRAN, we strongly recommend that the version of R in the image was the release version of R on this date. R release dates can be found on the [R wikipedia page](https://en.wikipedia.org/wiki/R_(programming_language)#Version_names).

In v2, when installing packages we use a Posit Public Package Manager (PPPM) snapshot repository on the chosen `CRAN_DATE`.

We use a fixed date because CRAN follows a rolling release model.
As such we know that on a particular date CRAN has tested these package versions with the release version of R.
Hence this is an extremely stable approach to choosing a set of package versions.
And we can add additional packages at their versions on this date reliably (and without updating dependency packages already included in the image).

The CRAN apt repository for R is available [here](https://cran.r-project.org/bin/linux/ubuntu/noble-cran40/) (note you may need to amend the Ubuntu codename in the URL if using a newer base image), find the package number you require and edit the number in _v2/dependencies.txt_ and _v2/build-dependencies.txt_.

Then amend the `CRAN_DATE` and `REPOS` arguments in _v2/env_.

To update run

```sh
just build v2
```

To test the updated image run

```sh
just test v2
```

### How to choose a version of R and CRAN date

Choose a version of R.

Choose a CRAN date when that version of R.

Essentially we follow a very similar approach to the versioned stack of the Rocker project. They list their R versions and CRAN dates on their [wiki](https://github.com/rocker-org/rocker-versioned2/wiki/Versions).

We recommend not choosing a date within the first week of a new version of R being released, because there may be alot of packages updated on CRAN during this time.

You then need to check that a PPPM snapshot repository exists for your chosen date. Navigate to <https://p3m.dev/client/#/repos/cran/setup> and inspect your chosen date. Set this as the `REPOS` argument in _v2/env_.

If you choose a version of R that is not the current version of R we recommend following the rocker approach and choosing the CRAN date as the day before the next version of R was released. For example, if choosing R 4.4.1, R 4.4.2 was released on 2024-10-31 and so we would choose 2024-10-30 as the CRAN date.

You can find out when the next release of R is scheduled for on the [R developer page](https://developer.r-project.org/).

We set the `HTTPUserAgent` in the appropriate places so that we obtain binary R packages for Linux from the PPPM. There is additional information about this on the [PPPM website](https://p3m.dev/__docs__/admin/serving-binaries/#binary-user-agents).
13 changes: 9 additions & 4 deletions docker-compose.yaml
Original file line number Diff line number Diff line change
@@ -1,20 +1,25 @@
services:
# used to build the production image
r:
image: r
image: r:${MAJOR_VERSION}
build:
context: .
dockerfile: ${MAJOR_VERSION}/Dockerfile
target: r
cache_from: # should speed up the build in CI, where we have a cold cache
- ghcr.io/opensafely-core/base-docker
- ghcr.io/opensafely-core/r
- ghcr.io/opensafely-core/base-docker:${BASE}
- ghcr.io/opensafely-core/r:${MAJOR_VERSION}
args:
# this makes the image work for later cache_from: usage
- BUILDKIT_INLINE_CACHE=1
# env vars supplied by make/just
- BUILD_DATE
- REVISION
- VERSION
- BASE
- MAJOR_VERSION
- CRAN_DATE
- REPOS
init: true
platform: linux/amd64
add-package:
Expand All @@ -28,7 +33,7 @@ services:
platform: linux/amd64
rstudio:
extends: r
image: rstudio
image: rstudio:${MAJOR_VERSION}
build:
target: rstudio
args:
Expand Down
73 changes: 49 additions & 24 deletions justfile
Original file line number Diff line number Diff line change
Expand Up @@ -5,40 +5,55 @@ export DOCKER_BUILDKIT := "1"
export COMPOSE_DOCKER_CLI_BUILD := "1"

# build the R image locally
build:
build version:
#!/usr/bin/env bash
set -euo pipefail

source {{ version }}/env

# set build args for prod builds
export BUILD_DATE=$(date -u +'%y-%m-%dT%H:%M:%SZ')
export GITREF=$(git rev-parse --short HEAD)

# build the thing
docker-compose build --pull r
docker-compose --env-file {{ version }}/env build --pull r

if [ "{{ version }}" = "v1" ]; then
# update renv.lock
cp ${MAJOR_VERSION}/renv.lock ${MAJOR_VERSION}/renv.lock.bak
# cannot use docker-compose run as it mangles the output
docker run --platform linux/amd64 --rm r:{{ version }} cat /renv/renv.lock > ${MAJOR_VERSION}/renv.lock
elif [ "{{ version }}" = "v2" ]; then
# update pkg.lock
cp ${MAJOR_VERSION}/pkg.lock ${MAJOR_VERSION}/pkg.lock.bak
# cannot use docker-compose run as it mangles the output
docker run --platform linux/amd64 --rm r:{{ version }} cat /pkg.lock > ${MAJOR_VERSION}/pkg.lock
fi

# render the packages.md file
{{ just_executable() }} render {{ version }}

# render the version/packages.md file
render version:
docker run --platform linux/amd64 --env-file {{ version }}/env --entrypoint bash --rm -v "/$PWD:/out" -v "$PWD/scripts:/out/scripts" r:{{ version }} "/out/scripts/render.sh"

# build and add a package and its dependencies to the image
add-package package repos="NULL":
bash ./add-package.sh {{ package }} {{ repos }}
add-package-v1 package repos="NULL":
bash v1/scripts/add-package.sh {{ package }} {{ repos }}

# r image containing rstudio-server
build-rstudio:
#!/usr/bin/env bash
set -euo pipefail

# Set RStudio Server .deb filename
export RSTUDIO_BASE_URL=https://download2.rstudio.org/server/focal/amd64/
export RSTUDIO_DEB=rstudio-server-2024.09.0-375-amd64.deb
docker-compose build --pull rstudio
build-rstudio version:
docker-compose --env-file {{ version }}/env build --pull rstudio

# test the locally built image
test image="r": build
bash ./test.sh "{{ image }}"
test version:
#!/usr/bin/env bash
source {{ version }}/env
bash tests/test.sh {{ version }}

# test rstudio-server launches
test-rstudio: _env
bash ./test-rstudio.sh
test-rstudio version: _env
bash tests/test-rstudio.sh {{ version }}

_env:
#!/bin/bash
Expand All @@ -47,14 +62,24 @@ _env:
echo "HOSTPLATFORM=$(docker info -f '{{{{ lower .ClientInfo.Os }}')" >> .env

# lint source code
lint:
lint version:
docker pull hadolint/hadolint
docker run --rm -i hadolint/hadolint < Dockerfile
docker run --rm -i hadolint/hadolint < {{ version }}/Dockerfile

publish:
docker tag r ghcr.io/opensafely-core/r:latest
docker push ghcr.io/opensafely-core/r:latest
publish version:
#!/usr/bin/env bash
docker tag r:{{ version }} ghcr.io/opensafely-core/r:{{ version }}
docker push ghcr.io/opensafely-core/r:{{ version }}
if [ "{{ version }}" = "v1" ]; then
docker tag r:{{ version }} ghcr.io/opensafely-core/r:latest
docker push ghcr.io/opensafely-core/r:latest
fi

publish-rstudio:
docker tag rstudio ghcr.io/opensafely-core/rstudio:latest
docker push ghcr.io/opensafely-core/rstudio:latest
publish-rstudio version:
#!/usr/bin/env bash
docker tag rstudio:{{ version }} ghcr.io/opensafely-core/rstudio:{{ version }}
docker push ghcr.io/opensafely-core/rstudio:{{ version }}
if [ "{{ version }}" = "v1" ]; then
docker tag rstudio:{{ version }} ghcr.io/opensafely-core/rstudio:latest
docker push ghcr.io/opensafely-core/rstudio:latest
fi
2 changes: 0 additions & 2 deletions packages-tmp.csv

This file was deleted.

Loading