The publicWW package implements a spatio-temporal modelling framework that produces probabilistic estimates of SARS-CoV-2 viral concentration in wastewater at high spatio-temporal resolution across England. The modelling framework utilises the publicly available wastewater data collected as part of the Environmental Monitoring for Health Protection (EMHP) wastewater monitoring programme. The functionality of the package includes data management, model fitting, model evaluation and output visualisation. All data files are in the inst/extdata/ directory. Outputs of this modelling framework are visualised via a dynamic and interactive dashboard.
You first need to obtain a personal access token (PAT) by following the instruction via this link: https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/creating-a-personal-access-token.
Enter the token as a string to the object PAT
then run the following code to install the package:
PAT <- " " # enter your PAT token here
devtools::install_github("gqlNU/publicWW", auth_token = PAT)
We use INLA to fit a spatio-temporal model to the wastewater data observed across the 303 sewage treatment works (STWs) in England over the period from 1 June, 2021 to 30 March 2022. The script to fit the model in INLA can be found via
The top of the script has a USER INPUT
section where some of the parameters of the model can be changed as described in the script comments. By default the script includes the following covariates in the model
- IMD_score - Index of Multiple Deprivation for 2019
- old_prop - age structure of the population, defines as percentage of the adults older than 75 in the population (ONS 2019)
- young_prop - age structure of the population, defines as percentage of the young population aging less than 16 years old (ONS 2019)
- bame_proportion - Black, Asian and Minority Ethnic (BAME) proportion in each area (2011 Census)
- population_density - Estimated by ONS 2019.
- f_industrial
include_genomic_covars <- FALSE
).
The resulting model fit will be saved in an RData
file. This file also saves the auxiliary variables needed to run the predictions (such as the covariate list, the setting for regional effect and the SPDE mesh).
The following script takes the model fit from the previous step to predict weekly viral concentration levels at the population-weighted centroids of all the 32844 LSOAs in England:
In predict_lsoas.R
the fitted RData
file is loaded. Due to memory constraints predictions will, by default, be made in batches of iteration for all LSOAs.
The results of the predictions are saved in an outputs
directory. These are two prediction outputs:
- The full iterations of the batch.
- The summary statistics for that batch.
nsims
). The time to run all LSOAs sequentially is about 8 hours and it requires at least 8.2 GB of hard disk space to store the predictions at iteration level.
The LSOA-level weekly concentrations can be aggregated to other adminstrative levels in England, including
- the Lower Tier Local Authority (LTLA) level
- the Clinical commissioning groups (CCGs) level
- the regional level
- the national level
This spatial aggregation is carried out by using the following script inst/scripts/aggregate_LSOApred.R
The links below provide the iteration-level predictions at LTLA, CCG, region and England level.
Download the files above and use the following script to produce the figures in the paper inst/scripts/results4paper_onGit.R
A 10-fold cross validation can be run using the following script:
inst/scripts/run_cross_validation.R
Similarly to the the fitting section, the script is ready to be run with the full dataset (1 June 2021 to 30 March 2022) and using all the covariates, but can be modified. Each fold of out-of-sample sites has been randomly selected without replacement. The list of sites to hold-out per fold is found in inst/extdata/cv_sites_10folds_30Mar2022.csv. It is also possible to create new folds if necessary.
There are a few other scripts that are used to explore genomics variables, aggregate batches of predictions, aggregate predictions to the LTLA level in the directory inst/scripts/.