diff --git a/.Rbuildignore b/.Rbuildignore index 91114bf2..158350c7 100644 --- a/.Rbuildignore +++ b/.Rbuildignore @@ -1,2 +1,16 @@ ^.*\.Rproj$ ^\.Rproj\.user$ +^.*\.sh +^.vscode$ +^LICENSE\.md$ +^\.github$ +^codecov\.yml$ +^README\.Rmd$ +^_pkgdown\.yml$ +^docs$ +^pkgdown$ +^CODE_OF_CONDUCT\.md$ +^doc$ +^Meta$ +^.lintr$ +^.*/tests/docker/.*$ diff --git a/.github/.gitignore b/.github/.gitignore new file mode 100644 index 00000000..2d19fc76 --- /dev/null +++ b/.github/.gitignore @@ -0,0 +1 @@ +*.html diff --git a/.github/CONTRIBUTING.md b/.github/CONTRIBUTING.md new file mode 100644 index 00000000..2e5cdb42 --- /dev/null +++ b/.github/CONTRIBUTING.md @@ -0,0 +1,40 @@ +# Contributing to serofoi + +This outlines how to propose a change to serofoi. + +## Making changes + +If you want to make a change, it's a good idea to first file an issue and make sure someone from the team agrees that it’s needed. +If you’ve found a bug, please file an issue that illustrates the bug with a minimal +[reprex](https://www.tidyverse.org/help/#reprex) (this will also help you write a unit test, if needed). See [bug report template](../.github/ISSUE_TEMPLATE/bug_report.md). If you have a feature request see [feature request](../.github/ISSUE_TEMPLATE/feature_request.md). + +### Pull request process + +See [pull request template](../.github/PULL_REQUEST_TEMPLATE/pull_request_template.md) + +* Fork the package and clone onto your computer. If you haven't done this before, we recommend using `usethis::create_from_github("epiverse-trace/serofoi", fork = TRUE)`. + +* Install all development dependencies with `devtools::install_dev_deps()`, and then make sure the package passes R CMD check by running `devtools::check()`. + If R CMD check doesn't pass cleanly, it's a good idea to ask for help before continuing. +* Create a Git branch for your pull request (PR). We recommend using `usethis::pr_init("brief-description-of-change")`. + +* Make your changes, commit to git, and then create a PR by running `usethis::pr_push()`, and following the prompts in your browser. + The title of your PR should briefly describe the change. + The body of your PR should contain `Fixes #issue-number`. + +* For user-facing changes, add a bullet to the top of `NEWS.md` (i.e. just below the first header). Follow the style described in . + +### Code style + +* New code should follow the tidyverse [style guide](https://style.tidyverse.org). + You can use the [styler](https://CRAN.R-project.org/package=styler) package to apply these styles, but please don't restyle code that has nothing to do with your PR. + +* We use [roxygen2](https://cran.r-project.org/package=roxygen2), with [Markdown syntax](https://cran.r-project.org/web/packages/roxygen2/vignettes/rd-formatting.html), for documentation. + +* We use [testthat](https://cran.r-project.org/package=testthat) for unit tests. + Contributions with test cases included are easier to accept. + +## Code of Conduct + +Please note that the serofoi project is released with a +[Contributor Code of Conduct](https://github.com/epiverse-trace/.github/blob/main/CODE_OF_CONDUCT.md). By contributing to this project you agree to abide by its terms. diff --git a/.github/ISSUE_TEMPLATE/bug_report.md b/.github/ISSUE_TEMPLATE/bug_report.md new file mode 100644 index 00000000..1050d1d9 --- /dev/null +++ b/.github/ISSUE_TEMPLATE/bug_report.md @@ -0,0 +1,26 @@ +--- +name: Bug report +about: Create a report to help us improve +title: '' +labels: '' +assignees: '' + +--- + +Please place an "x" in all the boxes that apply +--------------------------------------------- + + - [ ] I have the most recent version of linelist and R + - [ ] I have found a bug + - [ ] I have a [reproducible example](http://reprex.tidyverse.org/articles/reprex-dos-and-donts.html) + - [ ] I want to request a new feature + +-------- + +Please include a brief description of the problem with a code example: + +```r +# insert reprex here +``` + +--------- diff --git a/.github/ISSUE_TEMPLATE/feature_request.md b/.github/ISSUE_TEMPLATE/feature_request.md new file mode 100644 index 00000000..df81be2b --- /dev/null +++ b/.github/ISSUE_TEMPLATE/feature_request.md @@ -0,0 +1,17 @@ +--- +name: Feature request +about: Suggest an idea for this project +title: '' +labels: '' +assignees: '' + +--- + +**Is your feature request related to a problem? Please describe.** +A clear and concise description of what the problem is. Ex. I'm always frustrated when [...] + +**Describe the solution you'd like** +A clear and concise description of what you want to happen. + +**Additional context** +Add any other context or screenshots about the feature request here. diff --git a/.github/PULL_REQUEST_TEMPLATE/pull_request_template.md b/.github/PULL_REQUEST_TEMPLATE/pull_request_template.md new file mode 100644 index 00000000..45e9f48b --- /dev/null +++ b/.github/PULL_REQUEST_TEMPLATE/pull_request_template.md @@ -0,0 +1,25 @@ +* **Please check if the PR fulfills these requirements** + +- [ ] I have read the CONTRIBUTING guidelines +- [ ] The commit message follows our guidelines +- [ ] Tests for the changes have been added (for bug fixes / features) +- [ ] Docs have been added / updated (for bug fixes / features) + + +* **What kind of change does this PR introduce?** (Bug fix, feature, docs update, ...) + + + +* **What is the current behavior?** (You can also link to an open issue here) + + + +* **What is the new behavior (if this is a feature change)?** + + + +* **Does this PR introduce a breaking change?** (What changes might users need to make in their application due to this PR?) + + + +* **Other information**: diff --git a/.github/workflows/R-CMD-check.yaml b/.github/workflows/R-CMD-check.yaml new file mode 100644 index 00000000..fbfc2e01 --- /dev/null +++ b/.github/workflows/R-CMD-check.yaml @@ -0,0 +1,51 @@ +# Workflow derived from https://github.com/r-lib/actions/tree/v2/examples +# Need help debugging build failures? Start at https://github.com/r-lib/actions#where-to-find-help +on: + push: + branches: [main, master] + pull_request: + branches: [main, master] + +name: R-CMD-check + +jobs: + R-CMD-check: + runs-on: ${{ matrix.config.os }} + + name: ${{ matrix.config.os }} (${{ matrix.config.r }}) + + strategy: + fail-fast: false + matrix: + config: + # - {os: macos-latest, r: 'release'} + # - {os: windows-latest, r: 'release'} + # - {os: ubuntu-latest, r: 'devel', http-user-agent: 'release'} + - {os: ubuntu-latest, r: 'release'} + # - {os: ubuntu-latest, r: 'oldrel-1'} + + env: + GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }} + R_KEEP_PKG_SOURCE: yes + + steps: + - uses: actions/checkout@v3 + + - uses: r-lib/actions/setup-pandoc@v2 + + - uses: r-lib/actions/setup-r@v2 + with: + r-version: ${{ matrix.config.r }} + http-user-agent: ${{ matrix.config.http-user-agent }} + use-public-rspm: true + extra-repositories: https://mc-stan.org/r-packages/ + + - uses: r-lib/actions/setup-r-dependencies@v2 + with: + extra-packages: any::rcmdcheck + needs: check + + - uses: r-lib/actions/check-r-package@v2 + with: + upload-snapshots: true + error-on: '"error"' diff --git a/.github/workflows/pkgdown.yaml b/.github/workflows/pkgdown.yaml new file mode 100644 index 00000000..087f0b05 --- /dev/null +++ b/.github/workflows/pkgdown.yaml @@ -0,0 +1,46 @@ +# Workflow derived from https://github.com/r-lib/actions/tree/v2/examples +# Need help debugging build failures? Start at https://github.com/r-lib/actions#where-to-find-help +on: + push: + branches: [main, master] + pull_request: + branches: [main, master] + release: + types: [published] + workflow_dispatch: + +name: pkgdown + +jobs: + pkgdown: + runs-on: ubuntu-latest + # Only restrict concurrency for non-PR jobs + concurrency: + group: pkgdown-${{ github.event_name != 'pull_request' || github.run_id }} + env: + GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }} + steps: + - uses: actions/checkout@v3 + + - uses: r-lib/actions/setup-pandoc@v2 + + - uses: r-lib/actions/setup-r@v2 + with: + use-public-rspm: true + + - uses: r-lib/actions/setup-r-dependencies@v2 + with: + extra-packages: any::pkgdown, local::. + needs: website + + - name: Build site + run: pkgdown::build_site_github_pages(new_process = FALSE, install = FALSE) + shell: Rscript {0} + + - name: Deploy to GitHub pages 🚀 + if: github.event_name != 'pull_request' + uses: JamesIves/github-pages-deploy-action@v4.4.1 + with: + clean: false + branch: gh-pages + folder: docs diff --git a/.github/workflows/test-coverage.yaml b/.github/workflows/test-coverage.yaml new file mode 100644 index 00000000..b0db8768 --- /dev/null +++ b/.github/workflows/test-coverage.yaml @@ -0,0 +1,32 @@ +# Workflow derived from https://github.com/r-lib/actions/tree/v2/examples +# Need help debugging build failures? Start at https://github.com/r-lib/actions#where-to-find-help +on: + push: + branches: [main, master] + pull_request: + branches: [main, master] + +name: test-coverage + +jobs: + test-coverage: + runs-on: ubuntu-latest + env: + GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }} + + steps: + - uses: actions/checkout@v2 + + - uses: r-lib/actions/setup-r@v2 + with: + use-public-rspm: true + extra-repositories: https://mc-stan.org/r-packages/ + + - uses: r-lib/actions/setup-r-dependencies@v2 + with: + extra-packages: any::covr + needs: coverage + + - name: Test coverage + run: covr::codecov(quiet = FALSE) + shell: Rscript {0} diff --git a/.gitignore b/.gitignore index ba3114cc..aa3ed4e1 100644 --- a/.gitignore +++ b/.gitignore @@ -44,4 +44,14 @@ vignettes/*.pdf .DS_Store # .RDS Stan compilation files -inst/extdata/stanmodels/*.RDS \ No newline at end of file +inst/extdata/stanmodels/*.RDS +inst/extdata/stanmodels/*.rds +inst/doc +docs +/Meta/ +_snaps/ +!tests/testthat/_snaps +tests/testthat/Rplots.pdf +t, +tests/testthat/extdata/plots/actual/*.png +tests/testthat/extdata/dataframes/actual/*.csv \ No newline at end of file diff --git a/.lintr b/.lintr new file mode 100644 index 00000000..a1b0fdc3 --- /dev/null +++ b/.lintr @@ -0,0 +1,4 @@ +linters: with_defaults( + line_length_linter=line_length_linter(120), + object_usage_linter=NULL, + commented_code_linter=NULL) diff --git a/.vscode/settings.json b/.vscode/settings.json new file mode 100644 index 00000000..4950e189 --- /dev/null +++ b/.vscode/settings.json @@ -0,0 +1,3 @@ +{ + "scm.diffDecorationsIgnoreTrimWhitespace": "true" +} \ No newline at end of file diff --git a/.vscode/tasks.json b/.vscode/tasks.json new file mode 100644 index 00000000..09f44efb --- /dev/null +++ b/.vscode/tasks.json @@ -0,0 +1,43 @@ +{ + "version": "2.0.0", + "tasks": [ + { + "type": "R", + "code": [ + "devtools::install_deps(upgrade=TRUE)" + ], + "problemMatcher": [], + "group": { + "kind": "none", + "isDefault": true + }, + "label": "R: upgrade deps" + }, + { + "type": "R", + "code": [ + "devtools::test_active_file(\"${file}\")" + ], + "problemMatcher": [ + "$testthat" + ], + "group": { + "kind": "test", + "isDefault": true + }, + "label": "R: Test Active File" + }, + { + "type": "R", + "code": [ + "devtools::check(manual=FALSE, cran = TRUE, error_on=c('error'))" + ], + "problemMatcher": [], + "group": { + "kind": "test", + "isDefault": true + }, + "label": "R Package: Check Strict" + } + ] +} \ No newline at end of file diff --git a/DESCRIPTION b/DESCRIPTION index b4191615..58fdf6f4 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -1,26 +1,72 @@ Package: serofoi Type: Package -Title: Estimates the Force-of-Infection of a given pathogen from population based sero-prevalence studies. +Title: Estimates the Force-of-Infection of a given pathogen from population based sero-prevalence studies Version: 0.1.0 -Author: Zulma M. Cucunubá, Miguel Gamez and Nicolás Torres -Maintainer: The package maintainer -Description: Provides functions for estimate the Force-of-Infection of a given pathogen from population based sero-prevalence studies on a bayesian framework. +Authors@R: + c( + person( + given = "Zulma M.", + family = "Cucunubá", + role = c("aut", "cre"), + email = "zulma.cucunuba@javeriana.edu.co", + comment = c(ORCID = "https://orcid.org/0000-0002-8165-3198") + ), + person( + given = "Nicolás", + family = "Torres", + role = c("aut") + ), + person( + given = "Ben", + family = "Lambert", + role = c("aut") + ), + person( + given = "Pierre", + family = "Nouvellet", + role = c("aut") + ) + ) +Contributor: Miguel Gámez, Geraldine Gómez, Jaime A. Pavlich-Mariscal +Maintainer: The package maintainer +Description: R package to estimate time-varying Force-of-Infection of a given pathogen from population based sero-prevalence studies using a bayesian framework. License: MIT + file LICENSE Encoding: UTF-8 LazyData: true -RoxygenNote: 7.2.2 +RoxygenNote: 7.2.3 +Depends: + R (>= 3.5.0) Imports: config, - rstan, + rstan (>= 2.21.1), + StanHeaders, tidyverse, reshape2, bayesplot, loo, - gridExtra, + cowplot, Hmisc, dplyr, - gsubfn + gsubfn, + usethis, + testthat (>= 3.0.0), + vdiffr (>= 1.0.0), + devtools, + methods, + Rcpp, + ggplot2, + BH, + RcppEigen, + RcppParallel, + pracma, + purrr Suggests: knitr, rmarkdown VignetteBuilder: knitr +URL: https://trace-lac.github.io/serofoi/ +Additional_repositories: + https://mc-stan.org/r-packages/ +Config/testthat/edition: 3 +Remotes: + tidyverse/purrr diff --git a/LICENSE b/LICENSE new file mode 100644 index 00000000..c1893295 --- /dev/null +++ b/LICENSE @@ -0,0 +1,21 @@ +MIT License + +Copyright (c) 2023 TRACE-LAC + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in all +copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +SOFTWARE. diff --git a/NAMESPACE b/NAMESPACE index 63942dfe..1c975746 100644 --- a/NAMESPACE +++ b/NAMESPACE @@ -1,19 +1,31 @@ # Generated by roxygen2: do not edit by hand -export(extract_summary_model) -export(fit_model) -export(fit_model_log) +export(extract_seroprev_model_summary) +export(fit_seroprev_model) +export(generate_sim_data) +export(get_comparison_table) +export(get_exposure_ages) export(get_exposure_matrix) -export(get_posterior_summary) export(get_prev_expanded) +export(get_sim_counts) export(get_table_rhats) -export(make_yexpo) +export(group_sim_data) export(plot_foi) export(plot_info_table) -export(plot_model) export(plot_rhats) export(plot_seroprev) -export(prepare_data) -export(run_model) +export(plot_seroprev_fitted) +export(plot_seroprev_model) +export(plot_seroprev_models_grid) +export(prepare_bin_data) +export(prepare_seroprev_data) +export(run_seroprev_model) +export(save_or_load_model) +import(Rcpp) import(dplyr) +import(methods) importFrom(dplyr,"%>%") +importFrom(graphics,text) +importFrom(rstan,sampling) +importFrom(stats,quantile) +importFrom(utils,read.table) diff --git a/R/chagas2012.R b/R/chagas2012.R new file mode 100644 index 00000000..e58bc111 --- /dev/null +++ b/R/chagas2012.R @@ -0,0 +1,15 @@ +#' Seroprevalence data on serofoi +#' +#' Data from a serological surveys +#' +#' @docType data +#' +#' @usage chagas2012 +#' +#' @format An object of class \code{"cross"}; see \code{\link[qtl]{read.cross}}. +#' +#' @keywords datasets +#' +#' @examples +#' chagas2012 +"chagas2012" \ No newline at end of file diff --git a/R/chik2015.R b/R/chik2015.R new file mode 100644 index 00000000..e72f7357 --- /dev/null +++ b/R/chik2015.R @@ -0,0 +1,15 @@ +#' Seroprevalence data on serofoi +#' +#' Data from a serological surveys +#' +#' @docType data +#' +#' @usage chik2015 +#' +#' @format An object of class \code{"cross"}; see \code{\link[qtl]{read.cross}}. +#' +#' @keywords datasets +#' +#' @examples +#' chik2015 +"chik2015" \ No newline at end of file diff --git a/R/model_comparison.R b/R/model_comparison.R index 0dcfb81d..0e62231b 100644 --- a/R/model_comparison.R +++ b/R/model_comparison.R @@ -1,18 +1,105 @@ #' Get Table Rhats #' -#' Función que hace la tabla de los rhats #' Function that makes the rhats table #' @param model_object model_object #' @return rhats table +#' @examples +#' \dontrun{ +#' seroprev_data <- prepare_seroprev_data(seroprev_data = serodata, alpha = 0.05) +#' model_object <- run_seroprev_model( +#' seroprev_data = seroprev_data, seroprev_model_name = "constant_foi_bi") +#' get_table_rhats (model_object) +#' } #' @export get_table_rhats <- function(model_object) { - rhats <- bayesplot::rhat(model_object$fit, "foi") if (any(is.nan(rhats))) { - rhats[which(is.nan(rhats))] <- 0} - model_rhats <- data.frame(year = model_object$real_yexpo, rhat = rhats) - model_rhats$rhat[model_rhats$rhat == 0] <- NA # This is because I'm not estimating these foi values + rhats[which(is.nan(rhats))] <- 0 + } + model_rhats <- data.frame(year = model_object$exposure_years, rhat = rhats) + model_rhats$rhat[model_rhats$rhat == 0] <- NA return(model_rhats) } + +#' Get Model Table Comparison +#' Provides a table with statistics for comparison between models and selection +#' @param model_objects_list model_objects to compare +#' @return comparison table +#' @examples +#' \dontrun{ +#' data_test <- prepare_seroprev_data(serodata) +#' model_0 <- run_seroprev_model(seroprev_data = data_test, +#' seroprev_model_name = "constant_foi_bi", +#' n_iters = 1000) +#' +#' model_1 <- run_seroprev_model(seroprev_data = data_test, +#' seroprev_model_name = "continuous_foi_normal_bi", +#' n_iters = 1000) +#' +#' model_2 <- run_seroprev_model(seroprev_data = data_test, +#' seroprev_model_name = "continuous_foi_normal_log", +#' n_iters = 1000) +#' comp_table <- get_comparison_table(model_objects_list = c(m0 = model_0, +#' m1 = model_1, +#' m2 = model_2)) +#' } +#' @export +get_comparison_table <- function(model_objects_list) { + + + dif_m0_m1 <- loo::loo_compare(model_objects_list$m0.loo_fit, + model_objects_list$m1.loo_fit) + + dif_m0_m2 <- loo::loo_compare(model_objects_list$m0.loo_fit, + model_objects_list$m2.loo_fit) + + # Aquí pendiente revisar desde la función summary_model + # No estoy segura que este parámetro venga bien desde allá ni tampoco que esté bien acá + + # model_comp$better <- NA + # model_comp$better[model_comp$difference > 0] <- 'Yes' + # model_comp$better[model_comp$difference <= 0] <-'No' + # model_comp$better[model_comp$model == 'constant_foi_bi'] <- "-" + + + model_objects_list$m0.model_summary$difference <- 0 + model_objects_list$m0.model_summary$diff_se <- 1 + + model_objects_list$m1.model_summary$difference <- dif_m0_m1[1] + model_objects_list$m1.model_summary$diff_se <- dif_m0_m1[2] + + model_objects_list$m2.model_summary$difference <- dif_m0_m2[1] + model_objects_list$m2.model_summary$diff_se <- dif_m0_m2[2] + + model_comp <- rbind(model_objects_list$m0.model_summary, + model_objects_list$m1.model_summary, + model_objects_list$m2.model_summary) + + model_comp$converged[model_comp$elpd == -1.000e+10] <- 'No' # + ds_one <- dplyr::filter(model_comp, converged == 'Yes') + print(paste0('number of converged models = ', NROW(ds_one))) + + # Ordering the best model based on elpd values + elps_order <- rev(sort(ds_one$elpd)) + best <- dplyr::filter(model_comp, elpd %in% elps_order) %>% dplyr::arrange(-.data$elpd)# This is to make sure I keep only three + best_model1 <- as.character(best$model[1]) + best_model2 <- as.character(best$model[2]) + best_model3 <- as.character(best$model[3]) + + model_comp$best_elpd <- NA + model_comp$best_elpd[model_comp$model == best_model1] <- 1 + model_comp$best_elpd[model_comp$model == best_model2] <- 2 + model_comp$best_elpd[model_comp$model == best_model3] <- 3 + model_comp <- model_comp %>% dplyr::arrange(.data$best_elpd) + + # Estimating p-values to check the difference between the models m0 and other models is actually important + model_comp <- model_comp %>% dplyr::mutate(pvalue = 1 - stats::pnorm(difference/diff_se,0,1)) + model_comp$pvalue[is.nan(model_comp$pvalue)] <- 1 + model_comp$pvalue <- model_comp$pvalue * stats::runif(NROW(model_comp), min = 1, max = 1.0001)# I make this just to ensure I get different values + model_comp$pvalue[model_comp$model == 'constant_foi_bi'] <- 0 + model_comp$pvalue <- round(model_comp$pvalue, 6) + + return(model_comp) +} diff --git a/R/modelling.R b/R/modelling.R index cb24d80e..a0c5bea4 100644 --- a/R/modelling.R +++ b/R/modelling.R @@ -1,459 +1,463 @@ -#' Get Exposure Matrix +# TODO For some reason, the examples cannot access the serodata variable +#' Run the specified stan model for the force-of-infection #' -#' Function that generates the exposure matrix -#' @param model_data refers to the model data that has been selected -#' @param yexpo what the make yexpo function returns -#' @return exposure_output +#' @param seroprev_data A data frame containing the data from a seroprevalence survey. +#' This data frame must contain the following columns: +#' \tabular{ll}{ +#' \code{survey} \tab survey Label of the current survey \cr \tab \cr +#' \code{total} \tab Number of samples for each age group\cr \tab \cr +#' \code{counts} \tab Number of positive samples for each age group\cr \tab \cr +#' \code{age_min} \tab age_min \cr \tab \cr +#' \code{age_max} \tab age_max \cr \tab \cr +#' \code{year_init} \tab year_init \cr \tab \cr +#' \code{year_end} \tab year_end \cr \tab \cr +#' \code{tsur} \tab Year in which the survey took place \cr \tab \cr +#' \code{country} \tab The country where the survey took place \cr \tab \cr +#' \code{test} \tab The type of test taken \cr \tab \cr +#' \code{antibody} \tab antibody \cr \tab \cr +#' \code{age_mean_f} \tab Floor value of the average between age_min and age_max \cr \tab \cr +#' \code{sample_size} \tab The size of the sample \cr \tab \cr +#' \code{birth_year} \tab The year in which the individuals of each age group were bornt \cr \tab \cr +#' \code{prev_obs} \tab Observed prevalence \cr \tab \cr +#' \code{prev_obs_lower} \tab Lower limit of the confidence interval for the observed prevalence \cr \tab \cr +#' \code{prev_obs_upper} \tab Upper limit of the confidence interval for the observed prevalence \cr \tab \cr +#' } +#' The last six colums can be added to \code{seroprev_data} by means of the function \code{\link{prepare_seroprev_data}}. +#' @param seroprev_model_name Name of the selected model. Current version provides three options: +#' \describe{ +#' \item{\code{"constant_foi_bi"}}{Runs a constant model} +#' \item{\code{"continuous_foi_normal_bi"}}{Runs a normal model} +#' \item{\code{"continuous_foi_normal_log"}}{Runs a normal logarithmic model} +#' } +#' @param n_iters Number of interations for eah chain including the warmup. \code{iter} in \link[rstan]{sampling}. +#' @param n_thin Positive integer specifying the period for saving samples. \code{thin} in \link[rstan]{sampling}. +#' @param delta Real number between 0 and 1 that represents the target average acceptance probability. +#' Increasing the value of \code{delta} will result in a smaller step size and fewer divergences. +#' For further details refer to the \code{control} parameter in \link[rstan]{sampling} or \href{https://mc-stan.org/rstanarm/reference/adapt_delta.html}{here}. +#' @param m_treed Maximum tree depth for the binary tree used in the NUTS stan sampler. For further details refer to the \code{control} parameter in \link[rstan]{sampling}. +#' @param decades Number of decades covered by the survey data. +#' @return \code{model_object}. An object containing relevant information about the implementation of the model. For further details refer to \link{fit_seroprev_model}. +#' @examples +#' \dontrun{ +#' seroprev_data <- prepare_seroprev_data(serodata) +#' run_seroprev_model (seroprev_data, +#' seroprev_model_name = "constant_foi_bi") +#' } #' @export -get_exposure_matrix <- function(model_data, - yexpo) { - age_class <- model_data$age_mean_f - ly <- length(yexpo) - exposure <- matrix(0,nrow = length(age_class), ncol = ly) - for (k in 1:length(age_class)) exposure[k,(ly - age_class[k] + 1):ly] <- 1 - exposure_output <- exposure - return(exposure_output) - +run_seroprev_model <- function(seroprev_data, + seroprev_model_name = "constant_foi_bi", + n_iters = 1000, + n_thin = 2, + delta = 0.90, + m_treed = 10, + decades = 0) { + survey <- unique(seroprev_data$survey) + if (length(survey) > 1) warning("You have more than 1 surveys or survey codes") + model_object <- fit_seroprev_model(seroprev_data = seroprev_data, + seroprev_model_name = seroprev_model_name, + n_iters = n_iters, + n_thin = n_thin, + delta = delta, + m_treed = m_treed, + decades = decades); print(paste0("serofoi model ", + seroprev_model_name, + " finished running ------")) + print(t(model_object$model_summary)) + return(model_object) } -#' Get Prevalence Expanded +#' Save or load model #' -#' Function that generates the expanded prevalence -#' @param model_data refers to the model data that has been selected -#' @param foi force of infection -#' @return prev_final +#' This function determines whether the corresponding .RDS file of the selected model exists or not. +#' In case the .RDS file exists, it is read and returned; otherwise, the object model is created through the +#' \link[rstan]{stan_model} function, saved as an .RDS file and returned as the output of the function. +#' @param seroprev_model_name Name of the selected model. Current version provides three options: +#' \describe{ +#' \item{\code{"constant_foi_bi"}}{Runs a constant model} +#' \item{\code{"continuous_foi_normal_bi"}}{Runs a normal model} +#' \item{\code{"continuous_foi_normal_log"}}{Runs a normal logarithmic model} +#' } +#' @return \code{model}. The rstan model object corresponding to the selected model. +#' @examples +#' save_or_load_model(seroprev_model_name="constant_foi_bi") #' @export -get_prev_expanded <- function(foi, - model_data) { - - ndata <- data.frame(age = 1:80) - dim_foi <- dim(foi)[2] - if (dim_foi < 80) { - oldest_year <- 80 - dim_foi + 1 - foin <- matrix(NA, nrow = dim(foi)[1], 80) - foin[, oldest_year:80] <- foi - foin[, 1:(oldest_year - 1) ] <- rowMeans(foi[,1:5]) - } else { - foin <- foi} - - foi_expanded <- foin - - age_class <- 1:NCOL(foi_expanded) - ly <- NCOL(foi_expanded) - exposure <- matrix(0, nrow = length(age_class), ncol = ly) - for (k in 1:length(age_class)) exposure[k,(ly - age_class[k] + 1):ly] <- 1 - exposure_expanded <- exposure - - iterf <- NROW(foi_expanded) - age_max <- NROW(exposure_expanded) - PrevPn <- matrix(NA, nrow = iterf, ncol = age_max) - for (i in 1:iterf) { - PrevPn[i,] <- 1 - exp( -exposure_expanded %*% foi_expanded[i,]) - } - - lower <- apply(PrevPn, 2, function(x) quantile(x, 0.1)) - upper <- apply(PrevPn, 2, function(x) quantile(x, 0.9)) - medianv <- apply(PrevPn, 2, function(x) quantile(x, 0.5)) - - predicted_prev <- data.frame(age = 1:80, - predicted_prev = medianv, - predicted_prev_lower = lower, - predicted_prev_upper = upper) - - observed_prev <- model_data %>% - dplyr::select(age_mean_f, prev_obs, prev_obs_lower, prev_obs_upper, total, counts) %>% - dplyr::rename(age = age_mean_f, sample_by_age = total, positives = counts) - - prev_expanded <- base::merge(predicted_prev, observed_prev, by = "age", all.x = TRUE) %>% dplyr::mutate(survey = model_data$survey[1]) - - # I added this here for those cases when binned is prefered for plotting - if (model_data$age_max[1] - model_data$age_min[1] < 3) { - model_data$cut_ages <- cut(as.numeric(model_data$age_mean_f), seq(1,101, by = 5), include.lowest = TRUE) - xx <- model_data %>% dplyr::group_by(.data$cut_ages) %>% dplyr::summarise(bin_size = sum(.data$total), bin_pos = sum(.data$counts)) - labs <- read.table(text = gsub("[^.0-9]", " ", levels(xx$cut_ages)), col.names = c("lower", "upper")) %>% - dplyr::mutate(lev = levels(xx$cut_ages), midAge = round((lower + upper)/2)) %>% dplyr::select(.data$midAge, .data$lev) - xx$midAge <- labs$midAge[labs$lev %in% xx$cut_ages] - conf <- data.frame(Hmisc::binconf(xx$bin_pos, xx$bin_size,method = "exact")) - xx <- cbind(xx, conf) %>% dplyr::rename(age = .data$midAge, p_obs_bin = .data$PointEst, - p_obs_bin_l = .data$Lower,p_obs_bin_u = .data$Upper) - - prev_final <- base::merge(prev_expanded, xx, by = "age", all.x = TRUE) - - } else { - - prev_final <- prev_expanded %>% dplyr::mutate(cut_ages = "original", bin_size = .data$sample_by_age, - bin_pos = .data$positives, p_obs_bin = .data$prev_obs, - p_obs_bin_l = .data$prev_obs_lower, p_obs_bin_u = .data$prev_obs_upper) - +save_or_load_model <- function(seroprev_model_name = "constant_foi_bi") { + base_path <- config::get("stan_models_base_path", + file = system.file("config.yml", package = "serofoi", mustWork = TRUE)) + rds_path <- system.file(base_path, paste(seroprev_model_name, ".rds", sep = ""), package = getPackageName()) + if (!file.exists(rds_path)) { + message(sprintf("\nNo rds file found for model %s. Compiling stan model...", seroprev_model_name)) } + stan_path <- system.file(base_path, paste(seroprev_model_name, ".stan", sep = ""), package = getPackageName()) - return(prev_final) - -} + model <- rstan::stan_model(stan_path, auto_write = TRUE) -#' Make yexpo -#' -#' Function that generates Yexpo -#' @param model_data refers to the model data that has been selected -#' @return yexpo -#' @export -make_yexpo <- function(model_data) { - yexpo <- (seq_along(min(model_data$birth_year):model_data$tsur[1])) + return(model) } -#' Get Posterior Summary -#' -#' Function that gets a posterior summary -#' @param model_objects model_objects_chain -#' @return model_object -#' @export -get_posterior_summary <- function(model_objects_chain) { - model_object <- sapply(model_objects_chain, - function(i) c(quantile(i, c(0.5, 0.025, 0.975)))) - row.names(model_object) <- c("Median", "Lower", "Upper") - return(model_object) -} #' Fit Model #' -#' Function that fits the model to the data -#' @param model_data refers to the model data that has been selected -#' @param model refers to model selected -#' @param model_name name of the model selected -#' @param n_iters number of iterations. Each model has a default number. -#' @param n_thin Each model has a default number. -#' @param delta This value comes by default but it can be changed -#' @param m_treed This value comes by default but it can be changed -#' @param decades The decades covered by the survey data -#' @return model_object +#' Function that fits the selected model to the data +#' @param seroprev_data A data frame containing the data from a seroprevalence survey. For further details refer to \link{run_seroprev_model}. +#' @param seroprev_model_name Name of the selected model. Current version provides three options: +#' \describe{ +#' \item{\code{"constant_foi_bi"}}{Runs a constant model} +#' \item{\code{"continuous_foi_normal_bi"}}{Runs a normal model} +#' \item{\code{"continuous_foi_normal_log"}}{Runs a normal logarithmic model} +#' } +#' @param n_iters Number of interations for eah chain including the warmup. \code{iter} in \link[rstan]{sampling}. +#' @param n_thin Positive integer specifying the period for saving samples. \code{thin} in \link[rstan]{sampling}. +#' @param delta Real number between 0 and 1 that represents the target average acceptance probability. +#' Increasing the value of \code{delta} will result in a smaller step size and fewer divergences. +#' For further details refer to the \code{control} parameter in \link[rstan]{sampling} or \href{https://mc-stan.org/rstanarm/reference/adapt_delta.html}{here}. +#' @param m_treed Maximum tree depth for the binary tree used in the NUTS stan sampler. For further details refer to the \code{control} parameter in \link[rstan]{sampling}. +#' @param decades Number of decades covered by the survey data. +#' @return \code{model_object}. An object containing relevant information about the implementation of the model. It contains the following: +#' \tabular{ll}{ +#' \code{fit} \tab \code{stanfit} object returned by the function \link[rstan]{sampling} \cr \tab \cr +#' \code{seroprev_data} \tab A data frame containing the data from a seroprevalence survey. For further details refer to \link{run_seroprev_model}.\cr \tab \cr +#' \code{stan_data} \tab List containing \code{Nobs}, \code{Npos}, \code{Ntotal}, \code{Age}, \code{Ymax}, \code{AgeExpoMatrix} and \code{NDecades}. +#' This object is used as an input for the \link[rstan]{sampling} function \cr \tab \cr +#' \code{exposure_years} \tab Integer atomic vector containing the actual exposure years (1946, ..., 2007 e.g.) \cr \tab \cr +#' \code{exposure_ages} \tab Integer atomic vector containing the numeration of the exposure ages. \cr \tab \cr +#' \code{n_iters} \tab Number of interations for eah chain including the warmup. \cr \tab \cr +#' \code{n_thin} \tab Positive integer specifying the period for saving samples. \cr \tab \cr +#' \code{n_warmup} \tab Number of warm up iterations. Set by default as n_iters/2. \cr \tab \cr +#' \code{seroprev_model_name} \tab The name of the model\cr \tab \cr +#' \code{delta} \tab Real number between 0 and 1 that represents the target average acceptance probability. \cr \tab \cr +#' \code{m_treed} \tab Maximum tree depth for the binary tree used in the NUTS stan sampler. \cr \tab \cr +#' \code{loo_fit} \tab Efficient approximate leave-one-out cross-validation. Refer to \link[loo]{loo} for further details. \cr \tab \cr +#' \code{foi_cent_est} \tab A data fram e containing \code{year} (corresponding to \code{exposure_years}), \code{lower}, \code{upper}, and \code{medianv} \cr \tab \cr +#' \code{foi_post_s} \tab Sample n rows from a table. Refer to \link[dplyr]{sample_n} for further details. \cr \tab \cr +#' \code{model_summary} \tab A data fram containing the summary of the model. Refer to \link{extract_seroprev_model_summary} for further details. \cr \tab \cr +#' } + +#' @examples +#' \dontrun{ +#' seroprev_data <- prepare_seroprev_data(serodata) +#' fit_seroprev_model (seroprev_data, +#' seroprev_model_name = "constant_foi_bi") +#' } +#' #' @export -fit_model <- function(model, - model_data, - model_name, - n_iters = n_iters, - n_thin = n_thin, - delta = delta, - m_treed = m_treed, - decades = decades){ - - # add a warning to the warming process because there are exceptions where a minimal amount of iterations need to be run - n_warmup = floor(n_iters/2) - - yexpo <- make_yexpo(model_data) - yexpo <- yexpo[-length(yexpo)] - real_yexpo <- (min(model_data$birth_year):model_data$tsur[1])[-1] - exposure_matrix <- get_exposure_matrix(model_data, yexpo) - Nobs <- nrow(model_data) +fit_seroprev_model <- function(seroprev_data, + seroprev_model_name, + n_iters = 1000, + n_thin = 2, + delta = 0.90, + m_treed = 10, + decades = 0) { + # add a warning because there are exceptions where a minimal amount of iterations need to be run + model <- save_or_load_model(seroprev_model_name) + exposure_ages <- get_exposure_ages(seroprev_data) + exposure_years <- (min(seroprev_data$birth_year):seroprev_data$tsur[1])[-1] + exposure_matrix <- get_exposure_matrix(seroprev_data) + Nobs <- nrow(seroprev_data) stan_data <- list( - Nobs = Nobs, - Npos = model_data$counts, - Ntotal = model_data$total, - Age = model_data$age_mean_f, - Ymax = max(yexpo), + Nobs = Nobs, + Npos = seroprev_data$counts, + Ntotal = seroprev_data$total, + Age = seroprev_data$age_mean_f, + Ymax = max(exposure_ages), AgeExpoMatrix = exposure_matrix, - NDecades = decades) + NDecades = decades + ) + + n_warmup <- floor(n_iters / 2) + + if (seroprev_model_name == "continuous_foi_normal_log") { + f_init <- function() { + list(log_foi = rep(-3, length(exposure_ages))) + } + lower_quantile = 0.1 + upper_quantile = 0.9 + medianv_quantile = 0.5 + } - f_init <- function(){ - list(foi = rep(0.01, length(yexpo))) + else { + f_init <- function() { + list(foi = rep(0.01, length(exposure_ages))) } + lower_quantile = 0.05 + upper_quantile = 0.95 + medianv_quantile = 0.5 - fit <- rstan::sampling(model, - data = stan_data , - iter = n_iters, - chains = 4, - init = f_init, - warmup = n_warmup, - verbose = FALSE, - refresh = 0, - control = list(adapt_delta = delta, - max_treedepth = m_treed), - seed = "12345", - thin = n_thin + } + + fit <- rstan::sampling( + model, + data = stan_data, + iter = n_iters, + chains = 4, + init = f_init, + warmup = n_warmup, + verbose = FALSE, + refresh = 0, + control = list(adapt_delta = delta, + max_treedepth = m_treed), + seed = "12345", + thin = n_thin, + chain_id = 0 # https://github.com/stan-dev/rstan/issues/761#issuecomment-647029649 ) - if (class(fit@sim$samples) != "NULL") { + if (class(fit@sim$samples) != "NULL") { loo_fit <- loo::loo(fit, save_psis = TRUE, "logLikelihood") foi <- rstan::extract(fit, "foi", inc_warmup = FALSE)[[1]] - + # foi <- rstan::extract(fit, "foi", inc_warmup = TRUE, permuted=FALSE)[[1]] # generates central estimations - foi_cent_est <- data.frame(year = real_yexpo, - lower = apply(foi, 2, function(x) quantile(x, 0.05)), - upper = apply(foi, 2, function(x) quantile(x, 0.95)), - medianv = apply(foi, 2, function(x) quantile(x, 0.5))) + foi_cent_est <- data.frame( + year = exposure_years, + lower = apply(foi, 2, function(x) quantile(x, lower_quantile)), + + upper = apply(foi, 2, function(x) quantile(x, upper_quantile)), + + medianv = apply(foi, 2, function(x) quantile(x, medianv_quantile)) + ) # generates a sample of iterations - if (n_iters >= 2000){ + if (n_iters >= 2000) { foi_post_s <- dplyr::sample_n(as.data.frame(foi), size = 1000) - colnames(foi_post_s) <- real_yexpo - } - else{ + colnames(foi_post_s) <- exposure_years + } else { foi_post_s <- as.data.frame(foi) - colnames(foi_post_s) <- real_yexpo + colnames(foi_post_s) <- exposure_years } - model_object <- list(fit = fit, - model_data = model_data, - stan_data = stan_data, - real_yexpo = real_yexpo, - yexpo = yexpo, - n_iters = n_iters, - n_thin = n_thin, - n_warmup = n_warmup, - model = model_name, - delta = delta, - m_treed = m_treed, - loo_fit = loo_fit, - foi_cent_est = foi_cent_est, - foi_post_s = foi_post_s) - model_object$model_summary <- extract_summary_model(model_object) + model_object <- list( + fit = fit, + seroprev_data = seroprev_data, + stan_data = stan_data, + exposure_years = exposure_years, + exposure_ages = exposure_ages, + n_iters = n_iters, + n_thin = n_thin, + n_warmup = n_warmup, + seroprev_model_name = seroprev_model_name, + delta = delta, + m_treed = m_treed, + loo_fit = loo_fit, + foi_cent_est = foi_cent_est, + foi_post_s = foi_post_s + ) + model_object$model_summary <- + extract_seroprev_model_summary(model_object) } else { loo_fit <- c(-1e10, 0) - model_object <- list(fit = "no model", - model_data = model_data, - stan_data = stan_data, - real_yexpo = real_yexpo, - yexpo = yexpo, - n_iters = n_iters, - n_thin = n_thin, - n_warmup = n_warmup, - model = model_name, - delta = delta, - m_treed = m_treed, - loo_fit = loo_fit, - model_summary = NA) + model_object <- list( + fit = "no model", + seroprev_data = seroprev_data, + stan_data = stan_data, + exposure_years = exposure_years, + exposure_ages = exposure_ages, + n_iters = n_iters, + n_thin = n_thin, + n_warmup = n_warmup, + model = seroprev_model_name, + delta = delta, + m_treed = m_treed, + loo_fit = loo_fit, + model_summary = NA + ) } return(model_object) - } -#' Fit Model Log + +#' Get exposure years #' -#' Function that fits the logarithmic model to the data -#' @param model_data model_data -#' @param model refers to model selected -#' @param model_name name of the model selected -#' @param n_iters number of iterations. This value comes by default but it can be changed -#' @param n_thinThis value comes by default but it can be changed -#' @param delta This value comes by default but it can be changed -#' @param m_treed This value comes by default but it can be changed -#' @param decades The decades covered by the survey data -#' @return model_object +#' Function that generates an atomic vector with the exposition years in seroprev_data. The exposition years to the disease for each individual corresponds to the time from birth to the moment of the survey. +#' @param seroprev_data A data frame containing the data from a seroprevalence survey. This data frame must contain the year of birth for each individual (birth_year) and the time of the survey (tsur). birth_year can be constructed by means of the \link{prepare_seroprev_data} function. +#' @return \code{exposure_ages}. An atomic vector with the numeration of the exposition years in seroprev_data +#' @examples +#' \dontrun{ +#' seroprev_data <- prepare_seroprev_data(seroprev_data = serodata, alpha = 0.05) +#' exposure_ages <- get_exposure_ages(seroprev_data) +#' } #' @export -fit_model_log <- function(model, - model_data, - model_name, - n_iters = 3000, - n_thin = 2, - delta = 0.90, - m_treed = 10, - decades = 0){ - - yexpo <- make_yexpo(model_data) - yexpo <- yexpo[-length(yexpo)] - real_yexpo <- (min(model_data$birth_year):model_data$tsur[1])[-1] - exposure_matrix <- get_exposure_matrix(model_data, yexpo) - Nobs <- nrow(model_data) - - stan_data <- list( - Nobs = nrow(model_data), - Npos = model_data$counts, - Ntotal = model_data$total, - Age = model_data$age_mean_f, - Ymax = max(yexpo), - AgeExpoMatrix = exposure_matrix, - NDecades = decades) +get_exposure_ages <- function(seroprev_data) { + # TODO Verify if this change is correct + return(seq_along(min(seroprev_data$birth_year):(seroprev_data$tsur[1] - 1))) +} - n_warmup = floor(n_iters/2) - f_init <- function(){ - list(log_foi = rep(-3, length(yexpo))) - } - - fit <- rstan::sampling(model, - data = stan_data , - iter = n_iters, - chains = 4, - init = f_init, - warmup = n_warmup, - verbose = FALSE, - refresh = 0, - control = list(adapt_delta = delta, - max_treedepth = m_treed), - seed = "12345", - thin = n_thin - ) +#' Get Exposure Matrix +#' +#' Function that generates the exposure matrix for a seroprevalence survey. +#' @param seroprev_data A data frame containing the data from a seroprevalence survey. This data frame must contain the year of birth for each individual (birth_year) and the time of the survey (tsur). birth_year can be constructed by means of the \link{prepare_seroprev_data} function. +#' @return \code{exposure_output}. An atomic matrix containing the expositions for each entry of \code{seroprev_data} by year. +#' @examples +#' \dontrun{ +#' seroprev_data <- prepare_seroprev_data(seroprev_data = serodata, alpha = 0.05) +#' exposure_matrix <- get_exposure_matrix(seroprev_data = seroprev_data) +#' } +#' @export +get_exposure_matrix <- function(seroprev_data) { + age_class <- seroprev_data$age_mean_f + exposure_ages <- get_exposure_ages(seroprev_data) + ly <- length(exposure_ages) + exposure <- matrix(0, nrow = length(age_class), ncol = ly) + for (k in 1:length(age_class)) + exposure[k, (ly - age_class[k] + 1):ly] <- 1 + exposure_output <- exposure + return(exposure_output) +} - if (class(fit@sim$samples) != "NULL") { - loo_fit <- loo::loo(fit, save_psis = TRUE, "logLikelihood") - foi <- rstan::extract(fit, "foi", inc_warmup = FALSE)[[1]] +#' Extract Model Summary +#' +#' Function to generate a summary of a model. +#' @param model_object \code{model_object}. An object containing relevant information about the implementation of the model. Refer to \link{fit_seroprev_model} for further details. +#' @return \code{model_summary}. Object with a summary of \code{model_object} containing the following: +#' \tabular{ll}{ +#' \code{seroprev_model_name} \tab Name of the selected model. For further details refer to \link{save_or_load_model}. \cr \tab \cr +#' \code{data_set} \tab Seroprevalence survey label.\cr \tab \cr +#' \code{country} \tab Name of the country were the survey was conducted in. \cr \tab \cr +#' \code{year} \tab Year in which the survey was conducted. \cr \tab \cr +#' \code{test} \tab Type of test of the survey. \cr \tab \cr +#' \code{antibody} \tab Antibody \cr \tab \cr +#' \code{n_sample} \tab Total number of samples in the survey. \cr \tab \cr +#' \code{n_agec} \tab Number of age groups considered. \cr \tab \cr +#' \code{n_iter} \tab Number of interations for eah chain including the warmup. \cr \tab \cr +#' \code{elpd} \tab elpd \cr \tab \cr +#' \code{se} \tab se \cr \tab \cr +#' \code{converged} \tab convergence \cr \tab \cr +#' } +#' @examples +#' \dontrun{ +#' seroprev_data <- prepare_seroprev_data(serodata) +#' model_object <- run_seroprev_model(seroprev_data = seroprev_data, +#' seroprev_model_name = "constant_foi_bi") +#' extract_seroprev_model_summary (model_object) +#' } +#' @export +extract_seroprev_model_summary <- function(model_object) { + seroprev_model_name <- model_object$seroprev_model_name + seroprev_data <- model_object$seroprev_data + #------- Loo estimates - foi_cent_est <- data.frame(year = real_yexpo, - lower = apply(foi, 2, function(x) quantile(x, 0.1)), - upper = apply(foi, 2, function(x) quantile(x, 0.9)), - medianv = apply(foi, 2, function(x) quantile(x, 0.5))) - - foi_post_s <- dplyr::sample_n(as.data.frame(foi), size = 1000) - colnames(foi_post_s) <- real_yexpo - - - model_object <- list(fit = fit, - model_data = model_data, - stan_data = stan_data, - real_yexpo = real_yexpo, - yexpo = yexpo, - n_iters = n_iters, - n_thin = n_thin, - n_warmup = n_warmup, - model = model_name, - delta = delta, - m_treed = m_treed, - loo_fit = loo_fit, - foi_cent_est = foi_cent_est, - foi_post_s = foi_post_s) - - model_object$model_summary <- extract_summary_model(model_object) + loo_fit <- model_object$loo_fit + if (sum(is.na(loo_fit)) < 1) { + lll <- as.numeric((round(loo_fit$estimates[1, ], 2))) } else { - - loo_fit <- c(-1e10, 0) - model_object <- list(fit = "no model", - stan_data = stan_data, - real_yexpo = real_yexpo, - yexpo = yexpo, - n_iters = n_iters, - n_thin = n_thin, - n_warmup = n_warmup, - model = model_name, - delta = delta, - m_treed = m_treed, - loo_fit = loo_fit) - model_object$model_summary <- extract_summary_model(model_object) + lll <- c(-1e10, 0) } + model_summary <- data.frame( + seroprev_model_name = seroprev_model_name, + dataset = seroprev_data$survey[1], + country = seroprev_data$country[1], + year = seroprev_data$tsur[1], + test = seroprev_data$test[1], + antibody = seroprev_data$antibody[1], + n_sample = sum(seroprev_data$total), + n_agec = length(seroprev_data$age_mean_f), + n_iter = model_object$n_iters, + elpd = lll[1], + se = lll[2], + converged = NA + ) - return(model_object) + rhats <- get_table_rhats(model_object) + if (any(rhats$rhat > 1.1) == FALSE) { + model_summary$converged <- "Yes" + } + return(model_summary) } -#' Save or read model -#' Function that saves the .RDS file of the model -#' @param model_name name of the model selected -save_or_read_model <- function(model_name="constant_foi_bi") { - rds_path <- config::get(model_name)$rds_path - stan_path <- config::get(model_name)$stan_path - - if (!file.exists(rds_path)){ - model <- rstan::stan_model(stan_path) - saveRDS(model, rds_path) - } - else{ - model <- readRDS(rds_path) +#' Get Prevalence Expanded +#' +#' Function that generates the expanded prevalence +#' @param seroprev_data A data frame containing the data from a seroprevalence survey. For further details refer to \link{run_seroprev_model}. +#' @param foi Object containing the information of the force of infection. It is obtained from \code{rstan::extract(model_object$fit, "foi", inc_warmup = FALSE)[[1]]}. +#' @return \code{prev_final}. The expanded prevalence data. This is used for plotting purposes in the \code{visualization} module. +#' @examples +#' \dontrun{ +#' seroprev_data <- prepare_seroprev_data(serodata) +#' model_object <- run_seroprev_model(seroprev_data = seroprev_data, +#' seroprev_model_name = "constant_foi_bi") +#' foi <- rstan::extract(model_object$fit, "foi", inc_warmup = FALSE)[[1]] +#' get_prev_expanded <- function(foi, seroprev_data) +#' } +#' @export +get_prev_expanded <- function(foi, + seroprev_data) { + dim_foi <- dim(foi)[2] + if (dim_foi < 80) { + oldest_year <- 80 - dim_foi + 1 + foin <- matrix(NA, nrow = dim(foi)[1], 80) + foin[, oldest_year:80] <- foi + foin[, 1:(oldest_year - 1)] <- rowMeans(foi[, 1:5]) + } else { + foin <- foi } - return(model) -} + foi_expanded <- foin -#' Run model -#' Function that runs the specified model -#' @param model_data model_data -#' @param survey survey -#' @param model refers to model selected -#' @param model_name name of the model selected -#' @param n_iters number of iterations. Each model has a value by default. -#' @return model_object of model -#' @export -run_model <- function(model_data, - model_name="constant_foi_bi", - n_iters=1000, - n_thin = 2, - delta = 0.90, - m_treed = 10, - decades = 0) { + age_class <- 1:NCOL(foi_expanded) + ly <- NCOL(foi_expanded) + exposure <- matrix(0, nrow = length(age_class), ncol = ly) + for (k in 1:length(age_class)) + exposure[k, (ly - age_class[k] + 1):ly] <- 1 + exposure_expanded <- exposure - model_data <- model_data %>% dplyr::arrange(.data$age_mean_f) %>% dplyr::mutate(birth_year = .data$tsur - .data$age_mean_f) - survey <- unique(model_data$survey) - if (length(survey)>1) warning("You have more than 1 surveys or survey codes") - - if (model_name == "constant_foi_bi"){ - model_0 <- save_or_read_model(model_name = model_name) - model_object <- fit_model(model = model_0, - model_data = model_data, - model_name = model_name, - n_iters = n_iters, - n_thin = n_thin, - delta = delta, - m_treed = m_treed, - decades = decades); print(paste0(survey, "finished ------ model_0")) - } - if (model_name == "continuous_foi_normal_bi"){ - model_1 <- save_or_read_model(model_name = model_name) - model_object <- fit_model(model = model_1, - model_data = model_data, - model_name = model_name, - n_iters = n_iters, - n_thin = n_thin, - delta = delta, - m_treed = m_treed, - decades = decades); print(paste0(survey, "finished ------ model_1")) - } - if (model_name == "continuous_foi_normal_log"){ - model_2 <- save_or_read_model(model_name = model_name) - model_object <- fit_model_log(model = model_2, - model_data = model_data, - model_name = model_name, - n_iters = n_iters, - n_thin = n_thin, - delta = delta, - m_treed = m_treed, - decades = decades); print(paste0(survey, "finished ------ model_2")) + iterf <- NROW(foi_expanded) + age_max <- NROW(exposure_expanded) + prev_pn <- matrix(NA, nrow = iterf, ncol = age_max) + for (i in 1:iterf) { + prev_pn[i, ] <- 1 - exp(-exposure_expanded %*% foi_expanded[i, ]) } - print(model_object$model_summary) - return(model_object) -} -#' Extract Summary Model -#' -#' Function that summarizes the models -#' @param model_object what the run model function returns -#' @param model_data refers to data of the model -#' @return summary of the models -#' @export -extract_summary_model <- function(model_object) { + lower <- apply(prev_pn, 2, function(x) quantile(x, 0.1)) - model_name <- model_object$model - #------- Loo estimates + upper <- apply(prev_pn, 2, function(x) quantile(x, 0.9)) - loo_fit <- model_object$loo_fit - if (sum(is.na(loo_fit)) < 1) - { - lll <- as.numeric((round(loo_fit$estimates[1,],2)))} else - { - lll <- c(-1e10, 0) - } + medianv <- apply(prev_pn, 2, function(x) quantile(x, 0.5)) - model_data <- model_object$model_data - summary_model <- data.frame(model = model_object$model, - dataset = model_data$survey[1], - country = model_data$country[1], - year = model_data$tsur[1], - test = model_data$test[1], - antibody = model_data$antibody[1], - n_sample = sum(model_data$total), - n_agec = length(model_data$age_mean_f), - n_iter = model_object$n_iters, - performance = "_____", - elpd = lll[1], - se = lll[2], - converged = NA + predicted_prev <- data.frame( + age = 1:80, + predicted_prev = medianv, + predicted_prev_lower = lower, + predicted_prev_upper = upper ) - rhats <- get_table_rhats(model_object) - if (any(rhats$rhat > 1.1 ) == FALSE) { - summary_model$converged = "Yes" } + observed_prev <- seroprev_data %>% + dplyr::select(age_mean_f, + prev_obs, + prev_obs_lower, + prev_obs_upper, + total, + counts) %>% + dplyr::rename(age = age_mean_f, + sample_by_age = total, + positives = counts) + + prev_expanded <- + base::merge(predicted_prev, + observed_prev, + by = "age", + all.x = TRUE) %>% dplyr::mutate(survey = seroprev_data$survey[1]) + + # I added this here for those cases when binned is prefered for plotting + if (seroprev_data$age_max[1] - seroprev_data$age_min[1] < 3) { + xx <- prepare_bin_data(seroprev_data) + prev_final <- + base::merge(prev_expanded, xx, by = "age", all.x = TRUE) + } else { + prev_final <- prev_expanded %>% dplyr::mutate( + cut_ages = "original", + bin_size = .data$sample_by_age, + bin_pos = .data$positives, + p_obs_bin = .data$prev_obs, + p_obs_bin_l = .data$prev_obs_lower, + p_obs_bin_u = .data$prev_obs_upper + ) + } - return(summary_model) + return(prev_final) } diff --git a/R/serodata.R b/R/serodata.R new file mode 100644 index 00000000..1404ed4f --- /dev/null +++ b/R/serodata.R @@ -0,0 +1,15 @@ +#' Seroprevalence data on serofoi +#' +#' Data from a serological surveys +#' +#' @docType data +#' +#' @usage serodata +#' +#' @format An object of class \code{"cross"}; see \code{\link[qtl]{read.cross}}. +#' +#' @keywords datasets +#' +#' @examples +#' serodata +"serodata" \ No newline at end of file diff --git a/R/serofoi_package.R b/R/serofoi_package.R index 31764ab2..de07169d 100644 --- a/R/serofoi_package.R +++ b/R/serofoi_package.R @@ -4,5 +4,11 @@ ## usethis namespace: start #' @import dplyr #' @importFrom dplyr %>% +#' @import Rcpp +#' @import methods +#' @importFrom graphics text +#' @importFrom utils read.table +#' @importFrom rstan sampling +#' @importFrom stats quantile ## usethis namespace: end NULL diff --git a/R/seroprevalence_data.R b/R/seroprevalence_data.R index 39bdb582..e3299a4b 100644 --- a/R/seroprevalence_data.R +++ b/R/seroprevalence_data.R @@ -1,20 +1,249 @@ #' Prepare data #' #' Function that prepares the data for modelling -#' @param model_data dataset to be processed -#' @param alpha probability of a type I error (Hmisc::binconf) -#' @return model_data with additional columns necessary for the analysis +#' @param seroprev_data A data frame containing the data from a seroprevalence survey. +#' This data frame must contain the following columns: +#' \tabular{ll}{ +#' \code{survey} \tab survey Label of the current survey \cr \tab \cr +#' \code{total} \tab Number of samples for each age group\cr \tab \cr +#' \code{counts} \tab Number of positive samples for each age group\cr \tab \cr +#' \code{age_min} \tab age_min \cr \tab \cr +#' \code{age_max} \tab age_max \cr \tab \cr +#' \code{year_init} \tab year_init \cr \tab \cr +#' \code{year_end} \tab year_end \cr \tab \cr +#' \code{tsur} \tab Year in which the survey took place \cr \tab \cr +#' \code{country} \tab The country where the survey took place \cr \tab \cr +#' \code{test} \tab The type of test taken \cr \tab \cr +#' \code{antibody} \tab antibody \cr \tab \cr +#' } +#' @param alpha probability of a type I error. For further details refer to \link{Hmisc::binconf}. +#' @return seroprev_data with additional columns necessary for the analysis. These columns are: +#' \tabular{ll}{ +#' \code{age_mean_f} \tab Floor value of the average between age_min and age_max \cr \tab \cr +#' \code{sample_size} \tab The size of the sample \cr \tab \cr +#' \code{birth_year} \tab The year in which the individuals of each age group were bornt \cr \tab \cr +#' \code{prev_obs} \tab Observed prevalence \cr \tab \cr +#' \code{prev_obs_lower} \tab Lower limit of the confidence interval for the observed prevalence \cr \tab \cr +#' \code{prev_obs_upper} \tab Upper limit of the confidence interval for the observed prevalence \cr \tab \cr +#' } +#' @examples +#'\dontrun{ +#' data_test <- readRDS("data/data.RDS") +#' data_test <- prepare_seroprev_data(seroprev_data, alpha) +#' } #' @export -prepare_data <- function(model_data, - alpha = 0.05) { - model_data <- model_data %>% - dplyr::mutate(age_mean_f = floor((age_min+age_max)/2), sample_size = sum(total)) %>% - cbind(Hmisc::binconf(model_data$counts, - model_data$total, - alpha = alpha, - method="exact", - return.df = TRUE)) %>% - dplyr::rename(prev_obs = PointEst, prev_obs_lower = Lower, prev_obs_upper = Upper) - - return(model_data) +prepare_seroprev_data <- function(seroprev_data = serodata, + alpha = 0.05, + add_age_mean_f = TRUE) { + if(add_age_mean_f){ + seroprev_data <- seroprev_data %>% + dplyr::mutate(age_mean_f = floor((age_min + age_max) / 2), sample_size = sum(total)) %>% + dplyr::mutate(birth_year = .data$tsur - .data$age_mean_f) + } + seroprev_data <- seroprev_data %>% + cbind( + Hmisc::binconf( + seroprev_data$counts, + seroprev_data$total, + alpha = alpha, + method = "exact", + return.df = TRUE + ) + ) %>% + dplyr::rename( + prev_obs = PointEst, + prev_obs_lower = Lower, + prev_obs_upper = Upper + ) %>% + dplyr::arrange(.data$age_mean_f) + + return(seroprev_data) +} + + +#' Prepare data to plot binomial confidence intervals +#' +#' Function that prepares the data for modelling +#' @param seroprev_data A data frame containing the data from a seroprevalence survey. For more information see the function \link{run_seroprev_model}. +#' This data frame must contain the following columns: +#' \tabular{ll}{ +#' \code{survey} \tab survey Label of the current survey \cr \tab \cr +#' \code{total} \tab Number of samples for each age group\cr \tab \cr +#' \code{counts} \tab Number of positive samples for each age group\cr \tab \cr +#' \code{age_min} \tab age_min \cr \tab \cr +#' \code{age_max} \tab age_max \cr \tab \cr +#' \code{year_init} \tab year_init \cr \tab \cr +#' \code{year_end} \tab year_end \cr \tab \cr +#' \code{tsur} \tab Year in which the survey took place \cr \tab \cr +#' \code{country} \tab The country where the survey took place \cr \tab \cr +#' \code{test} \tab The type of test taken \cr \tab \cr +#' \code{antibody} \tab antibody \cr \tab \cr +#' \code{age_mean_f} \tab Floor value of the average between age_min and age_max \cr \tab \cr +#' \code{sample_size} \tab The size of the sample \cr \tab \cr +#' \code{birth_year} \tab The year in which the individuals of each age group were bornt \cr \tab \cr +#' \code{prev_obs} \tab Observed prevalence \cr \tab \cr +#' \code{prev_obs_lower} \tab Lower limit of the confidence interval for the observed prevalence \cr \tab \cr +#' \code{prev_obs_upper} \tab Upper limit of the confidence interval for the observed prevalence \cr \tab \cr +#' } +#' The last six colums can be added to \code{seroprev_data} by means of the function \code{\link{prepare_seroprev_data}}. +#' @return data set with the binomial confidence intervals +#' @examples +#'\dontrun{ +#' prepare_bin_data (seroprev_data) +#' } +#' @export +prepare_bin_data <- function(seroprev_data) { + seroprev_data$cut_ages <- + cut(as.numeric(seroprev_data$age_mean_f), + seq(1, 101, by = 5), + include.lowest = TRUE) + xx <- seroprev_data %>% + dplyr::group_by(.data$cut_ages) %>% + dplyr::summarise(bin_size = sum(.data$total), + bin_pos = sum(.data$counts)) + labs <- + read.table( + text = gsub("[^.0-9]", " ", levels(xx$cut_ages)), + col.names = c("lower", "upper") + ) %>% + dplyr::mutate(lev = levels(xx$cut_ages), mid_age = round((lower + upper) / 2)) %>% + dplyr::select(.data$mid_age, .data$lev) + xx$mid_age <- labs$mid_age[labs$lev %in% xx$cut_ages] + conf <- + data.frame(Hmisc::binconf(xx$bin_pos, xx$bin_size, method = "exact")) + xx <- cbind(xx, conf) %>% dplyr::rename( + age = .data$mid_age, + p_obs_bin = .data$PointEst, + p_obs_bin_l = .data$Lower, + p_obs_bin_u = .data$Upper + ) + return(xx) +} + +# TODO: Complete the documentation of get_sim_counts +#' Function that randomly generates a sample of counts for a simulated dataset +#' +#' @param sim_data A dataframe object containing the following columns: +#' \tabular{ll}{ +#' \code{birth_year} \tab List of years in which the subjects were borned \cr \tab \cr +#' \code{tsur} \tab Year of the survey\cr \tab \cr +#' \code{country} \tab Default to 'none'.\cr \tab \cr +#' \code{survey} \tab Survey label \cr \tab \cr +#' \code{age_mean_f} \tab Age \cr \tab \cr +#' } +#' @return A simulated list of counts following a binomial distribution in accordance with a given force of infection and age class sizes. +#' @examples +#'\dontrun{ +#' +#' } +#' @export +get_sim_counts <- function(sim_data, foi, size_age_class, seed = 1234){ + exposure_ages <- get_exposure_ages(sim_data) + exposure_matrix <- get_exposure_matrix(sim_data) + + set.seed(seed = seed) + sim_probabilities <- purrr::map_dbl(exposure_ages, ~1-exp(-dot(exposure_matrix[., ], foi))) + sim_counts <- purrr::map_int(sim_probabilities, ~rbinom(1, size_age_class, .)) + + return(sim_counts) +} + +# TODO: Complete the documentation of generate_sim_data +#' Function that generates simulated data from a given Force-of-Infection +#' +#' @param foi Numeric atomic vector corresponding to the desired Force-of-Infection +#' @return Dataframe object containing the simulated data generated from \code{foi} +#' @examples +#'\dontrun{ +#' size_age_class = 5 +#' foi <- rep(0.02, 50) +#' sim_data <- generate_sim_data(foi = foi, +#' size_age_class = size_age_class, +#' tsur = 2050, +#' birth_year_min = 2000, +#' survey_label = 'sim_constant_foi') +#' } +#' @export +generate_sim_data <- function(foi, + size_age_class, + tsur, + birth_year_min, + survey_label, + test = "fake", + antibody = "IgG", + seed = 1234 + ){ + sim_data <- data.frame(birth_year = c(birth_year_min:(tsur - 1))) %>% + mutate(tsur = tsur, + country = 'None', + test = test, + antibody = antibody, + survey = survey_label, + age_mean_f = tsur - birth_year) + sim_data <- sim_data %>% + mutate(counts = get_sim_counts(sim_data, foi, size_age_class, seed = seed), + total = size_age_class) %>% + prepare_seroprev_data(add_age_mean_f = FALSE) + + return(sim_data) +} + +# TODO: Complete the documentation of group_sim_data +#' Function that generates grouped simulated data from a given Force-of-Infection +#' +#' @param sim_data Dataframe with the structure of the output of \code{\linl{generate_sim_data}}. +#' @return Dataframe object containing grouped simulated data generated from \code{foi} +#' @examples +#'\dontrun{ +#' size_age_class = 5 +#' foi <- rep(0.02, 50) +#' sim_data <- generate_sim_data(foi = foi, +#' size_age_class = size_age_class, +#' tsur = 2050, +#' birth_year_min = 2000, +#' survey_label = 'sim_constant_foi') +#' sim_data_grouped <- group_sim_data(sim_data = sim_data, +#' foi = foi, +#' size_age_class = size_age_class, +#' tsur = 2050, +#' birth_year_min = 2000, +#' survey_label = 'sim_constant_foi_grouped') +#' } +#' @export +group_sim_data <- function(sim_data, + foi, + size_age_class, + tsur, + birth_year_min, + survey_label, + test = "fake", + antibody = "IgG", + seed = 1234) { + + sim_data <- sim_data %>% mutate(age_group = 'NA', age = age_mean_f) %>% arrange(age) + sim_data$age_group[sim_data$age > 0 & sim_data$age < 5] <- '01-04' + sim_data$age_group[sim_data$age > 4 & sim_data$age < 10] <- '05-09' + sim_data$age_group[sim_data$age > 9 & sim_data$age < 15] <- '10-14' + sim_data$age_group[sim_data$age > 14 & sim_data$age < 20] <- '15-19' + sim_data$age_group[sim_data$age > 19 & sim_data$age < 25] <- '20-24' + sim_data$age_group[sim_data$age > 24 & sim_data$age < 30] <- '25-29' + sim_data$age_group[sim_data$age > 29 & sim_data$age < 35] <- '30-34' + sim_data$age_group[sim_data$age > 34 & sim_data$age < 40] <- '35-39' + sim_data$age_group[sim_data$age > 39 & sim_data$age < 45] <- '40-44' + sim_data$age_group[sim_data$age > 44 & sim_data$age < 51] <- '45-50' + + + sim_data_grouped <- sim_data %>% group_by(age_group) %>% + dplyr::summarise(total = sum(total), counts = sum(counts)) %>% + mutate(tsur = sim_data$tsur[1], + country = "None", + survey = survey_label, + test = test, + antibody = antibody) %>% + mutate(age_min = as.numeric(substr(age_group, 1, 2)), + age_max = as.numeric(substr(age_group, 4, 5))) %>% + prepare_seroprev_data() + + return(sim_data_grouped) + } diff --git a/R/test_vignettes.R b/R/test_vignettes.R new file mode 100644 index 00000000..a8deac9e --- /dev/null +++ b/R/test_vignettes.R @@ -0,0 +1,4 @@ +#library(remotes) + +#remotes::install_github("TRACE-LAC/serofoi", ref = "dev") +#library(serofoi) diff --git a/R/veev2012.R b/R/veev2012.R new file mode 100644 index 00000000..a19088c0 --- /dev/null +++ b/R/veev2012.R @@ -0,0 +1,15 @@ +#' Seroprevalence data on serofoi +#' +#' Data from a serological surveys +#' +#' @docType data +#' +#' @usage veev2012 +#' +#' @format An object of class \code{"cross"}; see \code{\link[qtl]{read.cross}}. +#' +#' @keywords datasets +#' +#' @examples +#' veev2012 +"veev2012" \ No newline at end of file diff --git a/R/visualisation.R b/R/visualisation.R deleted file mode 100644 index 853f2037..00000000 --- a/R/visualisation.R +++ /dev/null @@ -1,248 +0,0 @@ -#' Generate sero-positivity plot -#' -#' Function that generates the sero positivity plot -#' @param model_object Object that the run_model function returns with the results of the fit -#' @param model Refers to the model selected -#' @param xlabel Label of axis x -#' @param ylabel Label of axis y -#' @return The sero-positivity plot -#' @export -plot_seroprev <- function(model_object, - size_text = 25) { - - if (is.character(model_object$fit) == FALSE) { - if (class(model_object$fit@sim$samples) != "NULL" ) { - - foi <- rstan::extract(model_object$fit, "foi", inc_warmup = FALSE)[[1]] - prev_expanded <- get_prev_expanded(foi, model_data = model_object$model_data) - - prev_plot <- - ggplot2::ggplot(prev_expanded) + - ggplot2::geom_ribbon(ggplot2::aes(x = age, ymin = predicted_prev_lower, ymax = predicted_prev_upper), fill = "#c994c7") + - ggplot2::geom_line(ggplot2::aes(x = age, y = predicted_prev), colour = "#7a0177") + - ggplot2::geom_errorbar(ggplot2::aes(age, ymin = p_obs_bin_l, ymax = p_obs_bin_u), width = 0.1) + - ggplot2::geom_point(ggplot2::aes(age, p_obs_bin, size = bin_size), fill = "#7a0177", colour = "black", shape = 21) + - ggplot2::theme_bw(size_text) + - ggplot2::coord_cartesian(xlim = c(0, 60), ylim = c(0,1)) + - ggplot2::theme(legend.position = "none") + - ggplot2::ylab("Sero-positivity") + ggplot2::xlab("Age") - - } } else - - { - print("model did not run") - print_warning <- "errors" - df <- data.frame() - - prev_plot <- ggplot2::ggplot(df) + ggplot2::geom_point() + ggplot2::xlim(0, 10) + ggplot2::ylim(0, 10) + - ggplot2::annotate("text", x = 4, y = 5, label = print_warning) + - ggplot2::theme_bw(25) + - ggplot2::theme(axis.text.x = ggplot2::element_blank(), axis.text.y = ggplot2::element_blank()) + - ggplot2::ylab(" ") + ggplot2::xlab(" ") - ggplot2::theme(plot.title = ggplot2::element_text(size = 10)) - - } - - return(prev_plot) - -} - -#' Generate Force-of-Infection Plot -#' -#' Function that generates the force of infection plot -#' @param model_object Object that the run_model function returns with the results of the fit -#' @param model Refers to model selected -#' @param xlabel Label of axis x -#' @param ylabel Label of axis y -#' @return Force of infection plot -#' @export -plot_foi <- function(model_object, - lambda_sim = NA, - max_lambda = NA, - size_text = 25) { - - if (is.character(model_object$fit) == FALSE) { - if (class(model_object$fit@sim$samples) != "NULL" ) { - - foi <- rstan::extract(model_object$fit, - "foi", - inc_warmup = FALSE)[[1]] - - #-------- This bit is to get the actual length of the foi data - foi_data <- model_object$foi_cent_est - - if (!is.na(lambda_sim)) { - lambda_mod_length <- NROW(foi_data) - lambda_sim_length <- length(lambda_sim) - - if (lambda_mod_length < lambda_sim_length) { - remove_x_values <- lambda_sim_length - lambda_mod_length - lambda_sim <- lambda_sim[-c(1:remove_x_values)] - } - - foi_data$simulated <- lambda_sim - - } - - #-------- - foi_data$medianv[1] <- NA - foi_data$lower[1] <- NA - foi_data$upper[1] <- NA - - foi_plot <- - ggplot2::ggplot(foi_data) + - ggplot2::geom_ribbon(ggplot2::aes(x = year, ymin = lower, ymax = upper), fill = "#41b6c4", alpha = 0.5) + - ggplot2::geom_line(ggplot2::aes(x = year, y = medianv), colour = "#253494", size = size_text/8) + - ggplot2::theme_bw(size_text) + - ggplot2::coord_cartesian(ylim = c(0, max_lambda)) + - ggplot2::ylab("Force-of-Infection") + ggplot2::xlab("Year") - } - - } else - - { - print("model did not run") - print_warning <- "errors" - df <- data.frame() - - foi_plot <- ggplot2::ggplot(df) + ggplot2::geom_point() + ggplot2::xlim(0, 10) + ggplot2::ylim(0, 10) + - ggplot2::annotate("text", x = 4, y = 5, label = print_warning) + - ggplot2::theme_bw(25) + - ggplot2::theme(axis.text.x = ggplot2::element_blank(), axis.text.y = ggplot2::element_blank()) + - ggplot2::ylab(" ") + ggplot2::xlab(" ") - ggplot2::theme(plot.title = ggplot2::element_text(size = 10)) - - } - - return(foi_plot) - -} - -#' Generate Rhats-Convergence Plot -#' -#' Function that generates the convergence graph of a model -#' @param model_object Object that the run_model function returns with the results of the fit -#' @param model Refers to model selected -#' @param xlabel Label of axis x -#' @param ylabel Label of axis y -#' @return The rhats-convergence plot of the selected model -#' @export -plot_rhats <- function(model_object, - size_text = 25) { - - if (is.character(model_object$fit) == FALSE) { - if (class(model_object$fit@sim$samples) != "NULL" ) { - - rhats <- get_table_rhats(model_object) - - rhats_plot <- ggplot2::ggplot(rhats, ggplot2::aes(year, rhat)) + - ggplot2::geom_line(colour = "purple") + - ggplot2::geom_point() + - ggplot2::coord_cartesian(ylim = c(0.7, 2)) + - ggplot2::geom_hline(yintercept = 1.1, colour = "blue", size = size_text/12) + - ggplot2::theme_bw(size_text) + - ggplot2::ylab("Convergence (R^)") - - } } else - - { - print("model did not run") - print_warning <- "errors" - df <- data.frame() - - rhats_plot <- ggplot2::ggplot(df) + ggplot2::geom_point() + ggplot2::xlim(0, 10) + ggplot2::ylim(0, 10) + - ggplot2::annotate("text", x = 4, y = 5, label = print_warning) + - ggplot2::theme_bw(25) + - ggplot2::theme(axis.text.x = ggplot2::element_blank(), axis.text.y = ggplot2::element_blank()) + - ggplot2::ylab(" ") + ggplot2::xlab(" ") - ggplot2::theme(plot.title = ggplot2::element_text(size = 10)) - - } - - return(rhats_plot) - -} - - -#' Generate a vertical arrange of plots summarizing the results of the model implementation -#' -#' Function that generates the combined graph -#' @param model_object Object that the run_model function returns with the results of the fit -#' @param model Refers to model selected -#' @param xlabel Label of axis x -#' @param ylabel Label of axis y -#' @return The combined plots -#' @export -plot_model <- function(model_object, - lambda_sim = NA, - max_lambda = NA, - size_text = 25) { - - if (is.character(model_object$fit) == FALSE) { - if (class(model_object$fit@sim$samples) != "NULL" ) { - prev_plot <- plot_seroprev(model_object = model_object, - size_text = size_text) - - foi_plot <- plot_foi(model_object = model_object, - lambda_sim = lambda_sim, - max_lambda = max_lambda, - size_text = size_text) - - rhats_plot <- plot_rhats(model_object = model_object, - size_text = size_text) - - summary_plot <- plot_info_table(t(model_object$model_summary), size_text = size_text) - - plot_arrange <- gridExtra::grid.arrange(summary_plot, - prev_plot, - foi_plot, - rhats_plot, nrow = 4, - heights = c(1.5, 1, 1, 1)) - - } } else - - { - print("model did not run") - print_warning <- "errors" - df <- data.frame() - - g0 <- ggplot2::ggplot(df) + ggplot2::geom_point() + ggplot2::xlim(0, 10) + ggplot2::ylim(0, 10) + - ggplot2::annotate("text", x = 4, y = 5, label = print_warning) + - ggplot2::theme_bw(25) + - ggplot2::theme(axis.text.x = ggplot2::element_blank(), axis.text.y = ggplot2::element_blank()) + - ggplot2::ylab(" ") + ggplot2::xlab(" ") - g1 <- g0 - g0 <- g0 + ggplot2::labs(subtitle = model_object$model) + - ggplot2::theme(plot.title = ggplot2::element_text(size = 10)) - - plot_arrange <- gridExtra::grid.arrange(g0, g1, g1, g1, g1, nrow = 5) - - } - - return(plot_arrange) - -} - - -#' Plot Info Table -#' -#' Function that generates the information table -#' @param model_data refers to data of the each model -#' @param info the information that will be contained in the table -#' @param size_text text size -#' @return The previous expanded graphic -#' @export -plot_info_table <- function(info, size_text){ - - dato <- data.frame( - y = NROW(info):1, - text = paste0(rownames(info), ": ",info[,1]) - ) - p <- ggplot2::ggplot(dato, ggplot2::aes(x = 1, y = y)) + - ggplot2::scale_y_continuous(limits = c(0, NROW(info) + 1), breaks = NULL) + - # scale_x_continuous(breaks=NULL) + - ggplot2::theme_void() + - ggplot2::geom_text(ggplot2::aes(label = text), size = size_text/2.5, fontface = "bold") - - return(p) -} diff --git a/R/visualization.R b/R/visualization.R new file mode 100644 index 00000000..ffac5a0c --- /dev/null +++ b/R/visualization.R @@ -0,0 +1,417 @@ +#' Generate sero-positivity plot from raw data +#' +#' Function that generates the sero positivity plot from raw data +#' @param seroprev_data A data frame containing the data from a seroprevalence survey. +#' This data frame must contain the following columns: +#' \tabular{ll}{ +#' \code{survey} \tab survey Label of the current survey \cr \tab \cr +#' \code{total} \tab Number of samples for each age group\cr \tab \cr +#' \code{counts} \tab Number of positive samples for each age group\cr \tab \cr +#' \code{age_min} \tab age_min \cr \tab \cr +#' \code{age_max} \tab age_max \cr \tab \cr +#' \code{year_init} \tab year_init \cr \tab \cr +#' \code{year_end} \tab year_end \cr \tab \cr +#' \code{tsur} \tab Year in which the survey took place \cr \tab \cr +#' \code{country} \tab The country where the survey took place \cr \tab \cr +#' \code{test} \tab The type of test taken \cr \tab \cr +#' \code{antibody} \tab antibody \cr \tab \cr +#' } +#' @param size_text Text size use in the theme of the graph returned by the function. +#' @return A ggplot object containing the seropositivity-vs-age graph of the raw data of a given seroprevalence survey with its corresponging binomial confidence interval. +#' @examples +#' \dontrun{ +#' data_test <- prepare_seroprev_data(serodata) +#' model_object <- run_seroprev_model( +#' seroprev_data = data_test, +#' seroprev_model_name = "constant_foi_bi", +#' n_iters = 1000 +#') +#' plot_seroprev(model_object, size_text = 15) +#' } +#' @export +plot_seroprev <- function(seroprev_data, + size_text = 6) { + xx <- prepare_bin_data(seroprev_data) + seroprev_plot <- + ggplot2::ggplot(data = xx) + + ggplot2::geom_errorbar(ggplot2::aes(age, ymin = p_obs_bin_l, ymax = p_obs_bin_u), width = 0.1) + + ggplot2::geom_point(ggplot2::aes(age, p_obs_bin, size = xx$bin_size), fill = "#7a0177", colour = "black", shape = 21) + + ggplot2::theme_bw(size_text) + + ggplot2::coord_cartesian(xlim = c(0, 60), ylim = c(0,1)) + + ggplot2::theme(legend.position = "none") + + ggplot2::ylab("Sero-positivity") + ggplot2::xlab("Age") + + return(seroprev_plot) + +} + +#' Generate sero-positivity plot with fitted model +#' +#' Function that generates the seropositivity graph with fitted model. Age is located on the x axis and seropositivity on the y axis with its corresponding confidence interval. +#' @param model_object Object containing the results of fitting a model by means of \link{run_seroprev_model}. +#' @param size_text Text size of the graph returned by the function. +#' @return A ggplot object containing the seropositivity-vs-age graph including the data, the fitted model and their corresponding confindence intervals. +#' @examples +#' \dontrun{ +#' data_test <- prepare_seroprev_data(serodata) +#' model_object <- run_seroprev_model( +#' seroprev_data = data_test, +#' seroprev_model_name = "constant_foi_bi", +#' n_iters = 1000 +#') +#' plot_seroprev_fitted(model_object, size_text = 15) +#' } +#' @export +plot_seroprev_fitted <- function(model_object, + size_text = 6) { + + if (is.character(model_object$fit) == FALSE) { + if (class(model_object$fit@sim$samples) != "NULL" ) { + + foi <- rstan::extract(model_object$fit, "foi", inc_warmup = FALSE)[[1]] + prev_expanded <- get_prev_expanded(foi, seroprev_data = model_object$seroprev_data) + prev_plot <- + ggplot2::ggplot(prev_expanded) + + ggplot2::geom_ribbon( + ggplot2::aes( + x = age, + ymin = predicted_prev_lower, + ymax = predicted_prev_upper + ), + fill = "#c994c7" + ) + + ggplot2::geom_line(ggplot2::aes(x = age, y = predicted_prev), colour = "#7a0177") + + ggplot2::geom_errorbar(ggplot2::aes(age, ymin = p_obs_bin_l, ymax = p_obs_bin_u), + width = 0.1) + + ggplot2::geom_point( + ggplot2::aes(age, p_obs_bin, size = bin_size), + fill = "#7a0177", + colour = "black", + shape = 21 + ) + + ggplot2::theme_bw(size_text) + + ggplot2::coord_cartesian(xlim = c(0, 60), ylim = c(0, 1)) + + ggplot2::theme(legend.position = "none") + + ggplot2::ylab("Sero-positivity") + + ggplot2::xlab("Age") + } + } else { + print("model did not run") + print_warning <- "errors" + df <- data.frame() + + prev_plot <- ggplot2::ggplot(df) + + ggplot2::geom_point() + + ggplot2::xlim(0, 10) + + ggplot2::ylim(0, 10) + + ggplot2::annotate("text", + x = 4, + y = 5, + label = print_warning) + + ggplot2::theme_bw(25) + + ggplot2::theme( + axis.text.x = ggplot2::element_blank(), + axis.text.y = ggplot2::element_blank() + ) + + ggplot2::ylab(" ") + + ggplot2::xlab(" ") + ggplot2::theme(plot.title = ggplot2::element_text(size = 10)) + } + + return(prev_plot) +} + +#' Generate Force-of-Infection Plot +#' +#' Function that generates the Force-of-Infection plot. The x axis corresponds to the decades covered by the survey the y axis to the Force-of-Infection. +#' @param model_object Object containing the results of fitting a model by means of \link{run_seroprev_model}. +#' @param size_text Text size use in the theme of the graph returned by the function. +#' @return A ggplot2 object containing the Force-of-infection-vs-time including the corresponding confidence interval. +#' @examples +#' \dontrun{ +#' data_test <- prepare_seroprev_data(serodata) +#' model_object <- run_seroprev_model( +#' seroprev_data = data_test, +#' seroprev_model_name = "constant_foi_bi", +#' n_iters = 1000 +#' ) +#' plot_foi(model_object, size_text = 15) +#' } +#' @export +plot_foi <- function(model_object, + lambda_sim = NA, + max_lambda = NA, + size_text = 25) { + if (is.character(model_object$fit) == FALSE) { + if (class(model_object$fit@sim$samples) != "NULL") { + foi <- rstan::extract(model_object$fit, + "foi", + inc_warmup = FALSE)[[1]] + + #-------- This bit is to get the actual length of the foi data + foi_data <- model_object$foi_cent_est + + if (!is.na(lambda_sim)) { + lambda_mod_length <- NROW(foi_data) + lambda_sim_length <- length(lambda_sim) + + if (lambda_mod_length < lambda_sim_length) { + remove_x_values <- lambda_sim_length - lambda_mod_length + lambda_sim <- lambda_sim[-c(1:remove_x_values)] + } + + foi_data$simulated <- lambda_sim + } + + #-------- + foi_data$medianv[1] <- NA + foi_data$lower[1] <- NA + foi_data$upper[1] <- NA + + foi_plot <- + ggplot2::ggplot(foi_data) + + ggplot2::geom_ribbon( + ggplot2::aes( + x = year, + ymin = lower, + ymax = upper + ), + fill = "#41b6c4", + alpha = 0.5 + ) + + ggplot2::geom_line(ggplot2::aes(x = year, y = medianv), + colour = "#253494", + size = size_text / 8) + + ggplot2::theme_bw(size_text) + + ggplot2::coord_cartesian(ylim = c(0, max_lambda)) + + ggplot2::ylab("Force-of-Infection") + + ggplot2::xlab("Year") + } + } else { + print("model did not run") + print_warning <- "errors" + df <- data.frame() + + foi_plot <- ggplot2::ggplot(df) + + ggplot2::geom_point() + + ggplot2::xlim(0, 10) + + ggplot2::ylim(0, 10) + + ggplot2::annotate("text", + x = 4, + y = 5, + label = print_warning) + + ggplot2::theme_bw(25) + + ggplot2::theme( + axis.text.x = ggplot2::element_blank(), + axis.text.y = ggplot2::element_blank() + ) + + ggplot2::ylab(" ") + + ggplot2::xlab(" ") + ggplot2::theme(plot.title = ggplot2::element_text(size = 10)) + } + + return(foi_plot) +} + +#' Generate Rhats-Convergence Plot +#' +#' Function that generates the convergence graph of a model. The x axis corresponds to the decades covered by the survey and the y axis to the value of the rhats. +#' All rhats must be smaller than 1 to ensure convergence. +#' @param model_object Object containing the results of fitting a model by means of \link{run_seroprev_model}. +#' @param size_text Text size use in the theme of the graph returned by the function. +#' @return The rhats-convergence plot of the selected model. +#' @examples +#' \dontrun{ +#' data_test <- prepare_seroprev_data(serodata) +#' model_object <- run_seroprev_model( +#' seroprev_data = data_test, +#' seroprev_model_name = "constant_foi_bi", +#' n_iters = 1000 +#') +#' plot_rhats(model_object, size_text = 15) +#' } +#' @export +plot_rhats <- function(model_object, + size_text = 25) { + if (is.character(model_object$fit) == FALSE) { + if (class(model_object$fit@sim$samples) != "NULL") { + rhats <- get_table_rhats(model_object) + + rhats_plot <- + ggplot2::ggplot(rhats, ggplot2::aes(year, rhat)) + + ggplot2::geom_line(colour = "purple") + + ggplot2::geom_point() + + ggplot2::coord_cartesian(ylim = c(0.7, 2)) + + ggplot2::geom_hline(yintercept = 1.1, + colour = "blue", + size = size_text / 12) + + ggplot2::theme_bw(size_text) + + ggplot2::ylab("Convergence (R^)") + } + } else { + print("model did not run") + print_warning <- "errors" + df <- data.frame() + + rhats_plot <- ggplot2::ggplot(df) + + ggplot2::geom_point() + + ggplot2::xlim(0, 10) + + ggplot2::ylim(0, 10) + + ggplot2::annotate("text", + x = 4, + y = 5, + label = print_warning) + + ggplot2::theme_bw(25) + + ggplot2::theme( + axis.text.x = ggplot2::element_blank(), + axis.text.y = ggplot2::element_blank() + ) + + ggplot2::ylab(" ") + + ggplot2::xlab(" ") + ggplot2::theme(plot.title = ggplot2::element_text(size = 10)) + } + + return(rhats_plot) +} + + +#' Generate a vertical arrange of plots summarizing the results of the model implementation +#' +#' Function that generates the combined plots summarizing the results of the model implementation +#' @param model_object Object containing the results of fitting a model by means of \link{run_seroprev_model}. +#' @param size_text Text size use in the theme of the graph returned by the function. +#' @return A ggplot object with a vertical arrange containing the seropositivity, force of infection, and convergence plots. +#' @examples +#' \dontrun{ +#' data_test <- prepare_seroprev_data(serodata) +#' model_object <- run_seroprev_model( +#' seroprev_data = data_test, +#' seroprev_model_name = "constant_foi_bi", +#' n_iters = 1000 +#') +#' plot_seroprev_model(model_object, size_text = 15) +#' } +#' @export +plot_seroprev_model <- function(model_object, + lambda_sim = NA, + max_lambda = NA, + size_text = 25) { + if (is.character(model_object$fit) == FALSE) { + if (class(model_object$fit@sim$samples) != "NULL") { + prev_plot <- plot_seroprev_fitted(model_object = model_object, + size_text = size_text) + + foi_plot <- plot_foi( + model_object = model_object, + lambda_sim = lambda_sim, + max_lambda = max_lambda, + size_text = size_text + ) + + rhats_plot <- plot_rhats(model_object = model_object, + size_text = size_text) + + summary_table <- t( + dplyr::select(model_object$model_summary, + c('seroprev_model_name', 'dataset', 'elpd', 'se', 'converged'))) + summary_plot <- + plot_info_table(summary_table, size_text = size_text) + + plot_arrange <- cowplot::plot_grid( + summary_plot, + prev_plot, + foi_plot, + rhats_plot, + ncol = 1, + nrow = 4, + rel_heights = c(0.5, 1, 1, 1) + ) + } + } else { + print("model did not run") + print_warning <- "errors" + df <- data.frame() + + g0 <- ggplot2::ggplot(df) + + ggplot2::geom_point() + + ggplot2::xlim(0, 10) + + ggplot2::ylim(0, 10) + + ggplot2::annotate("text", + x = 4, + y = 5, + label = print_warning) + + ggplot2::theme_bw(25) + + ggplot2::theme( + axis.text.x = ggplot2::element_blank(), + axis.text.y = ggplot2::element_blank() + ) + + ggplot2::ylab(" ") + + ggplot2::xlab(" ") + g1 <- g0 + g0 <- g0 + ggplot2::labs(subtitle = model_object$model) + + ggplot2::theme(plot.title = ggplot2::element_text(size = 10)) + + plot_arrange <- + cowplot::plot_grid(g0, g1, g1, g1, g1, ncol = 1, nrow = 5) + } + + return(plot_arrange) +} + + +#' Plot Info Table +#' +#' Function that generates the information table +#' @param info the information that will be contained in the table +#' @param size_text Text size of the graph returned by the function +#' @return p, a variable that will be used in the \link{visualisation} module +#' @examples +#' \dontrun{ +#' data_test <- prepare_seroprev_data(serodata) +#' model_object <- run_seroprev_model( +#' seroprev_data = data_test, +#' seroprev_model_name = "constant_foi_bi", +#' n_iters = 1000 +#') +#' info = t(model_object$model_summary) +#' plot_info_table (info, size_text = 15) +#' } +#' @export +plot_info_table <- function(info, size_text) { + dato <- data.frame(y = NROW(info):seq_len(1), + text = paste0(rownames(info), ": ", info[, 1])) + p <- ggplot2::ggplot(dato, ggplot2::aes(x = 1, y = y)) + + ggplot2::scale_y_continuous(limits = c(0, NROW(info) + 1), breaks = NULL) + + ggplot2::theme_void() + + ggplot2::geom_text(ggplot2::aes(label = text), + size = size_text / 2.5, + fontface = "bold") + + return(p) +} + + +#' Function that generates a plot showing the results for several models. +#' +#' @param ... The first n arguments of the function are taken as plots ti be arranged in a single plot. +#' This objects can be obtained by means of \link{run_seroprev_model}. +#' @param n_row Number of rows of the plot array. +#' @param n_col Number of columns of the plot array. +#' @return A ggplot object containing an array with the plots in \code{models_list}. +#' @examples +#' \dontrun{ +#' data_test <- prepare_seroprev_data(serodata) +#' model_object_constant <- run_seroprev_model(seroprev_data = data_test, +#' seroprev_model_name = "constant_foi_bi") +#' model_object_normalbi <- run_seroprev_model(seroprev_data = data_test, +#' seroprev_model_name = "continuous_foi_normal_bi") +#' model_object_normallog <- run_seroprev_model(seroprev_data = data_test, +#' seroprev_model_name = "continuous_foi_normal_log") +#' plot_seroprev_models_list(model_object_constant, model_object_normalbi, model_object_normallog, n_row = 1, n_col = 3) +#' } +#' @export +plot_seroprev_models_grid <- function(..., n_row = NULL, n_col = NULL) { + # TODO Distribute rows and columns according to the number of rows. + plot_seroprev_models <- cowplot::plot_grid(..., nrow = n_row, ncol = n_col) + return(plot_seroprev_models) +} \ No newline at end of file diff --git a/README.Rmd b/README.Rmd index 8b73605d..745c0010 100644 --- a/README.Rmd +++ b/README.Rmd @@ -1,5 +1,10 @@ --- output: github_document +editor_options: + markdown: + wrap: 72 +bibliography: references.bib +link-citations: true --- ```{r, include = FALSE} @@ -11,52 +16,199 @@ knitr::opts_chunk$set( ) ``` -## *serofoi* - -Estimates the _Force-of-Infection_ of a given pathogen from population based sero-prevalence studies on a bayesian framework. +## *serofoi*: force-of-infection from population based serosurveys with age-disagregated data -[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) -[![R-CMD-check](https://github.com/epiverse-trace/readepi/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/epiverse-trace/readepi/actions/workflows/R-CMD-check.yaml) -[![Codecov test coverage](https://codecov.io/gh/epiverse-trace/readepi/branch/main/graph/badge.svg)](https://app.codecov.io/gh/epiverse-trace/readepi?branch=main) -[![lifecycle-concept](https://raw.githubusercontent.com/reconverse/reconverse.github.io/master/images/badge-concept.svg)](https://www.reconverse.org/lifecycle.html#concept) + +[![License: +MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) +[![R-CMD-check](https://github.com/TRACE-LAC/serofoi/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/TRACE-LAC/serofoi/actions/workflows/R-CMD-check.yaml) +[![Codecov test +coverage](https://codecov.io/gh/TRACE-LAC/serofoi/branch/dev/graph/badge.svg)](https://app.codecov.io/gh/TRACE-LAC/serofoi/tree/dev/R?displayType=list) +[![lifecycle-concept](https://raw.githubusercontent.com/reconverse/reconverse.github.io/master/images/badge-concept.svg)](https://www.reconverse.org/lifecycle.html#concept) + +***serofoi*** is an R package to estimates the *Force-of-Infection* of a +given pathogen from age-disaggregated population based sero-prevalence +studies, using a Bayesian framework. + +***serofoi*** implements methods outlined in [@cucunubá2017] and +[@carrera2020] + +***serofoi*** relies on +[`rstan`](https://mc-stan.org/users/interfaces/rstan) + +***serofoi*** is part of the [Epiverse +Initiative](https://data.org/initiatives/epiverse/). ## Installation + You can install the **development version** of `serofoi` from [GitHub](https://github.com/) with: ``` r # install.packages("remotes") -remotes::install_github("TRACE-LAC/serofoi") +# remotes::install_github("TRACE-LAC/serofoi") +library(serofoi) ``` ## Quick start -These examples illustrate some of the current functionalities: +```{r cleaning, include = FALSE, echo = TRUE} +library(serofoi) +rownames(serodata) <- NULL + +``` + +The package provides an example dataset of the observed serosurvey data, +`serodata`. This example is the basic entry for the package. + +```{r ex, include = TRUE} + +# Load example serodata data included with the package +data(serodata) +head(serodata, 5) + +``` + +The function `prepare_seroprev_data` will prepare the entry data for entering the +modelling functions. The seroprevalence *prepared data* can be +visualised with the `plot_seroprev` function. This function also plots +the binomial confidence interval of the observed data. + +```{r data_test, include = TRUE, fig.height=6, fig.width=6, message=FALSE} + +data_test <- prepare_seroprev_data(serodata) + +plot_seroprev(data_test, size_text = 15) + +``` + +#### Current version of the package runs ***three*** different FoI models + +The `run_seroprev_model` function allows specifying the Bayesian model from *R*, +while running in the back from `rstan`. The number of iterations, +thinning, and other parameters can be customised. + +::: {.alert .alert-primary} +NOTE: Running the *serofoi* models for the first time on your local +computer make take a few minutes while the *rstan* code is compiled +locally. Afterwards, no further local compilation is needed. +::: + +#### Model 1. Constant Force-of-Infection (endemic model) + +For the *endemic model* a small number of iterations is enough for +achieving convergence, as it only fits one parameter (the constant FoI) +from a binomial distribution. + +```{r model_1, include = TRUE, errors = FALSE, warning = FALSE, message = FALSE } +model_1 <- run_seroprev_model(seroprev_data = data_test, + seroprev_model_name = "constant_foi_bi", + n_iters = 500, + n_thin = 2) + +``` + +#### Model 2. Time-varying Force-of-Infection (epidemic model) + +For the *epidemic model,* a larger number of iterations is required for +achieving convergence, as it fits yearly FoI values from a binomial +distribution. The number of iterations required may depend on the number +of years, reflected by the difference between year of the serosurvey and +the maximum age-class sampled. + +```{r model_2, include = TRUE, errors = FALSE, warning = FALSE, message = FALSE } +model_2 <- run_seroprev_model(seroprev_data = data_test, + seroprev_model_name = "continuous_foi_normal_bi", + n_iters = 1500, + n_thin = 2) +``` + +#### Model 3. Time-varying Force-of-Infection (fast epidemic model) + +For the *fast* *epidemic model,* a larger number of iterations is +required for achieving convergence, compared to the previous models. -```{r example} -##library(serofoi) +```{r model_3, include = TRUE, errors = FALSE, warning = FALSE, message = FALSE } +model_3 <- run_seroprev_model(seroprev_data = data_test, + seroprev_model_name = "continuous_foi_normal_log", + n_iters = 1500, + n_thin = 2) +``` + +For each model, the plot_seroprev_model function generate a vertical arrange of +plots summarising the results of the model implementation plotting +functions. Crucially, it shows the (expected) log-predictive density +`elpd`, standard error `se`, and allows to check convergence based on +`R-hat` convergence diagnostics. +```{r plot_seroprev_model, include = FALSE, echo = TRUE, errors = FALSE, warning = FALSE, message = FALSE, fig.height=12, fig.width=6} + +plot_seroprev_model(model_2, size_text = 15) ``` -### Lifecycle -This package is currently a *concept*, as defined by the [RECON software -lifecycle](https://www.reconverse.org/lifecycle.html). This means that essential -features and mechanisms are still being developed, and the package is not ready -for use outside of the development team. +Also, the `plot_seroprev_models_grid` allows a visual a comparison of +the models based on the (expected) log-predictive density `elpd`, +standard error `se`, and allows to check convergence based on `R-hat` +convergence diagnostics. +```{r, model_comp, include=FALSE, echo=TRUE, warning=FALSE, message=FALSE, fig.height=12, message=FALSE} +p1 <- plot_seroprev_model(model_1, size_text = 15) +p2 <- plot_seroprev_model(model_2, size_text = 15) +p3 <- plot_seroprev_model(model_3, size_text = 15) + +plot_seroprev_models_grid(p1, p2, p3, n_row = 1, n_col = 3) +``` + +For more detailed information and examples, please check the [online +documentation](https://epiverse-trace.github.io/serofoi/articles) as +package vignettes under Get Started. ### Contributions -Contributions are welcome via [pull requests](https://github.com/TRACE-LAC/serofoi/pulls). Contributors to the project include: -- [Zulma M. Cucunubá](https://github.com/zmcucunuba) (author) -- [Miguel Gamez](https://github.com/megamezl) (author) +- [Zulma M. Cucunubá](https://github.com/zmcucunuba) (author, + maintainer) + +- [Nicolás Tórres](https://github.com/ntorresd) (author) -### Code of Conduct -Please note that the linelist project is released with a -[Contributor Code of Conduct](https://contributor-covenant.org/version/2/0/CODE_OF_CONDUCT.html). +- [Benjamin Lambert](https://ben-lambert.com/about/) (author) + +- [Pierre Nouvellet](https://github.com/pnouvellet) (author) + +- [Miguel Gamez](https://github.com/megamezl) (contributor) + +- [Geraldine Gómez](https://github.com/megamezl) (contributor) + +- [Jaime A. Pavlich-Mariscal](https://github.com/jpavlich) + (contributor) + +## Package vignettes + +More details on how to use ***serofoi*** can be found in the [online +documentation as package +vignettes](https://epiverse-trace.github.io/serofoi/), under "Get +Started". + +## Help + +To report a bug please open an +[issue](https://github.com/epiverse-trace/serofoi/issues/new/choose). + +## Contribute + +Contributions to ***serofoi*** are welcomed. Please follow the [package +contributing +guide](https://github.com/epiverse-trace/serofoi/blob/main/.github/CONTRIBUTING.md). + +## Code of conduct + +Please note that the ***serofoi*** project is released with a +[Contributor Code of +Conduct](https://github.com/epiverse-trace/.github/blob/main/CODE_OF_CONDUCT.md). By contributing to this project, you agree to abide by its terms. + +## References diff --git a/README.md b/README.md index 02089e84..14212826 100644 --- a/README.md +++ b/README.md @@ -1,18 +1,30 @@ -## *serofoi* -Estimates the *Force-of-Infection* of a given pathogen from population -based sero-prevalence studies on a bayesian framework. +## *serofoi*: force-of-infection from population based serosurveys with age-disagregated data [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) -[![R-CMD-check](https://github.com/epiverse-trace/readepi/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/epiverse-trace/readepi/actions/workflows/R-CMD-check.yaml) +[![R-CMD-check](https://github.com/TRACE-LAC/serofoi/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/TRACE-LAC/serofoi/actions/workflows/R-CMD-check.yaml) [![Codecov test -coverage](https://codecov.io/gh/epiverse-trace/readepi/branch/main/graph/badge.svg)](https://app.codecov.io/gh/epiverse-trace/readepi?branch=main) +coverage](https://codecov.io/gh/TRACE-LAC/serofoi/branch/dev/graph/badge.svg)](https://app.codecov.io/gh/TRACE-LAC/serofoi/tree/dev/R?displayType=list) [![lifecycle-concept](https://raw.githubusercontent.com/reconverse/reconverse.github.io/master/images/badge-concept.svg)](https://www.reconverse.org/lifecycle.html#concept) + +***serofoi*** is an R package to estimates the *Force-of-Infection* of a +given pathogen from age-disaggregated population based sero-prevalence +studies, using a Bayesian framework. + +***serofoi*** implements methods outlined in (Cucunubá et al. +[2017](#ref-cucunubá2017)) and (Carrera et al. [2020](#ref-carrera2020)) + +***serofoi*** relies on +[`rstan`](https://mc-stan.org/users/interfaces/rstan) + +***serofoi*** is part of the [Epiverse +Initiative](https://data.org/initiatives/epiverse/). + ## Installation You can install the **development version** of `serofoi` from @@ -20,37 +32,228 @@ You can install the **development version** of `serofoi` from ``` r # install.packages("remotes") -remotes::install_github("TRACE-LAC/serofoi") +# remotes::install_github("TRACE-LAC/serofoi") +library(serofoi) ``` ## Quick start -These examples illustrate some of the current functionalities: +The package provides an example dataset of the observed serosurvey data, +`serodata`. This example is the basic entry for the package. + +``` r + +# Load example serodata data included with the package +data(serodata) +head(serodata, 5) +#> survey total counts age_min age_max year_init year_end tsur country test +#> 1 COL-035-93 34 0 1 1 2012 2012 2012 COL ELISA +#> 2 COL-035-93 25 0 2 2 2012 2012 2012 COL ELISA +#> 3 COL-035-93 35 1 3 3 2012 2012 2012 COL ELISA +#> 4 COL-035-93 29 0 4 4 2012 2012 2012 COL ELISA +#> 5 COL-035-93 36 0 5 5 2012 2012 2012 COL ELISA +#> antibody +#> 1 IgG anti-T.cruzi +#> 2 IgG anti-T.cruzi +#> 3 IgG anti-T.cruzi +#> 4 IgG anti-T.cruzi +#> 5 IgG anti-T.cruzi +``` + +The function `prepare_seroprev_data` will prepare the entry data for +entering the modelling functions. The seroprevalence *prepared data* can +be visualised with the `plot_seroprev` function. This function also +plots the binomial confidence interval of the observed data. ``` r -##library(serofoi) + +data_test <- prepare_seroprev_data(serodata) + +plot_seroprev(data_test, size_text = 15) ``` -### Lifecycle + -This package is currently a *concept*, as defined by the [RECON software -lifecycle](https://www.reconverse.org/lifecycle.html). This means that -essential features and mechanisms are still being developed, and the -package is not ready for use outside of the development team. +#### Current version of the package runs ***three*** different FoI models -### Contributions +The `run_seroprev_model` function allows specifying the Bayesian model +from *R*, while running in the back from `rstan`. The number of +iterations, thinning, and other parameters can be customised. + +
+ +NOTE: Running the *serofoi* models for the first time on your local +computer make take a few minutes while the *rstan* code is compiled +locally. Afterwards, no further local compilation is needed. + +
+ +#### Model 1. Constant Force-of-Infection (endemic model) + +For the *endemic model* a small number of iterations is enough for +achieving convergence, as it only fits one parameter (the constant FoI) +from a binomial distribution. + +``` r +model_1 <- run_seroprev_model(seroprev_data = data_test, + seroprev_model_name = "constant_foi_bi", + n_iters = 500, + n_thin = 2) +#> [1] "serofoi model constant_foi_bi finished running ------" +#> [,1] +#> seroprev_model_name "constant_foi_bi" +#> dataset "COL-035-93" +#> country "COL" +#> year "2012" +#> test "ELISA" +#> antibody "IgG anti-T.cruzi" +#> n_sample "747" +#> n_agec "72" +#> n_iter "500" +#> elpd "-92.88" +#> se "6.44" +#> converged "Yes" +``` + +#### Model 2. Time-varying Force-of-Infection (epidemic model) + +For the *epidemic model,* a larger number of iterations is required for +achieving convergence, as it fits yearly FoI values from a binomial +distribution. The number of iterations required may depend on the number +of years, reflected by the difference between year of the serosurvey and +the maximum age-class sampled. + +``` r +model_2 <- run_seroprev_model(seroprev_data = data_test, + seroprev_model_name = "continuous_foi_normal_bi", + n_iters = 1500, + n_thin = 2) +#> [1] "serofoi model continuous_foi_normal_bi finished running ------" +#> [,1] +#> seroprev_model_name "continuous_foi_normal_bi" +#> dataset "COL-035-93" +#> country "COL" +#> year "2012" +#> test "ELISA" +#> antibody "IgG anti-T.cruzi" +#> n_sample "747" +#> n_agec "72" +#> n_iter "1500" +#> elpd "-74.69" +#> se "5.83" +#> converged "Yes" +``` + +#### Model 3. Time-varying Force-of-Infection (fast epidemic model) + +For the *fast* *epidemic model,* a larger number of iterations is +required for achieving convergence, compared to the previous models. -Contributions are welcome via [pull -requests](https://github.com/TRACE-LAC/serofoi/pulls). +``` r +model_3 <- run_seroprev_model(seroprev_data = data_test, + seroprev_model_name = "continuous_foi_normal_log", + n_iters = 1500, + n_thin = 2) +#> [1] "serofoi model continuous_foi_normal_log finished running ------" +#> [,1] +#> seroprev_model_name "continuous_foi_normal_log" +#> dataset "COL-035-93" +#> country "COL" +#> year "2012" +#> test "ELISA" +#> antibody "IgG anti-T.cruzi" +#> n_sample "747" +#> n_agec "72" +#> n_iter "1500" +#> elpd "-72.22" +#> se "7.15" +#> converged "Yes" +``` + +For each model, the plot\_seroprev\_model function generate a vertical +arrange of plots summarising the results of the model implementation +plotting functions. Crucially, it shows the (expected) log-predictive +density `elpd`, standard error `se`, and allows to check convergence +based on `R-hat` convergence diagnostics. + +Also, the `plot_seroprev_models_grid` allows a visual a comparison of +the models based on the (expected) log-predictive density `elpd`, +standard error `se`, and allows to check convergence based on `R-hat` +convergence diagnostics. + +For more detailed information and examples, please check the [online +documentation](https://epiverse-trace.github.io/serofoi/articles) as +package vignettes under Get Started. + +### Contributions Contributors to the project include: -- [Zulma M. Cucunubá](https://github.com/zmcucunuba) (author) -- [Miguel Gamez](https://github.com/megamezl) (author) + - [Zulma M. Cucunubá](https://github.com/zmcucunuba) (author, + maintainer) + + - [Nicolás Tórres](https://github.com/ntorresd) (author) + + - [Benjamin Lambert](https://ben-lambert.com/about/) (author) + + - [Pierre Nouvellet](https://github.com/pnouvellet) (author) + + - [Miguel Gamez](https://github.com/megamezl) (contributor) + + - [Geraldine Gómez](https://github.com/megamezl) (contributor) + + - [Jaime A. Pavlich-Mariscal](https://github.com/jpavlich) + (contributor) + +## Package vignettes + +More details on how to use ***serofoi*** can be found in the [online +documentation as package +vignettes](https://epiverse-trace.github.io/serofoi/), under “Get +Started”. + +## Help + +To report a bug please open an +[issue](https://github.com/epiverse-trace/serofoi/issues/new/choose). + +## Contribute + +Contributions to ***serofoi*** are welcomed. Please follow the [package +contributing +guide](https://github.com/epiverse-trace/serofoi/blob/main/.github/CONTRIBUTING.md). + +## Code of conduct + +Please note that the ***serofoi*** project is released with a +[Contributor Code of +Conduct](https://github.com/epiverse-trace/.github/blob/main/CODE_OF_CONDUCT.md). +By contributing to this project, you agree to abide by its terms. + +## References + +
+ +
+ +Carrera, Jean-Paul, Zulma M. Cucunubá, Karen Neira, Ben Lambert, Yaneth +Pittí, Jesus Liscano, Jorge L. Garzón, et al. 2020. “Endemic and +Epidemic Human Alphavirus Infections in Eastern Panama: An Analysis of +Population-Based Cross-Sectional Surveys.” *The American Journal of +Tropical Medicine and Hygiene* 103 (6): 2429–37. +. + +
+ +
+ +Cucunubá, Zulma M, Pierre Nouvellet, Lesong Conteh, Mauricio Javier +Vera, Victor Manuel Angulo, Juan Carlos Dib, Gabriel Jaime Parra -Henao, +and María Gloria Basáñez. 2017. “Modelling Historical Changes in the +Force-of-Infection of Chagas Disease to Inform Control and Elimination +Programmes: Application in Colombia.” *BMJ Global Health* 2 (3): +e000345. . -### Code of Conduct +
-Please note that the linelist project is released with a [Contributor -Code of -Conduct](https://contributor-covenant.org/version/2/0/CODE_OF_CONDUCT.html). -By contributing to this project, you agree to abide by its terms. \ No newline at end of file +
diff --git a/_pkgdown.yml b/_pkgdown.yml new file mode 100644 index 00000000..89edf49d --- /dev/null +++ b/_pkgdown.yml @@ -0,0 +1,4 @@ +url: https://trace-lac.github.io/serofoi/ +template: + bootstrap: 5 + diff --git a/config.yml b/config.yml deleted file mode 100644 index b07d7b43..00000000 --- a/config.yml +++ /dev/null @@ -1,10 +0,0 @@ -default: - constant_foi_bi: - stan_path: "inst/extdata/stanmodels/constant_foi_bi.stan" - rds_path: "inst/extdata/stanmodels/ConstantUniformFOI.RDS" - continuous_foi_normal_bi: - stan_path: "inst/extdata/stanmodels/continuous_foi_normal_bi.stan" - rds_path: "inst/extdata/stanmodels/ContinuousNormalFOI.RDS" - continuous_foi_normal_log: - stan_path: "inst/extdata/stanmodels/continuous_foi_normal_log.stan" - rds_path: "inst/extdata/stanmodels/ContinuousNormalLogFOI_lowt.RDS" diff --git a/data/chagas2012.RDS b/data/chagas2012.RDS new file mode 100644 index 00000000..885b2727 Binary files /dev/null and b/data/chagas2012.RDS differ diff --git a/data/chagas2012.RData b/data/chagas2012.RData new file mode 100644 index 00000000..352fe7b7 Binary files /dev/null and b/data/chagas2012.RData differ diff --git a/data/chik2015.RDS b/data/chik2015.RDS new file mode 100644 index 00000000..390300e3 Binary files /dev/null and b/data/chik2015.RDS differ diff --git a/data/chik2015.RData b/data/chik2015.RData new file mode 100644 index 00000000..af87681d Binary files /dev/null and b/data/chik2015.RData differ diff --git a/data/data.RDS b/data/data.RDS deleted file mode 100644 index 78f058b9..00000000 Binary files a/data/data.RDS and /dev/null differ diff --git a/data/fake_data_1569526695.RDS b/data/fake_data_1569526695.RDS new file mode 100644 index 00000000..9d0ad0b3 Binary files /dev/null and b/data/fake_data_1569526695.RDS differ diff --git a/data/mydata.RData b/data/mydata.RData new file mode 100644 index 00000000..2e8a51fe Binary files /dev/null and b/data/mydata.RData differ diff --git a/data/serodata.RData b/data/serodata.RData new file mode 100644 index 00000000..1d653bfe Binary files /dev/null and b/data/serodata.RData differ diff --git a/data/veev2012.RDS b/data/veev2012.RDS new file mode 100644 index 00000000..c021ca47 Binary files /dev/null and b/data/veev2012.RDS differ diff --git a/data/veev2012.RData b/data/veev2012.RData new file mode 100644 index 00000000..8d33994f Binary files /dev/null and b/data/veev2012.RData differ diff --git a/doc/serofoi.R b/doc/serofoi.R new file mode 100644 index 00000000..fe57b4ec --- /dev/null +++ b/doc/serofoi.R @@ -0,0 +1,12 @@ +## ---- include = FALSE--------------------------------------------------------- +knitr::opts_chunk$set( + collapse = TRUE, + comment = "#>" +) + +## ----setup-------------------------------------------------------------------- +library(serofoi) + +## ----------------------------------------------------------------------------- +data_test <- prepare_data(serodata) + diff --git a/doc/serofoi.Rmd b/doc/serofoi.Rmd new file mode 100644 index 00000000..78260813 --- /dev/null +++ b/doc/serofoi.Rmd @@ -0,0 +1,24 @@ +--- +title: "serofoi" +output: rmarkdown::html_vignette +vignette: > + %\VignetteIndexEntry{serofoi} + %\VignetteEngine{knitr::rmarkdown} + %\VignetteEncoding{UTF-8} +--- + +```{r, include = FALSE} +knitr::opts_chunk$set( + collapse = TRUE, + comment = "#>" +) +``` + +```{r setup} +library(serofoi) +``` + +# Prepare the data for using serofoi +```{r} +data_test <- prepare_data(serodata) +``` diff --git a/doc/serofoi.html b/doc/serofoi.html new file mode 100644 index 00000000..e4798c74 --- /dev/null +++ b/doc/serofoi.html @@ -0,0 +1,365 @@ + + + + + + + + + + + + + + +serofoi + + + + + + + + + + + + + + + + + + + + + + + + + + +

serofoi

+ + + +
library(serofoi)
+
+

Prepare the data for using serofoi

+
data_test <- prepare_data(serodata)
+
+ + + + + + + + + + + diff --git a/inst/config.yml b/inst/config.yml new file mode 100644 index 00000000..9fc90915 --- /dev/null +++ b/inst/config.yml @@ -0,0 +1,2 @@ +default: + stan_models_base_path: "extdata/stanmodels" diff --git a/inst/extdata/config.yml b/inst/extdata/config.yml deleted file mode 100644 index bdbd0d4c..00000000 --- a/inst/extdata/config.yml +++ /dev/null @@ -1,10 +0,0 @@ -default: - constant_foi_bi: - stan_path: "stanmodels/constant_foi_bi.stan" - rds_path: "stanmodels/ConstantUniformFOI.RDS" - continuous_foi_normal_bi: - stan_path: "stanmodels/continuous_foi_normal_bi.stan" - rds_path: "stanmodels/ContinuousNormalFOI.RDS" - continuous_foi_normal_log: - stan_path: "stanmodels/continuous_foi_normal_log.stan" - rds_path: "stanmodels/ContinuousNormalLogFOI_lowt.RDS" diff --git a/man/chagas2012.Rd b/man/chagas2012.Rd new file mode 100644 index 00000000..39d0832a --- /dev/null +++ b/man/chagas2012.Rd @@ -0,0 +1,19 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/chagas2012.R +\docType{data} +\name{chagas2012} +\alias{chagas2012} +\title{Seroprevalence data on serofoi} +\format{ +An object of class \code{"cross"}; see \code{\link[qtl]{read.cross}}. +} +\usage{ +chagas2012 +} +\description{ +Data from a serological surveys +} +\examples{ +chagas2012 +} +\keyword{datasets} diff --git a/man/chik2015.Rd b/man/chik2015.Rd new file mode 100644 index 00000000..044418ba --- /dev/null +++ b/man/chik2015.Rd @@ -0,0 +1,19 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/chik2015.R +\docType{data} +\name{chik2015} +\alias{chik2015} +\title{Seroprevalence data on serofoi} +\format{ +An object of class \code{"cross"}; see \code{\link[qtl]{read.cross}}. +} +\usage{ +chik2015 +} +\description{ +Data from a serological surveys +} +\examples{ +chik2015 +} +\keyword{datasets} diff --git a/man/extract_seroprev_model_summary.Rd b/man/extract_seroprev_model_summary.Rd new file mode 100644 index 00000000..498b3b1f --- /dev/null +++ b/man/extract_seroprev_model_summary.Rd @@ -0,0 +1,39 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/modelling.R +\name{extract_seroprev_model_summary} +\alias{extract_seroprev_model_summary} +\title{Extract Model Summary} +\usage{ +extract_seroprev_model_summary(model_object) +} +\arguments{ +\item{model_object}{\code{model_object}. An object containing relevant information about the implementation of the model. Refer to \link{fit_seroprev_model} for further details.} +} +\value{ +\code{model_summary}. Object with a summary of \code{model_object} containing the following: +\tabular{ll}{ +\code{seroprev_model_name} \tab Name of the selected model. For further details refer to \link{save_or_load_model}. \cr \tab \cr +\code{data_set} \tab Seroprevalence survey label.\cr \tab \cr +\code{country} \tab Name of the country were the survey was conducted in. \cr \tab \cr +\code{year} \tab Year in which the survey was conducted. \cr \tab \cr +\code{test} \tab Type of test of the survey. \cr \tab \cr +\code{antibody} \tab Antibody \cr \tab \cr +\code{n_sample} \tab Total number of samples in the survey. \cr \tab \cr +\code{n_agec} \tab Number of age groups considered. \cr \tab \cr +\code{n_iter} \tab Number of interations for eah chain including the warmup. \cr \tab \cr +\code{elpd} \tab elpd \cr \tab \cr +\code{se} \tab se \cr \tab \cr +\code{converged} \tab convergence \cr \tab \cr +} +} +\description{ +Function to generate a summary of a model. +} +\examples{ +\dontrun{ +seroprev_data <- prepare_seroprev_data(serodata) +model_object <- run_seroprev_model(seroprev_data = seroprev_data, + seroprev_model_name = "constant_foi_bi") +extract_seroprev_model_summary (model_object) +} +} diff --git a/man/extract_summary_model.Rd b/man/extract_summary_model.Rd deleted file mode 100644 index d4dcb888..00000000 --- a/man/extract_summary_model.Rd +++ /dev/null @@ -1,19 +0,0 @@ -% Generated by roxygen2: do not edit by hand -% Please edit documentation in R/modelling.R -\name{extract_summary_model} -\alias{extract_summary_model} -\title{Extract Summary Model} -\usage{ -extract_summary_model(model_object) -} -\arguments{ -\item{model_object}{what the run model function returns} - -\item{model_data}{refers to data of the model} -} -\value{ -summary of the models -} -\description{ -Function that summarizes the models -} diff --git a/man/figures/README-data_test-1.png b/man/figures/README-data_test-1.png new file mode 100644 index 00000000..6b3f80ae Binary files /dev/null and b/man/figures/README-data_test-1.png differ diff --git a/man/figures/README-ex-1.png b/man/figures/README-ex-1.png new file mode 100644 index 00000000..5ab8b89a Binary files /dev/null and b/man/figures/README-ex-1.png differ diff --git a/man/figures/README-model_comp-1.png b/man/figures/README-model_comp-1.png new file mode 100644 index 00000000..a58c9464 Binary files /dev/null and b/man/figures/README-model_comp-1.png differ diff --git a/man/figures/README-plot_model-1.png b/man/figures/README-plot_model-1.png new file mode 100644 index 00000000..268d4f26 Binary files /dev/null and b/man/figures/README-plot_model-1.png differ diff --git a/man/figures/README-plot_seroprev_model-1.png b/man/figures/README-plot_seroprev_model-1.png new file mode 100644 index 00000000..062f61c6 Binary files /dev/null and b/man/figures/README-plot_seroprev_model-1.png differ diff --git a/man/figures/plot_foi_example.png b/man/figures/plot_foi_example.png new file mode 100644 index 00000000..86d082fb Binary files /dev/null and b/man/figures/plot_foi_example.png differ diff --git a/man/figures/plot_model_example.png b/man/figures/plot_model_example.png new file mode 100644 index 00000000..be42bfb1 Binary files /dev/null and b/man/figures/plot_model_example.png differ diff --git a/man/figures/plot_rhats_example.png b/man/figures/plot_rhats_example.png new file mode 100644 index 00000000..27ff5dcb Binary files /dev/null and b/man/figures/plot_rhats_example.png differ diff --git a/man/figures/plot_seroprev_example.png b/man/figures/plot_seroprev_example.png new file mode 100644 index 00000000..b2f4885e Binary files /dev/null and b/man/figures/plot_seroprev_example.png differ diff --git a/man/figures/plot_seroprev_fitted_example.png b/man/figures/plot_seroprev_fitted_example.png new file mode 100644 index 00000000..afd5348d Binary files /dev/null and b/man/figures/plot_seroprev_fitted_example.png differ diff --git a/man/figures/serofoi-logo.png b/man/figures/serofoi-logo.png new file mode 100644 index 00000000..ce1a0f03 Binary files /dev/null and b/man/figures/serofoi-logo.png differ diff --git a/man/fit_model.Rd b/man/fit_model.Rd deleted file mode 100644 index 913bf3cc..00000000 --- a/man/fit_model.Rd +++ /dev/null @@ -1,40 +0,0 @@ -% Generated by roxygen2: do not edit by hand -% Please edit documentation in R/modelling.R -\name{fit_model} -\alias{fit_model} -\title{Fit Model} -\usage{ -fit_model( - model, - model_data, - model_name, - n_iters = n_iters, - n_thin = n_thin, - delta = delta, - m_treed = m_treed, - decades = decades -) -} -\arguments{ -\item{model}{refers to model selected} - -\item{model_data}{refers to the model data that has been selected} - -\item{model_name}{name of the model selected} - -\item{n_iters}{number of iterations. Each model has a default number.} - -\item{n_thin}{Each model has a default number.} - -\item{delta}{This value comes by default but it can be changed} - -\item{m_treed}{This value comes by default but it can be changed} - -\item{decades}{The decades covered by the survey data} -} -\value{ -model_object -} -\description{ -Function that fits the model to the data -} diff --git a/man/fit_model_log.Rd b/man/fit_model_log.Rd deleted file mode 100644 index 8fccda55..00000000 --- a/man/fit_model_log.Rd +++ /dev/null @@ -1,40 +0,0 @@ -% Generated by roxygen2: do not edit by hand -% Please edit documentation in R/modelling.R -\name{fit_model_log} -\alias{fit_model_log} -\title{Fit Model Log} -\usage{ -fit_model_log( - model, - model_data, - model_name, - n_iters = 3000, - n_thin = 2, - delta = 0.9, - m_treed = 10, - decades = 0 -) -} -\arguments{ -\item{model}{refers to model selected} - -\item{model_data}{model_data} - -\item{model_name}{name of the model selected} - -\item{n_iters}{number of iterations. This value comes by default but it can be changed} - -\item{delta}{This value comes by default but it can be changed} - -\item{m_treed}{This value comes by default but it can be changed} - -\item{decades}{The decades covered by the survey data} - -\item{n_thinThis}{value comes by default but it can be changed} -} -\value{ -model_object -} -\description{ -Function that fits the logarithmic model to the data -} diff --git a/man/fit_seroprev_model.Rd b/man/fit_seroprev_model.Rd new file mode 100644 index 00000000..34f23b2d --- /dev/null +++ b/man/fit_seroprev_model.Rd @@ -0,0 +1,70 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/modelling.R +\name{fit_seroprev_model} +\alias{fit_seroprev_model} +\title{Fit Model} +\usage{ +fit_seroprev_model( + seroprev_data, + seroprev_model_name, + n_iters = 1000, + n_thin = 2, + delta = 0.9, + m_treed = 10, + decades = 0 +) +} +\arguments{ +\item{seroprev_data}{A data frame containing the data from a seroprevalence survey. For further details refer to \link{run_seroprev_model}.} + +\item{seroprev_model_name}{Name of the selected model. Current version provides three options: +\describe{ +\item{\code{"constant_foi_bi"}}{Runs a constant model} +\item{\code{"continuous_foi_normal_bi"}}{Runs a normal model} +\item{\code{"continuous_foi_normal_log"}}{Runs a normal logarithmic model} +}} + +\item{n_iters}{Number of interations for eah chain including the warmup. \code{iter} in \link[rstan]{sampling}.} + +\item{n_thin}{Positive integer specifying the period for saving samples. \code{thin} in \link[rstan]{sampling}.} + +\item{delta}{Real number between 0 and 1 that represents the target average acceptance probability. +Increasing the value of \code{delta} will result in a smaller step size and fewer divergences. +For further details refer to the \code{control} parameter in \link[rstan]{sampling} or \href{https://mc-stan.org/rstanarm/reference/adapt_delta.html}{here}.} + +\item{m_treed}{Maximum tree depth for the binary tree used in the NUTS stan sampler. For further details refer to the \code{control} parameter in \link[rstan]{sampling}.} + +\item{decades}{Number of decades covered by the survey data.} +} +\value{ +\code{model_object}. An object containing relevant information about the implementation of the model. It contains the following: +\tabular{ll}{ +\code{fit} \tab \code{stanfit} object returned by the function \link[rstan]{sampling} \cr \tab \cr +\code{seroprev_data} \tab A data frame containing the data from a seroprevalence survey. For further details refer to \link{run_seroprev_model}.\cr \tab \cr +\code{stan_data} \tab List containing \code{Nobs}, \code{Npos}, \code{Ntotal}, \code{Age}, \code{Ymax}, \code{AgeExpoMatrix} and \code{NDecades}. +This object is used as an input for the \link[rstan]{sampling} function \cr \tab \cr +\code{exposure_years} \tab Integer atomic vector containing the actual exposure years (1946, ..., 2007 e.g.) \cr \tab \cr +\code{exposure_ages} \tab Integer atomic vector containing the numeration of the exposure ages. \cr \tab \cr +\code{n_iters} \tab Number of interations for eah chain including the warmup. \cr \tab \cr +\code{n_thin} \tab Positive integer specifying the period for saving samples. \cr \tab \cr +\code{n_warmup} \tab Number of warm up iterations. Set by default as n_iters/2. \cr \tab \cr +\code{seroprev_model_name} \tab The name of the model\cr \tab \cr +\code{delta} \tab Real number between 0 and 1 that represents the target average acceptance probability. \cr \tab \cr +\code{m_treed} \tab Maximum tree depth for the binary tree used in the NUTS stan sampler. \cr \tab \cr +\code{loo_fit} \tab Efficient approximate leave-one-out cross-validation. Refer to \link[loo]{loo} for further details. \cr \tab \cr +\code{foi_cent_est} \tab A data fram e containing \code{year} (corresponding to \code{exposure_years}), \code{lower}, \code{upper}, and \code{medianv} \cr \tab \cr +\code{foi_post_s} \tab Sample n rows from a table. Refer to \link[dplyr]{sample_n} for further details. \cr \tab \cr +\code{model_summary} \tab A data fram containing the summary of the model. Refer to \link{extract_seroprev_model_summary} for further details. \cr \tab \cr +} +} +\description{ +Function that fits the selected model to the data +} +\examples{ +\dontrun{ +seroprev_data <- prepare_seroprev_data(serodata) +fit_seroprev_model (seroprev_data, + seroprev_model_name = "constant_foi_bi") +} + +} diff --git a/man/generate_sim_data.Rd b/man/generate_sim_data.Rd new file mode 100644 index 00000000..449564c7 --- /dev/null +++ b/man/generate_sim_data.Rd @@ -0,0 +1,37 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/seroprevalence_data.R +\name{generate_sim_data} +\alias{generate_sim_data} +\title{Function that generates simulated data from a given Force-of-Infection} +\usage{ +generate_sim_data( + foi, + size_age_class, + tsur, + birth_year_min, + survey_label, + test = "fake", + antibody = "IgG", + seed = 1234 +) +} +\arguments{ +\item{foi}{Numeric atomic vector corresponding to the desired Force-of-Infection} +} +\value{ +Dataframe object containing the simulated data generated from \code{foi} +} +\description{ +Function that generates simulated data from a given Force-of-Infection +} +\examples{ +\dontrun{ +size_age_class = 5 +foi <- rep(0.02, 50) +sim_data <- generate_sim_data(foi = foi, + size_age_class = size_age_class, + tsur = 2050, + birth_year_min = 2000, + survey_label = 'sim_constant_foi') +} +} diff --git a/man/get_comparison_table.Rd b/man/get_comparison_table.Rd new file mode 100644 index 00000000..e75c8333 --- /dev/null +++ b/man/get_comparison_table.Rd @@ -0,0 +1,38 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/model_comparison.R +\name{get_comparison_table} +\alias{get_comparison_table} +\title{Get Model Table Comparison +Provides a table with statistics for comparison between models and selection} +\usage{ +get_comparison_table(model_objects_list) +} +\arguments{ +\item{model_objects_list}{model_objects to compare} +} +\value{ +comparison table +} +\description{ +Get Model Table Comparison +Provides a table with statistics for comparison between models and selection +} +\examples{ +\dontrun{ +data_test <- prepare_seroprev_data(serodata) +model_0 <- run_seroprev_model(seroprev_data = data_test, + seroprev_model_name = "constant_foi_bi", + n_iters = 1000) + +model_1 <- run_seroprev_model(seroprev_data = data_test, + seroprev_model_name = "continuous_foi_normal_bi", + n_iters = 1000) + +model_2 <- run_seroprev_model(seroprev_data = data_test, + seroprev_model_name = "continuous_foi_normal_log", + n_iters = 1000) +comp_table <- get_comparison_table(model_objects_list = c(m0 = model_0, + m1 = model_1, + m2 = model_2)) +} +} diff --git a/man/get_exposure_ages.Rd b/man/get_exposure_ages.Rd new file mode 100644 index 00000000..71eeb3ed --- /dev/null +++ b/man/get_exposure_ages.Rd @@ -0,0 +1,23 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/modelling.R +\name{get_exposure_ages} +\alias{get_exposure_ages} +\title{Get exposure years} +\usage{ +get_exposure_ages(seroprev_data) +} +\arguments{ +\item{seroprev_data}{A data frame containing the data from a seroprevalence survey. This data frame must contain the year of birth for each individual (birth_year) and the time of the survey (tsur). birth_year can be constructed by means of the \link{prepare_seroprev_data} function.} +} +\value{ +\code{exposure_ages}. An atomic vector with the numeration of the exposition years in seroprev_data +} +\description{ +Function that generates an atomic vector with the exposition years in seroprev_data. The exposition years to the disease for each individual corresponds to the time from birth to the moment of the survey. +} +\examples{ +\dontrun{ +seroprev_data <- prepare_seroprev_data(seroprev_data = serodata, alpha = 0.05) +exposure_ages <- get_exposure_ages(seroprev_data) +} +} diff --git a/man/get_exposure_matrix.Rd b/man/get_exposure_matrix.Rd index 9ad72f6a..e988a53a 100644 --- a/man/get_exposure_matrix.Rd +++ b/man/get_exposure_matrix.Rd @@ -4,16 +4,20 @@ \alias{get_exposure_matrix} \title{Get Exposure Matrix} \usage{ -get_exposure_matrix(model_data, yexpo) +get_exposure_matrix(seroprev_data) } \arguments{ -\item{model_data}{refers to the model data that has been selected} - -\item{yexpo}{what the make yexpo function returns} +\item{seroprev_data}{A data frame containing the data from a seroprevalence survey. This data frame must contain the year of birth for each individual (birth_year) and the time of the survey (tsur). birth_year can be constructed by means of the \link{prepare_seroprev_data} function.} } \value{ -exposure_output +\code{exposure_output}. An atomic matrix containing the expositions for each entry of \code{seroprev_data} by year. } \description{ -Function that generates the exposure matrix +Function that generates the exposure matrix for a seroprevalence survey. +} +\examples{ +\dontrun{ +seroprev_data <- prepare_seroprev_data(seroprev_data = serodata, alpha = 0.05) +exposure_matrix <- get_exposure_matrix(seroprev_data = seroprev_data) +} } diff --git a/man/get_posterior_summary.Rd b/man/get_posterior_summary.Rd deleted file mode 100644 index 5ee7f9c0..00000000 --- a/man/get_posterior_summary.Rd +++ /dev/null @@ -1,17 +0,0 @@ -% Generated by roxygen2: do not edit by hand -% Please edit documentation in R/modelling.R -\name{get_posterior_summary} -\alias{get_posterior_summary} -\title{Get Posterior Summary} -\usage{ -get_posterior_summary(model_objects_chain) -} -\arguments{ -\item{model_objects}{model_objects_chain} -} -\value{ -model_object -} -\description{ -Function that gets a posterior summary -} diff --git a/man/get_prev_expanded.Rd b/man/get_prev_expanded.Rd index dbf891ed..94549724 100644 --- a/man/get_prev_expanded.Rd +++ b/man/get_prev_expanded.Rd @@ -4,16 +4,25 @@ \alias{get_prev_expanded} \title{Get Prevalence Expanded} \usage{ -get_prev_expanded(foi, model_data) +get_prev_expanded(foi, seroprev_data) } \arguments{ -\item{foi}{force of infection} +\item{foi}{Object containing the information of the force of infection. It is obtained from \code{rstan::extract(model_object$fit, "foi", inc_warmup = FALSE)[[1]]}.} -\item{model_data}{refers to the model data that has been selected} +\item{seroprev_data}{A data frame containing the data from a seroprevalence survey. For further details refer to \link{run_seroprev_model}.} } \value{ -prev_final +\code{prev_final}. The expanded prevalence data. This is used for plotting purposes in the \code{visualization} module. } \description{ -Function that obtains the expanded prevalence +Function that generates the expanded prevalence +} +\examples{ +\dontrun{ +seroprev_data <- prepare_seroprev_data(serodata) +model_object <- run_seroprev_model(seroprev_data = seroprev_data, + seroprev_model_name = "constant_foi_bi") +foi <- rstan::extract(model_object$fit, "foi", inc_warmup = FALSE)[[1]] +get_prev_expanded <- function(foi, seroprev_data) +} } diff --git a/man/get_sim_counts.Rd b/man/get_sim_counts.Rd new file mode 100644 index 00000000..b75538d1 --- /dev/null +++ b/man/get_sim_counts.Rd @@ -0,0 +1,29 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/seroprevalence_data.R +\name{get_sim_counts} +\alias{get_sim_counts} +\title{Function that randomly generates a sample of counts for a simulated dataset} +\usage{ +get_sim_counts(sim_data, foi, size_age_class, seed = 1234) +} +\arguments{ +\item{sim_data}{A dataframe object containing the following columns: +\tabular{ll}{ +\code{birth_year} \tab List of years in which the subjects were borned \cr \tab \cr +\code{tsur} \tab Year of the survey\cr \tab \cr +\code{country} \tab Default to 'none'.\cr \tab \cr +\code{survey} \tab Survey label \cr \tab \cr +\code{age_mean_f} \tab Age \cr \tab \cr +}} +} +\value{ +A simulated list of counts following a binomial distribution in accordance with a given force of infection and age class sizes. +} +\description{ +Function that randomly generates a sample of counts for a simulated dataset +} +\examples{ +\dontrun{ + +} +} diff --git a/man/get_table_rhats.Rd b/man/get_table_rhats.Rd index 034d353e..5a5ea6d7 100644 --- a/man/get_table_rhats.Rd +++ b/man/get_table_rhats.Rd @@ -13,6 +13,13 @@ get_table_rhats(model_object) rhats table } \description{ -Función que hace la tabla de los rhats Function that makes the rhats table } +\examples{ +\dontrun{ +seroprev_data <- prepare_seroprev_data(seroprev_data = serodata, alpha = 0.05) +model_object <- run_seroprev_model( +seroprev_data = seroprev_data, seroprev_model_name = "constant_foi_bi") +get_table_rhats (model_object) +} +} diff --git a/man/group_sim_data.Rd b/man/group_sim_data.Rd new file mode 100644 index 00000000..fcde7ad0 --- /dev/null +++ b/man/group_sim_data.Rd @@ -0,0 +1,44 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/seroprevalence_data.R +\name{group_sim_data} +\alias{group_sim_data} +\title{Function that generates grouped simulated data from a given Force-of-Infection} +\usage{ +group_sim_data( + sim_data, + foi, + size_age_class, + tsur, + birth_year_min, + survey_label, + test = "fake", + antibody = "IgG", + seed = 1234 +) +} +\arguments{ +\item{sim_data}{Dataframe with the structure of the output of \code{\linl{generate_sim_data}}.} +} +\value{ +Dataframe object containing grouped simulated data generated from \code{foi} +} +\description{ +Function that generates grouped simulated data from a given Force-of-Infection +} +\examples{ +\dontrun{ +size_age_class = 5 +foi <- rep(0.02, 50) +sim_data <- generate_sim_data(foi = foi, + size_age_class = size_age_class, + tsur = 2050, + birth_year_min = 2000, + survey_label = 'sim_constant_foi') +sim_data_grouped <- group_sim_data(sim_data = sim_data, + foi = foi, + size_age_class = size_age_class, + tsur = 2050, + birth_year_min = 2000, + survey_label = 'sim_constant_foi_grouped') +} +} diff --git a/man/make_yexpo.Rd b/man/make_yexpo.Rd deleted file mode 100644 index ffa89ab3..00000000 --- a/man/make_yexpo.Rd +++ /dev/null @@ -1,17 +0,0 @@ -% Generated by roxygen2: do not edit by hand -% Please edit documentation in R/modelling.R -\name{make_yexpo} -\alias{make_yexpo} -\title{Make yexpo} -\usage{ -make_yexpo(model_data) -} -\arguments{ -\item{model_data}{refers to the model data that has been selected} -} -\value{ -yexpo -} -\description{ -Function that generates Yexpo -} diff --git a/man/plot_foi.Rd b/man/plot_foi.Rd index 79a6d49a..45d0e987 100644 --- a/man/plot_foi.Rd +++ b/man/plot_foi.Rd @@ -1,5 +1,5 @@ % Generated by roxygen2: do not edit by hand -% Please edit documentation in R/visualisation.R +% Please edit documentation in R/visualization.R \name{plot_foi} \alias{plot_foi} \title{Generate Force-of-Infection Plot} @@ -7,17 +7,24 @@ plot_foi(model_object, lambda_sim = NA, max_lambda = NA, size_text = 25) } \arguments{ -\item{model_object}{Object that the run_model function returns with the results of the fit} +\item{model_object}{Object containing the results of fitting a model by means of \link{run_seroprev_model}.} -\item{model}{Refers to model selected} - -\item{xlabel}{Label of axis x} - -\item{ylabel}{Label of axis y} +\item{size_text}{Text size use in the theme of the graph returned by the function.} } \value{ -Force of infection plot +A ggplot2 object containing the Force-of-infection-vs-time including the corresponding confidence interval. } \description{ -Function that generates the force of infection plot +Function that generates the Force-of-Infection plot. The x axis corresponds to the decades covered by the survey the y axis to the Force-of-Infection. +} +\examples{ +\dontrun{ + data_test <- prepare_seroprev_data(serodata) + model_object <- run_seroprev_model( + seroprev_data = data_test, + seroprev_model_name = "constant_foi_bi", + n_iters = 1000 +) +plot_foi(model_object, size_text = 15) +} } diff --git a/man/plot_info_table.Rd b/man/plot_info_table.Rd index 7b474ee8..f30f7e5e 100644 --- a/man/plot_info_table.Rd +++ b/man/plot_info_table.Rd @@ -1,5 +1,5 @@ % Generated by roxygen2: do not edit by hand -% Please edit documentation in R/visualisation.R +% Please edit documentation in R/visualization.R \name{plot_info_table} \alias{plot_info_table} \title{Plot Info Table} @@ -9,13 +9,23 @@ plot_info_table(info, size_text) \arguments{ \item{info}{the information that will be contained in the table} -\item{size_text}{text size} - -\item{model_data}{refers to data of the each model} +\item{size_text}{Text size of the graph returned by the function} } \value{ -The previous expanded graphic +p, a variable that will be used in the \link{visualisation} module } \description{ Function that generates the information table } +\examples{ +\dontrun{ + data_test <- prepare_seroprev_data(serodata) + model_object <- run_seroprev_model( + seroprev_data = data_test, + seroprev_model_name = "constant_foi_bi", + n_iters = 1000 +) +info = t(model_object$model_summary) +plot_info_table (info, size_text = 15) +} +} diff --git a/man/plot_model.Rd b/man/plot_model.Rd deleted file mode 100644 index f80569de..00000000 --- a/man/plot_model.Rd +++ /dev/null @@ -1,23 +0,0 @@ -% Generated by roxygen2: do not edit by hand -% Please edit documentation in R/visualisation.R -\name{plot_model} -\alias{plot_model} -\title{Generate a vertical arrange of plots summarizing the results of the model implementation} -\usage{ -plot_model(model_object, lambda_sim = NA, max_lambda = NA, size_text = 25) -} -\arguments{ -\item{model_object}{Object that the run_model function returns with the results of the fit} - -\item{model}{Refers to model selected} - -\item{xlabel}{Label of axis x} - -\item{ylabel}{Label of axis y} -} -\value{ -The combined plots -} -\description{ -Function that generates the combined graph -} diff --git a/man/plot_rhats.Rd b/man/plot_rhats.Rd index 706a70a1..e79241c8 100644 --- a/man/plot_rhats.Rd +++ b/man/plot_rhats.Rd @@ -1,5 +1,5 @@ % Generated by roxygen2: do not edit by hand -% Please edit documentation in R/visualisation.R +% Please edit documentation in R/visualization.R \name{plot_rhats} \alias{plot_rhats} \title{Generate Rhats-Convergence Plot} @@ -7,17 +7,25 @@ plot_rhats(model_object, size_text = 25) } \arguments{ -\item{model_object}{Object that the run_model function returns with the results of the fit} +\item{model_object}{Object containing the results of fitting a model by means of \link{run_seroprev_model}.} -\item{model}{Refers to model selected} - -\item{xlabel}{Label of axis x} - -\item{ylabel}{Label of axis y} +\item{size_text}{Text size use in the theme of the graph returned by the function.} } \value{ -The rhats-convergence plot of the selected model +The rhats-convergence plot of the selected model. } \description{ -Function that generates the convergence graph of a model +Function that generates the convergence graph of a model. The x axis corresponds to the decades covered by the survey and the y axis to the value of the rhats. +All rhats must be smaller than 1 to ensure convergence. +} +\examples{ +\dontrun{ +data_test <- prepare_seroprev_data(serodata) +model_object <- run_seroprev_model( + seroprev_data = data_test, + seroprev_model_name = "constant_foi_bi", + n_iters = 1000 +) +plot_rhats(model_object, size_text = 15) +} } diff --git a/man/plot_seroprev.Rd b/man/plot_seroprev.Rd index ee97665e..90fff635 100644 --- a/man/plot_seroprev.Rd +++ b/man/plot_seroprev.Rd @@ -1,23 +1,44 @@ % Generated by roxygen2: do not edit by hand -% Please edit documentation in R/visualisation.R +% Please edit documentation in R/visualization.R \name{plot_seroprev} \alias{plot_seroprev} -\title{Generate sero-positivity plot} +\title{Generate sero-positivity plot from raw data} \usage{ -plot_seroprev(model_object, size_text = 25) +plot_seroprev(seroprev_data, size_text = 6) } \arguments{ -\item{model_object}{Object that the run_model function returns with the results of the fit} +\item{seroprev_data}{A data frame containing the data from a seroprevalence survey. +This data frame must contain the following columns: +\tabular{ll}{ +\code{survey} \tab survey Label of the current survey \cr \tab \cr +\code{total} \tab Number of samples for each age group\cr \tab \cr +\code{counts} \tab Number of positive samples for each age group\cr \tab \cr +\code{age_min} \tab age_min \cr \tab \cr +\code{age_max} \tab age_max \cr \tab \cr +\code{year_init} \tab year_init \cr \tab \cr +\code{year_end} \tab year_end \cr \tab \cr +\code{tsur} \tab Year in which the survey took place \cr \tab \cr +\code{country} \tab The country where the survey took place \cr \tab \cr +\code{test} \tab The type of test taken \cr \tab \cr +\code{antibody} \tab antibody \cr \tab \cr +}} -\item{model}{Refers to the model selected} - -\item{xlabel}{Label of axis x} - -\item{ylabel}{Label of axis y} +\item{size_text}{Text size use in the theme of the graph returned by the function.} } \value{ -The sero-positivity plot +A ggplot object containing the seropositivity-vs-age graph of the raw data of a given seroprevalence survey with its corresponging binomial confidence interval. } \description{ -Function that generates the sero positivity plot +Function that generates the sero positivity plot from raw data +} +\examples{ +\dontrun{ + data_test <- prepare_seroprev_data(serodata) + model_object <- run_seroprev_model( + seroprev_data = data_test, + seroprev_model_name = "constant_foi_bi", + n_iters = 1000 +) +plot_seroprev(model_object, size_text = 15) +} } diff --git a/man/plot_seroprev_fitted.Rd b/man/plot_seroprev_fitted.Rd new file mode 100644 index 00000000..72d84c52 --- /dev/null +++ b/man/plot_seroprev_fitted.Rd @@ -0,0 +1,30 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/visualization.R +\name{plot_seroprev_fitted} +\alias{plot_seroprev_fitted} +\title{Generate sero-positivity plot with fitted model} +\usage{ +plot_seroprev_fitted(model_object, size_text = 6) +} +\arguments{ +\item{model_object}{Object containing the results of fitting a model by means of \link{run_seroprev_model}.} + +\item{size_text}{Text size of the graph returned by the function.} +} +\value{ +A ggplot object containing the seropositivity-vs-age graph including the data, the fitted model and their corresponding confindence intervals. +} +\description{ +Function that generates the seropositivity graph with fitted model. Age is located on the x axis and seropositivity on the y axis with its corresponding confidence interval. +} +\examples{ +\dontrun{ + data_test <- prepare_seroprev_data(serodata) + model_object <- run_seroprev_model( + seroprev_data = data_test, + seroprev_model_name = "constant_foi_bi", + n_iters = 1000 +) +plot_seroprev_fitted(model_object, size_text = 15) +} +} diff --git a/man/plot_seroprev_model.Rd b/man/plot_seroprev_model.Rd new file mode 100644 index 00000000..ae613bb5 --- /dev/null +++ b/man/plot_seroprev_model.Rd @@ -0,0 +1,35 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/visualization.R +\name{plot_seroprev_model} +\alias{plot_seroprev_model} +\title{Generate a vertical arrange of plots summarizing the results of the model implementation} +\usage{ +plot_seroprev_model( + model_object, + lambda_sim = NA, + max_lambda = NA, + size_text = 25 +) +} +\arguments{ +\item{model_object}{Object containing the results of fitting a model by means of \link{run_seroprev_model}.} + +\item{size_text}{Text size use in the theme of the graph returned by the function.} +} +\value{ +A ggplot object with a vertical arrange containing the seropositivity, force of infection, and convergence plots. +} +\description{ +Function that generates the combined plots summarizing the results of the model implementation +} +\examples{ +\dontrun{ +data_test <- prepare_seroprev_data(serodata) +model_object <- run_seroprev_model( + seroprev_data = data_test, + seroprev_model_name = "constant_foi_bi", + n_iters = 1000 +) +plot_seroprev_model(model_object, size_text = 15) +} +} diff --git a/man/plot_seroprev_models_grid.Rd b/man/plot_seroprev_models_grid.Rd new file mode 100644 index 00000000..6205ccd4 --- /dev/null +++ b/man/plot_seroprev_models_grid.Rd @@ -0,0 +1,34 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/visualization.R +\name{plot_seroprev_models_grid} +\alias{plot_seroprev_models_grid} +\title{Function that generates a plot showing the results for several models.} +\usage{ +plot_seroprev_models_grid(..., n_row = NULL, n_col = NULL) +} +\arguments{ +\item{...}{The first n arguments of the function are taken as plots ti be arranged in a single plot. +This objects can be obtained by means of \link{run_seroprev_model}.} + +\item{n_row}{Number of rows of the plot array.} + +\item{n_col}{Number of columns of the plot array.} +} +\value{ +A ggplot object containing an array with the plots in \code{models_list}. +} +\description{ +Function that generates a plot showing the results for several models. +} +\examples{ +\dontrun{ + data_test <- prepare_seroprev_data(serodata) + model_object_constant <- run_seroprev_model(seroprev_data = data_test, + seroprev_model_name = "constant_foi_bi") + model_object_normalbi <- run_seroprev_model(seroprev_data = data_test, + seroprev_model_name = "continuous_foi_normal_bi") + model_object_normallog <- run_seroprev_model(seroprev_data = data_test, + seroprev_model_name = "continuous_foi_normal_log") + plot_seroprev_models_list(model_object_constant, model_object_normalbi, model_object_normallog, n_row = 1, n_col = 3) +} +} diff --git a/man/prepare_bin_data.Rd b/man/prepare_bin_data.Rd new file mode 100644 index 00000000..4bd20caa --- /dev/null +++ b/man/prepare_bin_data.Rd @@ -0,0 +1,43 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/seroprevalence_data.R +\name{prepare_bin_data} +\alias{prepare_bin_data} +\title{Prepare data to plot binomial confidence intervals} +\usage{ +prepare_bin_data(seroprev_data) +} +\arguments{ +\item{seroprev_data}{A data frame containing the data from a seroprevalence survey. For more information see the function \link{run_seroprev_model}. +This data frame must contain the following columns: +\tabular{ll}{ +\code{survey} \tab survey Label of the current survey \cr \tab \cr +\code{total} \tab Number of samples for each age group\cr \tab \cr +\code{counts} \tab Number of positive samples for each age group\cr \tab \cr +\code{age_min} \tab age_min \cr \tab \cr +\code{age_max} \tab age_max \cr \tab \cr +\code{year_init} \tab year_init \cr \tab \cr +\code{year_end} \tab year_end \cr \tab \cr +\code{tsur} \tab Year in which the survey took place \cr \tab \cr +\code{country} \tab The country where the survey took place \cr \tab \cr +\code{test} \tab The type of test taken \cr \tab \cr +\code{antibody} \tab antibody \cr \tab \cr +\code{age_mean_f} \tab Floor value of the average between age_min and age_max \cr \tab \cr +\code{sample_size} \tab The size of the sample \cr \tab \cr +\code{birth_year} \tab The year in which the individuals of each age group were bornt \cr \tab \cr +\code{prev_obs} \tab Observed prevalence \cr \tab \cr +\code{prev_obs_lower} \tab Lower limit of the confidence interval for the observed prevalence \cr \tab \cr +\code{prev_obs_upper} \tab Upper limit of the confidence interval for the observed prevalence \cr \tab \cr +} +The last six colums can be added to \code{seroprev_data} by means of the function \code{\link{prepare_seroprev_data}}.} +} +\value{ +data set with the binomial confidence intervals +} +\description{ +Function that prepares the data for modelling +} +\examples{ +\dontrun{ +prepare_bin_data (seroprev_data) +} +} diff --git a/man/prepare_seroprev_data.Rd b/man/prepare_seroprev_data.Rd new file mode 100644 index 00000000..208264ca --- /dev/null +++ b/man/prepare_seroprev_data.Rd @@ -0,0 +1,51 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/seroprevalence_data.R +\name{prepare_seroprev_data} +\alias{prepare_seroprev_data} +\title{Prepare data} +\usage{ +prepare_seroprev_data( + seroprev_data = serodata, + alpha = 0.05, + add_age_mean_f = TRUE +) +} +\arguments{ +\item{seroprev_data}{A data frame containing the data from a seroprevalence survey. +This data frame must contain the following columns: +\tabular{ll}{ +\code{survey} \tab survey Label of the current survey \cr \tab \cr +\code{total} \tab Number of samples for each age group\cr \tab \cr +\code{counts} \tab Number of positive samples for each age group\cr \tab \cr +\code{age_min} \tab age_min \cr \tab \cr +\code{age_max} \tab age_max \cr \tab \cr +\code{year_init} \tab year_init \cr \tab \cr +\code{year_end} \tab year_end \cr \tab \cr +\code{tsur} \tab Year in which the survey took place \cr \tab \cr +\code{country} \tab The country where the survey took place \cr \tab \cr +\code{test} \tab The type of test taken \cr \tab \cr +\code{antibody} \tab antibody \cr \tab \cr +}} + +\item{alpha}{probability of a type I error. For further details refer to \link{Hmisc::binconf}.} +} +\value{ +seroprev_data with additional columns necessary for the analysis. These columns are: +\tabular{ll}{ +\code{age_mean_f} \tab Floor value of the average between age_min and age_max \cr \tab \cr +\code{sample_size} \tab The size of the sample \cr \tab \cr +\code{birth_year} \tab The year in which the individuals of each age group were bornt \cr \tab \cr +\code{prev_obs} \tab Observed prevalence \cr \tab \cr +\code{prev_obs_lower} \tab Lower limit of the confidence interval for the observed prevalence \cr \tab \cr +\code{prev_obs_upper} \tab Upper limit of the confidence interval for the observed prevalence \cr \tab \cr +} +} +\description{ +Function that prepares the data for modelling +} +\examples{ +\dontrun{ +data_test <- readRDS("data/data.RDS") +data_test <- prepare_seroprev_data(seroprev_data, alpha) +} +} diff --git a/man/run_model.Rd b/man/run_model.Rd deleted file mode 100644 index ac0a92e4..00000000 --- a/man/run_model.Rd +++ /dev/null @@ -1,35 +0,0 @@ -% Generated by roxygen2: do not edit by hand -% Please edit documentation in R/modelling.R -\name{run_model} -\alias{run_model} -\title{Run model -Function that runs the specified model} -\usage{ -run_model( - model_data, - model_name = "constant_foi_bi", - n_iters = 1000, - n_thin = 2, - delta = 0.9, - m_treed = 10, - decades = 0 -) -} -\arguments{ -\item{model_data}{model_data} - -\item{model_name}{name of the model selected} - -\item{n_iters}{number of iterations. Each model has a value by default.} - -\item{survey}{survey} - -\item{model}{refers to model selected} -} -\value{ -model_object of model -} -\description{ -Run model -Function that runs the specified model -} diff --git a/man/run_seroprev_model.Rd b/man/run_seroprev_model.Rd new file mode 100644 index 00000000..89923bc5 --- /dev/null +++ b/man/run_seroprev_model.Rd @@ -0,0 +1,72 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/modelling.R +\name{run_seroprev_model} +\alias{run_seroprev_model} +\title{Run the specified stan model for the force-of-infection} +\usage{ +run_seroprev_model( + seroprev_data, + seroprev_model_name = "constant_foi_bi", + n_iters = 1000, + n_thin = 2, + delta = 0.9, + m_treed = 10, + decades = 0 +) +} +\arguments{ +\item{seroprev_data}{A data frame containing the data from a seroprevalence survey. +This data frame must contain the following columns: +\tabular{ll}{ +\code{survey} \tab survey Label of the current survey \cr \tab \cr +\code{total} \tab Number of samples for each age group\cr \tab \cr +\code{counts} \tab Number of positive samples for each age group\cr \tab \cr +\code{age_min} \tab age_min \cr \tab \cr +\code{age_max} \tab age_max \cr \tab \cr +\code{year_init} \tab year_init \cr \tab \cr +\code{year_end} \tab year_end \cr \tab \cr +\code{tsur} \tab Year in which the survey took place \cr \tab \cr +\code{country} \tab The country where the survey took place \cr \tab \cr +\code{test} \tab The type of test taken \cr \tab \cr +\code{antibody} \tab antibody \cr \tab \cr +\code{age_mean_f} \tab Floor value of the average between age_min and age_max \cr \tab \cr +\code{sample_size} \tab The size of the sample \cr \tab \cr +\code{birth_year} \tab The year in which the individuals of each age group were bornt \cr \tab \cr +\code{prev_obs} \tab Observed prevalence \cr \tab \cr +\code{prev_obs_lower} \tab Lower limit of the confidence interval for the observed prevalence \cr \tab \cr +\code{prev_obs_upper} \tab Upper limit of the confidence interval for the observed prevalence \cr \tab \cr +} +The last six colums can be added to \code{seroprev_data} by means of the function \code{\link{prepare_seroprev_data}}.} + +\item{seroprev_model_name}{Name of the selected model. Current version provides three options: +\describe{ +\item{\code{"constant_foi_bi"}}{Runs a constant model} +\item{\code{"continuous_foi_normal_bi"}}{Runs a normal model} +\item{\code{"continuous_foi_normal_log"}}{Runs a normal logarithmic model} +}} + +\item{n_iters}{Number of interations for eah chain including the warmup. \code{iter} in \link[rstan]{sampling}.} + +\item{n_thin}{Positive integer specifying the period for saving samples. \code{thin} in \link[rstan]{sampling}.} + +\item{delta}{Real number between 0 and 1 that represents the target average acceptance probability. +Increasing the value of \code{delta} will result in a smaller step size and fewer divergences. +For further details refer to the \code{control} parameter in \link[rstan]{sampling} or \href{https://mc-stan.org/rstanarm/reference/adapt_delta.html}{here}.} + +\item{m_treed}{Maximum tree depth for the binary tree used in the NUTS stan sampler. For further details refer to the \code{control} parameter in \link[rstan]{sampling}.} + +\item{decades}{Number of decades covered by the survey data.} +} +\value{ +\code{model_object}. An object containing relevant information about the implementation of the model. For further details refer to \link{fit_seroprev_model}. +} +\description{ +Run the specified stan model for the force-of-infection +} +\examples{ +\dontrun{ +seroprev_data <- prepare_seroprev_data(serodata) +run_seroprev_model (seroprev_data, + seroprev_model_name = "constant_foi_bi") +} +} diff --git a/man/save_or_load_model.Rd b/man/save_or_load_model.Rd new file mode 100644 index 00000000..49832931 --- /dev/null +++ b/man/save_or_load_model.Rd @@ -0,0 +1,27 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/modelling.R +\name{save_or_load_model} +\alias{save_or_load_model} +\title{Save or load model} +\usage{ +save_or_load_model(seroprev_model_name = "constant_foi_bi") +} +\arguments{ +\item{seroprev_model_name}{Name of the selected model. Current version provides three options: +\describe{ +\item{\code{"constant_foi_bi"}}{Runs a constant model} +\item{\code{"continuous_foi_normal_bi"}}{Runs a normal model} +\item{\code{"continuous_foi_normal_log"}}{Runs a normal logarithmic model} +}} +} +\value{ +\code{model}. The rstan model object corresponding to the selected model. +} +\description{ +This function determines whether the corresponding .RDS file of the selected model exists or not. +In case the .RDS file exists, it is read and returned; otherwise, the object model is created through the +\link[rstan]{stan_model} function, saved as an .RDS file and returned as the output of the function. +} +\examples{ +save_or_load_model(seroprev_model_name="constant_foi_bi") +} diff --git a/man/save_or_read_model.Rd b/man/save_or_read_model.Rd deleted file mode 100644 index f6448902..00000000 --- a/man/save_or_read_model.Rd +++ /dev/null @@ -1,16 +0,0 @@ -% Generated by roxygen2: do not edit by hand -% Please edit documentation in R/modelling.R -\name{save_or_read_model} -\alias{save_or_read_model} -\title{Save or read model -Function that saves the .RDS file of the model} -\usage{ -save_or_read_model(model_name = "constant_foi_bi") -} -\arguments{ -\item{model_name}{name of the model selected} -} -\description{ -Save or read model -Function that saves the .RDS file of the model -} diff --git a/man/serodata.Rd b/man/serodata.Rd new file mode 100644 index 00000000..a0644814 --- /dev/null +++ b/man/serodata.Rd @@ -0,0 +1,19 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/serodata.R +\docType{data} +\name{serodata} +\alias{serodata} +\title{Seroprevalence data on serofoi} +\format{ +An object of class \code{"cross"}; see \code{\link[qtl]{read.cross}}. +} +\usage{ +serodata +} +\description{ +Data from a serological surveys +} +\examples{ +serodata +} +\keyword{datasets} diff --git a/man/serofoi-package.Rd b/man/serofoi-package.Rd index 060ef941..6c1380ea 100644 --- a/man/serofoi-package.Rd +++ b/man/serofoi-package.Rd @@ -4,8 +4,26 @@ \name{serofoi-package} \alias{serofoi} \alias{serofoi-package} -\title{serofoi: Estimates the Force-of-Infection of a given pathogen from population based sero-prevalence studies.} +\title{serofoi: Estimates the Force-of-Infection of a given pathogen from population based sero-prevalence studies} \description{ -Provides functions for estimate the Force-of-Infection of a given pathogen from population based sero-prevalence studies on a bayesian framework. +R package to estimate time-varying Force-of-Infection of a given pathogen from population based sero-prevalence studies using a bayesian framework. +} +\seealso{ +Useful links: +\itemize{ + \item \url{https://trace-lac.github.io/serofoi/} +} + +} +\author{ +\strong{Maintainer}: Zulma M. Cucunubá \email{zulma.cucunuba@javeriana.edu.co} (\href{https://orcid.org/0000-0002-8165-3198}{ORCID}) + +Authors: +\itemize{ + \item Nicolás Torres + \item Ben Lambert + \item Pierre Nouvellet +} + } \keyword{internal} diff --git a/man/veev2012.Rd b/man/veev2012.Rd new file mode 100644 index 00000000..c095cc7f --- /dev/null +++ b/man/veev2012.Rd @@ -0,0 +1,19 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/veev2012.R +\docType{data} +\name{veev2012} +\alias{veev2012} +\title{Seroprevalence data on serofoi} +\format{ +An object of class \code{"cross"}; see \code{\link[qtl]{read.cross}}. +} +\usage{ +veev2012 +} +\description{ +Data from a serological surveys +} +\examples{ +veev2012 +} +\keyword{datasets} diff --git a/references.bib b/references.bib new file mode 100644 index 00000000..fc3a76e1 --- /dev/null +++ b/references.bib @@ -0,0 +1,30 @@ + +@article{cucunubá2017, + title = {Modelling historical changes in the force-of-infection of Chagas disease to inform control and elimination programmes: application in Colombia}, + author = {{Cucunubá}, Zulma M and Nouvellet, Pierre and Conteh, Lesong and Vera, Mauricio Javier and Angulo, Victor Manuel and Dib, Juan Carlos and Parra -Henao, Gabriel Jaime and {Basáñez}, {María Gloria}}, + year = {2017}, + month = {09}, + date = {2017-09}, + journal = {BMJ Global Health}, + pages = {e000345}, + volume = {2}, + number = {3}, + doi = {10.1136/bmjgh-2017-000345}, + url = {http://dx.doi.org/10.1136/bmjgh-2017-000345}, + langid = {en} +} + +@article{carrera2020, + title = {Endemic and Epidemic Human Alphavirus Infections in Eastern Panama: An Analysis of Population-Based Cross-Sectional Surveys}, + author = {Carrera, Jean-Paul and {Cucunubá}, Zulma M. and Neira, Karen and Lambert, Ben and {Pittí}, Yaneth and Liscano, Jesus and {Garzón}, Jorge L. and Beltran, Davis and Collado-Mariscal, Luisa and Saenz, Lisseth and Sosa, {Néstor} and Rodriguez-Guzman, Luis D. and {González}, Publio and Lezcano, {Andrés G.} and {Pereyra-Elías}, {Reneé} and Valderrama, Anayansi and Weaver, Scott C. and Vittor, Amy Y. and {Armién}, Blas and Pascale, Juan-Miguel and Donnelly, Christl A.}, + year = {2020}, + month = {12}, + date = {2020-12-02}, + journal = {The American Journal of Tropical Medicine and Hygiene}, + pages = {2429--2437}, + volume = {103}, + number = {6}, + doi = {10.4269/ajtmh.20-0408}, + url = {http://dx.doi.org/10.4269/ajtmh.20-0408}, + langid = {en} +} diff --git a/test/test_individual_models.R b/test/test_individual_models.R deleted file mode 100644 index 42661275..00000000 --- a/test/test_individual_models.R +++ /dev/null @@ -1,48 +0,0 @@ -rm(list = ls()) - -library(devtools) -library(dplyr) - -# install.packages("github/serofoi") #pendiente -source("R/modelling.R") -source("R/seroprevalence_data.R") -source("R/model_comparison.R") -source("R/visualisation.R") - -#----- Folder where results will be stored -# test_dir <- epitrix::clean_labels(paste0("tests_", Sys.time())) - -#----- Read and prepare data -data_test <- readRDS("data/data.RDS") %>% prepare_data(alpha=0.05) - -#----- Test each model -model_0_object <- run_model(model_data = data_test, - model_name = "constant_foi_bi", - n_iters = 1000) - -# model_1_object <- run_model(model_data = data_test, -# model_name = "continuous_foi_normal_bi", -# n_iters = 1000) -# -# model_2_object <- run_model(model_data = data_test, -# model_name = "continuous_foi_normal_log", -# n_iters = 1000) - -#----- Generate all plots for each model - -# model_0_plot <- plot_model(model_0_object, size_text = 6) -# model_1_plot <- plot_model(model_1_object, size_text = 6) -# model_2_plot <- plot_model(model_2_object, size_text = 6) - -#----- Generate each individual plot - -# plot_seroprev(model_0_object, size_text = 15) -# plot_foi(model_0_object, size_text = 15) -# plot_rhats(model_0_object, size_text = 15) - -# bayesplot::mcmc_trace(model_1_object$fit, pars="lambda0") - - -xxx <- "Zulma MILENA cucunuba 3455" -epitrix::clean_labels(xxx) - diff --git a/test/test_vignette.R b/test/test_vignette.R new file mode 100644 index 00000000..3a1d29f7 --- /dev/null +++ b/test/test_vignette.R @@ -0,0 +1,14 @@ +library(remotes) +remotes::install_github("TRACE-LAC/serofoi", + ref = "dev-zulma-vignette", + force = TRUE) +library(serofoi) + + +library(usethis) +# usethis::use_pkgdown() +# pkgdown::build_site() +# use_github_pages(branch = "dev-zulma-vignette", path = "/", cname = NA) + +library(roxygen2) +roxygen2::roxygenise() diff --git a/test/testing_vignette.R b/test/testing_vignette.R new file mode 100644 index 00000000..426415e4 --- /dev/null +++ b/test/testing_vignette.R @@ -0,0 +1,16 @@ +# TODO Convert to testthat. Move to `tests/` +library(remotes) +remotes::install_github("TRACE-LAC/serofoi", + ref = "dev", + force = TRUE) + + +library(serofoi) +library(usethis) +library(roxygen2) + +devtools::document() +usethis::use_pkgdown_github_pages() +pkgdown::build_site() +use_github_pages(branch = "dev", path = "/", cname = NA) + diff --git a/tests/docker/.bashrc b/tests/docker/.bashrc new file mode 100644 index 00000000..be8da80e --- /dev/null +++ b/tests/docker/.bashrc @@ -0,0 +1 @@ +export R_LIBS=/root/.R/site-library \ No newline at end of file diff --git a/tests/docker/Dockerfile b/tests/docker/Dockerfile new file mode 100644 index 00000000..b1370942 --- /dev/null +++ b/tests/docker/Dockerfile @@ -0,0 +1,13 @@ +FROM rocker/tidyverse:4.2.2 +VOLUME [ "/root/.R/site-library"] +ARG docker_folder="tests/docker" +WORKDIR /package +ADD . . +ADD ${docker_folder}/*.sh ./ +# ADD ${docker_folder}/install_deps.sh . + +WORKDIR /root +ADD ${docker_folder}/.bashrc . +# -i ensures that any command will source .bashrc https://stackoverflow.com/a/37286648 +# ENTRYPOINT ["/bin/bash", "-c", "-i"] +ENTRYPOINT ["/bin/bash", "-c"] \ No newline at end of file diff --git a/tests/docker/check.sh b/tests/docker/check.sh new file mode 100755 index 00000000..7034525d --- /dev/null +++ b/tests/docker/check.sh @@ -0,0 +1,2 @@ +#!/bin/bash +R --no-echo --no-restore -e 'devtools::check(manual=FALSE, cran = TRUE, error_on=c("error"))' \ No newline at end of file diff --git a/tests/docker/docker_scripts.R b/tests/docker/docker_scripts.R new file mode 100644 index 00000000..4af7954d --- /dev/null +++ b/tests/docker/docker_scripts.R @@ -0,0 +1,33 @@ +# Clear everything +system( + "docker container rm rtest-container; docker volume rm rtest-site-library; docker image rm -f rtest-image" +) +# Rebuild Docker image +system( + "docker container kill rtest-container; rm -f inst/extdata/stanmodels/*.rds; docker build -t rtest-image . -f tests/docker/Dockerfile" +) + +# Install R dependencies +system( + "docker container kill rtest-container ; docker container rm rtest-container ; docker run --name rtest-container --env R_LIBS=/root/.R/site-library -v rtest-site-library:/root/.R/site-library rtest-image 'cd /package && ./install_deps.sh'" +) + +# Run tests +system( + "docker container kill rtest-container ; docker container rm rtest-container ; docker container run --rm --name rtest-container --env R_LIBS=/root/.R/site-library -v rtest-site-library:/root/.R/site-library rtest-image 'cd /package && ./run_tests.sh'" +) + +# R CMD Check +system( + "docker container kill rtest-container ; docker container rm rtest-container ; docker container run --rm --name rtest-container --env R_LIBS=/root/.R/site-library -v rtest-site-library:/root/.R/site-library rtest-image 'cd /package && ./check.sh'" +) + +# Bash shell +system( + "docker container run -it --rm --env R_LIBS=/root/.R/site-library -v rtest-site-library:/root/.R/site-library rtest-image bash" +) + +# R Shell +system( + "docker container run -it --rm --env R_LIBS=/root/.R/site-library -v rtest-site-library:/root/.R/site-library rtest-image R" +) diff --git a/tests/docker/install_deps.sh b/tests/docker/install_deps.sh new file mode 100755 index 00000000..f7232f18 --- /dev/null +++ b/tests/docker/install_deps.sh @@ -0,0 +1,3 @@ +#!/bin/bash +Rscript -e 'install.packages("testthat", force=FALSE)' +Rscript -e 'devtools::install_deps(upgrade="always")' \ No newline at end of file diff --git a/tests/docker/run_tests.sh b/tests/docker/run_tests.sh new file mode 100755 index 00000000..602617e8 --- /dev/null +++ b/tests/docker/run_tests.sh @@ -0,0 +1,2 @@ +#!/bin/bash +R --no-echo --no-restore -e 'devtools::test()' \ No newline at end of file diff --git a/tests/testepidemics.R b/tests/testepidemics.R new file mode 100644 index 00000000..be7a80b2 --- /dev/null +++ b/tests/testepidemics.R @@ -0,0 +1,2 @@ +# TODO Convert to testthat + diff --git a/tests/testthat.R b/tests/testthat.R new file mode 100644 index 00000000..75afe8e5 --- /dev/null +++ b/tests/testthat.R @@ -0,0 +1,4 @@ +library(testthat) +library(serofoi) + +test_check("serofoi") \ No newline at end of file diff --git a/tests/testthat/_snaps/comparison/comp_table.csv b/tests/testthat/_snaps/comparison/comp_table.csv new file mode 100644 index 00000000..d70bd29d --- /dev/null +++ b/tests/testthat/_snaps/comparison/comp_table.csv @@ -0,0 +1,4 @@ +"","seroprev_model_name","dataset","country","year","test","antibody","n_sample","n_agec","n_iter","elpd","se","converged","difference","diff_se","best_elpd","pvalue" +"1","constant_foi_bi","COL-035-93","COL",2012,"ELISA","IgG anti-T.cruzi",747,72,1000,-92.75,6.4,"Yes",0,1,NA,0.500041 +"2","continuous_foi_normal_bi","COL-035-93","COL",2012,"ELISA","IgG anti-T.cruzi",747,72,1000,-74.46,5.88,"Yes",0,-18.2967809927967,NA,0.500009 +"3","continuous_foi_normal_log","COL-035-93","COL",2012,"ELISA","IgG anti-T.cruzi",747,72,1000,-71.93,7.11,"Yes",0,-20.8238925781359,NA,0.500029 diff --git a/tests/testthat/clean_expected_files.R b/tests/testthat/clean_expected_files.R new file mode 100644 index 00000000..5fdd7a6f --- /dev/null +++ b/tests/testthat/clean_expected_files.R @@ -0,0 +1,14 @@ +# This script removes the svg and csv files used in the automatic tests +# This is usually required when a new version of rstan is released +# Random number generation changes for each new version of +# Rstan, https://mc-stan.org/docs/2_18/reference-manual/reproducibility-chapter.html +# thus all files with expected results (both tables and graphs) become +# obsolete and need to be regenerated + +library(testthat) +paths <- c(testthat::test_path("_snaps", "*"), testthat::test_path("extdata", "dataframes", "expected", "*")) + +for (path in paths) { + cat("Deleting", path, "\n") + unlink(path, recursive = TRUE) +} diff --git a/tests/testthat/extdata/case_A_sim_data_grouped.csv b/tests/testthat/extdata/case_A_sim_data_grouped.csv new file mode 100644 index 00000000..863a63f2 --- /dev/null +++ b/tests/testthat/extdata/case_A_sim_data_grouped.csv @@ -0,0 +1,11 @@ +"age_group","total","counts","tsur","country","survey","test","antibody","age_min","age_max","age_mean_f","sample_size","birth_year","prev_obs","prev_obs_lower","prev_obs_upper" +"01-04",20,1,2050,"None","sim_foi","fake","IgG",1,4,2,250,2048,0.05,0.00126508949794981,0.248732762772028 +"05-09",25,2,2050,"None","sim_foi","fake","IgG",5,9,7,250,2043,0.08,0.00983959001879751,0.260305842105214 +"10-14",25,7,2050,"None","sim_foi","fake","IgG",10,14,12,250,2038,0.28,0.120716688504067,0.493876821806256 +"15-19",25,5,2050,"None","sim_foi","fake","IgG",15,19,17,250,2033,0.2,0.068311464012484,0.407037432278677 +"20-24",25,9,2050,"None","sim_foi","fake","IgG",20,24,22,250,2028,0.36,0.179716820583655,0.57479365044615 +"25-29",25,7,2050,"None","sim_foi","fake","IgG",25,29,27,250,2023,0.28,0.120716688504067,0.493876821806256 +"30-34",25,8,2050,"None","sim_foi","fake","IgG",30,34,32,250,2018,0.32,0.14949542261357,0.535000717497372 +"35-39",25,11,2050,"None","sim_foi","fake","IgG",35,39,37,250,2013,0.44,0.244023665147208,0.650718366008664 +"40-44",25,16,2050,"None","sim_foi","fake","IgG",40,44,42,250,2008,0.64,0.425206349553849,0.820283179416345 +"45-50",30,18,2050,"None","sim_foi","fake","IgG",45,50,47,250,2003,0.6,0.406034930051819,0.773442351171406 diff --git a/tests/testthat/extdata/case_A_sim_data_no_grouped.csv b/tests/testthat/extdata/case_A_sim_data_no_grouped.csv new file mode 100644 index 00000000..363fc0f5 --- /dev/null +++ b/tests/testthat/extdata/case_A_sim_data_no_grouped.csv @@ -0,0 +1,51 @@ +"birth_year","tsur","country","test","antibody","survey","age_mean_f","counts","total","prev_obs","prev_obs_lower","prev_obs_upper","age_min","age_max" +2049,2050,"None","fake","IgG","sim_foi",1,0,5,0,0,0.521823750104981,1,1 +2048,2050,"None","fake","IgG","sim_foi",2,0,5,0,0,0.521823750104981,2,2 +2047,2050,"None","fake","IgG","sim_foi",3,0,5,0,0,0.521823750104981,3,3 +2046,2050,"None","fake","IgG","sim_foi",4,1,5,0.2,0.00505076337946806,0.716417936118089,4,4 +2045,2050,"None","fake","IgG","sim_foi",5,0,5,0,0,0.521823750104981,5,5 +2044,2050,"None","fake","IgG","sim_foi",6,0,5,0,0,0.521823750104981,6,6 +2043,2050,"None","fake","IgG","sim_foi",7,1,5,0.2,0.00505076337946806,0.716417936118089,7,7 +2042,2050,"None","fake","IgG","sim_foi",8,0,5,0,0,0.521823750104981,8,8 +2041,2050,"None","fake","IgG","sim_foi",9,1,5,0.2,0.00505076337946806,0.716417936118089,9,9 +2040,2050,"None","fake","IgG","sim_foi",10,1,5,0.2,0.00505076337946806,0.716417936118089,10,10 +2039,2050,"None","fake","IgG","sim_foi",11,2,5,0.4,0.0527449505263169,0.853367200365327,11,11 +2038,2050,"None","fake","IgG","sim_foi",12,4,5,0.8,0.283582063881911,0.994949236620532,12,12 +2037,2050,"None","fake","IgG","sim_foi",13,0,5,0,0,0.521823750104981,13,13 +2036,2050,"None","fake","IgG","sim_foi",14,0,5,0,0,0.521823750104981,14,14 +2035,2050,"None","fake","IgG","sim_foi",15,2,5,0.4,0.0527449505263169,0.853367200365327,15,15 +2034,2050,"None","fake","IgG","sim_foi",16,0,5,0,0,0.521823750104981,16,16 +2033,2050,"None","fake","IgG","sim_foi",17,1,5,0.2,0.00505076337946806,0.716417936118089,17,17 +2032,2050,"None","fake","IgG","sim_foi",18,1,5,0.2,0.00505076337946806,0.716417936118089,18,18 +2031,2050,"None","fake","IgG","sim_foi",19,1,5,0.2,0.00505076337946806,0.716417936118089,19,19 +2030,2050,"None","fake","IgG","sim_foi",20,1,5,0.2,0.00505076337946806,0.716417936118089,20,20 +2029,2050,"None","fake","IgG","sim_foi",21,0,5,0,0,0.521823750104981,21,21 +2028,2050,"None","fake","IgG","sim_foi",22,3,5,0.6,0.146632799634673,0.947255049473683,22,22 +2027,2050,"None","fake","IgG","sim_foi",23,3,5,0.6,0.146632799634673,0.947255049473683,23,23 +2026,2050,"None","fake","IgG","sim_foi",24,2,5,0.4,0.0527449505263169,0.853367200365327,24,24 +2025,2050,"None","fake","IgG","sim_foi",25,3,5,0.6,0.146632799634673,0.947255049473683,25,25 +2024,2050,"None","fake","IgG","sim_foi",26,1,5,0.2,0.00505076337946806,0.716417936118089,26,26 +2023,2050,"None","fake","IgG","sim_foi",27,0,5,0,0,0.521823750104981,27,27 +2022,2050,"None","fake","IgG","sim_foi",28,1,5,0.2,0.00505076337946806,0.716417936118089,28,28 +2021,2050,"None","fake","IgG","sim_foi",29,2,5,0.4,0.0527449505263169,0.853367200365327,29,29 +2020,2050,"None","fake","IgG","sim_foi",30,2,5,0.4,0.0527449505263169,0.853367200365327,30,30 +2019,2050,"None","fake","IgG","sim_foi",31,1,5,0.2,0.00505076337946806,0.716417936118089,31,31 +2018,2050,"None","fake","IgG","sim_foi",32,1,5,0.2,0.00505076337946806,0.716417936118089,32,32 +2017,2050,"None","fake","IgG","sim_foi",33,2,5,0.4,0.0527449505263169,0.853367200365327,33,33 +2016,2050,"None","fake","IgG","sim_foi",34,2,5,0.4,0.0527449505263169,0.853367200365327,34,34 +2015,2050,"None","fake","IgG","sim_foi",35,1,5,0.2,0.00505076337946806,0.716417936118089,35,35 +2014,2050,"None","fake","IgG","sim_foi",36,3,5,0.6,0.146632799634673,0.947255049473683,36,36 +2013,2050,"None","fake","IgG","sim_foi",37,1,5,0.2,0.00505076337946806,0.716417936118089,37,37 +2012,2050,"None","fake","IgG","sim_foi",38,3,5,0.6,0.146632799634673,0.947255049473683,38,38 +2011,2050,"None","fake","IgG","sim_foi",39,3,5,0.6,0.146632799634673,0.947255049473683,39,39 +2010,2050,"None","fake","IgG","sim_foi",40,2,5,0.4,0.0527449505263169,0.853367200365327,40,40 +2009,2050,"None","fake","IgG","sim_foi",41,3,5,0.6,0.146632799634673,0.947255049473683,41,41 +2008,2050,"None","fake","IgG","sim_foi",42,2,5,0.4,0.0527449505263169,0.853367200365327,42,42 +2007,2050,"None","fake","IgG","sim_foi",43,4,5,0.8,0.283582063881911,0.994949236620532,43,43 +2006,2050,"None","fake","IgG","sim_foi",44,5,5,1,0.478176249895019,1,44,44 +2005,2050,"None","fake","IgG","sim_foi",45,3,5,0.6,0.146632799634673,0.947255049473683,45,45 +2004,2050,"None","fake","IgG","sim_foi",46,2,5,0.4,0.0527449505263169,0.853367200365327,46,46 +2003,2050,"None","fake","IgG","sim_foi",47,3,5,0.6,0.146632799634673,0.947255049473683,47,47 +2002,2050,"None","fake","IgG","sim_foi",48,3,5,0.6,0.146632799634673,0.947255049473683,48,48 +2001,2050,"None","fake","IgG","sim_foi",49,3,5,0.6,0.146632799634673,0.947255049473683,49,49 +2000,2050,"None","fake","IgG","sim_foi",50,4,5,0.8,0.283582063881911,0.994949236620532,50,50 diff --git a/tests/testthat/extdata/case_B_sim_data_grouped.csv b/tests/testthat/extdata/case_B_sim_data_grouped.csv new file mode 100644 index 00000000..aa57d1e2 --- /dev/null +++ b/tests/testthat/extdata/case_B_sim_data_grouped.csv @@ -0,0 +1,11 @@ +"age_group","total","counts","tsur","country","survey","test","antibody","age_min","age_max","age_mean_f","sample_size","birth_year","prev_obs","prev_obs_lower","prev_obs_upper" +"01-04",20,0,2050,"None","sim_foi","fake","IgG",1,4,2,250,2048,0,0,0.168433470983085 +"05-09",25,0,2050,"None","sim_foi","fake","IgG",5,9,7,250,2043,0,0,0.137185171530713 +"10-14",25,0,2050,"None","sim_foi","fake","IgG",10,14,12,250,2038,0,0,0.137185171530713 +"15-19",25,3,2050,"None","sim_foi","fake","IgG",15,19,17,250,2033,0.12,0.0254653966477332,0.312190307286235 +"20-24",25,7,2050,"None","sim_foi","fake","IgG",20,24,22,250,2028,0.28,0.120716688504067,0.493876821806256 +"25-29",25,21,2050,"None","sim_foi","fake","IgG",25,29,27,250,2023,0.84,0.639171545540728,0.95462054762829 +"30-34",25,25,2050,"None","sim_foi","fake","IgG",30,34,32,250,2018,1,0.862814828469287,1 +"35-39",25,23,2050,"None","sim_foi","fake","IgG",35,39,37,250,2013,0.92,0.739694157894786,0.990160409981202 +"40-44",25,25,2050,"None","sim_foi","fake","IgG",40,44,42,250,2008,1,0.862814828469287,1 +"45-50",30,30,2050,"None","sim_foi","fake","IgG",45,50,47,250,2003,1,0.884296691777972,1 diff --git a/tests/testthat/extdata/case_B_sim_data_no_grouped.csv b/tests/testthat/extdata/case_B_sim_data_no_grouped.csv new file mode 100644 index 00000000..80750082 --- /dev/null +++ b/tests/testthat/extdata/case_B_sim_data_no_grouped.csv @@ -0,0 +1,51 @@ +"birth_year","tsur","country","test","antibody","survey","age_mean_f","counts","total","prev_obs","prev_obs_lower","prev_obs_upper","age_min","age_max" +2049,2050,"None","fake","IgG","sim_foi",1,0,5,0,0,0.521823750104981,1,1 +2048,2050,"None","fake","IgG","sim_foi",2,0,5,0,0,0.521823750104981,2,2 +2047,2050,"None","fake","IgG","sim_foi",3,0,5,0,0,0.521823750104981,3,3 +2046,2050,"None","fake","IgG","sim_foi",4,0,5,0,0,0.521823750104981,4,4 +2045,2050,"None","fake","IgG","sim_foi",5,0,5,0,0,0.521823750104981,5,5 +2044,2050,"None","fake","IgG","sim_foi",6,0,5,0,0,0.521823750104981,6,6 +2043,2050,"None","fake","IgG","sim_foi",7,0,5,0,0,0.521823750104981,7,7 +2042,2050,"None","fake","IgG","sim_foi",8,0,5,0,0,0.521823750104981,8,8 +2041,2050,"None","fake","IgG","sim_foi",9,0,5,0,0,0.521823750104981,9,9 +2040,2050,"None","fake","IgG","sim_foi",10,0,5,0,0,0.521823750104981,10,10 +2039,2050,"None","fake","IgG","sim_foi",11,0,5,0,0,0.521823750104981,11,11 +2038,2050,"None","fake","IgG","sim_foi",12,0,5,0,0,0.521823750104981,12,12 +2037,2050,"None","fake","IgG","sim_foi",13,0,5,0,0,0.521823750104981,13,13 +2036,2050,"None","fake","IgG","sim_foi",14,0,5,0,0,0.521823750104981,14,14 +2035,2050,"None","fake","IgG","sim_foi",15,0,5,0,0,0.521823750104981,15,15 +2034,2050,"None","fake","IgG","sim_foi",16,0,5,0,0,0.521823750104981,16,16 +2033,2050,"None","fake","IgG","sim_foi",17,1,5,0.2,0.00505076337946806,0.716417936118089,17,17 +2032,2050,"None","fake","IgG","sim_foi",18,1,5,0.2,0.00505076337946806,0.716417936118089,18,18 +2031,2050,"None","fake","IgG","sim_foi",19,1,5,0.2,0.00505076337946806,0.716417936118089,19,19 +2030,2050,"None","fake","IgG","sim_foi",20,2,5,0.4,0.0527449505263169,0.853367200365327,20,20 +2029,2050,"None","fake","IgG","sim_foi",21,0,5,0,0,0.521823750104981,21,21 +2028,2050,"None","fake","IgG","sim_foi",22,1,5,0.2,0.00505076337946806,0.716417936118089,22,22 +2027,2050,"None","fake","IgG","sim_foi",23,1,5,0.2,0.00505076337946806,0.716417936118089,23,23 +2026,2050,"None","fake","IgG","sim_foi",24,3,5,0.6,0.146632799634673,0.947255049473683,24,24 +2025,2050,"None","fake","IgG","sim_foi",25,2,5,0.4,0.0527449505263169,0.853367200365327,25,25 +2024,2050,"None","fake","IgG","sim_foi",26,4,5,0.8,0.283582063881911,0.994949236620532,26,26 +2023,2050,"None","fake","IgG","sim_foi",27,5,5,1,0.478176249895019,1,27,27 +2022,2050,"None","fake","IgG","sim_foi",28,5,5,1,0.478176249895019,1,28,28 +2021,2050,"None","fake","IgG","sim_foi",29,5,5,1,0.478176249895019,1,29,29 +2020,2050,"None","fake","IgG","sim_foi",30,5,5,1,0.478176249895019,1,30,30 +2019,2050,"None","fake","IgG","sim_foi",31,5,5,1,0.478176249895019,1,31,31 +2018,2050,"None","fake","IgG","sim_foi",32,5,5,1,0.478176249895019,1,32,32 +2017,2050,"None","fake","IgG","sim_foi",33,5,5,1,0.478176249895019,1,33,33 +2016,2050,"None","fake","IgG","sim_foi",34,5,5,1,0.478176249895019,1,34,34 +2015,2050,"None","fake","IgG","sim_foi",35,4,5,0.8,0.283582063881911,0.994949236620532,35,35 +2014,2050,"None","fake","IgG","sim_foi",36,5,5,1,0.478176249895019,1,36,36 +2013,2050,"None","fake","IgG","sim_foi",37,4,5,0.8,0.283582063881911,0.994949236620532,37,37 +2012,2050,"None","fake","IgG","sim_foi",38,5,5,1,0.478176249895019,1,38,38 +2011,2050,"None","fake","IgG","sim_foi",39,5,5,1,0.478176249895019,1,39,39 +2010,2050,"None","fake","IgG","sim_foi",40,5,5,1,0.478176249895019,1,40,40 +2009,2050,"None","fake","IgG","sim_foi",41,5,5,1,0.478176249895019,1,41,41 +2008,2050,"None","fake","IgG","sim_foi",42,5,5,1,0.478176249895019,1,42,42 +2007,2050,"None","fake","IgG","sim_foi",43,5,5,1,0.478176249895019,1,43,43 +2006,2050,"None","fake","IgG","sim_foi",44,5,5,1,0.478176249895019,1,44,44 +2005,2050,"None","fake","IgG","sim_foi",45,5,5,1,0.478176249895019,1,45,45 +2004,2050,"None","fake","IgG","sim_foi",46,5,5,1,0.478176249895019,1,46,46 +2003,2050,"None","fake","IgG","sim_foi",47,5,5,1,0.478176249895019,1,47,47 +2002,2050,"None","fake","IgG","sim_foi",48,5,5,1,0.478176249895019,1,48,48 +2001,2050,"None","fake","IgG","sim_foi",49,5,5,1,0.478176249895019,1,49,49 +2000,2050,"None","fake","IgG","sim_foi",50,5,5,1,0.478176249895019,1,50,50 diff --git a/tests/testthat/extdata/data.RDS b/tests/testthat/extdata/data.RDS new file mode 100644 index 00000000..885b2727 Binary files /dev/null and b/tests/testthat/extdata/data.RDS differ diff --git a/tests/testthat/extdata/data.RData b/tests/testthat/extdata/data.RData new file mode 100644 index 00000000..c87e8a6a Binary files /dev/null and b/tests/testthat/extdata/data.RData differ diff --git a/tests/testthat/extdata/data_test.R b/tests/testthat/extdata/data_test.R new file mode 100644 index 00000000..49390a4d --- /dev/null +++ b/tests/testthat/extdata/data_test.R @@ -0,0 +1,17 @@ +#' Seroprevalence data on serofoi +#' +#' Data from a serological surveys +#' +#' @docType data +#' +#' @usage data_test +#' +#' @format An object of class \code{"cross"}; see \code{\link[qtl]{read.cross}}. +#' +#' @keywords datasets +#' +#' @examples +#' \dontrun{ +#' data_test +#' } +"data_test" diff --git a/tests/testthat/test_comparison.R b/tests/testthat/test_comparison.R new file mode 100644 index 00000000..43a779aa --- /dev/null +++ b/tests/testthat/test_comparison.R @@ -0,0 +1,63 @@ +test_that("comparison", { + # So far we are skipping tests on these platforms until + # we find an efficient way to update rstan testthat snapshots on all of them + skip_on_os(c("windows", "mac")) + source("testing_utils.R") + set.seed(1234) # For reproducibility + + package <- "serofoi" + + # TODO For some reason it is not recognizing the global `serodata` variable, so we need to explicitly load it + serodata <- readRDS(testthat::test_path("extdata", "data.RDS")) + + data_test <- prepare_seroprev_data(serodata) + + model_0 <- run_seroprev_model( + seroprev_data = data_test, + seroprev_model_name = "constant_foi_bi", + n_iters = 1000 + ) + + model_1 <- run_seroprev_model( + seroprev_data = data_test, + seroprev_model_name = "continuous_foi_normal_bi", + n_iters = 1000 + ) + + model_2 <- run_seroprev_model( + seroprev_data = data_test, + seroprev_model_name = "continuous_foi_normal_log", + n_iters = 1000 + ) + + comp_table <- get_comparison_table( + model_objects_list = c( + m0 = model_0, + m1 = model_1, + m2 = model_2 + ) + ) + + column_comparation_functions <- list( + model = equal_exact(), + dataset = equal_exact(), + country = equal_exact(), + year = equal_exact(), + test = equal_exact(), + antibody = equal_exact(), + n_sample = equal_exact(), + n_agec = equal_exact(), + n_iter = equal_exact(), + elpd = equal_with_tolerance(), + se = equal_with_tolerance(), + converged = equal_exact(), + difference = equal_exact(), + diff_se = equal_with_tolerance(), + best_elpd = equal_exact(), + pvalue = equal_with_tolerance() + ) + + expect_similar_dataframes( + "comp_table", comp_table, column_comparation_functions + ) +}) diff --git a/tests/testthat/test_individual_models.R b/tests/testthat/test_individual_models.R new file mode 100644 index 00000000..adcd1710 --- /dev/null +++ b/tests/testthat/test_individual_models.R @@ -0,0 +1,58 @@ +test_that("individual models", { + # So far we are skipping tests on these platforms until + # we find an efficient way to update rstan testthat snapshots on all of them + skip_on_os(c("windows", "mac")) + skip_on_ci() + + library(devtools) + library(dplyr) + library(vdiffr) + + #----- Read and prepare data + data_test_path <- testthat::test_path( + "extdata", "data.RDS" + ) + data_test <- readRDS(data_test_path) %>% prepare_seroprev_data(alpha = 0.05) + + #----- Test each model + model_0_object <- run_seroprev_model( + seroprev_data = data_test, + seroprev_model_name = "constant_foi_bi", + n_iters = 1000 + ) + + model_1_object <- run_seroprev_model( + seroprev_data = data_test, + seroprev_model_name = "continuous_foi_normal_bi", + n_iters = 1000 + ) + + model_2_object <- run_seroprev_model( + seroprev_data = data_test, + seroprev_model_name = "continuous_foi_normal_log", + n_iters = 1000 + ) + + #----- Generate all plots for each model + + model_0_plot <- plot_seroprev_model(model_0_object, size_text = 6) + vdiffr::expect_doppelganger("model_0_plot", model_0_plot) + model_1_plot <- plot_seroprev_model(model_1_object, size_text = 6) + vdiffr::expect_doppelganger("model_1_plot", model_1_plot) + model_2_plot <- plot_seroprev_model(model_2_object, size_text = 6) + vdiffr::expect_doppelganger("model_2_plot", model_2_plot) + + #----- Generate each individual plot + + sp_fitted_individual_plot <- plot_seroprev_fitted(model_2_object, size_text = 15) + vdiffr::expect_doppelganger("seroprev_fitted_individual_plot", sp_fitted_individual_plot) + + foi_individual_plot <- plot_foi(model_2_object, size_text = 15) + vdiffr::expect_doppelganger("foi_individual_plot", foi_individual_plot) + + rhats_individual_plot <- plot_rhats(model_2_object, size_text = 15) + vdiffr::expect_doppelganger("rhats_individual_plot", rhats_individual_plot) + + + # bayesplot::mcmc_trace(model_1_object$fit, pars="lambda0") +}) diff --git a/tests/testthat/test_plot_functions.R b/tests/testthat/test_plot_functions.R new file mode 100644 index 00000000..8c3f66eb --- /dev/null +++ b/tests/testthat/test_plot_functions.R @@ -0,0 +1,68 @@ +test_that("plot_seroprev_fitted", { + # So far we are skipping tests on these platforms until + # we find an efficient way to update rstan testthat snapshots on all of them + skip_on_os(c("windows", "mac")) + skip_on_ci() + print("*** Test info ****") + print(R.Version()) + cat("Interactive: ", interactive()) + library(devtools) + library(dplyr) + library(vdiffr) + set.seed(1234) # For reproducibility + data_test <- readRDS(testthat::test_path("extdata", "data.RDS")) %>% prepare_seroprev_data() + + actual_plot_seroprev <- plot_seroprev(data_test, size_text = 15) + + vdiffr::expect_doppelganger("plot_seroprev", actual_plot_seroprev) + + # Constant Model + model_object_constant <- run_seroprev_model( + seroprev_data = data_test, + seroprev_model_name = "constant_foi_bi", + n_iters = 1000 + ) + + plot_seroprev_model_constant <- plot_seroprev_model(model_object_constant, size_text = 6) + + vdiffr::expect_doppelganger("plot_seroprev_model_constant", plot_seroprev_model_constant) + + plot_seroprev_fitted_constant <- plot_seroprev_fitted(model_object_constant, size_text = 15) + + vdiffr::expect_doppelganger("plot_seroprev_fitted_constant", plot_seroprev_fitted_constant) + + # Normal Bi Model + model_object_normalbi <- run_seroprev_model( + seroprev_data = data_test, + seroprev_model_name = "continuous_foi_normal_bi", + n_iters = 1000 + ) + + plot_seroprev_model_normalbi <- plot_seroprev_model(model_object_normalbi, size_text = 6) + + vdiffr::expect_doppelganger("plot_seroprev_model_normalbi", plot_seroprev_model_normalbi) + + plot_seroprev_fitted_normalbi <- plot_seroprev_fitted(model_object_normalbi, size_text = 15) + + vdiffr::expect_doppelganger("plot_seroprev_fitted_normalbi", plot_seroprev_fitted_normalbi) + + # Normal Log Model + model_object_normallog <- run_seroprev_model( + seroprev_data = data_test, + seroprev_model_name = "continuous_foi_normal_log", + n_iters = 1000 + ) + + plot_seroprev_model_normallog <- plot_seroprev_model(model_object_normallog, size_text = 6) + + vdiffr::expect_doppelganger("plot_seroprev_model_normallog", plot_seroprev_model_normallog) + + plot_seroprev_fitted_normallog <- plot_seroprev_fitted(model_object_normallog, size_text = 15) + + vdiffr::expect_doppelganger("plot_seroprev_fitted_normallog", plot_seroprev_fitted_normallog) + + # Models Comparison Plot + plot_arrange_models <- plot_seroprev_models_grid(plot_seroprev_model_constant, plot_seroprev_model_normalbi, plot_seroprev_model_normallog, n_col = 3) + + vdiffr::expect_doppelganger("plot_arrange_models", plot_arrange_models) +}) diff --git a/tests/testthat/test_simdata_cases.R b/tests/testthat/test_simdata_cases.R new file mode 100644 index 00000000..ad707716 --- /dev/null +++ b/tests/testthat/test_simdata_cases.R @@ -0,0 +1,130 @@ +test_that("simulated data", { + # So far we are skipping tests on these platforms until + # we find an efficient way to update rstan testthat snapshots on all of them + skip_on_os(c("windows", "mac")) + skip_on_ci() + + library(dplyr) + library(serofoi) + library(pracma) + + seed = 1234 + size_age_class = 5 + #----- Test foi case A + sim_foi <- rep(0.02, 50) + case_label <- "case_A_" + max_lambda <- 0.035 + + #----- Test foi case B + # no_transm <- 0.0000000001 + # sim_foi <- c(rep(0.2, 25), rep(0.1, 10), rep(no_transm, 15)) + # case_label <- "case_B_" + # max_lambda <- 0.3 + + #----- Results paths + data_path_no_grouped <- testthat::test_path( + "extdata", paste0(case_label, "sim_data_no_grouped.csv") + ) + + data_path_grouped <- testthat::test_path( + "extdata", paste0(case_label, "sim_data_grouped.csv") + ) + + #----- Data simulation + sim_data <- generate_sim_data(foi = sim_foi, + size_age_class = size_age_class, + tsur = 2050, + birth_year_min = 2000, + survey_label = 'sim_foi', + seed = seed) + + sim_data <- sim_data %>% mutate(age_min = age_mean_f, age_max = age_mean_f) + write.csv(sim_data, data_path_no_grouped, row.names = FALSE) + + model_constant <- run_seroprev_model(seroprev_data = sim_data, + seroprev_model_name = "constant_foi_bi") + + model_normal <- run_seroprev_model(seroprev_data = sim_data, + seroprev_model_name = "continuous_foi_normal_bi") + + model_log <- run_seroprev_model(seroprev_data = sim_data, + seroprev_model_name = "continuous_foi_normal_log") + + model_plot_constant <- plot_seroprev_model(model_constant, size_text = 6, , max_lambda = max_lambda) + model_plot_normal <- plot_seroprev_model(model_normal, size_text = 6, , max_lambda = max_lambda) + model_plot_log <- plot_seroprev_model(model_log, size_text = 6, max_lambda = max_lambda) + model_plot_arrange <- plot_seroprev_models_grid(model_plot_constant, + model_plot_normal, + model_plot_log, + size_text = 6, + n_col = 3, + n_row = 1) + vdiffr::expect_doppelganger(paste0(case_label, "no_group_models_", seed), model_plot_arrange) + + sim_foi_plot_constant <- plot_foi(model_constant, size_text = 10, max_lambda = max_lambda) + + ggplot2::geom_point(data = data.frame(year = model_constant$exposure_years, + foi = sim_foi), + ggplot2::aes(year, foi)) + + ggplot2::ggtitle(paste0(case_label, "no_group_", seed)) + + sim_foi_plot_normal <- plot_foi(model_normal, size_text = 10, max_lambda = max_lambda) + + ggplot2::geom_point(data = data.frame(year = model_normal$exposure_years, + foi = sim_foi), + ggplot2::aes(year, foi)) + sim_foi_plot_log <- plot_foi(model_log, size_text = 10, max_lambda = max_lambda) + + ggplot2::geom_point(data = data.frame(year = model_log$exposure_years, + foi = sim_foi), + ggplot2::aes(year, foi)) + sim_foi_plot_arrange <- plot_seroprev_models_grid(sim_foi_plot_constant, + sim_foi_plot_normal, + sim_foi_plot_log, + n_col = 1, n_row = 3) + + vdiffr::expect_doppelganger(paste0(case_label, "no_group_foi_", seed), sim_foi_plot_arrange) + + + sim_data_grouped <- group_sim_data(sim_data = sim_data, + foi = sim_foi, + size_age_class = size_age_class, + tsur = 2050, + birth_year_min = 2000, + survey_label = 'sim_foi') + write.csv(sim_data_grouped, data_path_grouped, row.names = FALSE) + + model_grouped_constant <- run_seroprev_model(seroprev_data = sim_data_grouped, + seroprev_model_name = "constant_foi_bi") + model_grouped_normal <- run_seroprev_model(seroprev_data = sim_data_grouped, + seroprev_model_name = "continuous_foi_normal_bi") + model_grouped_log <- run_seroprev_model(seroprev_data = sim_data_grouped, + seroprev_model_name = "continuous_foi_normal_log") + + model_grouped_constant_plot <- plot_seroprev_model(model_grouped_constant, size_text = 6, max_lambda = max_lambda) + model_grouped_normal_plot <- plot_seroprev_model(model_grouped_normal, size_text = 6, max_lambda = max_lambda) + model_grouped_log_plot <- plot_seroprev_model(model_grouped_log, size_text = 6, max_lambda = max_lambda) + model_grouped_plot_arrange <- plot_seroprev_models_grid(model_grouped_constant_plot, + model_grouped_normal_plot, + model_grouped_log_plot, + n_col = 3, n_row = 1) + vdiffr::expect_doppelganger(paste0(case_label, "group_models_", seed), model_grouped_plot_arrange) + rm(list = ls(pat = "*_plot")) + + sim_foi_grouped_constant_plot <- plot_foi(model_grouped_constant, size_text = 10, max_lambda = max_lambda) + + ggplot2::geom_point(data = data.frame(year = model_constant$exposure_years, + foi = sim_foi), + ggplot2::aes(year, foi)) + + ggplot2::ggtitle(paste0(case_label, "group_", seed)) + sim_foi_grouped_normal_plot <- plot_foi(model_grouped_normal, size_text = 10, max_lambda = max_lambda) + + ggplot2::geom_point(data = data.frame(year = model_normal$exposure_years, + foi = sim_foi), + ggplot2::aes(year, foi)) + sim_foi_grouped_log_plot <- plot_foi(model_grouped_log, size_text = 10, max_lambda = max_lambda) + + ggplot2::geom_point(data = data.frame(year = model_log$exposure_years, + foi = sim_foi), + ggplot2::aes(year, foi)) + sim_foi_grouped_plot_arrange <- plot_seroprev_models_grid(sim_foi_grouped_constant_plot, + sim_foi_grouped_normal_plot, + sim_foi_grouped_log_plot, + n_col = 1, n_row = 3) + vdiffr::expect_doppelganger(paste0(case_label, "group_foi_", seed), sim_foi_grouped_plot_arrange) +}) +# TODO: solve error 'address 0x18, cause 'memory not mapped' when testing. diff --git a/tests/testthat/testing_utils.R b/tests/testthat/testing_utils.R new file mode 100644 index 00000000..f1eae1de --- /dev/null +++ b/tests/testthat/testing_utils.R @@ -0,0 +1,69 @@ +# TODO Move to separate package ### +# TODO Document all functions and provide examples +library(testthat) +library(vdiffr) +equal_with_tolerance <- function(tolerance = 2e-1) { + function(a, b) { + c <- base::mapply(function(x, y) { + if (is.na(x) && is.na(y)) { + return(0) + } else if (is.na(x) || is.na(y)) { + return(tolerance + 1) + } else { + return(abs(x - y)) + } + }, a, b) + return(base::all(c < tolerance)) + } +} +equal_exact <- function() { + function(a, b) { + x <- base::mapply(function(x, y) x == y || (is.na(x) && is.na(y)), a, b) + return(base::all(x == TRUE)) + } +} + +# TODO use testthat snapshots +expect_similar_dataframes <- function(name, actual_df, column_comparation_functions) { + actual_df_filename <- file.path(tempdir(), paste(name, "csv", sep = ".")) + write.csv(actual_df, actual_df_filename) + compare_fun <- function(expected_df_filename, actual_df_filename) { + return(compare_dataframes(expected_df_filename, actual_df_filename, column_comparation_functions)) + } + expect_snapshot_file(actual_df_filename, compare = compare_fun) +} + + +compare_dataframes <- function(expected_df_filename, actual_df_filename, column_comparation_functions) { + expected_df <- read.csv(expected_df_filename) + actual_df <- read.csv(actual_df_filename) + + all_columns_ok <- TRUE + for (col in base::names(column_comparation_functions)) { + if (col %in% colnames(expected_df) && col %in% colnames(actual_df)) { + compare_function <- column_comparation_functions[[col]] + col_ok <- compare_function(expected_df[[col]], actual_df[[col]]) + if (!col_ok) { + cat("Column", col, "differs ", expected_df[[col]], "!=", actual_df[[col]], "\n") + } + all_columns_ok <- all_columns_ok && col_ok + } else { + if (!(col %in% colnames(expected_df))) { + cat("Column", col, "not in first dataframe") + } + if (!(col %in% colnames(expected_df))) { + cat("Column", col, "not in second dataframe") + } + } + } + return(all_columns_ok) +} + +expect_same_plot <- function(plot_name, actual_plot) { + if (per_platform_snapshots) { + title <- file.path(r_version_id(), plot_name) + } else { + title <- plot_name + } + return(vdiffr::expect_doppelganger(title, actual_plot)) +} diff --git a/vignettes/references.bib b/vignettes/references.bib new file mode 100644 index 00000000..185bcf95 --- /dev/null +++ b/vignettes/references.bib @@ -0,0 +1,60 @@ + +@article{Cucunubá2017, + title = {Modelling historical changes in the force-of-infection of Chagas disease to inform control and elimination programmes: application in Colombia}, + author = {{Cucunubá}, Zulma M and Nouvellet, Pierre and Conteh, Lesong and Vera, Mauricio Javier and Angulo, Victor Manuel and Dib, Juan Carlos and Parra -Henao, Gabriel Jaime and {Basáñez}, {María Gloria}}, + year = {2017}, + month = {09}, + date = {2017-09}, + journal = {BMJ Global Health}, + pages = {e000345}, + volume = {2}, + number = {3}, + doi = {10.1136/bmjgh-2017-000345}, + url = {http://dx.doi.org/10.1136/bmjgh-2017-000345}, + langid = {en} +} + +@article{Carrera2020, + title = {Endemic and Epidemic Human Alphavirus Infections in Eastern Panama: An Analysis of Population-Based Cross-Sectional Surveys}, + author = {Carrera, Jean-Paul and {Cucunubá}, Zulma M. and Neira, Karen and Lambert, Ben and {Pittí}, Yaneth and Liscano, Jesus and {Garzón}, Jorge L. and Beltran, Davis and Collado-Mariscal, Luisa and Saenz, Lisseth and Sosa, {Néstor} and Rodriguez-Guzman, Luis D. and {González}, Publio and Lezcano, {Andrés G.} and {Pereyra-Elías}, {Reneé} and Valderrama, Anayansi and Weaver, Scott C. and Vittor, Amy Y. and {Armién}, Blas and Pascale, Juan-Miguel and Donnelly, Christl A.}, + year = {2020}, + month = {12}, + date = {2020-12-02}, + journal = {The American Journal of Tropical Medicine and Hygiene}, + pages = {2429--2437}, + volume = {103}, + number = {6}, + doi = {10.4269/ajtmh.20-0408}, + url = {http://dx.doi.org/10.4269/ajtmh.20-0408}, + langid = {en} +} + +@article{cunha2017, + title = {Seroprevalence of Chikungunya Virus in a Rural Community in Brazil}, + author = {Cunha, Rivaldo V. and Trinta, Karen S. and Montalbano, Camila A. and Sucupira, Michel V. F. and de Lima, Maricelia M. and Marques, Erenilde and Romanholi, Izilyanne H. and Croda, Julio}, + editor = {Barrera, Roberto}, + year = {2017}, + month = {01}, + date = {2017-01-20}, + journal = {PLOS Neglected Tropical Diseases}, + pages = {e0005319}, + volume = {11}, + number = {1}, + doi = {10.1371/journal.pntd.0005319}, + url = {http://dx.doi.org/10.1371/journal.pntd.0005319}, + langid = {en} +} + +@article{dias2018, + title = {Seroprevalence of Chikungunya Virus after Its Emergence in Brazil}, + author = {Dias, Juarez P. and Costa, {Maria da Conceição N.} and Campos, Gubio Soares and {Paixão}, Enny S. and Natividade, Marcio S. and Barreto, Florisneide R. and Itaparica, Martha Suely C. and Goes, Cristina and Oliveira, Francisca L.S. and Santana, Eloisa B. and Silva, Neusa S.J. and Brito, Carlos A.A. and Rodrigues, Laura C. and Sardi, Silvia Inez and Saavedra, Ramon C. and Teixeira, {Maria Glória}}, + year = {2018}, + month = {04}, + date = {2018-04}, + journal = {Emerging Infectious Diseases}, + pages = {617--624}, + volume = {24}, + number = {4}, + doi = {10.3201/eid2404.171370}, + url = {http://dx.doi.org/10.3201/eid2404.171370} +} diff --git a/vignettes/serofoi.Rmd b/vignettes/serofoi.Rmd new file mode 100644 index 00000000..7d8b042a --- /dev/null +++ b/vignettes/serofoi.Rmd @@ -0,0 +1,24 @@ +--- +title: "serofoi" +output: rmarkdown::html_vignette +vignette: > + %\VignetteIndexEntry{serofoi} + %\VignetteEngine{knitr::rmarkdown} + %\VignetteEncoding{UTF-8} +--- + +```{r, include = FALSE} +knitr::opts_chunk$set( + collapse = TRUE, + comment = "#>" +) +``` + +```{r setup} +library(serofoi) +``` + +# Prepare the data for using serofoi +```{r} +data_test <- prepare_seroprev_data(serodata) +``` diff --git a/vignettes/simulated_data.Rmd b/vignettes/simulated_data.Rmd new file mode 100644 index 00000000..620d6da1 --- /dev/null +++ b/vignettes/simulated_data.Rmd @@ -0,0 +1,239 @@ +--- +title: "Simulated data" +output: rmarkdown::html_vignette +bibliography: references.bib +link-citations: true +vignette: > + %\VignetteIndexEntry{simdata} + %\VignetteEngine{knitr::rmarkdown} + %\VignetteEncoding{UTF-8} +--- + +```{r, include = FALSE} +knitr::opts_chunk$set( + collapse = TRUE, + comment = "#>" +) +``` + +```{r cleaning, include = FALSE, echo = TRUE, message=FALSE} +library(serofoi) +library(tidyverse) +rownames(serodata) <- NULL +data_test <- prepare_seroprev_data(serodata) + + +``` + +# Epidemiological scenarios for the *serofoi* models + +**serofoi** is an R package that allows estimations of the Force-of-Infection from a population based serosurvey data. + +## **Sero-survey data** + +Surveys had to meet all of the following [inclusion criteria]{.underline}: + +1. Be population-based (not hospital based) + +2. Specify individuals' age or age group + +3. Indicate diagnostic test(s) used. The current version of serofoi only applies to *IgG* antibodies) + +4. Identify the location and date (year) of sample collection + +## Model assumptions + +Current version of **serofoi** includes the following assumptions on the underlying biological process: + +- The Force-of-infection (FoI) is estimated using a catalytic model + +- There is no sero-reversion (from positive to negative). It means IgG antibodies are life-long duration with no waning immunity. This may not be the case for several pathogens. This feature is planned for future versions of *serofoi***.** + +- There is no age-dependency. This may not be the case for several pathogens. This feature is planned for future versions of *serofoi***.** + +- There is no impact from migration processes in the sampled population + +- There are no differences in the mortality rate of infected versus susceptible individuals + +::: {.alert .alert-primary} +NOTE: Running the *serofoi* models for the first time on your local computer make take a few minutes while the *rstan* code is compiled locally. Afterwards, no further local compilation is needed. +::: + +# **Constant vs Time-varying** FoI models + +## Constant Force-of-Infection (endemic model) + +For *constant* *endemic model*, the rate of infection acquisition―rate of seroconversion―is constant over time, and infection (sero)prevalence will increase monotonically with age as cumulative exposure increases. + +Denoting $n(a,t)$ the number of seropositive individuals of age $a$ at time $t$ , $N$ the serosurvey sample size, and $P(a,t)$ the underlying seroprevalence at age $a$ at time $t$, we assumed that the number of seropositive subjects follows a Binomial distribution $n(a,t)$ \~ $B(N,P(a,t))$. + +For the force-of-infection (FoI) that is constant over time, denoted $\lambda$, we modelled seroprevalence for age $a$ in year $t$ (i.e. the time when the serosurvey occurred) as: + +$$ P(a,t) = 1−exp(−\lambda{a})$$ + +The corresponding code for the constant-FoI model with binomial distribution is: + +```{r model_1, include = FALSE, echo=TRUE, errors = FALSE, warning = FALSE, message = FALSE } +model_1 <- run_seroprev_model(seroprev_data = data_test, + seroprev_model_name = "constant_foi_bi", + n_iters = 500, + n_thin = 2) + +``` + +## Time-varying Force-of-Infection + +For the time-varying FoI model, the FoI values are estimated from a binomial distribution. The sero-prevalence as age a can be expressed as: + +$$ +𝑃(𝑎,𝑡)=1−exp(−∑_{i=t-a+1}^{t}𝜆{i}) +$$ + +Therefore, a serosurvey completed at time $t$ and including ages $a$ from {$age_{min}, age_{max}$} is informative on exposure (and FoI) between {$t, age_{max}$} and $t$. + +Currently, two models within *serofoi* allow time-varying FoI, with corresponding code: + +```{r model_2, include = FALSE, echo = TRUE, errors = FALSE, warning = FALSE, message = FALSE } +model_2 <- run_seroprev_model(seroprev_data = data_test, + seroprev_model_name = "continuous_foi_normal_bi", + n_iters = 1500, + n_thin = 2) +``` + +```{r model_3, include = FALSE, echo = TRUE, errors = FALSE, warning = FALSE, message = FALSE } +model_3 <- run_seroprev_model(seroprev_data = data_test, + seroprev_model_name = "continuous_foi_normal_log", + n_iters = 1500, + n_thin = 2) +``` + +## Fitting process + +Models are implemented in **RStan** using the No-U-Turn sampler, a type of Hamiltonian Monte Carlo sampling. After convergence, the best-fitting models can be selected based on the expected log predictive density for an out-of-sample data point (elpd). + +## Case study 1. Chagas disease (endemic disease) + +Based on the data and analysis shown in [@Cucunubá2017], we use one of the datasets for measuring the sero-prevalence of IgG antibodies against *Trypanosoma cruzi* infection in rural area of Colombia in 2012. The dataset is part of the serofoi package as `chagas2012`. + +```{r data_chagas, include = TRUE, errors = FALSE, warning = FALSE, message = FALSE } +data("chagas2012") +head(chagas2012, 5) + +``` + +After preparing the data we can run two models (constant and time varying) available on serofoi. + +```{r three models, include = TRUE, errors = FALSE, warning = FALSE, message=FALSE } + +chagas2012p <- prepare_seroprev_data(chagas2012) + +m1_cha <- run_seroprev_model(seroprev_data = chagas2012p, + seroprev_model_name = "constant_foi_bi", + n_iters = 500, + n_thin = 2) + +m2_cha <- run_seroprev_model(seroprev_data = chagas2012p, + seroprev_model_name = "continuous_foi_normal_bi", + n_iters = 1500, + n_thin = 2) + +``` + +Now, we can plot the results of the two models to compare (Figure 1). As shown in @Cucunubá2017, interventions for Chagas control have been ongoing from the 1980s in Colombia having heterogeneous impact depending on the type of population. For this serosurvey, which is from tradittional indigenous rural area, the serofoi models are able to detect a modest still relevant slow decreasing trend consistent (model 2) as slightly better supported than the constant model. Notice that model 3 does not converge despite a high number of iterations. + +```{r, model_comp_cha, include=TRUE, echo=TRUE, fig.width = 8, fig.height=10, warning=FALSE, message=FALSE, message=FALSE} +p1_cha <- plot_seroprev_model(m1_cha, size_text = 12, max_lambda = 0.02) +p2_cha <- plot_seroprev_model(m2_cha, size_text = 12, max_lambda = 0.02) + +plot_seroprev_models_grid(p1_cha, p2_cha, n_row = 1, n_col = 2) +``` + +## Case study 2. Hidden Alphaviruses epidemics in Panama + +As shown in [@Carrera2020], hidden epidemic and endemic transmission of alphaviruses in Eastern Panama have been around for decades. From this paper we use a dataset measuring IgG antibodies againts Venezuelan Equine Encephalitis Virus (VEEV) in a rural village in Panamá in 2017. This dataset, `veev2017` is included in serofoi. + +```{r data_veev, include = TRUE, errors = FALSE, warning = FALSE, message = FALSE } + +data("veev2012") +head(veev2012, 5) + +``` + +```{r models_veee, include = TRUE, errors = FALSE, warning = FALSE, message=FALSE } + +veev2012p <- prepare_seroprev_data(veev2012) + +m1_veev <- run_seroprev_model(seroprev_data = veev2012p, + seroprev_model_name = "constant_foi_bi", + n_iters = 500, + n_thin = 2) + +m2_veev <- run_seroprev_model(seroprev_data = veev2012p, + seroprev_model_name = "continuous_foi_normal_bi", + n_iters = 500, + n_thin = 2) + +m3_veev <- run_seroprev_model(seroprev_data = veev2012p, + seroprev_model_name = "continuous_foi_normal_log", + n_iters = 500, + n_thin = 2) +``` + +Now, we can plot the results of the three models to compare. On Figure 3, we can observe a large increase in the estimated FOI. As suggested in @Carrera2020, an important increase in the transmission of VEEV in this region is inferred. Although the three fitted models converge well, we see much larger support for a more sudden increase in recent years with highest FoI values at 0.25 (consistent with an epidemic). + +```{r, model_comp_veev, include=TRUE, echo=TRUE, fig.width = 10, fig.height=10, warning=FALSE, message=FALSE, message=FALSE} +p1_veev <- plot_seroprev_model(m1_veev, size_text = 10, max_lambda = 0.6) +p2_veev <- plot_seroprev_model(m2_veev, size_text = 10, max_lambda = 0.6) +p3_veev <- plot_seroprev_model(m3_veev, size_text = 10, max_lambda = 0.6) + +plot_seroprev_models_grid(p1_veev, p2_veev, p3_veev, n_row = 1, n_col = 3) + +``` + +Figure 3. Results of fitted models, Force-of-Infection estimates and convergence. + +## Case study 3. A known Large Chikungunya epidemic in Brazil + +Chikungunya outbreaks ocurred rapidly after the introduction of the virus to Brazil in 2013-2014. We use the dataset from [@dias2018] that conducts a population-based study through household interviews and serologic surveys (measuring IgG antibodies against Chikungunya virus) in Bahia, Brazil during October-December 2015, right after a large CHIKV epidemic that occurred in that area. + +```{r data_chik, include = TRUE, errors = FALSE, warning = FALSE, message = FALSE } + +data("chik2015") + +head(chik2015, 5) + +``` + +```{r models_chik, include = TRUE, errors = FALSE, warning = FALSE, message=FALSE } + +chik2015p <- prepare_seroprev_data(chik2015) + +mod1_chik <- run_seroprev_model(seroprev_data = chik2015p, + seroprev_model_name = "constant_foi_bi", + n_iters = 1000, + n_thin = 2) + +mod2_chik <- run_seroprev_model(seroprev_data = chik2015p, + seroprev_model_name = "continuous_foi_normal_bi", + n_iters = 1500, + n_thin = 2) + +mod3_chik <- run_seroprev_model(seroprev_data = chik2015p, + seroprev_model_name = "continuous_foi_normal_log", + n_iters = 1500, + n_thin = 2) +``` + +In Figure 4, we can observe the comparison between the three serofoi models. Here serofoi shows strong statistical support for a sudden increase in the transmission of CHIKV close to the year of the serosurvey (2015). The exact year is not possible to estimate, mainly given the data used is largely aggregated by 20-years age groups. Despite that, the results are consistent with the empirical evidence shown by [@dias2018] with both interviews, and IgM testing. + +```{r plot_chik, include=TRUE, echo=TRUE, fig.width = 10, fig.height = 10, warning = FALSE, message = FALSE} + +p1_chik <- plot_seroprev_model(mod1_chik, size_text = 10, max_lambda = 0.08) +p2_chik <- plot_seroprev_model(mod2_chik, size_text = 10, max_lambda = 0.08) +p3_chik <- plot_seroprev_model(mod3_chik, size_text = 10, max_lambda = 0.08) + +plot_seroprev_models_grid(p1_chik, p2_chik, p3_chik, n_row = 1, n_col = 3) + +``` + +## References diff --git a/vignettes/simulated_data_files/figure-html/model_comp_cha-1.png b/vignettes/simulated_data_files/figure-html/model_comp_cha-1.png new file mode 100644 index 00000000..74dcd047 Binary files /dev/null and b/vignettes/simulated_data_files/figure-html/model_comp_cha-1.png differ diff --git a/vignettes/tmp.txt b/vignettes/tmp.txt deleted file mode 100644 index e69de29b..00000000 diff --git a/vignettes/use_cases.Rmd b/vignettes/use_cases.Rmd new file mode 100644 index 00000000..5ffff56f --- /dev/null +++ b/vignettes/use_cases.Rmd @@ -0,0 +1,235 @@ +--- +title: "Use cases" +output: rmarkdown::html_vignette +bibliography: references.bib +link-citations: true +vignette: > + %\VignetteIndexEntry{serofoi} + %\VignetteEngine{knitr::rmarkdown} + %\VignetteEncoding{UTF-8} +--- + +```{r, include = FALSE} +knitr::opts_chunk$set( + collapse = TRUE, + comment = "#>" +) +``` + +```{r cleaning, include = FALSE, echo = TRUE, message=FALSE} +library(serofoi) +library(tidyverse) +rownames(serodata) <- NULL +data_test <- prepare_seroprev_data(serodata) + + +``` + +# Epidemiological scenarios for the *serofoi* models + +**serofoi** is an R package that allows estimations of the Force-of-Infection from a population based serosurvey data. + +## **Sero-survey data** + +Surveys had to meet all of the following [inclusion criteria]{.underline}: + +1. Be population-based (not hospital based) + +2. Specify individuals' age or age group + +3. Indicate diagnostic test(s) used. The current version of serofoi only applies to *IgG* antibodies) + +4. Identify the location and date (year) of sample collection + +## Model assumptions + +Current version of **serofoi** includes the following assumptions on the underlying biological process: + +- The Force-of-infection (FoI) is estimated using a catalytic model + +- There is no sero-reversion (from positive to negative). It means IgG antibodies are life-long duration with no waning immunity. This may not be the case for several pathogens. This feature is planned for future versions of *serofoi***.** + +- There is no age-dependency. This may not be the case for several pathogens. This feature is planned for future versions of *serofoi***.** + +- There is no impact from migration processes in the sampled population + +- There are no differences in the mortality rate of infected versus susceptible individuals + +::: {.alert .alert-primary} +NOTE: Running the *serofoi* models for the first time on your local computer make take a few minutes while the *rstan* code is compiled locally. Afterwards, no further local compilation is needed. +::: + +# **Constant vs Time-varying** FoI models + +## Constant Force-of-Infection (endemic model) + +For *constant* *endemic model*, the rate of infection acquisition―rate of seroconversion―is constant over time, and infection (sero)prevalence will increase monotonically with age as cumulative exposure increases. + +Denoting $n(a,t)$ the number of seropositive individuals of age $a$ at time $t$ , $N$ the serosurvey sample size, and $P(a,t)$ the underlying seroprevalence at age $a$ at time $t$, we assumed that the number of seropositive subjects follows a Binomial distribution $n(a,t)$ \~ $B(N,P(a,t))$. + +For the force-of-infection (FoI) that is constant over time, denoted $\lambda$, we modelled seroprevalence for age $a$ in year $t$ (i.e. the time when the serosurvey occurred) as: + +$$ P(a,t) = 1−exp(−\lambda{a})$$ + +The corresponding code for the constant-FoI model with binomial distribution is: + +```{r model_1, include = FALSE, echo=TRUE, errors = FALSE, warning = FALSE, message = FALSE } +model_1 <- run_seroprev_model(seroprev_data = data_test, + seroprev_model_name = "constant_foi_bi", + n_iters = 500, + n_thin = 2) + +``` + +## Time-varying Force-of-Infection + +For the time-varying FoI model, the FoI values are estimated from a binomial distribution. The sero-prevalence as age a can be expressed as: + +$$ +𝑃(𝑎,𝑡)=1−exp(−∑_{i=t-a+1}^{t}𝜆{i}) +$$ + +Therefore, a serosurvey completed at time $t$ and including ages $a$ from {$age_{min}, age_{max}$} is informative on exposure (and FoI) between {$t, age_{max}$} and $t$. + +Currently, two models within *serofoi* allow time-varying FoI, with corresponding code: + +```{r model_2, include = FALSE, echo = TRUE, errors = FALSE, warning = FALSE, message = FALSE } +model_2 <- run_seroprev_model(seroprev_data = data_test, + seroprev_model_name = "continuous_foi_normal_bi", + n_iters = 1500, + n_thin = 2) + +model_3 <- run_seroprev_model(seroprev_data = data_test, + seroprev_model_name = "continuous_foi_normal_log", + n_iters = 1500, + n_thin = 2) +``` + +## Fitting process + +Models are implemented in **RStan** using the No-U-Turn sampler, a type of Hamiltonian Monte Carlo sampling. After convergence, the best-fitting models can be selected based on the expected log predictive density for an out-of-sample data point (elpd). + +## Case study 1. Chagas disease (endemic disease) + +Based on the data and analysis shown in [@Cucunubá2017], we use one of the datasets for measuring the sero-prevalence of IgG antibodies against *Trypanosoma cruzi* infection in rural area of Colombia in 2012. The dataset is part of the serofoi package as `chagas2012`. + +```{r data_chagas, include = TRUE, errors = FALSE, warning = FALSE, message = FALSE } +chagas2012 <- readRDS("../data/chagas2012.RDS") # This can be removed once data is incorporated +head(chagas2012, 5) + +``` + +After preparing the data we can run two models (constant and time varying) available on serofoi. + +```{r three models, include = TRUE, errors = FALSE, warning = FALSE, message=FALSE } + +chagas2012p <- prepare_seroprev_data(chagas2012) + +m1_cha <- run_seroprev_model(seroprev_data = chagas2012p, + seroprev_model_name = "constant_foi_bi", + n_iters = 500, + n_thin = 2) + +m2_cha <- run_seroprev_model(seroprev_data = chagas2012p, + seroprev_model_name = "continuous_foi_normal_bi", + n_iters = 1500, + n_thin = 2) + +``` + +Now, we can plot the results of the two models to compare (Figure 1). As shown in @Cucunubá2017, interventions for Chagas control have been ongoing from the 1980s in Colombia having heterogeneous impact depending on the type of population. For this serosurvey, which is from tradittional indigenous rural area, the serofoi models are able to detect a modest still relevant slow decreasing trend consistent (model 2) as slightly better supported than the constant model. Notice that model 3 does not converge despite a high number of iterations. + +```{r, model_comp_cha, include=TRUE, echo=TRUE, fig.width = 8, fig.height=10, warning=FALSE, message=FALSE, message=FALSE} +p1_cha <- plot_seroprev_model(m1_cha, size_text = 12, max_lambda = 0.02) +p2_cha <- plot_seroprev_model(m2_cha, size_text = 12, max_lambda = 0.02) + +plot_seroprev_models_grid(p1_cha, p2_cha, n_row = 1, n_col = 2) +``` + +## Case study 2. Hidden Alphaviruses epidemics in Panama + +As shown in [@Carrera2020], hidden epidemic and endemic transmission of alphaviruses in Eastern Panama have been around for decades. From this paper we use a dataset measuring IgG antibodies againts Venezuelan Equine Encephalitis Virus (VEEV) in a rural village in Panamá in 2017. This dataset, `veev2017` is included in serofoi. + +```{r data_veev, include = TRUE, errors = FALSE, warning = FALSE, message = FALSE } + +veev2012 <- readRDS("../data/veev2012.RDS") +head(veev2012, 5) + +``` + +```{r models_veee, include = TRUE, errors = FALSE, warning = FALSE, message=FALSE } + +veev2012p <- prepare_seroprev_data(veev2012) + +m1_veev <- run_seroprev_model(seroprev_data = veev2012p, + seroprev_model_name = "constant_foi_bi", + n_iters = 500, + n_thin = 2) + +m2_veev <- run_seroprev_model(seroprev_data = veev2012p, + seroprev_model_name = "continuous_foi_normal_bi", + n_iters = 500, + n_thin = 2) + +m3_veev <- run_seroprev_model(seroprev_data = veev2012p, + seroprev_model_name = "continuous_foi_normal_log", + n_iters = 500, + n_thin = 2) +``` + +Now, we can plot the results of the three models to compare. On Figure 3, we can observe a large increase in the estimated FOI. As suggested in @Carrera2020, an important increase in the transmission of VEEV in this region is inferred. Although the three fitted models converge well, we see much larger support for a more sudden increase in recent years with highest FoI values at 0.25 (consistent with an epidemic). + +```{r, model_comp_veev, include=TRUE, echo=TRUE, fig.width = 10, fig.height=10, warning=FALSE, message=FALSE, message=FALSE} +p1_veev <- plot_seroprev_model(m1_veev, size_text = 10, max_lambda = 0.6) +p2_veev <- plot_seroprev_model(m2_veev, size_text = 10, max_lambda = 0.6) +p3_veev <- plot_seroprev_model(m3_veev, size_text = 10, max_lambda = 0.6) + +plot_seroprev_models_grid(p1_veev, p2_veev, p3_veev, n_row = 1, n_col = 3) + +``` + +Figure 3. Results of fitted models, Force-of-Infection estimates and convergence. + +## Case study 3. A known Large Chikungunya epidemic in Brazil + +Chikungunya outbreaks ocurred rapidly after the introduction of the virus to Brazil in 2013-2014. We use the dataset from [@dias2018] that conducts a population-based study through household interviews and serologic surveys (measuring IgG antibodies against Chikungunya virus) in Bahia, Brazil during October-December 2015, right after a large CHIKV epidemic that occurred in that area. + +```{r data_chik, include = TRUE, errors = FALSE, warning = FALSE, message = FALSE } + +chik2015 <- readRDS("../data/chik2015.RDS") +head(chik2015, 5) + +``` + +```{r models_chik, include = TRUE, errors = FALSE, warning = FALSE, message=FALSE } + +chik2015p <- prepare_seroprev_data(chik2015) + +mod1_chik <- run_seroprev_model(seroprev_data = chik2015p, + seroprev_model_name = "constant_foi_bi", + n_iters = 1000, + n_thin = 2) + +mod2_chik <- run_seroprev_model(seroprev_data = chik2015p, + seroprev_model_name = "continuous_foi_normal_bi", + n_iters = 1500, + n_thin = 2) + +mod3_chik <- run_seroprev_model(seroprev_data = chik2015p, + seroprev_model_name = "continuous_foi_normal_log", + n_iters = 1500, + n_thin = 2) +``` + +In Figure 4, we can observe the comparison between the three serofoi models. Here serofoi shows strong statistical support for a sudden increase in the transmission of CHIKV close to the year of the serosurvey (2015). The exact year is not possible to estimate, mainly given the data used is largely aggregated by 20-years age groups. Despite that, the results are consistent with the empirical evidence shown by [@dias2018] with both interviews, and IgM testing. + +```{r, plot_chik, include=TRUE, echo=TRUE, fig.width = 10, fig.height=10, warning=FALSE, message=FALSE, message=FALSE} +p1_chik <- plot_seroprev_model(mod1_chik, size_text = 10, max_lambda = 0.08) +p2_chik <- plot_seroprev_model(mod2_chik, size_text = 10, max_lambda = 0.08) +p3_chik <- plot_seroprev_model(mod3_chik, size_text = 10, max_lambda = 0.08) + +plot_seroprev_models_grid(p1_chik, p2_chik, p3_chik, n_row = 1, n_col = 3) + +``` + +## References