Skip to content

Commit

Permalink
Merge dev -> main. (#38)
Browse files Browse the repository at this point in the history
* Documentation of the visualisation and modelling modules

* automatic documentation of the visualisation and modeling modules

* minor changes to the documentation. Typos and minor bugs corrected.

* feat: define a preliminar version of the minimal dataset needed to use the library, now stored in data/data.RDS (data/data_bu.RDS is backup of the old dataset). The prepare_data() function in seroprevalence_data process the dataset creating 5 columns needed for the analysis (age_mean_f, sample_size, prev_obs, prev_obs_lower and prev_obs_upper). This function is tested in test/test_minimal_data and implemented in test_individual_models.

* fix: remove unused functions from all modules in preparation for a compiltation test. Remove the corresponding documentation files. Remove the folder R/stantmodels (it's redundant with inst/extdata/stanmodels). Update dependencies in DESCRIPTION (epitrix and cowplot removed).

* fix: prepare data_test for compilation of the package

* Dev compilation test nicolas (#10)

* minor changes in how the summary is shown

* change summary message

* minor change in extract_summary_model

* create template of vignette

* delete temporary files

* testing vignette

* vignette's test

* adding workflows sca and r-cmd-check

* adding workflows sca and r-cmd-check (#11)

* Static code review with lintr in package modules

* changing descriotion

* adding info into run-model function

* adding a script for model comparison

* adding a script for model comparison

* adding a script for model comparison

* adding a script for model comparison

* adding a script for model comparison

* adding a script for model comparison

* adding a script for model comparison

* returning logo

* adding a script for model comparison

* adding a script for model comparison

* changing README.Rmd file

* changing README.Rmd file

* changing README.Rmd file

* changing README.Rmd file

* adding plots to the readme file

* adding plots to the readme file

* adding function for plotting raw seroprevalence data

* adding the packages name standart

* adding data reference for dplyr

* update function documentation

* Static code analysis for package modules

* Static code analysis for the modeling module

* Clean test/test_comparison.R. Add a warning for first compilation of the models. Minor sintaxis changes. Change slightly the name of the visualization module.

* fix: Calculate the  binomial confidence interval from the raw seroprevalence data in function plot_seroprev. The function is tested in test_plot_functions.R. Add compilation line to test_individual models.R.

* Compilation test in test_plot_functions.R

* minor changes before merging with dev

* feat: Add function prepare_bin_data to seroprevalence_data module. This function prepares the data to plot the binomial confidence intervals and allows to remove redundant code in the visualization and modelling modules.

* minor changes before merging with dev

* fix: corrects the bin size in plot_seroprev(). Minor syntax changes.

* doc: Add the corresponding documentation for plot_seroprev() to README.Rmd and README.md including the example image file man/figures/plot_seroprev_example.png

* Dev docu mg (#16)

* Added examples in core functions documentation

* Example of the functions in the documentation

* updating links of R-CMD check and Codecov test

* Dev doc nicolas (#18)

* doc: Update documentation for run_model, save_or_load_model and fit_model. The name of some functions and parameters were changed for the sake of clarity (make_yexpo -> get_exposure_years, save_or_read_model -> save_or_load_model, yexpo -> exposure_years, ). The order of functions definitions were in the modeling module were changed; now they appear in hierarchical order starting from run_model, since this is the most important function in the module.

* refac: fit_model_log function removed. Now the function fit_model has an exception that implements the logarithmic model as the removed function used to. fit_model documentation updated. Minor changes in the name of the objects returned by the functions.

* doc: modelling module documentation updated. get_posterior_summary function removed (unused).

* updating package version and contributors

* Dev doc nicolas (#20)

* doc: Update documentation for run_model, save_or_load_model and fit_model. The name of some functions and parameters were changed for the sake of clarity (make_yexpo -> get_exposure_years, save_or_read_model -> save_or_load_model, yexpo -> exposure_years, ). The order of functions definitions were in the modeling module were changed; now they appear in hierarchical order starting from run_model, since this is the most important function in the module.

* refac: fit_model_log function removed. Now the function fit_model has an exception that implements the logarithmic model as the removed function used to. fit_model documentation updated. Minor changes in the name of the objects returned by the functions.

* doc: modelling module documentation updated. get_posterior_summary function removed (unused).

* doc: minor changes to the modelling module. Quotation marks added for string variables in the documentation and some minor errors fixed.

* Dev docker tests (#22)

* 1st version of Dockerfile

* Addedd auto dep install for docker container

* Refactor docker folder

* Changed process to obtain path of stan and RDS files to make it compatible with testhat

* First version of tests

* Factored testing functions

* config.yml now only stores the base path of stan models

* Added automated test tasks for vscode

* Added more tests

* misc changes to vscode tasks

* Misc improvements to containers

* small fixes to container

* misc docker refactor

* moved docker scripts to an R file (tested only on Linux)

* Temporary change to test github actions on this branch

* added testthat to deps

* Added devtools to deps

* removed erroneous code in unit test

* moving config.yml to inst

* same

* temp changes to github actions files for testing

* doc: Update documentation for run_model, save_or_load_model and fit_model. The name of some functions and parameters were changed for the sake of clarity (make_yexpo -> get_exposure_years, save_or_read_model -> save_or_load_model, yexpo -> exposure_years, ). The order of functions definitions were in the modeling module were changed; now they appear in hierarchical order starting from run_model, since this is the most important function in the module.

* Added R CMD Check to Docker

* refac: fit_model_log function removed. Now the function fit_model has an exception that implements the logarithmic model as the removed function used to. fit_model documentation updated. Minor changes in the name of the objects returned by the functions.

* Added more files to rbuildignore

* more testing of github actions

* misc fixes

* testing windows

* adding BH dep

* Added linking deps for rstan

* added suggest deps for rstan

* doc: modelling module documentation updated. get_posterior_summary function removed (unused).

* Add LinkingTo field (#19)

* Add LinkingTo field

* Add roxygen comments from rstantools::use_rstan()

* temporarily removed some deps

* temp remove of this branch from yaml

* added vscode configs

* Fixes linking errors in R CMD CHeck

* Fixed examples

* Fixed tests for latest changes in function and var names

* added missing deps

---------

Co-authored-by: Nicolas Torres <[email protected]>
Co-authored-by: Hugo Gruson <[email protected]>

* Now most examples run without errors. Those that not are temporarily enclosed in \dontrun

* Documentation of the seroprevalence_data and visualisation modules (#24)

Co-authored-by: Nicolás Torres Domínguez <[email protected]>

* Added test functions for plots

* doc: Update author's information in DESCRIPTION.

* Added myself to contributors

* Removed test/ folder

* Added a TODO

* More automatic tests

* Update .gitignore

Added dataframes actual test folder

* Fixed save_or_load_model to avoid DLL Bug

* same

* dontrun some examples

* same

* misc changes

* Some corrections to the documentation

* R CMD Check now seems to be working without errors (hopefully :)

* Some fixes to module documentation

* R CMD Check works without errors (locally)

* doc: review and correct visualization module documentation.

* doc: minor changes to seroprevalence_data module documentation.

* Add function to generate comparative plot of the models (#29)

* test: Add plot tests for each model to test_plot_functions.

* fix: Change GridExtra dependencie for cowplot (visualization module). Add plots for the 3 models to test_plot_functions.

* feat: Add function plot_models_list to the visualization module. This function plots a grid arrange by means of cowplot::plot_grid. A change that is still needed is to add proper default values for n_row and n_col or an exception for the case when they're passed as NULL. An example of the use of this function can be found at the end of test_plot_functions and the corresponding result can be visualized in plot-arrange-models.

* Minor changes to individual_models .svg files.

* Testing all platforms in github actions

* testing coverage

* added missing BH dep to make it work on windows

* added more missing deps

* doc: minor change to fit_model function documentation.

* add back the data folder to use mydata object when importing the library.

* doc: generate documentation with devtools::document().

* updated RMD Check tasks for vscode and docker to make them more similar to github actions'

* Added some deps to avoid warnings in R CMD Check

* Updated man pages with roxygen2

* Added dep to TBB to hopefully fix compilation problems in windows

* Upgraded rstan to v2.26.11. Added required TBB dep

* Updated SVGs and CSVs to match results from rstan v2.26.11

* Added mc-stan as extra repo to support rstan 2.26.11

* Misc fixes

* Rename plot_models_list to plot_seroprev_models_grid.

* doc: add documentation for plot_seroprev_models_grid function.

* Added more docker-related funcionality

* Added script to clean SVGs and CSVs when there rstan needs to be updated and tests fail

* misc changes

* Dev webrd (#32)

* test epidemics

* changes to vignette

* testing epidemics

* testing site

* improved vignette

* improved vignette

* improved vignette

* improved vignette with contributions

* improved vignette with contributions

* improved vignette with refernces

* improved vignette with references

* improved vignette with use cases

* improved vignette with use cases

* improved vignette with use cases

* improved vignette with use cases

* improved vignette with use cases

* improved vignette with use cases

* improved vignette with use cases

* improved vignette with use cases

* improved vignette with use cases

* use cases

* use cases

* update preloaded package datasets. mydata and serodata contain a copy of the same dataset for the time being.

* add additional changes to add multiple datasets to the package.

* doc: add datasets documentation files.

* use cases

* adding veev panama

* adding chik 2015

* adding chik 2015

* removing unnecesary data

* correcting chik data for nicaragua

* correcting chagas data for Colombia

* correcting chagas data for Colombia

* doc: add mydata and serodata documentation and .R files.

* doc: add documentation files for the additional incorporated datasets chagas2012, chik2015 and veev2012.

* doc: add documentation files for the additional incorporated datasets chagas2012, chik2015 and veev2012.

---------

Co-authored-by: Zulma Cucunubá <[email protected]>
Co-authored-by: Zulma M Cucunubá <[email protected]>

* dependencie rstan (>= 2.26.11) changed to rstan (>= 2.21.1). 2.26.11 was generating an error in the installation of the package.

* Back to rstan (>= 2.26.11),

* Adding multiplatform tests

* test: run all tests for the new test dataset.

* update plot_functions test figures.

* doc: minor change to plot_seroprev_models_grid documentation.

* style: Update the name of the functions to specify they refer to seroprevalence models (seroprev sufix). Update the documentation correspondingly.

* style: mydata changed to serodata. The current dataset is chagas2012.RDS, but this will be changed to a simulated dataset in the future.

* update tdata_test dataset. The dataset now corresponds to chagas2012. This will be changed by a simulated dataset in the future.

* removed unused code

* Temporarily skipping tests on windows and mac, until we find an efficient way to test in those platforms without worrying about reproducibility

* branch change for testing

* updated testing snapshots

* misch changes

* added install deps task for vscode

* Added TODOs

* Added missing deps

* Added missing testthat snapthots

* Created new function `expect_similar_dataframes` to test dataframes using snapshots. It is compatible with column_comparation_functions

* Increased default tolerance to deal with rstan shenanigans

* testing ci

* Skipping these tests on CI

* Misc changes

* Temporary changes while we improve tests

* fix: solve minor typo in the name of function prepare_seroprev_data.

* Dev zulma vignette (#34)

* vignette draft

* vignette draft

* vignette draft

* testing vignette

* testing vignette

* updating vignette

* updating vignette

* remove doc of .gitignore

---------

Co-authored-by: zmcucunuba <[email protected]>
Co-authored-by: GeraldineGomez <[email protected]>

* Dev webrd (#35)

* test epidemics

* changes to vignette

* testing epidemics

* testing site

* improved vignette

* improved vignette

* improved vignette

* improved vignette with contributions

* improved vignette with contributions

* improved vignette with refernces

* improved vignette with references

* improved vignette with use cases

* improved vignette with use cases

* improved vignette with use cases

* improved vignette with use cases

* improved vignette with use cases

* improved vignette with use cases

* improved vignette with use cases

* improved vignette with use cases

* improved vignette with use cases

* use cases

* use cases

* update preloaded package datasets. mydata and serodata contain a copy of the same dataset for the time being.

* add additional changes to add multiple datasets to the package.

* doc: add datasets documentation files.

* use cases

* adding veev panama

* adding chik 2015

* adding chik 2015

* removing unnecesary data

* correcting chik data for nicaragua

* correcting chagas data for Colombia

* correcting chagas data for Colombia

* doc: add mydata and serodata documentation and .R files.

* doc: add documentation files for the additional incorporated datasets chagas2012, chik2015 and veev2012.

* doc: add documentation files for the additional incorporated datasets chagas2012, chik2015 and veev2012.

* changing rstan version from 2.26.11 (non existen) to > 2.21.1

* simulated fake data

* refac: test_sim_data is refactorized. I cleaned the code and added a function to plot the simulated datasets obtained for each foi example.

* Save selected simulated data for scenarios A,B and D to tests/sim_data. I took all the grouped datasets for n=5.

* chik-seroinference-simulations

---------

Co-authored-by: Zulma Cucunubá <[email protected]>
Co-authored-by: Zulma M Cucunubá <[email protected]>

* fix: function get_exposure_years was returning ages that were not consistent with the survey time and the minimal birth_year in the dataset. I changed the name of the function to get_exposure_ages for consistency with the output.

* fix: description typo.

* change the name of get_exposure_years to get_exposure_ages (for lack of a better name). Update mydata to serodata. Reran all tests (slight changes in the tests results).

* fix: updating functions and variables names in the vignettes files. This was causing the R-CMD github checks to fail.

* Removed dev from actions scripts

* Updated badges in README.Rmd. Updated README.md with latest changes from README.Rmd

* fix: add default value for seroprev_data to the prepare_seroprev_data function.

* Update prepare_seroprev_data documentation.

* doc: minor corrections to simulated_data.Rmd.

* webpage publication

* fixed bug "recompiling to avoid crashing R session"

* Update use_cases.Rmd

* Add simulated data generation (#36)

* Add conditional to prepare_seroprev_data. In some cases, like when datasets are being simulated, columns age_mean_f and birt_year need to be added prior to the data preparation in order to compute the exposure matrix.

* refac: modification of get_exposure_matrix. Now it does not depend on get_exposure_ages explicitely, this was a redundant dependency.

* testing changes in extract_seroprev_model_summary.

* feat: add functions get_sim_counts, generate_sim_data and generate_sim_data_grouped to module seroprevalence_data. This functions can be used to generate simulated datasets like is shown in test_simdata_caseA.R

* add results obtained by running the test test_simdata_caseA.R

* delete redundant or unnecessary tests and their corresponding results.

* remove old simulated data.

* Add simulated data test script (tests/testthat/test_simdata_cases.R) and results for a constant foi (case A) and for a stepwise decreasing foi (case B).

* doc: add documentation for the data simulation functions.

* Add title identifying the specific case of each simulation for the test test_simdata_cases.R. Update the corresponding figures.

* Save simulated data into testthat/exdata/ for cases A and B. They are stored automatically on running test_simdata_cases.

* remove unused man files.

* refac: Removed functions still in developing stage that will be added in a future version. In particular all functions for data simulation are removed. The function plot_seroprev_models_grid can be replaced in its current state by cowplot::plot_grid() function. Function get_comparison_table is unused.

* refac: Remove redundant tests and refactorize test_indivudual_models (now test_models); now it runs the models using a for cycle instead of running each model one by one. Remove unnecessary folder test/.

* Remove unused file R/test_vignettes.R

* Remove unused data files.

* Minor changes to vignettes.

* Remove cowplot and pracma from dependencies and unnecessary test file test_plot_functions.

* chore: change seroprev_model to seromodel.

* chore: change seroprev_data to serodata.

* chore: change model_object for seromodel_object.

* Add option print_summary with deaful TRUE to run_seromodel (modelling module).

* doc: update documentation for model_comparison and modelling modules.

* doc: update documentation for visualization and modelling seroprevalence_data modules.

---------

Co-authored-by: megamezl <[email protected]>
Co-authored-by: tracelac <[email protected]>
Co-authored-by: zmcucunuba <[email protected]>
Co-authored-by: GeraldineGomez <[email protected]>
Co-authored-by: Miguel Enrique Gámez López <[email protected]>
Co-authored-by: Jaime Pavlich-Mariscal <[email protected]>
Co-authored-by: Hugo Gruson <[email protected]>
Co-authored-by: JAIME ANDRÉS PAVLICH MARISCAL <[email protected]>
Co-authored-by: Zulma Cucunubá <[email protected]>
  • Loading branch information
10 people authored Mar 28, 2023
1 parent 46aa0cb commit 4572dfd
Show file tree
Hide file tree
Showing 71 changed files with 1,130 additions and 8,699 deletions.
2 changes: 0 additions & 2 deletions DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,6 @@ Imports:
reshape2,
bayesplot,
loo,
cowplot,
Hmisc,
dplyr,
gsubfn,
Expand All @@ -58,7 +57,6 @@ Imports:
BH,
RcppEigen,
RcppParallel,
pracma,
purrr
Suggests:
knitr,
Expand Down
15 changes: 5 additions & 10 deletions NAMESPACE
Original file line number Diff line number Diff line change
@@ -1,25 +1,20 @@
# Generated by roxygen2: do not edit by hand

export(extract_seroprev_model_summary)
export(fit_seroprev_model)
export(generate_sim_data)
export(get_comparison_table)
export(extract_seromodel_summary)
export(fit_seromodel)
export(get_exposure_ages)
export(get_exposure_matrix)
export(get_prev_expanded)
export(get_sim_counts)
export(get_table_rhats)
export(group_sim_data)
export(plot_foi)
export(plot_info_table)
export(plot_rhats)
export(plot_seromodel)
export(plot_seroprev)
export(plot_seroprev_fitted)
export(plot_seroprev_model)
export(plot_seroprev_models_grid)
export(prepare_bin_data)
export(prepare_seroprev_data)
export(run_seroprev_model)
export(prepare_serodata)
export(run_seromodel)
export(save_or_load_model)
import(Rcpp)
import(dplyr)
Expand Down
108 changes: 15 additions & 93 deletions R/model_comparison.R
Original file line number Diff line number Diff line change
@@ -1,105 +1,27 @@
#' Get Table Rhats
#'
#' Function that makes the rhats table
#' @param model_object model_object
#' Method for extracting a dataframe containing the R-hat estimates for a given serological model.
#'
#' This method relies in the function \link[bayesplot]{rhat} to extract the R-hat estimates of the serological model object
#' \code{seromodel_object} and returns a table a dataframe with the estimates for each year of birth.
#' @param seromodel_object seromodel_object
#' @return rhats table
#' @examples
#' \dontrun{
#' seroprev_data <- prepare_seroprev_data(seroprev_data = serodata, alpha = 0.05)
#' model_object <- run_seroprev_model(
#' seroprev_data = seroprev_data, seroprev_model_name = "constant_foi_bi")
#' get_table_rhats (model_object)
#' data("serodata")
#' data_test <- prepare_serodata(serodata = serodata)
#' model_constant <- run_seromodel(serodata = data_test,
#' seromodel_name = "constant_foi_bi",
#' n_iters = 1500)
#' get_table_rhats(model_object = model_constant)
#' }
#' @export
get_table_rhats <- function(model_object) {
rhats <- bayesplot::rhat(model_object$fit, "foi")
get_table_rhats <- function(seromodel_object) {
rhats <- bayesplot::rhat(seromodel_object$fit, "foi")

if (any(is.nan(rhats))) {
rhats[which(is.nan(rhats))] <- 0
}
model_rhats <- data.frame(year = model_object$exposure_years, rhat = rhats)
model_rhats <- data.frame(year = seromodel_object$exposure_years, rhat = rhats)
model_rhats$rhat[model_rhats$rhat == 0] <- NA

return(model_rhats)
}

#' Get Model Table Comparison
#' Provides a table with statistics for comparison between models and selection
#' @param model_objects_list model_objects to compare
#' @return comparison table
#' @examples
#' \dontrun{
#' data_test <- prepare_seroprev_data(serodata)
#' model_0 <- run_seroprev_model(seroprev_data = data_test,
#' seroprev_model_name = "constant_foi_bi",
#' n_iters = 1000)
#'
#' model_1 <- run_seroprev_model(seroprev_data = data_test,
#' seroprev_model_name = "continuous_foi_normal_bi",
#' n_iters = 1000)
#'
#' model_2 <- run_seroprev_model(seroprev_data = data_test,
#' seroprev_model_name = "continuous_foi_normal_log",
#' n_iters = 1000)
#' comp_table <- get_comparison_table(model_objects_list = c(m0 = model_0,
#' m1 = model_1,
#' m2 = model_2))
#' }
#' @export
get_comparison_table <- function(model_objects_list) {


dif_m0_m1 <- loo::loo_compare(model_objects_list$m0.loo_fit,
model_objects_list$m1.loo_fit)

dif_m0_m2 <- loo::loo_compare(model_objects_list$m0.loo_fit,
model_objects_list$m2.loo_fit)

# Aquí pendiente revisar <diference> desde la función summary_model
# No estoy segura que este parámetro venga bien desde allá ni tampoco que esté bien acá

# model_comp$better <- NA
# model_comp$better[model_comp$difference > 0] <- 'Yes'
# model_comp$better[model_comp$difference <= 0] <-'No'
# model_comp$better[model_comp$model == 'constant_foi_bi'] <- "-"


model_objects_list$m0.model_summary$difference <- 0
model_objects_list$m0.model_summary$diff_se <- 1

model_objects_list$m1.model_summary$difference <- dif_m0_m1[1]
model_objects_list$m1.model_summary$diff_se <- dif_m0_m1[2]

model_objects_list$m2.model_summary$difference <- dif_m0_m2[1]
model_objects_list$m2.model_summary$diff_se <- dif_m0_m2[2]

model_comp <- rbind(model_objects_list$m0.model_summary,
model_objects_list$m1.model_summary,
model_objects_list$m2.model_summary)

model_comp$converged[model_comp$elpd == -1.000e+10] <- 'No' #
ds_one <- dplyr::filter(model_comp, converged == 'Yes')
print(paste0('number of converged models = ', NROW(ds_one)))

# Ordering the best model based on elpd values
elps_order <- rev(sort(ds_one$elpd))
best <- dplyr::filter(model_comp, elpd %in% elps_order) %>% dplyr::arrange(-.data$elpd)# This is to make sure I keep only three
best_model1 <- as.character(best$model[1])
best_model2 <- as.character(best$model[2])
best_model3 <- as.character(best$model[3])

model_comp$best_elpd <- NA
model_comp$best_elpd[model_comp$model == best_model1] <- 1
model_comp$best_elpd[model_comp$model == best_model2] <- 2
model_comp$best_elpd[model_comp$model == best_model3] <- 3
model_comp <- model_comp %>% dplyr::arrange(.data$best_elpd)

# Estimating p-values to check the difference between the models m0 and other models is actually important
model_comp <- model_comp %>% dplyr::mutate(pvalue = 1 - stats::pnorm(difference/diff_se,0,1))
model_comp$pvalue[is.nan(model_comp$pvalue)] <- 1
model_comp$pvalue <- model_comp$pvalue * stats::runif(NROW(model_comp), min = 1, max = 1.0001)# I make this just to ensure I get different values
model_comp$pvalue[model_comp$model == 'constant_foi_bi'] <- 0
model_comp$pvalue <- round(model_comp$pvalue, 6)

return(model_comp)
}
}
Loading

0 comments on commit 4572dfd

Please sign in to comment.