vignettes/pathogen_marburg.Rmd

---
title: "Marburg Virus Disease (MVD)"
date: "Latest update: `r Sys.Date()`"
output: html_document
---

In 2018, the World Health Organization (WHO) published a list of nine known pathogens (in addition to an unknown _Pathogen X_) for research and development (R&D) prioritisation, due to both their epidemic and pandemic potential and the absence of licensed vaccines or therapeutics. Among these prioritised pathogens is MVD, a highly-lethal infectious _Filoviridae_ single-stranded RNA virus first described in Germany and Serbia (formerly Yugoslavia) in 1967. Subsequent outbreaks of this virus have primarily occurred in sub-Saharan Africa. The Pathogen Epidemiology Review Group (PERG) has published a systematic review for MVD, if you use any of our results please cite our paper:

> @article{marburg_systematic_review_2023,
  title={{Marburg virus disease outbreaks, mathematical models, and disease parameters: a systematic review}},
  author={Gina Cuomo-Dannenburg and Kelly McCain and Ruth McCabe and H Juliette T Unwin and Patrick Doohan and Rebecca K Nash and Joseph T Hicks and Kelly Charniga and Cyril Geismar and Ben Lambert and Dariya Nikitin and Janetta E Skarp and Jack Wardle and Pathogen Epidemiology Review Group and Mara Kont and Sangeeta Bhatia and Natsuko Imai and Sabine L van Elsland and Anne Cori and Christian Morgenstern},
  journal={The Lancet Infectious Diseases},
  year={2023},
  doi={10.1016/S1473-3099(23)00515-7},
  URL = {https://www.thelancet.com/journals/laninf/article/PIIS1473-3099(23)00515-7/fulltext},
  publisher={Elsevier}
}

All Tables and Figures from the paper are re-produced below on the **latest** available data in our data set. For convenience we label the Figures and Tables with the same numbers as in the paper.

```{css echo=FALSE}
.flextable-shadow-host{
  overflow: scroll;
  white-space: nowrap;
}
```


```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = FALSE)
remotes::install_github("mrc-ide/epireview@v0.1.0")
library(epireview)
library(patchwork)
library(ggplot2)
library(flextable)
library(officer)
library(tm)
library(meta)
library(tidyverse)
```

## Outbreaks
**Table 1:** Overview of MVD outbreaks i.e. location, timing, and size, as reported in the studies included in this review. We report in bold the country and outbreak year, the location refers to the place of the actual outbreak in the country if known. Blank cells correspond to information which we were unable to find in or extract from the literature.

```{r echo=FALSE, fig.height=4, fig.width=8, message=FALSE, warning=FALSE, paged.print=TRUE}
df <- data_outbreak_table(pathogen = "marburg","../")
outbreak_table(df, "marburg")
```

## Parameters


### Reproduction numbers

**Figure 3 (A):** Estimates of the reproduction number. The blue and red points correspond to estimates of the effective reproduction number (Re) and basic reproduction number (R0) respectively, with associated uncertainty shown by the solid lines where available. The dashed vertical line presents the threshold for epidemic growth. 

```{r echo=FALSE, fig.height=4, fig.width=8, message=FALSE, warning=FALSE, paged.print=TRUE}
df <- data_forest_plots(pathogen = "marburg", prepend = "../", exclude = c(15, 17))
forest_plot_r(df)
```

### Severity
**Figure 2 (A):** Case Fatality Ratio (CFR) meta-analyses, using logit-transformed proportions and a generalized linear mixed-effects model (full details in SI A.4 in the paper). The forest plot displays studies includedin each meta-analyis: the red squares indicate study weight, and for each study, a 95% binomial confidence interval is provided. As summaries we display the total common effects, where all data are effectively pooled and assumed to come from a single data-generating process with one common CFR and total random effect estimates, which allow the CFR to vary by study and accordingly give different weights to each study when determining an overall estimate [19]. CFR estimates reported in the included studies. 


```{r cfr_panel_A, echo=FALSE, fig.height=4, fig.width=9, message=FALSE, warning=FALSE}
df <- data_forest_plots(pathogen = "marburg", prepend = "../", exclude = c(15, 17))

file_path_ob <- system.file("extdata", "marburg_outbreak.csv", package = "epireview")
if(file_path_ob=="") file_path_ob <- paste0('../inst/extdata/marburg_outbreak.csv')
outbreak     <- read_csv(file_path_ob)

file_path_ar <- system.file("extdata","marburg_article.csv", package = "epireview")
if(file_path_ar=="") file_path_ar <- paste0('../inst/extdata/marburg_article.csv')
articles     <- read_csv(file_path_ar)

df_out <- left_join(outbreak, articles %>% dplyr::select(covidence_id, first_author_surname, year_publication), by = "covidence_id") %>%
  mutate(article_label = as.character(paste0(first_author_surname, " ", year_publication)),
         outbreak_country = str_replace_all(outbreak_country, ";", ", ")) %>%
  dplyr::arrange(article_label, -year_publication) %>%
  dplyr::filter(article_id %in% c(17,15) == FALSE) %>%
  filter(!is.na(deaths) & !is.na(cases_confirmed)) %>% 
  mutate(cases_suspected = replace_na(cases_suspected, 0),
         parameter_value = deaths/(cases_confirmed + cases_suspected)*100,
         cfr_ifr_numerator = deaths,
         cfr_ifr_denominator = cases_confirmed + cases_suspected,
         cfr_ifr_method = "Naive",
         parameter_class = "Severity",
         parameter_type = "Severity - case fatality rate (CFR)",
         parameter_uncertainty_lower_value = NA,
         parameter_uncertainty_upper_value = NA,
         outbreak_year_cnt = str_replace(paste0(outbreak_country, " (",outbreak_start_year, ")"), "NA", "unknown") ) %>%
  dplyr::arrange(outbreak_year_cnt) %>% distinct()
df_out$parameter_data_id <- seq(1, dim(df_out)[1], 1)

df_out <- df_out %>% mutate(outbreak_country_v2=replace(outbreak_country,str_starts(outbreak_country,"Germany"),"Germany"),
                            outbreak_start_v2 = outbreak_start_year )

# nasty hack to deal with historic outbreak data + data capture
df_out[df_out$outbreak_id==20,]$outbreak_start_v2 <- 1968
df_out[df_out$outbreak_id==10,]$outbreak_start_v2 <- 2004

selection <- df_out %>% mutate(outbreak_country_v2=replace(outbreak_country,str_starts(outbreak_country,"Germany"),"Germany")) %>%
  dplyr::select(outbreak_country,outbreak_country_v2,outbreak_start_v2,cfr_ifr_denominator,parameter_data_id) %>% group_by(outbreak_country_v2,outbreak_start_v2) %>% summarize(max=max(cfr_ifr_denominator))

selection_index <- rep(0,dim(selection)[1])

for(i in 1:length(selection_index))
{
  sec <- selection[i,]
  ind <- df_out %>% filter(outbreak_country_v2==sec$outbreak_country_v2 & outbreak_start_v2==sec$outbreak_start_v2 & cfr_ifr_denominator==sec$max) %>% pull(parameter_data_id)
  selection_index[i] <- ind[length(ind)]
}
selection_index <- na.omit(unique(selection_index))

df_out$keep_record <- 0
df_out[selection_index,]$keep_record <- 1

cfr <- df %>% filter(cfr_ifr_denominator>0 & parameter_class=="Severity")

meta_cfr <- metaprop(cfr_ifr_numerator, cfr_ifr_denominator, studlab=article_label, sm="PLOGIT", data=cfr, method="GLMM", method.tau="ML")

forest(meta_cfr, layout="RevMan5", xlab="Proportion", comb.r=T, comb.f=F, xlim = c(0,1), fontsize=10, digits=3)
```

**Figure 2 (B):** CFR estimated from extracted outbreak data, including only one observation per outbreak using the study with the longest duration of the outbreak reported ensuring no case is double counted.

```{r cfr_panel_B, echo=FALSE, fig.height=5.5, fig.width=9, message=FALSE, warning=FALSE}
cfr_outbreak <- df_out %>% filter(keep_record == 1)

meta_cfr_outbreak <- metaprop(cfr_ifr_numerator,
                              cfr_ifr_denominator,
                              studlab = article_label,
                              sm = "PLOGIT",
                              data = cfr_outbreak,
                              method = "GLMM",
                              method.tau = "ML")

forest(meta_cfr_outbreak,
       layout = "RevMan5",
       xlab = "Proportion",
            comb.r = T,
            comb.f = F,
            xlim = c(0,1),
            fontsize = 10,
            digits = 3)
```

**Figure S2 (A):** Overview of the estimates of the case fatality ratio (CFR) obtained from the included studies. CFR estimates reported in the included studies, stratified according to estimation method. Points represent central estimates. Error bars represent an uncertainty interval associated with the point estimate, as reported in the original study. 

```{r echo=FALSE, fig.height=4, fig.width=8, message=FALSE, warning=FALSE, paged.print=TRUE}
df <- data_forest_plots(pathogen = "marburg", prepend = "../", exclude = c(15, 17))
forest_plot_severity(df)
```
**Figure S2 (B):** Overview of the estimates of the case fatality ratio (CFR) obtained from the included studies. CFR estimated from extracted outbreak data, including only one observation per outbreak using the study with the longest duration of the outbreak reported ensuring each case is not double counted. Shaded bars represents the imputed Binomial confidence interval for studies with a sample size, n > 1. Vertical dotted lines represent 0% and 100% CFR.
```{r echo=FALSE, fig.height=4, fig.width=8, message=FALSE, warning=FALSE, paged.print=TRUE}
forest_plot_severity(df_out,outbreak_naive = TRUE)
```


### Delays
**Table 2:** Overview of the delay parameter estimates extracted from the included studies. These are stratified into five categories: Generation Time, Incubation Period, Time in Care, Time from Symptom to Careseeking and Time from Symptom to Outcome. Estimates and associated uncertainty are provided, along with information regarding the population group corresponding to the estimate and the timing and location of the outbreak. ‘Other’ refers to a range of different values which are specified in the underlying papers.


```{r echo=FALSE, fig.height=4, fig.width=8, message=FALSE, warning=FALSE, paged.print=TRUE}
df <- data_param_table(pathogen = "marburg", prepend = "../",exclude = c(17, 15))
delay_table(df, "marburg")
```

**Figure 3 (B):**  Delay parameters, stratified into five categories: Generation Time, Incubation Period, Time in Care, Time from Symptom to Careseeking and Time from Symptom to Outcome as indicated by different colours. 

```{r echo=FALSE, fig.height=4, fig.width=8, message=FALSE, warning=FALSE, paged.print=TRUE}
df <- data_forest_plots(pathogen = "marburg", prepend = "../", exclude = c(15, 17))
forest_plot_delay(df)
```

### Seroprevalence
**Table 4:** Overview of seroprevalence estimates as reported in the included studies. Estimates were primarily reported as percentages, as shown here. Associated uncertainty and sample sizes are provided where these were reported. Where available, additional information regarding the location and timing of the estimates, the antibody being tested for, the target population, the timing in relation to any ongoing outbreak and the availability of disaggregated data is also summarised.

```{r echo=FALSE, fig.height=4, fig.width=8, message=FALSE, warning=FALSE, paged.print=TRUE}
df <- data_param_table(pathogen = "marburg", prepend = "../", exclude = c(17, 15))
sero_table(df, "marburg")
```

### Molecular evolutionary rates
**Figure 3 (C):**  Evolutionary rates. Colours indicate different genome types; points represent central estimates. Solid lines represent an uncertainty interval associated with the point estimate while ribbons indicate a parameter value +/- standard error with a minimum value of 0.

```{r echo=FALSE, fig.height=4, fig.width=8, message=FALSE, warning=FALSE, paged.print=TRUE}
df <- data_forest_plots(pathogen = "marburg", prepend = "../", exclude = c(15, 17))
forest_plot_mutations(df)
```

## Risk Factors
**Table 3:** Aggregated information on risk factors associated with MVD infection and seropositivity. Risk factors were mapped onto our risk factor classification (see Supplement) by interpreting the authors’ descriptions. Adjusted refers to whether estimates are adjusted (i.e. from a multivariate analysis) or not (i.e. from a univariate analysis), with unknown showing that this information is not clearly stated in the original study. Statistical significance was determined according to the original authors’ statistical approaches when specified, or using a p-value of 0.05 otherwise. The numbers in the significant and not significant columns represent the total sample size included in the analyses for this risk factor and outcome category.

```{r echo=FALSE, fig.height=4, fig.width=8, message=FALSE, warning=FALSE, paged.print=TRUE}
df <- data_param_table(pathogen = "marburg", prepend = "../", exclude = c(15, 17))
risk_table(df, "marburg")
```


**Figure S4:** More detailed table of risk factor data from the extracted studies, giving countries, times and contexts of surveys and non-aggregated information on each risk factor assessed in the four relevant studies.


```{r echo=FALSE, fig.height=4, fig.width=8, message=FALSE, warning=FALSE, paged.print=TRUE}
risk_table(df,'marburg',supplement=TRUE)
```

## Quality Assessment

**Figure S3:** (A) Count of papers for each quality assessment question scoring Yes, No or not applicable. (B) Quality Assessment Score defined as proportion of Yes votes for each paper relative to sum of Yes and No answers, removing NAs. The time trend is fitted using Local Polynomial Regression Fitting.

```{r echo=FALSE, fig.height=4, fig.width=8, message=FALSE, warning=FALSE, paged.print=TRUE}
quality_assessment_plots(pathogen = "marburg", prepend = "../")
```