vignettes/NewYorkInfluenza.Rmd

---
title: "New York Influenza"
author: "Michael Karcher"
date: "October 12, 2015"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{New York Influenza}
  %\VignetteEngine{knitr::rmarkdown}
  \usepackage[utf8]{inputenc}
---

This vignette shows a case study of New York influenza data, using `phylodyn`.
We start by loading the `phylodyn` package.

```{r message=FALSE}
library(phylodyn)
```

In preparation, we aligned the sequences using the software MUSCLE,
and inferred a maximum clade credibility genealogy using the software BEAST[^1].
We packaged this genealogy in a `phylo` object `NY_flu`, and we load it now.

[^1]: We infer the genealogy branch lengths in units of years using a strict molecular clock, a constant effective population size prior, and an HKY substitution model with the first two nucleotides of a codon sharing the same estimated transition matrix, while the third nucleotide's transition matrix is estimated separately.

```{r}
data(NY_flu)
```

We use BNPR and BNPR_PS to calculate approximate marginals (without and with a sampling model).

```{r}
NY_cond = BNPR(data = NY_flu, lengthout = 100)
NY_pref = BNPR_PS(data = NY_flu, lengthout = 100)
```

We plot the results (we use a scaling factor because we stored the object with units of weeks, and we standardize on units of years).

```{r, fig.width=8.5, fig.height=3.5}
axlabs = list(x = seq(0, 12*52, by=52) + 10, labs = seq(2005, 1993, by=-1))

par(mfrow=c(1,3), cex=0.9, cex.lab=1.5, cex.main=1.7, oma=c(2.5, 2, 0, 0)+0.1,
    mar=c(2,1.5,2,1), mgp = c(2.5,1,0), xpd=NA,
    fig=c(0, 0.32, 0, 1))
plot_BNPR(NY_cond, main="BNPR", ylim = c(10, 500)/52, yscale = 1/52,
          col = rgb(0.829, 0.680, 0.306), axlabs = axlabs, heatmap_labels_side = "left")

par(fig=c(0.32, 0.64, 0, 1), new=TRUE)
plot_BNPR(NY_pref, main="BNPR-PS", ylim = c(10, 500)/52, yscale = 1/52, ylab = "",
          col = rgb(0.330, 0.484, 0.828), axlabs = axlabs, heatmap_labels = FALSE)

par(mar=c(2,3.5,2,1), fig=c(0.64, 1.0, 0, 1), new=TRUE)
plot_mrw(list(NY_cond, NY_pref), axlabs = axlabs, ylab = "Mean Relative Width",
         cols = c(rgb(0.829, 0.680, 0.306), rgb(0.330, 0.484, 0.828)),
         legends = c("BNPR", "BNPR-PS"), legend_place = "topright", legend_cex = 0.8)
```

## References

1. R. C. Edgar.
MUSCLE: Multiple sequence alignment with high accuracy and high through- put.
*Nucleic Acids Research*, 32:1792–1797, 2004.

2. A. J. Drummond, M. A. Suchard, D. Xie, and A. Rambaut.
Bayesian phylogenetics with BEAUti and the BEAST 1.7.
*Molecular Biology and Evolution*, 29:1969–1973, 2012.

3. A. Rambaut, O. G. Pybus, M. I. Nelson, C. Viboud, J. K. Taubenberger, and E. C. Holmes.
The genomic and epidemiological dynamics of human influenza A virus.
*Nature*, 453 (7195):615–619, 2008.

4. M. D. Karcher, J. A. Palacios, T. Bedford, M. A. Suchard, and V. N. Minin.
Quantifying and mitigating the effect of preferential sampling on phylodynamic inference.
*arXiv preprint arXiv*:1510.00775, 2015.